MIa Ridge looks at what is next for ‘open cultural data’ – data from cultural institutions that is made available for use in a machine-readable format under an open licence.
To define ‘open cultural data’, it is best to look at each term in turn. While the degree of openness required to be ‘open’ data can be contentious, at its simplest, ‘open’ refers to content that is available for use outside the institution that created it, whether for school homework projects, academic monographs or mobile phone apps. ‘Open’ may further refer to licences that clarify the permissions and restrictions placed on data, or to the use of non-proprietary digital technologies, or ideally, to a combination of both open licences and technologies. Cultural data is data about objects, publications (such as books, pamphlets, posters or musical scores), archival material, etc, created and distributed by museums, libraries, archives and other organisations. Data can refer to different types of content, from metadata or tombstone records (the basic titles, names, dates, places, materials, etc of a catalogue record), to entire collection records (including data such as researched and interpretive descriptions of objects, bibliographic data, related themes and narratives) to full digital surrogates of an object, document or book as images or transcribed text. To put that all together, ‘open cultural data’ is data from cultural institutions that is made available for use in a machine-readable format under an open licence.
Open cultural data is related to Linked Data (or Linked Open Data) but while Linked Data requires specific technical protocols to support connections in the ‘web of data’, open cultural data could be as simple as publishing a downloadable text file. Discussion of linked data and its precursor, the Semantic Web, often assumed that data would be openly available.
Key moments in open cultural data
Journalism and politics were key drivers in the movement for open data in the early to mid-2000s, with TheyWorkForYou.com launched in 2004, and the Guardian calling for the government to ‘free our data’ in 2006.(1) In 2009, Tim Berners-Lee led the audience of a TED talk in a chant of ‘raw data now’, an apparently pivotal moment in the public awareness of open data,(2) and in the same year the US and UK governments launched their open data sites, data.gov and data.gov.uk.
Museum technologists followed these developments closely, and they helped inspired some of the key moments in the history of open cultural data highlighted here. By 2006 the cultural heritage sector in the UK was discussing how they might make their data available on the Semantic Web,(3) but progress on pure Linked Open Data has been slow, in part because of the complexity of licensing, technology and vocabulary issues involved. The Flickr Commons project was launched in January 2008 with a pilot project of 1500 images from the Library of Congress licensed with “no known copyright restrictions”.(4) The aim was to increase access to public collections and provide a way for the public to share their knowledge. In April 2008, Sydney’s Powerhouse Museum became the first museum to join The Commons, and at the time of writing, there are 56 participating institutions from across the world. In March 2009, Brooklyn Museum became the first museum in the world to release an API, and within a few days they had three projects built on their API.
In April 2009, the Powerhouse Museum released its collection metadata under a Creative Commons CC-BY-SA license (meaning content could be used for any use, including commercial use as long as it was attributed and shared under the same licence) with additional, museum-specific notes released for non-commercial use.(5) In May 2011 Yale released 250,000 high-quality images in the public domain to ‘more fully harness the potential of digital and networked technologies in service to scholarship as well as to creative use and reuse of our rich cultural heritage’.(6) In 2011 the British Museum launched a linked data service and a more open licence for access to its collections and Europeana launched an API providing access to over 20 million objects from 1500 museums, libraries, archives and audio-visual collections aggregated from across Europe.(7, 8) In 2012, Europeana released their dataset for free re-use (including commercial use) without any restrictions under the widely-understood Creative Commons licence.(9)
If you build it, they may not come
The Museum API wiki currently lists over 50 museum, gallery, library and archive APIs and machine-readable sources for open cultural data, including national and subject-specific aggregators and while a sample of the websites, mobile apps and games made with open cultural data demonstrates huge variety, not every dataset gets as much use as its organisation might expect.(9,10) Possible reasons for the under-use of open cultural data include confusing or incompatible licenses, poor or inconsistent record quality within datasets, a lack of images or interesting descriptions, and undocumented or ambiguous vocabularies.
Just as user experience design is increasingly important for digital audiences, it is important to design usable data services for programmers. Developers are the link between a data service and new audiences of their applications, and the choices made regarding technologies, licensing and data structures must be carefully designed and documented. Helping developers find cultural datasets by including them in wider open government directories also positions museums within the wider open data movement.
Museums often limit commercial use by licensing content for ‘non-commercial use only’, requiring attribution, or retaining copyright while allowing some uses of their content. However, there is no clear definition of ‘non-commercial use’ in Creative Commons licences, so some developers may choose not to risk using a dataset with an unclear licence.(12) Custom licences also make it difficult to integrate content with other openly-licensed datasets.
Towards more usable open cultural data
There is often a tension between the need for easy-to-use datasets using common vocabularies for simple ‘mash-up’ style applications and the need for more sophisticated data structures and specialised vocabularies to support internal uses, partnerships between museums, libraries and archives, or for use in research-led projects.(14) However, just as museums benefit from the network effect of using existing media platforms, they may also benefit from leveraging existing open data platforms such as Europeana (or the local equivalent). Other methods to meet common external needs and enhance the discoverability of cultural data are simple mark-up in web pages (such as schema.org) or downloadable datasets.
Internal use cases provide compelling reasons for implementing more complex, expressive models as Linked Data, with access to more structured data for re-use by the public a useful side effect.(15) Organisations like the BBC and Cooper-Hewitt Design Museum have been using open data sources like Wikipedia and MusicBrainz alongside other open cultural data sources like the Indianapolis Museum of Art and The Open Library to supplement the content on their own sites or simply make better use of their own content archives.(16) Some museums have re-used their open cultural datasets to reduce the amount of overhead required to participate in partnerships like the Google Art Project or Pelagios.(17)
Emerging licensing models may contribute to greater use of open cultural data. For example, many museums are making lower resolutions images available for re-use while reserving high resolution versions for commercial sales and licensing. Ultimately, the key to unlocking the potential of open cultural data lies in the networked nature of the web of data. The internal and external benefits of linked data are in linking to other sources as well as providing linkable sources.(18) Each open cultural dataset added to the web of data contributes to the wider network of content and knowledge and creates new possibilities for innovative experiences of our shared cultural heritage.
Chair of the Museums Computer Group, former Lead Web Developer at the Science Museum, London, and currently researching a PhD in digital humanities (Department of History, Open University), focusing on historians and scholarly crowdsourcing
Notes | References | Bibliography
3. For example, through the Arts and Humanities Research Council funded ‘Semantic Web Think Tank’ research project http://culturalsemanticweb.wordpress.com/about/ and presentations at UK Museums and the Web 2005
12. See for example the discussion at http://www.niemanlab.org/2011/11/wired-releases-images-via-creative-commons-but-reopens-a-debate-on-what-noncommercial-means/
13. For example, data.gov appointed a ‘data evangelist’ to lead a team in outreach
14. See Dominic Oldman’s discussion of simple schemas for ‘basic information jukeboxes’ versus the ‘rich semantic framework’ provided by the descriptions, associations and taxonomies in the British Museum’s data in http://www.oldman.me.uk/blog/the-british-museum-cidoc-crm-and-the-shaping-of-knowledge/
15. See http://5stardata.info/ for information on steps towards ‘5 star linked data’ and http://www.museumsandtheweb.com/mw2011/papers/reprogramming_the_museum for a museum perspective on developing data services for internal use
18. For example, ‘Improving Linked Data’ discusses these issues http://blog.library.si.edu/2013/01/improving-linked-data/