Abstract / Description of output
How can we use data science to understand academic library holdings at scale? Can we use library catalogues to understand the historical growth of collections, acquisition practices, subject level specialisms, biases, or how a collection reflects the library’s stated acquisition strategy? We present the Edinburgh University Library Metadata Visualizations Project, which used MARC (Machine Readable Cataloging) metadata, the international standard for dissemination and searching of bibliographic data (Schudel 2006, Library of Congress 2019), as a rich source to understand holdings.
Library catalogue data is an example of a Humanities dataset that is complex, challenging, heterogeneous, fragmentary, multilingual, and ambiguous (Lazer et al 2009, Kitchin 2014, Guiliano and Ridge 2016, Underwood 2018, Alex et al 2019). Most data processing of MARC focusses on improvement of the records, although previous work has used MARC to understand biases (Diao and Cao 2016, Lavoie 2018), and for library analytics (Harper 2016).
The University Library’s MARC data for its 1,297,311 print books was downloaded from OCLC (the “physical collection”: avoiding complexities of syndication to electronic sources). Data was translated to CSV and cleaned using Python and Pandas scripts. Visualisations were created from samples of the data using Python, Microsoft Excel and Adobe InDesign. Our code is available on GitHub.
Library catalogue data is an example of a Humanities dataset that is complex, challenging, heterogeneous, fragmentary, multilingual, and ambiguous (Lazer et al 2009, Kitchin 2014, Guiliano and Ridge 2016, Underwood 2018, Alex et al 2019). Most data processing of MARC focusses on improvement of the records, although previous work has used MARC to understand biases (Diao and Cao 2016, Lavoie 2018), and for library analytics (Harper 2016).
The University Library’s MARC data for its 1,297,311 print books was downloaded from OCLC (the “physical collection”: avoiding complexities of syndication to electronic sources). Data was translated to CSV and cleaned using Python and Pandas scripts. Visualisations were created from samples of the data using Python, Microsoft Excel and Adobe InDesign. Our code is available on GitHub.
Original language | English |
---|---|
Publication status | Published - 5 Sept 2022 |
Keywords / Materials (for Non-textual outputs)
- library science
- data visualisation
- digital library