The dataset includes metadata descriptions extracted from the Centre for Research Collections' online archival catalog using OAI-PMH EAD harvesting. Metadata descriptions were extracted from four metadata fields: an identifier (<unitid>), Biographical / Historical (<bioghist>), Scope and Contents (<scopecontent>), and Processing Information (<processinfo>). The descriptions were extracted in October 2020. The dataset includes five files that will be annotated for instances of gender bias, in an effort to create a gold standard dataset on which an algorithm can be trained to identify and classify gender bias in text.
## Acknowledgments ##
This dataset has been created for a PhD project conducted in collaboration with Beatrice Alex, Benjamin Bach, and Melissa Terras (PhD supevisors); and with Rachel Hosker and the Centre for Research Collections (CRC). This group of collaborators will be involved in future uses of the data as this PhD project continues; specifically, for determining how to annotate the data for gender bias. Thanks are due to Scott Renton for his guidance in using the Open Archives Initiative - Protocol for Metadata Harvesting (OAI-PMH), which was necessary to extract selections of metadata in Encoded Archival Description (EAD) XML format from the CRC's online archives' catalog, ArchivesSpace.
Havens, L; Alex, B; Bach, B; Terras, M; Renton, S; Hosker, R; Centre for Research Collections, The. (2020). Archival Metadata Descriptions from the University of Edinburgh Centre for Research Collections - Extracted October 2020, [dataset]. University of Edinburgh. School of Informatics. https://doi.org/10.7488/ds/2953.
|Date made available||19 Nov 2020|
|Geographical coverage||UK,UNITED KINGDOM|