Edinburgh Research Explorer

Human, yeast and pig genomics: sequence submissions and first sequence descriptions in the literature (1980-2015)

Dataset

Related Edinburgh Organisations

PublisherEdinburgh DataShare
Temporal coverage1 Jan 1980 - 31 Dec 2015
Date made available5 Dec 2019

Abstract

This data collection is derived from two sources: 1) Submissions of DNA sequences of S. cerevisiae (yeast), Sus scrofa (pig) and Homo sapiens (human) to the European Nucleotide Archive (ENA), and 2) First description of these sequences in the scientific literature. The time range of the records is 1980-2000 (yeast), 1985-2005 (human) and 1990-2015 (pig). In total, each species has two associated datasets: 1) A .csv file documenting the PubMed ID (PMID) of each article describing new sequences, all paper authors, all institutional affiliations of each author, country of institution, year of first submission to the ENA (when available) and year of article publication; 2) A .csv file documenting all institutions submitting to the ENA, number of nucleotides sequenced and year of submission to the database. While the data about yeast submissions is provided sequence per sequence with full dates and information about both submitting individuals and institutions, the pig and human submission datasets offer aggregate figures per institution and per year. Some submission data is not fully clean. The approximate number of records is 28,000 publications and 13.5 million sequence submissions. The software codes that were used to obtain the submission and publication records can be found at https://github.com/UofGMarkWong/TRANSGENE. A publication describing the data collection and cleaning protocol is available at https://f1000research.com/articles/8-1200. Further information about the project within which this collection was generated: www.stis.ed.ac.uk/transgene.

Data Citation

Wong, Mark; Leng, Rhodri; Viry, Gil; Liscovsky Barrera, Rodrigo; Garcia-Sancho, Miguel. (2019). Human, yeast and pig genomics: sequence submissions and first sequence descriptions in the literature (1980-2015), 1980-2015 [dataset]. University of Edinburgh. Science, Technology and Innovation Studies. https://doi.org/10.7488/ds/2718.

ID: 128570026