Description
The following describes the citation network datasets that underpins the manuscript “A career in numbers: a citation network analysis of the work of RP Millar and his contribution to GnRH research” [1]. Data collection We retrieved data from the Web of Science Core Collection under the University of Edinburgh’s subscription in January 2024. We sought to retrieve all indexed papers of Professor Robert P. Millar (RPM). We searched the following AU = (Millar, R), and then retrieved records that corresponded to his WoS profile (n=428 records) and an additional 49 paper that were authored by Robert but had not been included in his WoS record – validating the records against a CV of his published works. We retrieved the full citation history as record by Web of Science to these papers from other indexed records. The 477 RPM papers had been cited 21,677 times by 11,138 documents by date of retrieval, and removing self-citations left 19,256 citations by 10,719 documents. We then retrieved all metadata from WoS concerning the 477 RPM papers and the 10,719 citation papers, resulting in a dataset covering 11,196 documents. Citation network dataset We constructed a citation network dataset by parsing data from each paper’s full bibliography consisting of: i. ‘Edge-list’ that records citation links from a citing to a cited document. This is constructed by assigning unique IDs to each retrieved paper and to every unique reference string contained in their bibliographies. The edge list is composed of a ‘Source’ column that contains the ID of the citing document and a ‘Target’ column containing the IDs of its citations, with one record per row. Given that we were only interested in citations between the WoS retrieved documents, we discarded any reference string that represented a document outwith our search. ii. ‘Node-attribute list’ that contains the ID, with relevant metadata contained in adjacent columns to identify documents, including authors, title of publication, journal, year of publication. We also parsed into this dataset the WoS full citation count for each paper and the total number of references in the bibliographies of each paper. This results in a dataset containing 11,196 nodes and 115,834 edges between nodes. We removed a total of 67 papers for which metadata was incomplete and/or corrupted. We further focussed on the largest interconnected component, removing nodes with no connections (isolates) or smaller components that were detached from the main network. We excluded papers
Data Citation
Leng, R. (2024). Citation network dataset covering the work of RP Millar and its citing literature [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11534257
| Date made available | 9 Jun 2024 |
|---|---|
| Publisher | Zenodo |
Cite this
- DataSetCite