Extraction and analysis of referenced web links in large-scale scholarly articles

Research output: Chapter in Book/Report/Conference proceedingConference contribution


In this paper we report on a sub-task undertaken as part of Hiberlink, a project which is examining the phenomenon of reference rot within scholarly works. In our sub-task we aim to quantify and understand the nature of occurrence of links to web resources referenced from papers in very large-scale scholarly collections. We first introduce the challenges involved in extracting links from scholarly articles and develop and evaluate the accuracy of a set of link extraction systems. Secondly, five collections containing millions of scholarly articles with different characteristics (across different disciplines, time periods and publication types) are studied and we demonstrate that web resources are widely cited in scholarly publications and should be an important concern for digital preservation.
Original languageEnglish
Title of host publicationIEEE/ACM Joint Conference on Digital Libraries, JCDL 2014, London, United Kingdom, September 8-12, 2014
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages2
Publication statusPublished - 2014

Cite this