In this paper we report on a sub-task undertaken as part of Hiberlink, a project which is examining the phenomenon of reference rot within scholarly works. In our sub-task we aim to quantify and understand the nature of occurrence of links to web resources referenced from papers in very large-scale scholarly collections. We first introduce the challenges involved in extracting links from scholarly articles and develop and evaluate the accuracy of a set of link extraction systems. Secondly, five collections containing millions of scholarly articles with different characteristics (across different disciplines, time periods and publication types) are studied and we demonstrate that web resources are widely cited in scholarly publications and should be an important concern for digital preservation.
|Title of host publication||IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014, London, United Kingdom, September 8-12, 2014|
|Publisher||Institute of Electrical and Electronics Engineers (IEEE)|
|Number of pages||2|
|Publication status||Published - 2014|