Holes in the outline: Subject-dependent abstract quality and its implications for scientific literature search

Chien Yu Huang, Arlene Casey, Dorota Głowacka, Alan Medlar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Scientific literature search engines typically index abstracts instead of the full-text of publications. The expectation is that the abstract provides a comprehensive summary of the article, enumerating key points for the reader to assess whether their information needs could be satisfied by reading the full-text. Furthermore, from a practical standpoint, obtaining the full-text is more complicated due to licensing issues, in the case of commercial publishers, and resource limitations of public repositories and pre-print servers. In this article, we use topic modelling to represent content in abstracts and full-text articles. Using Computer Science as a case study, we demonstrate that how well the abstract summarises the full-text is subfield-dependent. Indeed, we show that abstract representativeness has a direct impact on retrieval performance, with poorer abstracts leading to degraded performance. Finally, we present evidence that how well an abstract represents the full-text of an article is not random, but is a consequence of style and writing conventions in different subdisciplines and can be used to infer an “evolutionary” tree of subfields within Computer Science.

Original languageEnglish
Title of host publicationCHIIR 2019 - Proceedings of the 2019 Conference on Human Information Interaction and Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages289-293
Number of pages5
ISBN (Electronic)9781450360258
DOIs
Publication statusPublished - 8 Mar 2019
Event4th ACM SIGIR Conference on Information Interaction and Retrieval, CHIIR 2019 - Glasgow, United Kingdom
Duration: 10 Mar 201914 Mar 2019

Publication series

NameCHIIR 2019 - Proceedings of the 2019 Conference on Human Information Interaction and Retrieval

Conference

Conference4th ACM SIGIR Conference on Information Interaction and Retrieval, CHIIR 2019
Country/TerritoryUnited Kingdom
CityGlasgow
Period10/03/1914/03/19

Keywords / Materials (for Non-textual outputs)

  • Scientific literature search
  • Term taxonomy
  • Topic models

Fingerprint

Dive into the research topics of 'Holes in the outline: Subject-dependent abstract quality and its implications for scientific literature search'. Together they form a unique fingerprint.

Cite this