TY - GEN
T1 - Holes in the outline
T2 - 4th ACM SIGIR Conference on Information Interaction and Retrieval, CHIIR 2019
AU - Huang, Chien Yu
AU - Casey, Arlene
AU - Głowacka, Dorota
AU - Medlar, Alan
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/3/8
Y1 - 2019/3/8
N2 - Scientific literature search engines typically index abstracts instead of the full-text of publications. The expectation is that the abstract provides a comprehensive summary of the article, enumerating key points for the reader to assess whether their information needs could be satisfied by reading the full-text. Furthermore, from a practical standpoint, obtaining the full-text is more complicated due to licensing issues, in the case of commercial publishers, and resource limitations of public repositories and pre-print servers. In this article, we use topic modelling to represent content in abstracts and full-text articles. Using Computer Science as a case study, we demonstrate that how well the abstract summarises the full-text is subfield-dependent. Indeed, we show that abstract representativeness has a direct impact on retrieval performance, with poorer abstracts leading to degraded performance. Finally, we present evidence that how well an abstract represents the full-text of an article is not random, but is a consequence of style and writing conventions in different subdisciplines and can be used to infer an “evolutionary” tree of subfields within Computer Science.
AB - Scientific literature search engines typically index abstracts instead of the full-text of publications. The expectation is that the abstract provides a comprehensive summary of the article, enumerating key points for the reader to assess whether their information needs could be satisfied by reading the full-text. Furthermore, from a practical standpoint, obtaining the full-text is more complicated due to licensing issues, in the case of commercial publishers, and resource limitations of public repositories and pre-print servers. In this article, we use topic modelling to represent content in abstracts and full-text articles. Using Computer Science as a case study, we demonstrate that how well the abstract summarises the full-text is subfield-dependent. Indeed, we show that abstract representativeness has a direct impact on retrieval performance, with poorer abstracts leading to degraded performance. Finally, we present evidence that how well an abstract represents the full-text of an article is not random, but is a consequence of style and writing conventions in different subdisciplines and can be used to infer an “evolutionary” tree of subfields within Computer Science.
KW - Scientific literature search
KW - Term taxonomy
KW - Topic models
UR - https://www.scopus.com/pages/publications/85063125752
U2 - 10.1145/3295750.3298953
DO - 10.1145/3295750.3298953
M3 - Conference contribution
AN - SCOPUS:85063125752
T3 - CHIIR 2019 - Proceedings of the 2019 Conference on Human Information Interaction and Retrieval
SP - 289
EP - 293
BT - CHIIR 2019 - Proceedings of the 2019 Conference on Human Information Interaction and Retrieval
PB - Association for Computing Machinery, Inc
Y2 - 10 March 2019 through 14 March 2019
ER -