Abstract
When a reader is first introduced to an entity, its referring expression must describe the entity. For entities that are widely known, a single word or phrase often suffices. This paper presents the first study of how expressions that refer to the same entity develop over time. We track thousands of person and organization entities over 20 years of New York Times (NYT). As entities move from hearer-new (first introduction to the NYT audience) to hearer-old (common knowledge) status, we show empirically that the referring expressions along this trajectory depend on the type of the entity, and exhibit linguistic properties related to becoming common knowledge (e.g., shorter length, less use of appositives, more definiteness). These properties can also be used to build a model to predict how long it will take for an entity to reach hearer-old status. Our results reach 10-30% absolute improvement over a majority-class baseline.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing |
Place of Publication | Brussels, Belgium |
Publisher | Association for Computational Linguistics |
Pages | 4350-4359 |
Number of pages | 10 |
Publication status | Published - Nov 2018 |
Event | 2018 Conference on Empirical Methods in Natural Language Processing - Square Meeting Center, Brussels, Belgium Duration: 31 Oct 2018 → 4 Nov 2018 http://emnlp2018.org/ |
Conference
Conference | 2018 Conference on Empirical Methods in Natural Language Processing |
---|---|
Abbreviated title | EMNLP 2018 |
Country/Territory | Belgium |
City | Brussels |
Period | 31/10/18 → 4/11/18 |
Internet address |