Diachronic Embeddings for People in the News

Felix Hennig, Steven R. Wilson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Previous English-language diachronic change models based on word embeddings have typically used single tokens to represent entities, including names of people. This leads to issues with both ambiguity (resulting in one embedding representing several distinct and unrelated people) and unlinked references (leading to several distinct embeddings which represent the same person). In this paper, we show that using named entity recognition and heuristic name linking steps before training a diachronic embedding model leads to more accurate representations of references to people, as compared to the token-only baseline. In large news corpus of articles from The Guardian, we provide examples of several types of analysis that can be performed using these new embeddings. Further, we show that real world events and context changes can be detected using our proposed model.
Original languageEnglish
Title of host publicationProceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science
PublisherAssociation for Computational Linguistics (ACL)
Pages173-183
Number of pages11
ISBN (Electronic)978-1-952148-80-4
Publication statusPublished - 20 Nov 2020
EventFourth Natural Language Processing and Computational Social Science Workshop @ EMNLP 2020 - Virtual event
Duration: 20 Nov 202020 Nov 2020
https://sites.google.com/site/nlpandcss/home?authuser=0

Workshop

WorkshopFourth Natural Language Processing and Computational Social Science Workshop @ EMNLP 2020
Abbreviated titleNLP+CSS 2020
CityVirtual event
Period20/11/2020/11/20
Internet address

Cite this