The evolution of scalar terms’ semantic structure

Fausto Carcassi, Marieke Schouwstra, Simon Kirby

Research output: Contribution to conferenceAbstractpeer-review

Abstract

Large diachronic text corpora enable a data-based approach to the evolution and dynamics of language. The development of unsupervised methods for the inference of semantic similarity, semantic change and polysemy from such large datasets mean that, in addition to measuring orthographic similarity or counting frequencies (e.g., Petersen et al. 2012, Bochkarev et al. 2015), it is also possible to measure meaning and therefore semantic evolution over time. This has given rise to a body of work – notably, scattered across multiple disciplines and thus at times being carried out in parallel – dealing with the postulation and evaluation of trends or laws in language dynamics relating to semantic change (e.g., Dubossarsky et al. 2016, Hamilton et al. 2016, Xu and Kemp 2015).
However, concerns have been raised regarding these corpus-based approaches, arising from the inherent sampling biases of corpora (Pechenick et al. 2015), the influence of world events on the composition of topics in corpora (Chelsey and Baayen 2010, Lijffijt et al. 2012, Szmrecsanyi 2016), and most recently, methodological problems arising from diachronic applications of distributed semantics methods, shown to be more closely tied to frequency (and frequency change) than previously assumed (Dubossarsky et al. 2017). Additionally, there is a lack of gold standard datasets to evaluate the performance of automatic semantic change measures (with the exception of some small test sets, e.g., Gulordava et al. 2011, Schlechtweg et al. 2017).
We review these developments and propose solutions to two of the aforementioned issues. We demonstrate a simple model capable of controlling for topical fluctuations in a corpus, and show that it describes a considerable amount of variance in diachronic word frequency changes. Furthermore, we discuss a tentative approach to control for potentially frequency-biased results of semantic change measures, demonstrating its utility using simulations of change on artificially composed corpora, providing a controlled test of our technique.
Original languageEnglish
Pages494
Publication statusPublished - 31 Aug 2018
EventSocietas Linguistica Europaea 51th annual meeting - Tallinn, Estonia
Duration: 29 Aug 20181 Sep 2018

Conference

ConferenceSocietas Linguistica Europaea 51th annual meeting
Country/TerritoryEstonia
CityTallinn
Period29/08/181/09/18

Fingerprint

Dive into the research topics of 'The evolution of scalar terms’ semantic structure'. Together they form a unique fingerprint.

Cite this