Embedding Words as Distributions with a Bayesian Skip-gram Model

Arthur Brazinskas, Serhii Havrylov, Ivan Titov

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We introduce a method for embedding words as probability densities in a low-dimensional space. Rather than assuming that a word embedding is fixed across the entire text collection, as in standard word embedding methods, in our Bayesian model we generate it from a word-specific prior density for each occurrence of a given word. Intuitively, for each word, the prior density encodes the distribution of its potential ‘meanings’. These prior densities are conceptually similar to Gaussian embeddings of Vilnis and McCallum (2015). Interestingly, unlike the Gaussian embeddings, we can also obtain context-specific densities: they encode uncertainty about the sense of a word given its context and correspond to the approximate posterior distributions within our model. The context-dependent densities have many potential applications: for example, we show that they can be directly used in the lexical substitution task. We describe an effective estimation method based on the variational autoencoding framework and demonstrate the effectiveness of our embedding technique on a range of standard benchmarks.
Original languageEnglish
Title of host publicationProceedings of the 27th International Conference on Computational Linguistics
PublisherAssociation for Computational Linguistics (ACL)
Pages1775–1789
Number of pages13
ISBN (Print)978-1-948087-50-6
Publication statusPublished - 31 Aug 2018
Event27th International Conference on Computational Linguistics - Sante Fe, United States
Duration: 20 Aug 201825 Aug 2018
http://coling2018.org/

Conference

Conference27th International Conference on Computational Linguistics
Abbreviated titleCOLING 2018
Country/TerritoryUnited States
CitySante Fe
Period20/08/1825/08/18
Internet address

Fingerprint

Dive into the research topics of 'Embedding Words as Distributions with a Bayesian Skip-gram Model'. Together they form a unique fingerprint.

Cite this