What the Vec? Towards Probabilistically Grounded Embeddings

Carl Allen, Ivana Balazevic, Timothy Hospedales

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Word2Vec (W2V) and GloVe are popular, fast and efficient word embedding algorithms. Their embeddings are widely used and perform well on a variety of natural language processing tasks. Moreover, W2V has recently been adopted in the field of graph embedding, where it underpins several leading algorithms. However, despite their ubiquity and relatively simple model architecture, a theoretical understanding of what the embedding parameters of W2V and GloVe learn and why that it useful in downstream tasks has been lacking. We show that different interactions between PMI vectors reflect semantic word relationships, such as similarity and paraphrasing, that are encoded in low dimensional word embeddings under a suitable projection, theoretically explaining why embeddings of W2V and GloVe work. As a consequence, we also reveal an interesting mathematical interconnection between the considered semantic relationships themselves.
Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems (NIPS 2019)
PublisherCurran Associates Inc
Pages7465-7475
Number of pages13
Volume32
Publication statusPublished - 14 Dec 2019
Event33rd Conference on Neural Information Processing Systems - Vancouver Convention Centre, Vancouver, Canada
Duration: 8 Dec 201914 Dec 2019
https://neurips.cc/

Conference

Conference33rd Conference on Neural Information Processing Systems
Abbreviated titleNeurIPS 2019
Country/TerritoryCanada
CityVancouver
Period8/12/1914/12/19
Internet address

Fingerprint

Dive into the research topics of 'What the Vec? Towards Probabilistically Grounded Embeddings'. Together they form a unique fingerprint.

Cite this