Embedding Words as Distributions with a Bayesian Skip-gram Model

Arthur Bražinskas, Serhii Havrylov, Ivan Titov

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Distributed representations induced from large unlabeled text collections have had a large impact on many natural language processing (NLP) applications, providing an effective and simple way of dealing with data sparsity. Word embedding methods [1, 2, 3, 4] typically represent words as vectors in a low-dimensional space. In contrast, we encode them as probability densities. Intuitively, the densities will represent the distributions over possible ‘meanings’ of the word. Representing a word as a distribution has many attractive properties. For example, this lets us encode generality of terms (e.g., ‘animal’ is a hypernym of ‘dog’), characterize uncertainty about their meaning (e.g., a proper noun, such as ’John’, encodes little about the person it refers to) or represent polysemy (e.g., ’tip’ may refer to a gratuity or a sharp edge of an object). Capturing entailment (e.g., ‘run’ entails ’move’) is especially important as it needs to be explicitly or implicitly accounted for in many NLP applications (e.g., question answering or summarization). Intuitively, distributions provide a natural way of encoding entailment: the entailment decision can be made by testing the level sets of the distributions for ‘soft inclusion’(e.g., using the KL divergence [5]).
Original languageEnglish
Title of host publicationNIPS Bayesian Deep Learning Workshop 2016
Number of pages3
Publication statusPublished - 10 Dec 2016
EventNIPS Bayesian Deep Learning Workshop 2016 - Centre Convencions Internacional Barcelona, Barcelona, Spain
Duration: 10 Dec 201610 Dec 2016


ConferenceNIPS Bayesian Deep Learning Workshop 2016
Abbreviated titleNIPS 2016
Internet address

Fingerprint Dive into the research topics of 'Embedding Words as Distributions with a Bayesian Skip-gram Model'. Together they form a unique fingerprint.

Cite this