Bayesian Word Sense Induction

Samuel Brody, Mirella Lapata

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Sense induction seeks to automatically identify word senses directly from a corpus. A key assumption underlying previous work is that the context surrounding an ambiguous word is indicative of its meaning. Sense induction is thus typically viewed as an unsupervised clustering problem where the aim is to partition a word’s contexts into different classes, each representing a word sense. Our work places sense induction in a Bayesian context by modeling the contexts of the ambiguous word as samples from a multinomial distribution over senses which are in turn characterized as distributions over words. The Bayesian framework provides a principled way to incorporate a wide range of features beyond lexical co-occurrences and to systematically assess their utility on the sense induction task.
The proposed approach yields improvements over state-of-the-art systems on a benchmark dataset.
Original languageEnglish
Title of host publicationEACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Athens, Greece, March 30 - April 3, 2009
PublisherAssociation for Computational Linguistics
Pages103-111
Number of pages9
Publication statusPublished - 2009

Fingerprint

Dive into the research topics of 'Bayesian Word Sense Induction'. Together they form a unique fingerprint.

Cite this