A Bayesian Mixture Model for Part-of-speech Induction Using Multiple Features

Christos Christodoulopoulos, Sharon Goldwater, Mark Steedman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we present a fully unsupervised syntactic class induction system formulated as a Bayesian multinomial mixture model, where each word type is constrained to belong to a single class. By using a mixture model rather than a sequence model (e.g., HMM), we are able to easily add multiple kinds of features, including those at both the type level (morphology features) and token level (context and alignment features, the latter from parallel corpora). Using only context features, our system yields results comparable to state-of-the art, far better than a similar model without the one-class-per-type constraint. Using the additional features provides added benefit, and our final system outperforms the best published results on most of the 25 corpora tested.
Original languageEnglish
Title of host publicationProceedings of the Conference on Empirical Methods in Natural Language Processing
Place of PublicationStroudsburg, PA, USA
PublisherAssociation for Computational Linguistics
Pages638-647
Number of pages10
ISBN (Print)978-1-937284-11-4
Publication statusPublished - 2011

Publication series

NameEMNLP '11
PublisherAssociation for Computational Linguistics

Fingerprint

Dive into the research topics of 'A Bayesian Mixture Model for Part-of-speech Induction Using Multiple Features'. Together they form a unique fingerprint.

Cite this