Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars

Mark Johnson, Sharon Goldwater

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

One of the reasons nonparametric Bayesian inference is attracting attention in computational linguistics is because it provides a principled way of learning the units of generalization together with their probabilities. Adaptor grammars are a framework for defining a variety of hierarchical nonparametric Bayesian models. This paper investigates some of the choices that arise in formulating adaptor grammars and associated inference procedures,
and shows that they can have a dramatic impact on performance in an unsupervised word segmentation task. With appropriate adaptor grammars and inference procedures we achieve an 87% word token f-score
on the standard Brent version of the Bernstein- Ratner corpus, which is an error reduction of over 35% over the best previously reported results
for this corpus.
Original languageEnglish
Title of host publicationProceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Place of PublicationBoulder, Colorado
PublisherAssociation for Computational Linguistics
Pages317-325
Number of pages9
Publication statusPublished - 1 Jun 2009

Fingerprint

Dive into the research topics of 'Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars'. Together they form a unique fingerprint.

Cite this