Abstract
One of the reasons nonparametric Bayesian inference is attracting attention in computational linguistics is because it provides a principled way of learning the units of generalization together with their probabilities. Adaptor grammars are a framework for defining a variety of hierarchical nonparametric Bayesian models. This paper investigates some of the choices that arise in formulating adaptor grammars and associated inference procedures,
and shows that they can have a dramatic impact on performance in an unsupervised word segmentation task. With appropriate adaptor grammars and inference procedures we achieve an 87% word token f-score
on the standard Brent version of the Bernstein- Ratner corpus, which is an error reduction of over 35% over the best previously reported results
for this corpus.
and shows that they can have a dramatic impact on performance in an unsupervised word segmentation task. With appropriate adaptor grammars and inference procedures we achieve an 87% word token f-score
on the standard Brent version of the Bernstein- Ratner corpus, which is an error reduction of over 35% over the best previously reported results
for this corpus.
Original language | English |
---|---|
Title of host publication | Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics |
Place of Publication | Boulder, Colorado |
Publisher | Association for Computational Linguistics |
Pages | 317-325 |
Number of pages | 9 |
Publication status | Published - 1 Jun 2009 |