Bayesian Inference for PCFGs via Markov Chain Monte Carlo

Mark Johnson, Thomas Griffiths, Sharon Goldwater

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents two Markov chain Monte Carlo (MCMC) algorithms for Bayesian inference of probabilistic context free grammars (PCFGs) from terminal strings, providing an alternative to maximum-likelihood estimation using Inside-Outside algorithm. We illustrate these methods by estimating a sparse grammar describing the morphology of the Bantu language Sesotho, demonstrating that with suitable priors Bayesian techniques can infer linguistic structure in situations where maximum likelihood methods such as Inside-Outside algorithm only produce a trivial grammar.
Original languageEnglish
Title of host publicationHuman Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference
Place of PublicationRochester, New York
PublisherAssociation for Computational Linguistics
Pages139-146
Number of pages8
Publication statusPublished - 1 Apr 2007

Fingerprint

Dive into the research topics of 'Bayesian Inference for PCFGs via Markov Chain Monte Carlo'. Together they form a unique fingerprint.

Cite this