Priors in Bayesian Learning of Phonological Rules

Sharon Goldwater, Mark Johnson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes a Bayesian procedure for unsupervised learning of phonological rules from an unlabeled corpus of training data. Like Goldsmith's Linguistica program (Goldsmith, 2004b), whose output is taken as the starting point of this procedure, our learner returns a grammar that consists of a set of signatures, each of which consists of a set of stems and a set of suffixes.  Our grammars differ from Linguistica's in that they also contain a set of phonological rules, which permit our grammars to collapse far more words into a signature than Linguistica can.  Interestingly, the choice of a Bayesian prior turns out to be crucial for obtaining a learner that makes linguistically appropriate generalizations through a range of different sized training corpora.

Original languageEnglish
Title of host publicationProceedings of the Seventh Meeting of the ACL Special Interest Group in Computational Phonology
Place of PublicationBarcelona, Spain
PublisherAssociation for Computational Linguistics
Pages35-42
Number of pages8
Publication statusPublished - 1 Jul 2004

Fingerprint

Dive into the research topics of 'Priors in Bayesian Learning of Phonological Rules'. Together they form a unique fingerprint.

Cite this