Bootstrapping a Unified Model of Lexical and Phonetic Acquisition

Micha Elsner, Sharon Goldwater, Jacob Eisenstein

Research output: Chapter in Book/Report/Conference proceedingConference contribution


During early language acquisition, infants must learn both a lexicon and a model of phonetics that explains how lexical items can vary in pronunciation—for instance “the” might be realized as [Di] or [D@]. Previous models of acquisition have generally tackled these problems in isolation, yet behavioral evidence suggests infants acquire lexical and phonetic knowledge
simultaneously. We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatoryfeatures. The model is trained ontranscribed surface pronunciations, and learns by bootstrapping, without access to the true lexicon. We test the model using a corpus of child-directed speech with realistic phonetic
variation and either gold standard or automatically induced word boundaries. In both cases modeling variability improves the accuracy of the learned lexicon over a system that assumes each lexical item has a unique pronunciation.
Original languageEnglish
Title of host publicationProceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Place of PublicationJeju Island, Korea
PublisherAssociation for Computational Linguistics
Number of pages10
Publication statusPublished - 1 Jul 2012


Dive into the research topics of 'Bootstrapping a Unified Model of Lexical and Phonetic Acquisition'. Together they form a unique fingerprint.

Cite this