A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability

Micha Elsner, Sharon Goldwater, Naomi Feldman, Frank Wood

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a cognitive model of early lexical acquisition which jointly performs word segmentation and learns an explicit model of phonetic variation. We define the model as a Bayesian noisy channel; we sample segmentations and word forms simultaneously from the posterior, using beam sampling to control the size of the search space. Compared to a pipelined approach in which segmentation is performed first, our model is qualitatively more similar to human learners. On data with variable pronunciations, the pipelined approach learns to treat syllables or morphemes as words. In contrast, our joint model, like infant learners, tends to learn multiword collocations. We also conduct analyses of the phonetic variations that the model learns to accept and its patterns of word recognition errors, and relate these to developmental evidence.
Original languageEnglish
Title of host publicationProceedings of the Conference on Empirical Methods in Natural Language Processing
PublisherAssociation for Computational Linguistics
Pages42-54
Number of pages13
ISBN (Print)978-1-937284-97-8
Publication statusPublished - 2013

Fingerprint

Dive into the research topics of 'A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability'. Together they form a unique fingerprint.

Cite this