Contextual Dependencies in Unsupervised Word Segmentation

Sharon Goldwater, Thomas L. Griffiths, Mark Johnson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment speech. We propose two new Bayesian word segmentation methods that assume unigram and bigram models of word dependencies respectively. The bigram model greatly out-performs the unigram model (and previous probabilistic models), demonstrating the importance of such dependencies for word segmentation. We also show that previous probabilistic models rely crucially on sub-optimal search procedures.
Original languageEnglish
Title of host publicationProceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
Place of PublicationSydney, Australia
PublisherAssociation for Computational Linguistics
Pages673-680
Number of pages8
DOIs
Publication statusPublished - 1 Jul 2006

Fingerprint

Dive into the research topics of 'Contextual Dependencies in Unsupervised Word Segmentation'. Together they form a unique fingerprint.

Cite this