Improving Statistical MT through Morphological Analysis

Sharon Goldwater, David McClosky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In statistical machine translation, estimating word-to-word alignment probabilities for the translation model can be difficult due to the problem of sparse data: most words in a given corpus occur at most a handful of times. With a highly inflected language such as Czech, this problem can be particularly severe. In addition, much of the morphological variation seen in Czech words is not reflected in either the morphology or syntax of a language like English. In this work, we show that using morphological analysis to modify the Czech input can improve a Czech-English machine translation system. We investigate several different methods of incorporating morphological information, and show that a system that combines these methods yields the best results. Our final system achieves a BLEU score of .333, as compared to .270 for the baseline word-to-word system.
Original languageEnglish
Title of host publicationProceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing
Place of PublicationVancouver, British Columbia, Canada
PublisherAssociation for Computational Linguistics
Pages676-683
Number of pages8
Publication statusPublished - 1 Oct 2005

Cite this