Statistical Phrase-based Translation

Philipp Koehn, Franz Josef Och, Daniel Marcu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a new phrase-based translation model and decoding algorithm that enables us to evaluate and compare several, previously proposed phrase-based translation models. Within our framework, we carry out a large number of experiments to understand better and explain why phrase-based models out-perform word-based models. Our empirical results, which hold for all examined language pairs, suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translations. Surprisingly, learning phrases longer than three words and learning phrases from high-accuracy word-level alignment models does not have a strong impact on performance. Learning only syntactically motivated phrases degrades the performance of our systems.
Original languageEnglish
Title of host publicationProceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Place of PublicationStroudsburg, PA, USA
PublisherAssociation for Computational Linguistics
Pages48-54
Number of pages7
DOIs
Publication statusPublished - 2003
Event2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Langauge Technology (HLT-NAACL 2003) - Edmonton, Canada
Duration: 27 May 20031 Jun 2003

Publication series

NameNAACL '03
PublisherAssociation for Computational Linguistics

Conference

Conference2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Langauge Technology (HLT-NAACL 2003)
Country/TerritoryCanada
CityEdmonton
Period27/05/031/06/03

Fingerprint

Dive into the research topics of 'Statistical Phrase-based Translation'. Together they form a unique fingerprint.

Cite this