Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora

Trevor Cohn, Mirella Lapata

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Current phrase-based SMT systems perform poorly when using small training sets. This is a consequence of unreliable translation estimates and low coverage over source and target phrases. This paper presents a method which alleviates this problem by exploiting multiple translations of the same source phrase. Central to our approach is triangulation, the process of translating from a source to a target language via an intermediate third language. This allows the use of a much wider range of parallel corpora for training, and can be combined with a standard phrase-table using conventional smoothing methods. Experimental results demonstrate BLEU improvements for triangulated models over a standard phrase-based system.
Original languageEnglish
Title of host publicationACL 2007, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, June 23-30, 2007, Prague, Czech Republic
Pages728-735
Number of pages8
Publication statusPublished - 2007

Fingerprint

Dive into the research topics of 'Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora'. Together they form a unique fingerprint.

Cite this