Iterative, MT-based sentence alignment of parallel texts

Rico Sennrich, Martin Volk

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent research has shown that MT-based sentence alignment is a robust approach for noisy parallel texts. However, using Machine Translation for sentence alignment causes a chicken-and-egg problem: to train a corpus-based MT system, we need sentence-aligned data, and MT-based sentence alignment depends on an MT system. We describe a bootstrapping approach to sentence alignment that resolves this circular dependency by computing an initial alignment with length-based methods. Our evaluation shows that iterative MT-based sentence alignment significantly outperforms widespread alignment approaches on our evaluation set, without requiring any linguistic resources other than the to-be-aligned bitext.
Original languageEnglish
Title of host publicationNODALIDA 2011, Nordic Conference of Computational Linguistics
PublisherNorthern European Association for Language Technology (NEALT)
Publication statusPublished - 1 May 2011
EventThe 18th Nordic Conference of Computational Linguistics - Riga, Latvia
Duration: 11 May 201113 May 2011

Conference

ConferenceThe 18th Nordic Conference of Computational Linguistics
Country/TerritoryLatvia
CityRiga
Period11/05/1113/05/11

Fingerprint

Dive into the research topics of 'Iterative, MT-based sentence alignment of parallel texts'. Together they form a unique fingerprint.

Cite this