Abstract
Recent research has shown that MT-based sentence alignment is a robust approach for noisy parallel texts. However, using Machine Translation for sentence alignment causes a chicken-and-egg problem: to train a corpus-based MT system, we need sentence-aligned data, and MT-based sentence alignment depends on an MT system. We describe a bootstrapping approach to sentence alignment that resolves this circular dependency by computing an initial alignment with length-based methods. Our evaluation shows that iterative MT-based sentence alignment significantly outperforms widespread alignment approaches on our evaluation set, without requiring any linguistic resources other than the to-be-aligned bitext.
Original language | English |
---|---|
Title of host publication | NODALIDA 2011, Nordic Conference of Computational Linguistics |
Publisher | Northern European Association for Language Technology (NEALT) |
Publication status | Published - 1 May 2011 |
Event | The 18th Nordic Conference of Computational Linguistics - Riga, Latvia Duration: 11 May 2011 → 13 May 2011 |
Conference
Conference | The 18th Nordic Conference of Computational Linguistics |
---|---|
Country/Territory | Latvia |
City | Riga |
Period | 11/05/11 → 13/05/11 |