Abstract
Parallel corpora are usually a collection of documents which are translations of each other. To be useful in NLP applications such as word alignment or machine translation, they first have to be aligned at the sentence level. This paper is a user study briefly reviewing several sentence aligners and evaluating them based on the performance achieved by the SMT systems trained on their output. We conducted experiments on two language pairs and showed that using a more advanced sentence alignment algorithm may yield gains of 0.5 to 1 BLEU points.
Original language | English |
---|---|
Title of host publication | Workshop on Creating Cross-language Resources for Disconnected Languages and Styles |
Pages | 6-10 |
Number of pages | 5 |
Publication status | Published - 1 May 2012 |
Event | Workshop on Creating Cross-language Resources for Disconnected Languages and Styles - Istanbul, Turkey Duration: 27 May 2012 → 27 May 2012 |
Workshop
Workshop | Workshop on Creating Cross-language Resources for Disconnected Languages and Styles |
---|---|
Country/Territory | Turkey |
City | Istanbul |
Period | 27/05/12 → 27/05/12 |