Extrinsic evaluation of sentence alignment systems

Sadaf Abdul-Rauf, Mark Fishel, Patrik Lambert, Sandra Noubours, Rico Sennrich

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Parallel corpora are usually a collection of documents which are translations of each other. To be useful in NLP applications such as word alignment or machine translation, they first have to be aligned at the sentence level. This paper is a user study briefly reviewing several sentence aligners and evaluating them based on the performance achieved by the SMT systems trained on their output. We conducted experiments on two language pairs and showed that using a more advanced sentence alignment algorithm may yield gains of 0.5 to 1 BLEU points.
Original languageEnglish
Title of host publicationWorkshop on Creating Cross-language Resources for Disconnected Languages and Styles
Pages6-10
Number of pages5
Publication statusPublished - 1 May 2012
EventWorkshop on Creating Cross-language Resources for Disconnected Languages and Styles - Istanbul, Turkey
Duration: 27 May 201227 May 2012

Workshop

WorkshopWorkshop on Creating Cross-language Resources for Disconnected Languages and Styles
Country/TerritoryTurkey
CityIstanbul
Period27/05/1227/05/12

Fingerprint

Dive into the research topics of 'Extrinsic evaluation of sentence alignment systems'. Together they form a unique fingerprint.

Cite this