Europarl: A Parallel Corpus for Statistical Machine Translation

Philipp Koehn

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We collected a corpus of parallel text in 11 languages from the proceedings of the European Parliament, which are published on the web1. This corpus has found widespread use in the NLP community. Here, we focus on its acquisition and its application as training data for statistical machine translation (SMT). We trained SMT systems for 110 language pairs, which reveal interesting clues into the challenges ahead.
Original languageEnglish
Title of host publicationThe Tenth Machine Translation Summit Proceedings of Conference
EditorsJohn Hutchins
PublisherInternational Association for Machine Translation
Pages79-86
Number of pages8
Publication statusPublished - 2005

Fingerprint

Dive into the research topics of 'Europarl: A Parallel Corpus for Statistical Machine Translation'. Together they form a unique fingerprint.

Cite this