Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering

Philipp Koehn, Huda Khayrallah, Kenneth Heafield, Mikel L. Forcada

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We posed the shared task of assigning sentence-level quality scores for a very noisy corpus of sentence pairs crawled from the web, with the goal of sub-selecting 1% and 10% of high-quality data to be used to train machine translation systems. Seventeen participants from companies, national research labs, and universities participated in this task.
Original languageEnglish
Title of host publicationProceedings of the Third Conference on Machine Translation: Shared Task Papers
Place of PublicationBelgium, Brussels
PublisherAssociation for Computational Linguistics
Pages726-739
Number of pages14
DOIs
Publication statusPublished - 31 Oct 2018
EventEMNLP 2018 Third Conference on Machine Translation (WMT18) - Brussels, Belgium
Duration: 31 Oct 20181 Nov 2018
http://www.statmt.org/wmt18/

Workshop

WorkshopEMNLP 2018 Third Conference on Machine Translation (WMT18)
Abbreviated titleWMT18
Country/TerritoryBelgium
CityBrussels
Period31/10/181/11/18
Internet address

Fingerprint

Dive into the research topics of 'Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering'. Together they form a unique fingerprint.

Cite this