Ten Years of WMT Evaluation Campaigns: Lessons Learnt

Ondrej Bojar, Christian Federmann, Barry Haddow, Philipp Koehn, Matt Post, Lucia Specia

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The WMT evaluation campaign (http://www.statmt.org/wmt16) has been run annually since 2006. It is a collection of shared tasks related to machine translation, in which researchers compare their techniques against those of others in the field. The longest running task in the campaign is the translation task, where participants translate a common test set with their MT systems. In addition to the translation task, we have also included shared tasks on Evaluation: both on automatic metrics (since 2008), which compare the reference to the MT system output, and on quality estimation (since 2012), where system output is evaluated without a reference. An important component of WMT has always been the manual evaluation, wherein human annotators are used to produce the official ranking of the systems in each translation task. This reflects the belief of the WMT organizers that human judgement should be the ultimate arbiter of MT quality. Over the years, we have experimented with different methods of improving the reliability, efficiency and discriminatory power of these Judgements. In this paper we report on our experiences in running this evaluation campaign, the current state of the art in MT evaluation (both human and automatic), and our plans for future editions of WMT.
Original languageEnglish
Title of host publicationProceedings of the LREC 2016 Workshop “Translation Evaluation – From Fragmented Tools and Data Sets to an Integrated Ecosystem”
Pages27-34
Number of pages8
Publication statusPublished - 24 May 2016
EventLREC 2016 Workshop “Translation Evaluation – From Fragmented Tools and Data Sets to an Integrated Ecosystem” - Portorož, Slovenia
Duration: 24 May 201624 May 2016
http://www.cracking-the-language-barrier.eu/mt-eval-workshop-2016/

Conference

ConferenceLREC 2016 Workshop “Translation Evaluation – From Fragmented Tools and Data Sets to an Integrated Ecosystem”
Abbreviated titleLREC 2016
CountrySlovenia
CityPortorož
Period24/05/1624/05/16
Internet address

Cite this