(Meta-) Evaluation of Machine Translation

Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz, Josh Schroeder

Research output: Chapter in Book/Report/Conference proceedingConference contribution


This paper evaluates the translation quality of machine translation systems for 8 language pairs: translating French, German, Spanish, and Czech to English and back. We carried out an extensive human evaluation which allowed us not only to rank the different MT systems, but also to perform higher-level analysis of the evaluation process. We measured timing and intra- and inter-annotator agreement for three types of subjective evaluation. We measured the correlation of automatic evaluation metrics with human judgments. This meta-evaluation reveals surprising facts about the most commonly used methodologies.
Original languageEnglish
Title of host publicationProceedings of the Second Workshop on Statistical Machine Translation
Place of PublicationStroudsburg, PA, USA
PublisherAssociation for Computational Linguistics
Number of pages23
Publication statusPublished - 2007

Cite this