Abstract
This paper analyzes the translation quality of machine translation systems for 10 language pairs translating between Czech, English, French, German, Hungarian, and Spanish. We report the translation quality of over 30 diverse translation systems based on a large-scale manual evaluation involving hundreds of hours of effort. We use the human judgments of the systems to analyze automatic evaluation metrics for translation quality, and we report the strength of the correlation with human judgments at both the system-level and at the sentence-level. We validate our manual evaluation methodology by measuring intra- and inter-annotator agreement, and collecting timing information.
Original language | English |
---|---|
Title of host publication | Proceedings of the Third Workshop on Statistical Machine Translation (StatMT '08) |
Place of Publication | Stroudsburg, PA, USA |
Publisher | Association for Computational Linguistics |
Pages | 70-106 |
Number of pages | 37 |
ISBN (Print) | 978-1-932432-09-1 |
Publication status | Published - 2008 |