HUME: Human UCCA-Based Evaluation of Machine Translation

Alexandra Birch, Omri Abend, Ondrej Bojar, Barry Haddow

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Human evaluation of machine translation normally uses sentence-level measures such as relative ranking or adequacy scales. However, these provide no insight into possible errors, and do not scale well with sentence length. We argue for a semantics-based evaluation, which captures what meaning components are retained in the MT output, thus providing a more fine-grained analysis of translation quality, and enabling the construction and tuning of semantics-based MT. We present a novel human semantic evaluation measure, Human UCCA-based MT Evaluation (HUME), building on the UCCA semantic representation scheme. HUME covers a wider range of semantic phenomena than previous methods and does not rely on semantic annotation of the potentially garbled MT output. We experiment with four language pairs, demonstrating HUME’s broad applicability, and report good inter-annotator agreement rates and correlation with human adequacy scores.
Original languageEnglish
Title of host publicationProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016
Place of PublicationAustin, Texas, USA
PublisherAssociation for Computational Linguistics (ACL)
Number of pages11
ISBN (Print)978-1-945626-25-8
Publication statusPublished - 5 Nov 2016
Event2016 Conference on Empirical Methods in Natural Language Processing - Austin, United States
Duration: 1 Nov 20165 Nov 2016


Conference2016 Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2016
Country/TerritoryUnited States
Internet address


Dive into the research topics of 'HUME: Human UCCA-Based Evaluation of Machine Translation'. Together they form a unique fingerprint.

Cite this