The Feasibility of HMEANT as a Human MT Evaluation Metric

Alexandra Birch, Barry Haddow, Ulrich Germann, Maria Nadejde, Christian Buck, Philipp Koehn

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

There has been a recent surge of interest in semantic machine translation, which standard automatic metrics struggle to evaluate. A family of measures called MEANT has been proposed which uses semantic role labels (SRL) to overcome this problem. The human variant, HMEANT, has largely been evaluated using correlation with human contrastive evaluations, the standard human evaluation metric for the WMT shared tasks. In this paper we claim that for a human metric to be useful, it needs to be evaluated on intrinsic properties. It needs to be reliable; it needs to work across different language pairs; and it needs to be lightweight. Most importantly, however, a human metric must be discerning. We conclude that HMEANT is a step in the right direction, but has some serious flaws. The reliance on verbs as heads of frames, and the assumption that annotators need minimal guidelines are particularly problematic.
Original languageEnglish
Title of host publicationProceedings of the Eighth Workshop on Statistical Machine Translation
Place of PublicationSofia, Bulgaria
PublisherAssociation for Computational Linguistics
Pages52-61
Number of pages10
Publication statusPublished - 1 Aug 2013

Fingerprint

Dive into the research topics of 'The Feasibility of HMEANT as a Human MT Evaluation Metric'. Together they form a unique fingerprint.

Cite this