PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation

Liane Guillou, Christian Hardmeier

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We present PROTEST, a test suite for the evaluation of pronoun translation by MT systems. The test suite comprises 250 hand-selected pronoun tokens and an automatic evaluation method which compares the translations of pronouns in MT output with those in the reference translation. Pronoun translations that do not match the reference are referred for manual evaluation. PROTEST is designed to support analysis of system performance at the level of individual pronoun groups, rather than to provide a single aggregate measure over all pronouns. We wish to encourage detailed analyses to highlight issues in the handling of specific linguistic mechanisms by MT systems, thereby contributing to a better understanding of those problems involved in translating pronouns. We present two use cases for PROTEST: a) for measuring improvement/degradation of an incremental system change, and b) for comparing the performance of a group of systems whose design may be largely unrelated. Following the latter use case, we demonstrate the application of PROTEST to the evaluation of the systems submitted to the DiscoMT 2015 shared task on pronoun translation.
Original languageEnglish
Title of host publicationLREC 2016, Tenth International Conference on Language Resources and Evaluation
Place of PublicationPortorož, Slovenia
Number of pages8
Publication statusPublished - May 2016
Event10th edition of the Language Resources and Evaluation Conference - Portorož , Slovenia
Duration: 23 May 201628 May 2016


Conference10th edition of the Language Resources and Evaluation Conference
Abbreviated titleLREC 2016
Internet address


Dive into the research topics of 'PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation'. Together they form a unique fingerprint.

Cite this