Pitfalls and outlooks in using COMET

Vilém Zouhar, Pinzhen Chen, Tsz Kin Lam, Nikita Moghe, Barry Haddow

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

The COMET metric has blazed a trail in the machine translation community, given its strong correlation with human judgements of translation quality.Its success stems from being a modified pre-trained multilingual model finetuned for quality assessment.However, it being a machine learning model also gives rise to a new set of pitfalls that may not be widely known. We investigate these unexpected behaviours from three aspects:1) technical: obsolete software versions and compute precision; 2) data: empty content, language mismatch, and translationese at test time as well as distribution and domain biases in training; 3) usage and reporting: multi-reference support and model referencing in the literature. All of these problems imply that COMET scores are not comparable between papers or even technical setups and we put forward our perspective on fixing each issue.Furthermore, we release the sacreCOMET package that can generate a signature for the software and model configuration as well as an appropriate citation.The goal of this work is to help the community make more sound use of the COMET metric.
Original languageEnglish
Title of host publicationProceedings of the Ninth Conference on Machine Translation
EditorsBarry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
PublisherAssociation for Computational Linguistics
Pages1272-1288
Number of pages17
ISBN (Electronic)9798891761797
Publication statusPublished - 16 Nov 2024
EventNinth Conference on Machine Translation - Miami, United States
Duration: 15 Nov 202416 Nov 2024

Conference

ConferenceNinth Conference on Machine Translation
Abbreviated titleWMT24
Country/TerritoryUnited States
CityMiami
Period15/11/2416/11/24

Fingerprint

Dive into the research topics of 'Pitfalls and outlooks in using COMET'. Together they form a unique fingerprint.

Cite this