On the Limits of Minimal Pairs in Contrastive Evaluation

Jannis Vamvas, Rico Sennrich

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Minimal sentence pairs are frequently used to analyze the behavior of language models. It is often assumed that model behavior on contrastive pairs is predictive of model behavior at large. We argue that two conditions are necessary for this assumption to hold: First, a tested hypothesis should be well-motivated, since experiments show that contrastive evaluation can lead to false positives. Secondly, test data should be chosen such as to minimize distributional discrepancy between evaluation time and deployment time. For a good approximation of deployment-time decoding, we recommend that minimal pairs are created based on machine-generated text, as opposed to human-written references. We present a contrastive evaluation suite for English--German MT that implements this recommendation.
Original languageEnglish
Title of host publicationProceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
Place of PublicationPunta Cana, Dominican Republic
PublisherAssociation for Computational Linguistics
Number of pages11
ISBN (Electronic)978-1-955917-06-3
Publication statusPublished - 11 Nov 2021
EventBlackboxNLP 2021: Analyzing and interpreting neural networks for NLP - Virtual
Duration: 11 Nov 202111 Nov 2021


ConferenceBlackboxNLP 2021
Internet address


Dive into the research topics of 'On the Limits of Minimal Pairs in Contrastive Evaluation'. Together they form a unique fingerprint.

Cite this