Cross-lingual Transfer Can Worsen Bias in Sentiment Analysis

Seraphina Goldfarb-Tarrant, Björn Ross, Adam Lopez

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Sentiment analysis (SA) systems are widely deployed in many of the world's languages, and there is well-documented evidence of demographic bias in these systems. In languages beyond English, scarcer training data is often supplemented with transfer learning using pre-trained models, including multilingual models trained on other languages. In some cases, even supervision data comes from other languages. Does cross-lingual transfer also import new biases? To answer this question, we use counterfactual evaluation to test whether gender or racial biases are imported when using cross-lingual transfer, compared to a monolingual transfer setting. Across five languages, we find that systems using cross-lingual transfer usually become more biased than their monolingual counterparts. We also find racial biases to be much more prevalent than gender biases. To spur further research on this topic, we release the sentiment models we used for this study, and the intermediate checkpoints throughout training, yielding 1,525 distinct models; we also release our evaluation code.
Original languageEnglish
Title of host publicationThe 2023 Conference on Empirical Methods in Natural Language Processing
PublisherAssociation for Computational Linguistics
Pages5691–5704
Number of pages14
ISBN (Electronic)979-8-89176-060-8
DOIs
Publication statusPublished - 6 Dec 2023
EventThe 2023 Conference on Empirical Methods in Natural Language Processing - , Singapore
Duration: 6 Dec 202310 Dec 2023
https://2023.emnlp.org/

Conference

ConferenceThe 2023 Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2023
Country/TerritorySingapore
Period6/12/2310/12/23
Internet address

Fingerprint

Dive into the research topics of 'Cross-lingual Transfer Can Worsen Bias in Sentiment Analysis'. Together they form a unique fingerprint.

Cite this