Biases in Large Language Models: Origins, Inventory and Discussion

Roberto Navigli*, Simone Conia, Björn Ross

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

In this paper, we introduce and discuss the pervasive issue of bias in the large language models that are currently at the core of mainstream approaches to Natural Language Processing (NLP). We first introduce data selection bias, that is, the bias caused by the choice of texts that make up a training corpus. Then, we survey the different types of social bias evidenced in the text generated by language models trained on such corpora, ranging from gender to age, from sexual orientation to ethnicity, and from religion to culture. We conclude with directions focused on measuring, reducing, and tackling the aforementioned types of bias.
Original languageEnglish
Article number10
Pages (from-to)1-21
JournalJournal of Data and Information Quality
Volume15
Issue number2
DOIs
Publication statusPublished - 22 Jun 2023

Keywords / Materials (for Non-textual outputs)

  • bias in NLP
  • language models

Fingerprint

Dive into the research topics of 'Biases in Large Language Models: Origins, Inventory and Discussion'. Together they form a unique fingerprint.

Cite this