Investigating the capabilities and limitations of machine learning for identifying bias in English language data with information and heritage professionals

Lucy Havens*, Benjamin Bach, Melissa Terras, Beatrice Alex

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Despite numerous efforts to mitigate their biases, ML systems continue to harm already-marginalized people. While predominant ML approaches assume bias can be removed and fair models can be created, we show that these are not always possible, nor desirable, goals. We reframe the problem of ML bias by creating models to identify biased language, drawing attention to a dataset’s biases rather than trying to remove them. Then, through a workshop, we evaluated the models for a specific use case: workflows of information and heritage professionals. Our findings demonstrate the limitations of ML for identifying bias due to its contextual nature, the way in which approaches to mitigating it can simultaneously privilege and oppress different communities, and its inevitability. We demonstrate the need to expand ML approaches to bias and fairness, providing a mixed-methods approach to investigating the feasibility of removing bias or achieving fairness in a given ML use case.
Original languageEnglish
Title of host publicationCHI '25: Proceedings of the 2025 CHI Conference on Human Factors in Computing System
EditorsNaomi Yamashita, Vanessa Evers, Koji Yatani, Xianghua (Sharon) Ding, Bongshin Lee, Marshini Chetty, Phoebe Toups-Dugas
PublisherACM
Pages1-22
Number of pages22
ISBN (Electronic)9798400713941
DOIs
Publication statusPublished - 25 Apr 2025
Event2025 Conference on Human Factors in Computing Systems - PACIFICO Yokohama, Yokohama, Japan
Duration: 26 Apr 20251 May 2025
https://chi2025.acm.org/

Conference

Conference2025 Conference on Human Factors in Computing Systems
Abbreviated titleCHI 2025
Country/TerritoryJapan
CityYokohama
Period26/04/251/05/25
Internet address

Keywords / Materials (for Non-textual outputs)

  • human-centred machine learning
  • human-centered AI
  • gender bias
  • bias data
  • language bias
  • cultural heritage

Fingerprint

Dive into the research topics of 'Investigating the capabilities and limitations of machine learning for identifying bias in English language data with information and heritage professionals'. Together they form a unique fingerprint.

Cite this