Mind the data gap(s): Investigating power in speech and language datasets

Nina Markl

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Algorithmic oppression is an urgent and persistent problem in speech and language technologies. Considering power relations embedded in datasets before compiling or using them to train or test speech and language technologies is essential to designing less harmful, more just technologies. This paper presents a reflective exercise to recognise and challenge gaps and the power relations they reveal in speech and language datasets by applying principles of Data Feminism and Design Justice, and building on work on dataset documentation and sociolinguistics.
Original languageEnglish
Title of host publicationProceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
EditorsBharathi Raja Chakravarthi, B Bharathi, John P. McCrae, Manel Zarrouk, Kalika Bali, Paul Buitelaar
Place of PublicationDublin, Ireland
PublisherAssociation for Computational Linguistics
Pages1-12
Number of pages12
ISBN (Electronic) 978-1-955917-43-8
DOIs
Publication statusPublished - 3 Jun 2022
Event2nd Workshop on Language Technology for Equality, Diversity, Inclusion 2022 - Dublin, Ireland
Duration: 27 May 202227 May 2022
Conference number: 2
https://sites.google.com/view/lt-edi-2022/home

Workshop

Workshop2nd Workshop on Language Technology for Equality, Diversity, Inclusion 2022
Abbreviated titleLT-EDI 2022
Country/TerritoryIreland
CityDublin
Period27/05/2227/05/22
Internet address

Cite this