Language bias in self-supervised learning for automatic speech recognition

Edward Storey, Naomi Harte, Peter Bell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Self-supervised learning (SSL) is used in deep learning to train on large datasets without the need for expensive labelling of the data. Recently, large Automatic Speech Recognition (ASR) models such as XLS-R have utilised SSL to train on over one hundred different languages simultaneously. However, deeper investigation shows that the bulk of the training data for XLS-R comes from a small number of languages. Biases learned through SSL have been shown to exist in multiple domains, but language bias in multilingual SSL ASR has not been thoroughly examined. In this paper, we utilise the Lottery Ticket Hypothesis (LTH) to identify language-specific subnetworks within XLS-R and test the performance of these subnetworks on a variety of different languages. We are able to show that when fine-tuning, XLS-R bypasses traditional linguistic knowledge and builds only on weights learned from the languages with the largest data contribution to the pre-training data.
Original languageEnglish
Title of host publicationProceedings of the 2024 IEEE Spoken Language Technology Workshop
PublisherInstitute of Electrical and Electronics Engineers
Pages1-6
Number of pages6
Publication statusAccepted/In press - 30 Aug 2024
EventIEEE Spoken Language Technology Workshop 2024 - Banyan Tree Macau, Macau, China
Duration: 2 Dec 20245 Dec 2024
https://2024.ieeeslt.org

Publication series

NameProceedings of the IEEE Spoken Language Technology Workshop
PublisherIEEE
ISSN (Print)2639-5479

Conference

ConferenceIEEE Spoken Language Technology Workshop 2024
Abbreviated titleSLT 2024
Country/TerritoryChina
CityMacau
Period2/12/245/12/24
Internet address

Keywords / Materials (for Non-textual outputs)

  • speech recognition
  • self-supervised learning
  • language bias
  • language-specific subnetworks
  • model pruning

Fingerprint

Dive into the research topics of 'Language bias in self-supervised learning for automatic speech recognition'. Together they form a unique fingerprint.

Cite this