Leveraging Linguistic Knowledge for Accent Robustness of End-to-End Models

Andrea Carmantini, Steve Renals, Peter Bell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Acoustic models are susceptible to the difference in acoustic characteristics between the training distribution and test distributions. Accent variability is a challenging source of variability, and the variations within one accent often do not generalize to others. Consequently, end-to-end models that have only transcriptions as linguistic information need high amounts of data to learn how different accents realize their sounds. To aid with recognition of accented speech, we make use of an accent independent abstraction of phonemes, often called metaphonemes. We force our models to learn hidden representations that are correlated to metaphonemes using multi-task training. Our aim is to obtain a model that is more robust to accented speech and, can, at the same time, adapt faster to different accents through the learned structure. Our experiments on the Common Voice corpus show better generalization when making use of this additional linguistic information, with a word error rate reduction of up to 12.6% when compared to the baseline. Furthermore, the relative improvement when adapting an existing model by making use of the metaphonemes is higher than using Byte Pair Encodings alone.
Original languageEnglish
Title of host publicationProceedings of the 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
PublisherInstitute of Electrical and Electronics Engineers
Pages803-810
Number of pages8
ISBN (Electronic)978-1-6654-3739-4, 978-1-6654-3738-7
ISBN (Print)978-1-6654-3740-0
DOIs
Publication statusPublished - 3 Feb 2022
EventIEEE Automatic Speech Recognition and Understanding Workshop 2021 - Cartagena, Colombia
Duration: 13 Dec 202117 Dec 2021
https://asru2021.signalprocessingsociety.org/asru2021.org/index.html

Workshop

WorkshopIEEE Automatic Speech Recognition and Understanding Workshop 2021
Abbreviated titleASRU 2021
Country/TerritoryColombia
CityCartagena
Period13/12/2117/12/21
Internet address

Keywords / Materials (for Non-textual outputs)

  • speech recognition
  • acoustic model adaptation
  • accent adaptation
  • end-to-end models

Fingerprint

Dive into the research topics of 'Leveraging Linguistic Knowledge for Accent Robustness of End-to-End Models'. Together they form a unique fingerprint.

Cite this