Leveraging Linguistic Knowledge for Accent Robustness of End-to-End Models

Andrea Carmantini, Steve Renals, Peter Bell

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Acoustic models are susceptible to the difference in acoustic characteristics between the training distribution and test distributions. Accent variability is a challenging source of variability, and the variations within one accent often do not generalize to others. Consequently, end-to-end models that have only transcriptions as linguistic information need high amounts of data to learn how different accents realize their sounds. To aid with recognition of accented speech, we make use of an accent independent abstraction of phonemes, often called metaphonemes. We force our models to learn hidden representations that are correlated to metaphonemes using multi-task training. Our aim is to obtain a model that is more robust to accented speech and, can, at the same time, adapt faster to different accents through the learned structure. Our experiments on the Common Voice corpus show better generalization when making use of this additional linguistic information, with a word error rate reduction of up to 12.6% when compared to the baseline. Furthermore, the relative improvement when adapting an existing model by making use of the metaphonemes is higher than using Byte Pair Encodings alone.
Original languageEnglish
Title of host publicationProceedings of the 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
Number of pages8
ISBN (Electronic)978-1-6654-3739-4, 978-1-6654-3738-7
ISBN (Print)978-1-6654-3740-0
Publication statusPublished - 3 Feb 2022
EventIEEE Automatic Speech Recognition and Understanding Workshop 2021 - Cartagena, Colombia
Duration: 13 Dec 202117 Dec 2021


WorkshopIEEE Automatic Speech Recognition and Understanding Workshop 2021
Abbreviated titleASRU 2021
Internet address


  • speech recognition
  • acoustic model adaptation
  • accent adaptation
  • end-to-end models


Dive into the research topics of 'Leveraging Linguistic Knowledge for Accent Robustness of End-to-End Models'. Together they form a unique fingerprint.

Cite this