Abstract / Description of output
Acoustic models are susceptible to the difference in acoustic characteristics between the training distribution and test distributions. Accent variability is a challenging source of variability, and the variations within one accent often do not generalize to others. Consequently, end-to-end models that have only transcriptions as linguistic information need high amounts of data to learn how different accents realize their sounds. To aid with recognition of accented speech, we make use of an accent independent abstraction of phonemes, often called metaphonemes. We force our models to learn hidden representations that are correlated to metaphonemes using multi-task training. Our aim is to obtain a model that is more robust to accented speech and, can, at the same time, adapt faster to different accents through the learned structure. Our experiments on the Common Voice corpus show better generalization when making use of this additional linguistic information, with a word error rate reduction of up to 12.6% when compared to the baseline. Furthermore, the relative improvement when adapting an existing model by making use of the metaphonemes is higher than using Byte Pair Encodings alone.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 803-810 |
Number of pages | 8 |
ISBN (Electronic) | 978-1-6654-3739-4, 978-1-6654-3738-7 |
ISBN (Print) | 978-1-6654-3740-0 |
DOIs | |
Publication status | Published - 3 Feb 2022 |
Event | IEEE Automatic Speech Recognition and Understanding Workshop 2021 - Cartagena, Colombia Duration: 13 Dec 2021 → 17 Dec 2021 https://asru2021.signalprocessingsociety.org/asru2021.org/index.html |
Workshop
Workshop | IEEE Automatic Speech Recognition and Understanding Workshop 2021 |
---|---|
Abbreviated title | ASRU 2021 |
Country/Territory | Colombia |
City | Cartagena |
Period | 13/12/21 → 17/12/21 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- speech recognition
- acoustic model adaptation
- accent adaptation
- end-to-end models