Abstract
This paper seeks to improve the performance of automatic speech recognition (ASR) systems operating on code-switched speech. Code-switching refers to the alternation of languages within a conversation, a phenomenon that is of increasing importance considering the rapid rise in the number of bilingual speakers in the world. It is particularly challenging for ASR owing to the relative scarcity of code-switching speech and text data, even when the individual languages are themselves well-resourced. This paper proposes to overcome this challenge by applying linguistic theories in order to generate more realistic code-switching text, necessary for language modelling in ASR. Working with English-Spanish code-switching, we find that Equivalence Constraint theory and part-of-speech labelling are particularly helpful for text generation, and bring 2% improvement to ASR performance.
Original language | English |
---|---|
Title of host publication | Proceedings of the 29th International Conference on Computational Linguistics |
Editors | Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na |
Publisher | ACL Anthology |
Pages | 7171-7176 |
Number of pages | 6 |
Volume | 29 |
Publication status | Published - 3 Nov 2022 |
Event | The 29th International Conference on Computational Linguistics, 2022 - Gyeongju, Korea, Republic of Duration: 12 Oct 2022 → 17 Oct 2022 Conference number: 29 https://coling2022.org/ |
Publication series
Name | COLING |
---|---|
Publisher | ACL Anthology |
Number | 1 |
Volume | 29 |
ISSN (Electronic) | 2591-2093 |
Conference
Conference | The 29th International Conference on Computational Linguistics, 2022 |
---|---|
Abbreviated title | COLING 2022 |
Country/Territory | Korea, Republic of |
City | Gyeongju |
Period | 12/10/22 → 17/10/22 |
Internet address |