Projects per year
Abstract / Description of output
Designing effective automatic speech recognition (ASR) systems for Code-Switching (CS) often depends on the availability of the transcribed CS resources. To address data scarcity, this paper introduces Speech Collage, a method that synthesizes CS data from monolingual corpora by splicing audio segments. We further improve the smoothness quality of audio generation using an overlap-add approach. We investigate the impact of generated data on speech recognition in two scenarios: using in-domain CS text and a zeroshot approach with synthesized CS text. Empirical results highlight up to 34.4% and 16.2% relative reductions in Mixed-Error Rate and Word-Error Rate for in-domain and zero-shot scenarios, respectively. Lastly, we demonstrate that CS augmentation bolsters the model’s code-switching inclination and reduces its monolingual bias.
Original language | English |
---|---|
Title of host publication | ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 12006-12010 |
Number of pages | 5 |
ISBN (Electronic) | 979-8-3503-4485-1 |
DOIs | |
Publication status | Published - 18 Mar 2024 |
Event | 2024 IEEE International Conference on Acoustics, Speech and Signal Processing - Seoul, Korea, Republic of Duration: 14 Apr 2024 → 19 Apr 2024 https://2024.ieeeicassp.org/ |
Publication series
Name | International Conference on Acoustics, Speech, and Signal Processing proceedings |
---|---|
Publisher | IEEE |
ISSN (Print) | 1520-6149 |
ISSN (Electronic) | 2379-190X |
Conference
Conference | 2024 IEEE International Conference on Acoustics, Speech and Signal Processing |
---|---|
Abbreviated title | ICASSP 2024 |
Country/Territory | Korea, Republic of |
City | Seoul |
Period | 14/04/24 → 19/04/24 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- Code-switching
- ASR
- data augmentation
- end-to-end
- zero-shot learning
Fingerprint
Dive into the research topics of 'SPEECH COLLAGE: CODE-SWITCHED AUDIO GENERATION BY COLLAGING MONOLINGUAL CORPORA'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Unmute : Opening Spoken Language Interaction to the Currently Unheard
Bell, P., Goldwater, S. & Renals, S.
1/12/20 → 30/11/23
Project: Research