SPEECH COLLAGE: CODE-SWITCHED AUDIO GENERATION BY COLLAGING MONOLINGUAL CORPORA

Amir Hussein, Dorsa Zeinali, Ondrej Klejch, Matthew Wiesner, Brian Yan, Shammur Chowdhury, Ahmed Ali, Shinji Watanabe, Sanjeev Khudanpur

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Designing effective automatic speech recognition (ASR) systems for Code-Switching (CS) often depends on the availability of the transcribed CS resources. To address data scarcity, this paper introduces Speech Collage, a method that synthesizes CS data from monolingual corpora by splicing audio segments. We further improve the smoothness quality of audio generation using an overlap-add approach. We investigate the impact of generated data on speech recognition in two scenarios: using in-domain CS text and a zeroshot approach with synthesized CS text. Empirical results highlight up to 34.4% and 16.2% relative reductions in Mixed-Error Rate and Word-Error Rate for in-domain and zero-shot scenarios, respectively. Lastly, we demonstrate that CS augmentation bolsters the model’s code-switching inclination and reduces its monolingual bias.
Original languageEnglish
Title of host publicationICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherIEEE
Pages1-5
Number of pages5
Publication statusAccepted/In press - 13 Dec 2023
Event2024 IEEE International Conference on Acoustics, Speech and Signal Processing - Seoul, Korea, Republic of
Duration: 14 Apr 202419 Apr 2024
https://2024.ieeeicassp.org/

Publication series

NameInternational Conference on Acoustics, Speech, and Signal Processing proceedings
PublisherIEEE
ISSN (Print)1520-6149
ISSN (Electronic)2379-190X

Conference

Conference2024 IEEE International Conference on Acoustics, Speech and Signal Processing
Abbreviated titleICASSP 2024
Country/TerritoryKorea, Republic of
CitySeoul
Period14/04/2419/04/24
Internet address

Keywords / Materials (for Non-textual outputs)

  • Code-switching
  • ASR
  • data augmentation
  • end-to-end
  • zero-shot learning

Fingerprint

Dive into the research topics of 'SPEECH COLLAGE: CODE-SWITCHED AUDIO GENERATION BY COLLAGING MONOLINGUAL CORPORA'. Together they form a unique fingerprint.

Cite this