The CSTR System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages

Ondřej Klejch, Electra Wallington, Peter Bell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes the CSTR submission to the Multilingual and Code-Switching ASR Challenges at Interspeech 2021. For the multilingual track of the challenge, we trained a multilingual CNN-TDNN acoustic model for Gujarati, Hindi, Marathi, Odia, Tamil and Telugu and subsequently fine-tuned the model on monolingual training data. A language model built on a mixture of training and CommonCrawl data was used for decoding. We also demonstrate that crawled data from YouTube can be successfully used to improve the performance of the acoustic model with semi-supervised training. These models together with confidence based language identification achieve the average WER of 18.1%, a 41% relative improvement compared to the provided multilingual baseline model. For the code-switching track of the challenge we again train a multilingual model on Bengali and Hindi technical lectures and we employ a language model trained on CommonCrawl Bengali and Hindi data mixed with in-domain English data, using a novel transliteration method to generate pronunciations for the English terms. The final model improves by 18% and 34% relative compared to our multilingual baseline. Both our systems were among the top-ranked entries to the challenge.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2021
PublisherInternational Speech Communication Association
Pages2881-2885
Number of pages5
DOIs
Publication statusPublished - 30 Aug 2021
EventInterspeech 2021: The 22nd Annual Conference of the International Speech Communication Association - Brno, Czech Republic
Duration: 30 Aug 20213 Sept 2021
Conference number: 22
https://www.interspeech2021.org

Publication series

Name
ISSN (Print)1990-9772

Conference

ConferenceInterspeech 2021
Country/TerritoryCzech Republic
CityBrno
Period30/08/213/09/21
Internet address

Keywords / Materials (for Non-textual outputs)

  • low-resource speech recognition
  • multilingual speech recognition
  • code switching

Fingerprint

Dive into the research topics of 'The CSTR System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages'. Together they form a unique fingerprint.

Cite this