Untranscribed web audio for low resource speech recognition

Andrea Carmantini, Peter Bell, Steve Renals

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Speech recognition models are highly susceptible to mismatch in the acoustic and language domains between the training and the evaluation data. For low resource languages, it is difficult to obtain transcribed speech for target domains, while untranscribed data can be collected with minimal effort. Recently, a method applying lattice-free maximum mutual information (LF-MMI) to untranscribed data has been found to be effective for semi-supervised training. However, weaker initial models and domain mismatch can result in high deletion rates for the semi-supervised model. Therefore, we propose a method to force the base model to overgenerate possible transcriptions, relying on the ability of LF-MMI to deal with uncertainty.

On data from the IARPA MATERIAL programme, our new semi-supervised method outperforms the standard semisupervised method, yielding significant gains when adapting for mismatched bandwidth and domain.
Original languageEnglish
Title of host publicationProceedings Interspeech 2019
PublisherInternational Speech Communication Association
Number of pages5
Publication statusPublished - 19 Sep 2019
EventInterspeech 2019 - Graz, Austria
Duration: 15 Sep 201919 Sep 2019

Publication series

PublisherInternational Speech Communication Association
ISSN (Electronic)1990-9772


ConferenceInterspeech 2019
Internet address


  • speech recognition
  • semi-supervised training
  • domain adaptation
  • web data

Fingerprint Dive into the research topics of 'Untranscribed web audio for low resource speech recognition'. Together they form a unique fingerprint.

Cite this