Untranscribed web audio for low resource speech recognition

Andrea Carmantini, Peter Bell, Steve Renals

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Speech recognition models are highly susceptible to mismatch in the acoustic and language domains between the training and the evaluation data. For low resource languages, it is difficult to obtain transcribed speech for target domains, while untranscribed data can be collected with minimal effort. Recently, a method applying lattice-free maximum mutual information (LF-MMI) to untranscribed data has been found to be effective for semi-supervised training. However, weaker initial models and domain mismatch can result in high deletion rates for the semi-supervised model. Therefore, we propose a method to force the base model to overgenerate possible transcriptions, relying on the ability of LF-MMI to deal with uncertainty.

On data from the IARPA MATERIAL programme, our new semi-supervised method outperforms the standard semisupervised method, yielding significant gains when adapting for mismatched bandwidth and domain.
Original languageEnglish
Title of host publicationProceedings Interspeech 2019
PublisherInternational Speech Communication Association
Pages226-230
Number of pages5
DOIs
Publication statusPublished - 19 Sept 2019
EventInterspeech 2019 - Graz, Austria
Duration: 15 Sept 201919 Sept 2019
https://www.interspeech2019.org/

Publication series

Name
PublisherInternational Speech Communication Association
ISSN (Electronic)1990-9772

Conference

ConferenceInterspeech 2019
Country/TerritoryAustria
CityGraz
Period15/09/1919/09/19
Internet address

Keywords / Materials (for Non-textual outputs)

  • speech recognition
  • semi-supervised training
  • domain adaptation
  • web data

Fingerprint

Dive into the research topics of 'Untranscribed web audio for low resource speech recognition'. Together they form a unique fingerprint.

Cite this