Edinburgh Research Explorer

Lattice-based lightly-supervised acoustic model training

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationProceedings Interspeech 2019
PublisherInternational Speech Communication Association
Pages1596-1600
Number of pages5
DOIs
Publication statusPublished - 19 Sep 2019
EventInterspeech 2019 - Graz, Austria
Duration: 15 Sep 201919 Sep 2019
https://www.interspeech2019.org/

Publication series

Name
PublisherInternational Speech Communication Association
ISSN (Electronic)1990-9772

Conference

ConferenceInterspeech 2019
CountryAustria
CityGraz
Period15/09/1919/09/19
Internet address

Abstract

In the broadcast domain there is an abundance of related text data and partial transcriptions, such as closed captions and subtitles. This text data can be used for lightly supervised training, in which text matching the audio is selected using an existing speech recognition model. Current approaches to light supervision typically filter the data based on matching error rates between the transcriptions and biased decoding hypotheses. In contrast, semi-supervised training does not require matching text data, instead generating a hypothesis using a background language model. State-of-the-art semi-supervised training uses lattice-based supervision with the lattice-free MMI (LF-MMI) objective function. We propose a technique to combine inaccurate transcriptions with the lattices generated for semisupervised training, thus preserving uncertainty in the lattice where appropriate. We demonstrate that this combined approach reduces the expected error rates over the lattices, and reduces the word error rate (WER) on a broadcast task.

    Research areas

  • Automatic speech recognition, ightly supervised training, LF-MMI, broadcast media

Event

Interspeech 2019

15/09/1919/09/19

Graz, Austria

Event: Conference

Download statistics

No data available

ID: 99435668