Projects per year
Abstract / Description of output
In the broadcast domain there is an abundance of related text data and partial transcriptions, such as closed captions and subtitles. This text data can be used for lightly supervised training, in which text matching the audio is selected using an existing speech recognition model. Current approaches to light supervision typically filter the data based on matching error rates between the transcriptions and biased decoding hypotheses. In contrast, semi-supervised training does not require matching text data, instead generating a hypothesis using a background language model. State-of-the-art semi-supervised training uses lattice-based supervision with the lattice-free MMI (LF-MMI) objective function. We propose a technique to combine inaccurate transcriptions with the lattices generated for semisupervised training, thus preserving uncertainty in the lattice where appropriate. We demonstrate that this combined approach reduces the expected error rates over the lattices, and reduces the word error rate (WER) on a broadcast task.
Original language | English |
---|---|
Title of host publication | Proceedings Interspeech 2019 |
Publisher | International Speech Communication Association |
Pages | 1596-1600 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 19 Sept 2019 |
Event | Interspeech 2019 - Graz, Austria Duration: 15 Sept 2019 → 19 Sept 2019 https://www.interspeech2019.org/ |
Publication series
Name | |
---|---|
Publisher | International Speech Communication Association |
ISSN (Electronic) | 1990-9772 |
Conference
Conference | Interspeech 2019 |
---|---|
Country/Territory | Austria |
City | Graz |
Period | 15/09/19 → 19/09/19 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- Automatic speech recognition
- ightly supervised training
- LF-MMI
- broadcast media
Fingerprint
Dive into the research topics of 'Lattice-based lightly-supervised acoustic model training'. Together they form a unique fingerprint.Projects
- 3 Finished
-
-
SUMMA - Scalable Understanding of Mulitingual Media
Renals, S., Birch-Mayne, A. & Cohen, S.
1/02/16 → 31/01/19
Project: Research
-
Multi-domain speech recognition
Non-EU industry, commerce and public corporations
1/09/15 → 28/02/19
Project: Research