Edinburgh Research Explorer

Complementary tasks for context-dependent deep neural network acoustic models

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationINTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association
Pages3610-3614
Number of pages5
Publication statusPublished - 11 Sep 2015

Abstract

We have previously found that context-dependent DNN models for automatic speech recognition can be improved with the use of monophone targets as a secondary task for the network. This paper asks whether the improvements derive from the regularising effect of having a much small number of monophone outputs – compared to the typical number of tied states – or from the use of targets that are not tied to an arbitrary stateclustering. We investigate the use of factorised targets for left and right context, and targets motivated by articulatory properties of the phonemes. We present results on a large-vocabulary lecture recognition task. Although the regularising effect of monophones seems to be important, all schemes give substantial improvements over the baseline single task system, even though the cardinality of the outputs is relatively high.

Download statistics

No data available

ID: 19957665