Complementary tasks for context-dependent deep neural network acoustic models

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We have previously found that context-dependent DNN models for automatic speech recognition can be improved with the use of monophone targets as a secondary task for the network. This paper asks whether the improvements derive from the regularising effect of having a much small number of monophone outputs – compared to the typical number of tied states – or from the use of targets that are not tied to an arbitrary stateclustering. We investigate the use of factorised targets for left and right context, and targets motivated by articulatory properties of the phonemes. We present results on a large-vocabulary lecture recognition task. Although the regularising effect of monophones seems to be important, all schemes give substantial improvements over the baseline single task system, even though the cardinality of the outputs is relatively high.
Original languageEnglish
Title of host publicationINTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association
Number of pages5
Publication statusPublished - 11 Sept 2015


Dive into the research topics of 'Complementary tasks for context-dependent deep neural network acoustic models'. Together they form a unique fingerprint.

Cite this