Projects per year
We have previously found that context-dependent DNN models for automatic speech recognition can be improved with the use of monophone targets as a secondary task for the network. This paper asks whether the improvements derive from the regularising effect of having a much small number of monophone outputs – compared to the typical number of tied states – or from the use of targets that are not tied to an arbitrary stateclustering. We investigate the use of factorised targets for left and right context, and targets motivated by articulatory properties of the phonemes. We present results on a large-vocabulary lecture recognition task. Although the regularising effect of monophones seems to be important, all schemes give substantial improvements over the baseline single task system, even though the cardinality of the outputs is relatively high.
|Title of host publication||INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association|
|Number of pages||5|
|Publication status||Published - 11 Sep 2015|