Edinburgh Research Explorer

Multitask Learning of Context-Dependent Targets in Deep Neural Network Acoustic Models

Research output: Contribution to journalArticle

Original languageEnglish
Pages (from-to)238 - 247
Number of pages10
Journal IEEE/ACM Transactions on Audio, Speech, and Language Processing
Issue number2
Early online date17 Nov 2016
Publication statusPublished - Feb 2017


This paper investigates the use of multitask learning to improve context-dependent deep neural network (DNN) acoustic models. The use of hybrid DNN systems with clustered triphone targets is now standard in automatic speech recognition. However, we suggest that using a single set of DNN targets in this manner may not be the most effective choice, since the targets are the result of a somewhat arbitrary clustering process that may not be optimal for discrimination. We propose to remedy this problem through the addition of secondary tasks predicting alternative content-dependent or context-independent targets. We present a comprehensive set of experiments on a lecture recognition task showing that DNNs trained through multitask learning in this manner give consistently improved performance compared to standard hybrid DNNs. The technique is evaluated across a range of data and output sizes. Improvements are seen when training uses the cross entropy criterion and also when sequence training is applied.

Download statistics

No data available

ID: 29476079