Multitask Learning of Context-Dependent Targets in Deep Neural Network Acoustic Models

Peter Bell, Pawel Swietojanski, Steve Renals

Research output: Contribution to journalArticlepeer-review


This paper investigates the use of multitask learning to improve context-dependent deep neural network (DNN) acoustic models. The use of hybrid DNN systems with clustered triphone targets is now standard in automatic speech recognition. However, we suggest that using a single set of DNN targets in this manner may not be the most effective choice, since the targets are the result of a somewhat arbitrary clustering process that may not be optimal for discrimination. We propose to remedy this problem through the addition of secondary tasks predicting alternative content-dependent or context-independent targets. We present a comprehensive set of experiments on a lecture recognition task showing that DNNs trained through multitask learning in this manner give consistently improved performance compared to standard hybrid DNNs. The technique is evaluated across a range of data and output sizes. Improvements are seen when training uses the cross entropy criterion and also when sequence training is applied.
Original languageEnglish
Pages (from-to)238 - 247
Number of pages10
JournalIEEE/ACM Transactions on Audio, Speech and Language Processing
Issue number2
Early online date17 Nov 2016
Publication statusPublished - 1 Feb 2017


Dive into the research topics of 'Multitask Learning of Context-Dependent Targets in Deep Neural Network Acoustic Models'. Together they form a unique fingerprint.

Cite this