Speaker-aware training of LSTM-RNNs for acoustic modelling

Tian Tan, Yanmin Qian, Dong Yu, Souvik Kundu, Liang Lu, Xiong Xiao, Khe Chai SIM, Yu Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Long Short-Term Memory (LSTM) is a particular type of recurrent neural network (RNN) that can model long term temporal dynamics. Recently it has been shown that LSTM-RNNs can achieve higher recognition accuracy than deep feed-forword neural networks (DNNs) in acoustic modelling. However, speaker adaption for LSTM-RNN based acoustic models has not been well investigated. In this paper, we study the LSTM-RNN speaker-aware training that incorporates the speaker information during model training to normalise the speaker variability. We first present several speaker-aware training architectures, and then empirically evaluate three types of speaker representation: I-vectors, bottleneck speaker vectors and speaking rate. Furthermore, to factorize the variability in the acoustic signals caused by speakers and phonemes respectively, we investigate the speaker-aware and phone-aware joint training under the framework of multi-task learning. In AMI meeting speech transcription task, speaker-aware training of LSTM-RNNs reduces word error rates by 6.5% relative to a very strong LSTM-RNN baseline, which uses FMLLR features.
Original languageEnglish
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages5280-5284
Number of pages5
ISBN (Print)978-1-4799-9988-0
DOIs
Publication statusPublished - Mar 2016
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - China, Shanghai, China
Duration: 20 Mar 201625 Mar 2016
https://www2.securecms.com/ICASSP2016/Default.asp

Conference

Conference41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Abbreviated titleICASSP 2016
Country/TerritoryChina
CityShanghai
Period20/03/1625/03/16
Internet address

Fingerprint

Dive into the research topics of 'Speaker-aware training of LSTM-RNNs for acoustic modelling'. Together they form a unique fingerprint.

Cite this