Projects per year
Abstract / Description of output
We investigate convolutional neural networks (CNNs) for large vocabulary distant speech recognition, trained using speech recorded from a single distant microphone (SDM) and multiple distant microphones (MDM). In the MDM case we explore a beamformed signal input representation compared with the direct use of multiple acoustic channels as a parallel input to the CNN. We have explored different weight sharing approaches, and propose a channel-wise convolution with two-way pooling. Our experiments, using the AMI meeting corpus, found that CNNs improve the word error rate (WER) by 6.5% relative compared to conventional deep neural network (DNN) models and 15.7% over a discriminatively trained Gaussian mixture model (GMM) baseline. For cross-channel CNN training, the WER improves by 3.5% relative over the comparable DNN structure. Compared with the best beamformed GMM system, cross-channel convolution reduces the WER by 9.7% relative, and matches the accuracy of a beamformed DNN.
Original language | English |
---|---|
Pages (from-to) | 1120-1124 |
Number of pages | 5 |
Journal | IEEE Signal Processing Letters |
Volume | 21 |
Issue number | 9 |
Early online date | 20 May 2014 |
DOIs | |
Publication status | Published - 1 Sept 2014 |
Fingerprint
Dive into the research topics of 'Convolutional Neural Networks for Distant Speech Recognition'. Together they form a unique fingerprint.Projects
- 2 Finished