Projects per year
Abstract
We investigate convolutional neural networks (CNNs) for large vocabulary distant speech recognition, trained using speech recorded from a single distant microphone (SDM) and multiple distant microphones (MDM). In the MDM case we explore a beamformed signal input representation compared with the direct use of multiple acoustic channels as a parallel input to the CNN. We have explored different weight sharing approaches, and propose a channel-wise convolution with two-way pooling. Our experiments, using the AMI meeting corpus, found that CNNs improve the word error rate (WER) by 6.5% relative compared to conventional deep neural network (DNN) models and 15.7% over a discriminatively trained Gaussian mixture model (GMM) baseline. For cross-channel CNN training, the WER improves by 3.5% relative over the comparable DNN structure. Compared with the best beamformed GMM system, cross-channel convolution reduces the WER by 9.7% relative, and matches the accuracy of a beamformed DNN.
| Original language | English |
|---|---|
| Pages (from-to) | 1120-1124 |
| Number of pages | 5 |
| Journal | IEEE Signal Processing Letters |
| Volume | 21 |
| Issue number | 9 |
| Early online date | 20 May 2014 |
| DOIs | |
| Publication status | Published - 1 Sept 2014 |
Fingerprint
Dive into the research topics of 'Convolutional Neural Networks for Distant Speech Recognition'. Together they form a unique fingerprint.Projects
- 2 Finished
-
-
Natural Speech Technology
Renals, S. (Principal Investigator) & King, S. (Co-investigator)
1/05/11 → 31/07/16
Project: Research