Edinburgh Research Explorer

Convolutional Neural Networks for Distant Speech Recognition

Research output: Contribution to journalArticle

Original languageEnglish
Pages (from-to)1120-1124
Number of pages5
JournalIEEE Signal Processing Letters
Volume21
Issue number9
Early online date20 May 2014
DOIs
Publication statusPublished - 1 Sep 2014

Abstract

We investigate convolutional neural networks (CNNs) for large vocabulary distant speech recognition, trained using speech recorded from a single distant microphone (SDM) and multiple distant microphones (MDM). In the MDM case we explore a beamformed signal input representation compared with the direct use of multiple acoustic channels as a parallel input to the CNN. We have explored different weight sharing approaches, and propose a channel-wise convolution with two-way pooling. Our experiments, using the AMI meeting corpus, found that CNNs improve the word error rate (WER) by 6.5% relative compared to conventional deep neural network (DNN) models and 15.7% over a discriminatively trained Gaussian mixture model (GMM) baseline. For cross-channel CNN training, the WER improves by 3.5% relative over the comparable DNN structure. Compared with the best beamformed GMM system, cross-channel convolution reduces the WER by 9.7% relative, and matches the accuracy of a beamformed DNN.

Download statistics

No data available

ID: 20099537