Edinburgh Research Explorer

Hybrid acoustic models for distant and multichannel large vocabulary speech recognition

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationAutomatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages285-290
Number of pages6
ISBN (Print)978-1-4799-2756-2
DOIs
Publication statusPublished - 2013

Abstract

We investigate the application of deep neural network (DNN)-hidden Markov model (HMM) hybrid acoustic models for far-field speech recognition of meetings recorded using microphone arrays. We show that the hybrid models achieve significantly better accuracy than conventional systems based on Gaussian mixture models (GMMs). We observe up to 8% absolute word error rate (WER) reduction from a discriminatively trained GMM baseline when using a single distant microphone, and between 4textendash6% absolute WER reduction when using beamforming on various combinations of array channels. By training the networks on audio from multiple channels, we find the networks can recover significant part of accuracy difference between the single distant microphone and beamformed configurations. Finally, we show that the accuracy of a network recognising speech from a single distant microphone can approach that of a multi-microphone setup by training with data from other microphones.

Download statistics

No data available

ID: 12351513