Deep Beamforming Networks for Multi-channel Speech Recognition

Xiong Xiao, Shinji Watanabe, Hakan Erdogan, Liang Lu, John Hershey, Michael L. Seltzer, Guoguo Chen, Yu Zhang, Michael Mandel, Dong Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Despite the significant progress in speech recognition enabled by deep neural networks, poor performance persists in some scenarios. In this work, we focus on far-field speech recognition which remains challenging due to high levels of noise and reverberation in the captured speech signals. We propose to represent the stages of acoustic processing including beam forming, feature extraction, and acoustic modeling, as three components of a single unified computational network. The parameters of a frequency-domain beam former are first estimated by a network based on features derived from the microphone channels. These filter coefficients are then applied to the array signals to form an enhanced signal. Conventional features are then extracted from this signal and passed to a second network that performs acoustic modeling for classification. The parameters of both the beam forming and acoustic modeling networks are trained jointly using back-propagation with a common cross entropy objective function. In experiments on the AMI meeting corpus,we observed improvements by pre-training each sub-network with a network-specific objective function before joint training of both networks. The proposed method obtained a 3.2% absolute word error rate reduction compared to a conventional pipeline of independent processing stages.
Original languageEnglish
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages5745 - 5749
Number of pages5
ISBN (Print) 978-1-4799-9988-0
DOIs
Publication statusPublished - Mar 2016
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - China, Shanghai, China
Duration: 20 Mar 201625 Mar 2016
https://www2.securecms.com/ICASSP2016/Default.asp

Conference

Conference41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Abbreviated titleICASSP 2016
Country/TerritoryChina
CityShanghai
Period20/03/1625/03/16
Internet address

Fingerprint

Dive into the research topics of 'Deep Beamforming Networks for Multi-channel Speech Recognition'. Together they form a unique fingerprint.

Cite this