Recognition of overlapping speech using digital MEMS microphone arrays

Erich Zwyssig, Friedrich Faubel, Steve Renals, Mike Lincoln

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a new corpus comprising single and overlapping speech recorded using digital MEMS and analogue microphone arrays. In addition to this, the paper presents results from speech separation and recognition experiments on this data. The corpus is a reproduction of the multi-channel Wall Street Journal audio-visual corpus (MC-WSJAV), containing recorded speech in both a meeting room and an anechoic chamber using two different microphone types as well as two different array geometries. The speech separation and speech recognition experiments were performed using SRP-PHAT-based speaker localisation, superdirective beamforming and multiple post-processing schemes, such as residual echo suppression and binary masking. Our simple, cMLLR-based recognition system matches the performance of state-of-the-art ASR systems on the single speaker task and outperforms them on overlapping speech. The corpus will be made publicly available via the LDC in spring 2013.

Original languageEnglish
Title of host publication2013 IEEE International Conference on Acoustics, Speech and Signal Processing
Pages7068-7072
Number of pages5
DOIs
Publication statusPublished - 21 Oct 2013
Event38th IEEE International Conference on Acoustics, Speech, and Signal Processing - Vancouver, Canada
Duration: 26 May 201331 May 2013
https://www2.securecms.com/ICASSP2013/default.asp

Conference

Conference38th IEEE International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP 2013
Country/TerritoryCanada
CityVancouver
Period26/05/1331/05/13
Internet address

Keywords / Materials (for Non-textual outputs)

  • ASR
  • MEMS microphones
  • microphone array
  • speech separation
  • WSJ

Fingerprint

Dive into the research topics of 'Recognition of overlapping speech using digital MEMS microphone arrays'. Together they form a unique fingerprint.

Cite this