Abstract
This paper presents a new corpus comprising single and overlapping speech recorded using digital MEMS and analogue microphone arrays. In addition to this, the paper presents results from speech separation and recognition experiments on this data. The corpus is a reproduction of the multi-channel Wall Street Journal audio-visual corpus (MC-WSJAV), containing recorded speech in both a meeting room and an anechoic chamber using two different microphone types as well as two different array geometries. The speech separation and speech recognition experiments were performed using SRP-PHAT-based speaker localisation, superdirective beamforming and multiple post-processing schemes, such as residual echo suppression and binary masking. Our simple, cMLLR-based recognition system matches the performance of state-of-the-art ASR systems on the single speaker task and outperforms them on overlapping speech. The corpus will be made publicly available via the LDC in spring 2013.
Original language | English |
---|---|
Title of host publication | 2013 IEEE International Conference on Acoustics, Speech and Signal Processing |
Pages | 7068-7072 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 21 Oct 2013 |
Event | 38th IEEE International Conference on Acoustics, Speech, and Signal Processing - Vancouver, Canada Duration: 26 May 2013 → 31 May 2013 https://www2.securecms.com/ICASSP2013/default.asp |
Conference
Conference | 38th IEEE International Conference on Acoustics, Speech, and Signal Processing |
---|---|
Abbreviated title | ICASSP 2013 |
Country/Territory | Canada |
City | Vancouver |
Period | 26/05/13 → 31/05/13 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- ASR
- MEMS microphones
- microphone array
- speech separation
- WSJ