Projects per year
Abstract
Due to limited computational resources, acoustic models of early automatic speech recognition ( asr) systems were built in low-dimensional feature spaces that incur considerable information loss at the outset of the process. Several comparative studies of automatic and human speech recognition suggest that this information loss can adversely affect the robustness of asr systems. To mitigate that and allow for learning of robust models, we propose a deep 2 d convolutional network in the waveform domain. The first layer of the network decomposes waveforms into frequency sub-bands, thereby representing them in a structured high-dimensional space. This is achieved by means of a parametric convolutional block defined via cosine modulations of compactly supported windows. The next layer embeds the waveform in an even higher-dimensional space of high-resolution spectro-temporal patterns, implemented via a 2 d convolutional block. This is followed by a gradual compression phase that selects most relevant spectro-temporal patterns using wide-pass 2 d filtering. Our results show that the approach significantly outperforms alternative waveform-based models on both noisy and spontaneous conversational speech (24% and 11% relative error reduction, respectively). Moreover, this study provides empirical evidence that learning directly from the waveform domain could be more effective than learning using hand-crafted features.
Original language | English |
---|---|
Title of host publication | Proceedings of Interspeech 2020 |
Publisher | International Speech Communication Association |
Pages | 1654-1658 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 25 Oct 2020 |
Event | Interspeech 2020 - Virtual Conference, China Duration: 25 Oct 2020 → 29 Oct 2020 http://www.interspeech2020.org/ |
Publication series
Name | |
---|---|
ISSN (Electronic) | 1990-9772 |
Conference
Conference | Interspeech 2020 |
---|---|
Abbreviated title | INTERSPEECH 2020 |
Country | China |
City | Virtual Conference |
Period | 25/10/20 → 29/10/20 |
Internet address |
Keywords
- automatic speech recognition
- parametric filters
- deep convolutional networks
- raw speech
- robustness
Fingerprint Dive into the research topics of 'A Deep 2D Convolutional Network for Waveform-Based Speech Recognition'. Together they form a unique fingerprint.
Projects
- 1 Active