Deep Scattering Power Spectrum Features for Robust Speech Recognition

Neethu M. Joy, Dino Oglic, Zoran Cvetkovic, Peter Bell, Steve Renals

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deep scattering spectrum consists of a cascade of wavelet transforms and modulus non-linearity. It generates features of different orders, with the first order coefficients approximately equal to the Mel-frequency cepstrum, and higher order coefficients recovering information lost at lower levels. We investigate the effect of including the information recovered by higher order coefficients on the robustness of speech recognition. To that end, we also propose a modification to the original scattering transform tailored for noisy speech. In particular, instead of the modulus non-linearity we opt to work with power coefficients and, therefore, use the squared modulus non-linearity. We quantify the robustness of scattering features using the word error rates of acoustic models trained on clean speech and evaluated using sets of utterances corrupted with different noise types. Our empirical results show that the second order scattering power spectrum coefficients capture invariants relevant for noise robustness and that this additional information improves generalization to unseen noise conditions (almost 20% relative error reduction on aurora 4). This finding can have important consequences on speech recognition systems that typically discard the second order information and keep only the first order features (known for emulating mfcc and fbank values) when representing speech.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2020
PublisherInternational Speech Communication Association
Pages1673-1677
Number of pages5
DOIs
Publication statusPublished - 25 Oct 2020
EventInterspeech 2020 - Virtual Conference, China
Duration: 25 Oct 202029 Oct 2020
http://www.interspeech2020.org/

Publication series

Name
ISSN (Electronic)1990-9772

Conference

ConferenceInterspeech 2020
Abbreviated titleINTERSPEECH 2020
Country/TerritoryChina
CityVirtual Conference
Period25/10/2029/10/20
Internet address

Keywords / Materials (for Non-textual outputs)

  • scattering coefficients
  • wavelet transform
  • robustness
  • deep scattering spectrum
  • power spectrum

Fingerprint

Dive into the research topics of 'Deep Scattering Power Spectrum Features for Robust Speech Recognition'. Together they form a unique fingerprint.
  • SpeechWave

    Renals, S. & Bell, P.

    EPSRC

    1/03/1821/05/22

    Project: Research

Cite this