On the Robustness and Training Dynamics of Raw Waveform Models

Erfan Loweimi, Peter Bell, Steve Renals

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We investigate the robustness and training dynamics of raw waveform acoustic models for automatic speech recognition (ASR). It is known that the first layer of such models learn a set of filters, performing a form of time-frequency analysis. This layer is liable to be under-trained owing to gradient vanishing, which can negatively affect the network performance. Through a set of experiments on TIMIT, Aurora-4 and WSJ datasets, we investigate the training dynamics of the first layer by measuring the evolution of its average frequency response over different epochs. We demonstrate that the network efficiently learns an optimal set of filters with a high spectral resolution and the dynamics of the first layer highly correlates with the dynamics of the cross entropy (CE) loss and word error rate (WER). In addition, we study the robustness of raw waveform models in both matched and mismatched conditions. The accuracy of these models is found to be comparable to, or better than, their MFCC-based counterparts in matched conditions and notably improved by using a better alignment. The role of raw waveform normalisation was also examined and up to 4.3% absolute WER reduction in mismatched conditions was achieved.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2020
PublisherInternational Speech Communication Association
Pages1001-1005
Number of pages5
DOIs
Publication statusPublished - 25 Oct 2020
EventInterspeech 2020 - Virtual Conference, China
Duration: 25 Oct 202029 Oct 2020
http://www.interspeech2020.org/

Publication series

Name
ISSN (Electronic)1990-9772

Conference

ConferenceInterspeech 2020
Abbreviated titleINTERSPEECH 2020
Country/TerritoryChina
CityVirtual Conference
Period25/10/2029/10/20
Internet address

Keywords

  • ASR
  • acoustic modelling
  • raw waveform
  • training dynamics
  • average frequency response

Fingerprint

Dive into the research topics of 'On the Robustness and Training Dynamics of Raw Waveform Models'. Together they form a unique fingerprint.

Cite this