Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we investigate the usefulness of the sign spectrum and its combination with the raw magnitude spectrum in acoustic modelling for automatic speech recognition (ASR). The sign spectrum is a sequence of ±1s, capturing one bit of the phase spectrum. It encodes information overlooked by the magnitude spectrum enabling unique signal characterisation and reconstruction. In particular, we demonstrate it carries information related to the temporal structure of the signal as well as the speech’s source component. Furthermore, we investigate the usefulness of combining it with the raw magnitude spectrum via multi-head CNNs at different fusion levels for ASR. While information-wise these two streams of information are together equivalent to the raw waveform signal the overall performance is noticeably higher than raw waveform and classic features such as MFCC and filterbank. This has been observed and verified in TIMIT, NTIMT, Aurora-4 and WSJ tasks and up to 14.5% relative WER reduction has been achieved.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2020
PublisherInternational Speech Communication Association
Pages1644-1648
Number of pages5
DOIs
Publication statusPublished - 25 Oct 2020
EventInterspeech 2020 - Virtual Conference, China
Duration: 25 Oct 202029 Oct 2020
http://www.interspeech2020.org/

Publication series

Name
ISSN (Electronic)1990-9772

Conference

ConferenceInterspeech 2020
Abbreviated titleINTERSPEECH 2020
CountryChina
CityVirtual Conference
Period25/10/2029/10/20
Internet address

Keywords

  • Sign spectrum
  • raw magnitude spectrum
  • multi-head CNN
  • multi-stream processing
  • multi-stream processing,

Fingerprint Dive into the research topics of 'Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling'. Together they form a unique fingerprint.

Cite this