Statistical normalisation of phase-based feature representation for robust speech recognition

E. Loweimi, Jon Barker, Thomas Hain

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

In earlier work we have proposed a source-filter decomposition of speech through phase-based processing. The decomposition leads to novel speech features that are extracted from the filter component of the phase spectrum. This paper analyses this spectrum and the proposed representation by evaluating statistical properties at various points along the parametrisation pipeline. We show that speech phase spectrum has a bell-shaped distribution which is in contrast to the uniform assumption that is usually made. It is demonstrated that the uniform density (which implies that the corresponding sequence is least-informative) is an artefact of the phase wrapping and not an original characteristic of this spectrum. In addition, we extend the idea of statistical normalisation usually applied for the magnitudebased features into the phase domain. Based on the statistical structure of the phase-based features, which is shown to be super-gaussian in the clean condition, three normalisation schemes, namely, Gaussianisation, Laplacianisation and table-based histogram equalisation have been applied for improving the robustness. Speech recognition experiments using Aurora-2 show that applying an optimal normalisation scheme at the right stage of the feature extraction process can produce average relative WER reductions of up to 18.6% across the 0-20 dB SNR conditions.
Original languageEnglish
Title of host publication2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Place of PublicationNew Orleans, LA, USA
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages5310-5314
Number of pages5
ISBN (Electronic)978-1-5090-4117-6
ISBN (Print)978-1-5090-4118-3
DOIs
Publication statusPublished - 1 Mar 2017
Event42nd IEEE International Conference on Acoustics, Speech and Signal Processing - New Orleans, United States
Duration: 5 Mar 20179 Mar 2017
http://www.ieee-icassp2017.org/

Publication series

Name
PublisherIEEE
ISSN (Electronic)2379-190X

Conference

Conference42nd IEEE International Conference on Acoustics, Speech and Signal Processing
Abbreviated titleICASSP 2017
Country/TerritoryUnited States
CityNew Orleans
Period5/03/179/03/17
Internet address

Keywords / Materials (for Non-textual outputs)

  • feature extraction
  • filtering theory
  • Gaussian processes
  • signal representation
  • speech processing
  • speech recognition
  • statistical analysis
  • statistical normalisation
  • phase-based feature representation
  • robust speech recognition
  • source-filter decomposition
  • phase-based processing
  • speech feature extraction
  • phase spectrum filter component
  • parametrisation pipeline
  • speech phase spectrum
  • bell-shaped distribution
  • phase wrapping
  • magnitude-based features
  • superGaussian
  • Gaussianisation
  • Laplacianisation
  • table-based histogram equalisation
  • Aurora-2
  • optimal normalisation scheme
  • Speech
  • Speech recognition
  • Speech processing
  • Histograms
  • Feature extraction
  • Market research
  • Wrapping
  • phase spectrum
  • phase distribution

Fingerprint

Dive into the research topics of 'Statistical normalisation of phase-based feature representation for robust speech recognition'. Together they form a unique fingerprint.

Cite this