On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition

Erfan Loweimi, Peter Bell, Steve Renals

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

DNNs play a major role in the state-of-the-art ASR systems. They can be used for extracting features and building probabilistic models for acoustic and language modelling. Despite their huge practical success, the level of theoretical understanding has remained shallow. This paper investigates DNNs from a statistical standpoint. In particular, the effect of activation functions on the distribution of the pre-activations and activations is investigated and discussed from both analytic and empirical viewpoints. This study, among others, shows that the pre-activation density in the bottleneck layer can be well fitted with a diagonal GMM with a few Gaussians and how and why the ReLU activation function promotes sparsity. Motivated by the statistical properties of the pre-activations, the usefulness of statistical normalisation of bottleneck features was also investigated. To this end, methods such as mean(-variance) normalisation, Gaussianisation, and histogram equalisation (HEQ) were employed and up to 2% (absolute) WER reduction achieved in the Aurora-4 task.
Original languageEnglish
Title of host publicationICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Place of PublicationBrighton, United Kingdom
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages3862-3866
Number of pages5
ISBN (Electronic)978-1-4799-8131-1
ISBN (Print)978-1-4799-8132-8
DOIs
Publication statusE-pub ahead of print - 17 Apr 2019
Event44th International Conference on Acoustics, Speech, and Signal Processing: Signal Processing: Empowering Science and Technology for Humankind - Brighton , United Kingdom
Duration: 12 May 201917 May 2019
Conference number: 44
https://2019.ieeeicassp.org/

Publication series

Name
PublisherIEEE
ISSN (Print)1520-6149
ISSN (Electronic)2379-190X

Conference

Conference44th International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP 2019
Country/TerritoryUnited Kingdom
CityBrighton
Period12/05/1917/05/19
Internet address

Keywords

  • Deep neural networks
  • bottleneck features
  • probability density
  • statistical normalisation

Fingerprint

Dive into the research topics of 'On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition'. Together they form a unique fingerprint.
  • SpeechWave

    Renals, S. & Bell, P.

    EPSRC

    1/03/1821/05/22

    Project: Research

Cite this