Use of Generalised Nonlinearity in Vector Taylor Series Noise Compensation for Robust Speech Recognition

Erfan Loweimi, Jon Barker, Thomas Hain

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Designing good normalisation to counter the effect of environmental distortions is one of the major challenges for automatic speech recognition (ASR). The Vector Taylor series (VTS) method is a powerful and mathematically well principled technique that can be applied to both the feature and model domains to compensate for both additive and convolutional noises. One of the limitations of this approach, however, is that it is tied to MFCC (and log-filterbank) features and does not extend to other representations such as PLP, PNCC and phase-based front-ends that use power transformation rather than log compression. This paper aims at broadening the scope of the VTS method by deriving a new formulation that assumes a power transformation is used as the non-linearity during feature extraction. It is shown that the conventional VTS, in the log domain, is a special case of the new extended framework. In addition, the new formulation introduces one more degree of freedom which makes it possible to tune the algorithm to better fit the data to the statistical requirements of the ASR back-end. Compared with MFCC and conventional VTS, the proposed approach provides up to 12.2% and 2.0% absolute performance improvements on average, in Aurora-4 tasks, respectively.
Original languageEnglish
Title of host publicationProc. Interspeech 2016
PublisherISCA
Pages3798-3802
Number of pages5
DOIs
Publication statusPublished - 12 Sep 2016
EventInterspeech 2016 - San Francisco, United States
Duration: 8 Sep 201612 Sep 2016
http://www.interspeech2016.org/

Publication series

Name
PublisherISCA
ISSN (Electronic)1990-9772

Conference

ConferenceInterspeech 2016
Country/TerritoryUnited States
CitySan Francisco
Period8/09/1612/09/16
Internet address

Fingerprint

Dive into the research topics of 'Use of Generalised Nonlinearity in Vector Taylor Series Noise Compensation for Robust Speech Recognition'. Together they form a unique fingerprint.

Cite this