A new phase-based feature representation for robust speech recognition

E. Loweimi, S. M. Ahadi, T. Drugman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The aim of this paper is to introduce a novel phase-based feature representation for robust speech recognition. This method consists of four main parts: autoregressive (AR) model extraction, group delay function (GDF) computation, compression, and scale information augmentation. Coupling GDF with an AR model results in a high-resolution estimate of the power spectrum with low frequency leakage. The compression step includes two stages similar to MFCC without taking a logarithm of the output energies. The fourth part augments the phase-based feature vector with scale information which is based on the Hilbert transform relations and complements the phase spectrum information. In the presence of additive and convolutional noises, the proposed method has led to 15% and 12% reductions in the averaged error rates, respectively (SNR ranging from 0 to 20 dB), compared to the standard MFCCs.
Original languageEnglish
Title of host publication2013 IEEE International Conference on Acoustics, Speech and Signal Processing
PublisherInstitute of Electrical and Electronics Engineers
Pages7155-7159
Number of pages5
ISBN (Electronic)978-1-4799-0356-6
DOIs
Publication statusPublished - 1 May 2013
Event38th IEEE International Conference on Acoustics, Speech, and Signal Processing - Vancouver, Canada
Duration: 26 May 201331 May 2013
https://www2.securecms.com/ICASSP2013/default.asp

Conference

Conference38th IEEE International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP 2013
Country/TerritoryCanada
CityVancouver
Period26/05/1331/05/13
Internet address

Keywords / Materials (for Non-textual outputs)

  • autoregressive processes
  • error statistics
  • feature extraction
  • Hilbert transforms
  • image representation
  • speech recognition
  • phase-based feature representation
  • robust speech recognition
  • AR model extraction
  • autoregressive model extraction
  • group delay function computation
  • GDF computation
  • scale information augmentation
  • power spectrum
  • high-resolution estimation
  • low frequency leakage
  • compression step
  • feature vector
  • Hilbert transform relations
  • phase spectrum information
  • additive noises
  • convolutional noises
  • averaged error rates
  • standard MFCC
  • Speech
  • Robustness
  • Speech recognition
  • Abstracts
  • Mel frequency cepstral coefficient
  • Speech phase spectrum
  • group delay
  • compression
  • scale information

Fingerprint

Dive into the research topics of 'A new phase-based feature representation for robust speech recognition'. Together they form a unique fingerprint.

Cite this