Abstract
The aim of this paper is to introduce a novel phase-based feature representation for robust speech recognition. This method consists of four main parts: autoregressive (AR) model extraction, group delay function (GDF) computation, compression, and scale information augmentation. Coupling GDF with an AR model results in a high-resolution estimate of the power spectrum with low frequency leakage. The compression step includes two stages similar to MFCC without taking a logarithm of the output energies. The fourth part augments the phase-based feature vector with scale information which is based on the Hilbert transform relations and complements the phase spectrum information. In the presence of additive and convolutional noises, the proposed method has led to 15% and 12% reductions in the averaged error rates, respectively (SNR ranging from 0 to 20 dB), compared to the standard MFCCs.
Original language | English |
---|---|
Title of host publication | 2013 IEEE International Conference on Acoustics, Speech and Signal Processing |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 7155-7159 |
Number of pages | 5 |
ISBN (Electronic) | 978-1-4799-0356-6 |
DOIs | |
Publication status | Published - 1 May 2013 |
Event | 38th IEEE International Conference on Acoustics, Speech, and Signal Processing - Vancouver, Canada Duration: 26 May 2013 → 31 May 2013 https://www2.securecms.com/ICASSP2013/default.asp |
Conference
Conference | 38th IEEE International Conference on Acoustics, Speech, and Signal Processing |
---|---|
Abbreviated title | ICASSP 2013 |
Country/Territory | Canada |
City | Vancouver |
Period | 26/05/13 → 31/05/13 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- autoregressive processes
- error statistics
- feature extraction
- Hilbert transforms
- image representation
- speech recognition
- phase-based feature representation
- robust speech recognition
- AR model extraction
- autoregressive model extraction
- group delay function computation
- GDF computation
- scale information augmentation
- power spectrum
- high-resolution estimation
- low frequency leakage
- compression step
- feature vector
- Hilbert transform relations
- phase spectrum information
- additive noises
- convolutional noises
- averaged error rates
- standard MFCC
- Speech
- Robustness
- Speech recognition
- Abstracts
- Mel frequency cepstral coefficient
- Speech phase spectrum
- group delay
- compression
- scale information