Long-Term Statistical Feature Extraction from Speech Signal and Its Application in Emotion Recognition

Erfan Loweimi, Mortaza Doulaty, Jon Barker, Thomas Hain

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we propose a statistical-based parametrization framework for representing the speech through a fixed-length supervector which paves the way for capturing the long-term properties of this signal. Having a fixed-length representation for a variable-length pattern like speech which preserved the task-relevant information allows for using a wide range of powerful discriminative models which could not effectively handle the variability in the pattern length. In the proposed approach, a GMM is trained for each class and the posterior probabilities of the components of all the GMMs are computed for each data instance (frame), averaged over all utterance frames and finally stacked into a supervector. The main benefits of the proposed method are making the feature extraction task-specific, performing a remarkable dimensionality reduction and yet preserving the discriminative capability of the extracted features. This method leads to an 7.6 % absolute performance improvement in comparison with the baseline system which is a GMM-based classifier and results in 87.6 % accuracy in emotion recognition task. Human performance on the employed database (Berlin) is reportedly 84.3 %.
Original languageEnglish
Title of host publicationStatistical Language and Speech Processing
Subtitle of host publicationSLSP 2015
EditorsAdrian-Horia Dediu, Carlos Martín-Vide, Klára Vicsi
PublisherSpringer, Cham
Pages173-184
Number of pages12
ISBN (Electronic)978-3-319-25789-1
ISBN (Print)978-3-319-25788-4
DOIs
Publication statusPublished - 2015
Event3rd International Conference on Statistical Language and Speech Processing - Budapest, Hungary
Duration: 24 Nov 201526 Nov 2015
http://grammars.grlmc.com/SLSP2015/

Publication series

NameLecture Notes in Computer Science
Volume9449
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference3rd International Conference on Statistical Language and Speech Processing
Abbreviated titleSLSP 2015
CountryHungary
CityBudapest
Period24/11/1526/11/15
Internet address

Fingerprint Dive into the research topics of 'Long-Term Statistical Feature Extraction from Speech Signal and Its Application in Emotion Recognition'. Together they form a unique fingerprint.

Cite this