Abstract / Description of output
In this paper, we propose an effective way for combining the discriminative and generative models for emotion recognition from speech signal. Finding an efficient feature extraction algorithm which captures just the main attribute(s) pertinent to the task and filters out the other aspects of the data turns out to be very challenging, if not impossible. We propose an interface between the front-end and the back-end in order to compensate for the shortcoming of the parameterization block in suppressing the irrelevant dimensions of the signal. This interface is a generative model, which performs remarkable dimensionality reduction, allows for extraction of a long-term feature, and also paves the way for better classification of the data through a discriminative model. This method leads to a 7.6% absolute performance improvement in comparison with the baseline system and results in 87.6% accuracy in emotion recognition task. Human performance on the same database is reportedly 84.3
Original language | English |
---|---|
Title of host publication | USES 2015 - The University of Sheffield Engineering Symposium |
Number of pages | 2 |
Publication status | Published - 1 Jun 2015 |
Keywords / Materials (for Non-textual outputs)
- Discriminative model
- Emotion recognition
- Front-end
- Generative model
- Speech signal