Abstract / Description of output
An approach to the categorization of voice samples according to emotions expressed by the speaker is proposed which uses Multi-Resolution Cochleagram (MRCG) and scalogram features in a novel way. Audio recordings from the EmoDB, EMOVO and Savee Data-sets are employed in training and testing of predictive models consisting of different sets of speech features. This study systematically evaluates the performance of the feature sets most commonly used in computational paralinguistic tasks (i.e. emobase, eGeMAPS and ComParE) in addition to MRCG- and scalogram-derived features and their fusion, across five different classifiers. The datasets used in this evaluation include speech in three different languages (German, Italian and English). MRCG features outperform the feature sets most commonly used in computational paralinguistic tasks, including emobase, eGeMAPS and ComParE, for the EmoDB (unweighted average recall, UAR = 59:15%) and SAVEE (UAR = 36:12%) datasets, while eGeMAPS provides the best overall UAR (33.84%) for the EMOVO dataset. A support vector machine (SVM) classifier yields the best UAR for EmoDB (80.05%) through fusion of emobase, eGeMAPS, ComParE and MRCG, and for EMOVO (40.31%), through fusion of emobase, eGeMAPS and ComParE. For SAVEE, random forests provide the best result (46.55%) using the ComParE feature set.
Original language | English |
---|---|
Pages | 581-585 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 30 Aug 2021 |
Event | 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic Duration: 30 Aug 2021 → 3 Sept 2021 |
Conference
Conference | 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 |
---|---|
Country/Territory | Czech Republic |
City | Brno |
Period | 30/08/21 → 3/09/21 |
Keywords / Materials (for Non-textual outputs)
- Affective computing
- Emotion recognition
- Social signal processing