In this work, we compare emotion recognition on two types of speech: spontaneous and acted dialogues. Experiments were conducted on the AVEC 2012 database of spontaneous dialogues and the IEMOCAP database of acted dialogues. We studied the performance of two types of acoustic features for emotion recognition: knowledge-inspired disfluency and nonverbal vocalisation (DIS-NV) features, and statistical Low-Level Descriptor (LLD) based features. Both Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) were built using each feature set on each emotional database. Our work aims to identify aspects of the data that constrain the effectiveness of models and features. Our results show that the performance of different types of features and models is influenced by the type of dialogue and the amount of training data. Because DIS-NVs are less frequent in acted dialogues than in spontaneous dialogues, the DIS-NV features perform better than the LLD features when recognizing emotions in spontaneous dialogues, but not in acted dialogues. The LSTM-RNN model gives better performance than the SVMmodel when there is enough training data, but the complex structure of a LSTM-RNN model may limit its performance when there is less training data available, and may also risk over-fitting. Additionally, we find that long distance contexts may be more useful when performing emotion recognition at the word level than at the utterance level.
|Title of host publication||Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on|
|Publisher||Institute of Electrical and Electronics Engineers (IEEE)|
|Pages||698 - 704|
|Number of pages||7|
|Publication status||Published - 2015|
FingerprintDive into the research topics of 'Emotion Recognition in Spontaneous and Acted Dialogues'. Together they form a unique fingerprint.
- School of Philosophy, Psychology and Language Sciences - Lecturer in Speech and Language Processing
- Institute of Language, Cognition and Computation
- Centre for Speech Technology Research
Person: Academic: Research Active