Emotion Recognition in Spontaneous and Acted Dialogues

Leimin Tian, Johanna Moore, Catherine Lai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this work, we compare emotion recognition on two types of speech: spontaneous and acted dialogues. Experiments were conducted on the AVEC 2012 database of spontaneous dialogues and the IEMOCAP database of acted dialogues. We studied the performance of two types of acoustic features for emotion recognition: knowledge-inspired disfluency and nonverbal vocalisation (DIS-NV) features, and statistical Low-Level Descriptor (LLD) based features. Both Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) were built using each feature set on each emotional database. Our work aims to identify aspects of the data that constrain the effectiveness of models and features. Our results show that the performance of different types of features and models is influenced by the type of dialogue and the amount of training data. Because DIS-NVs are less frequent in acted dialogues than in spontaneous dialogues, the DIS-NV features perform better than the LLD features when recognizing emotions in spontaneous dialogues, but not in acted dialogues. The LSTM-RNN model gives better performance than the SVMmodel when there is enough training data, but the complex structure of a LSTM-RNN model may limit its performance when there is less training data available, and may also risk over-fitting. Additionally, we find that long distance contexts may be more useful when performing emotion recognition at the word level than at the utterance level.
Original languageEnglish
Title of host publicationAffective Computing and Intelligent Interaction (ACII), 2015 International Conference on
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages698 - 704
Number of pages7
ISBN (Print)978-1-4799-9953-8
DOIs
Publication statusPublished - 2015

Fingerprint

Dive into the research topics of 'Emotion Recognition in Spontaneous and Acted Dialogues'. Together they form a unique fingerprint.

Cite this