Abstract / Description of output
Attitudes play an important role in human communication.Models and algorithms for automatic recognition of attitudes therefore may have applications in areas where successful communication and interaction are crucial, such as healthcare, education and digital entertainment. This paper focuses on the task of categorizing speaker attitudes using speech features. Data extracted from video recordings are employed in training and testing of predictive models consisting of different sets of speech features. A novel attitude recognition approach using Multi-Resolution Cochleagram (MRCG) features is proposed. The results show that MRCG feature set outperforms the feature sets most commonly used in computational paralinguistic tasks, including emobase, eGeMAPS and ComParE, in terms of attitude recognition accuracy for decision tree, 1-nearest neighbour and random forest classifiers. Analysis of the results suggests that MRCG features contribute information not captured by these existing feature sets. Indeed, while the ComParE feature set provides slightly better results than MRCG features for support vector machine classifiers, the fusion of the existing feature sets with the new MRCG features improves on those results. Overall, with the addition of MRCG, the attitude recognition method proposed in this study achieves accuracy scores approximately 11 points higher than reported in previous studies.
|Publication status||Published - 12 May 2019|
|Event||2019 IEEE International Conference on Acoustics, Speech and Signal Processing - Brighton Conference Centre, Brighton, United Kingdom|
Duration: 12 May 2019 → 17 May 2019
|Conference||2019 IEEE International Conference on Acoustics, Speech and Signal Processing|
|Abbreviated title||ICASSP 2019|
|Period||12/05/19 → 17/05/19|