Attitude recognition of video bloggers using audio-visual descriptors

Fasih Haider, Loredana Sundberg Cerrato, Nick Campbell, Saturnino Luz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In social media, vlogs (video blogs) are a form of unidirectional communication, where the vloggers (video bloggers) convey their messages (opinions, thoughts, etc.) to a potential audience which cannot give them feedback in real time. In this kind of communication, the non-verbal behaviour and personality impression of a video blogger tends to influence viewers' attention because non-verbal cues are correlated with the messages conveyed by a vlogger. In this study, we use the acoustic and visual features (body movements that are captured by low-level visual descriptors) to predict the six different attitudes (amusement, enthusiasm, friendliness, frustration, impatience and neutral) annotated in the speech of 10 video bloggers. The automatic detection of attitude can be helpful in a scenario where a machine has to automatically provide feedback to bloggers about their performance in terms of the extent to which they manage to engage the audience by displaying certain attitudes. Attitude recognition models are trained using the random forest classifier. Results show that: 1) acoustic features provide better accuracy than the visual features, 2) while fusion of audio and visual features does not increase overall accuracy, it improves the results for some attitudes and subjects, and 3) densely extracted histograms of flow provide better results than other visual descriptors. A three-class (positive, negative and neutral attitudes) problem has also been defined. Results for this setting show that feature fusion degrades overall classifier accuracy, and the classifiers perform better on the original six-class problem than on the three-class setting.

Original languageEnglish
Title of host publication2016 Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, MA3HMI 2016
PublisherAssociation for Computing Machinery (ACM)
Pages38-42
Number of pages5
DOIs
Publication statusPublished - 12 Jan 2016
Event2016 Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, MA3HMI 2016 - Tokyo, Japan
Duration: 12 Nov 201616 Nov 2016

Conference

Conference2016 Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, MA3HMI 2016
Country/TerritoryJapan
CityTokyo
Period12/11/1616/11/16

Keywords / Materials (for Non-textual outputs)

  • Attitude recogniton
  • Emotion recognition
  • Expressive speech analysis
  • Instructional advice
  • Non-verbal behavior analysis
  • Social media
  • Video blogs
  • Vloggers

Fingerprint

Dive into the research topics of 'Attitude recognition of video bloggers using audio-visual descriptors'. Together they form a unique fingerprint.

Cite this