High level visual and paralinguistic features extraction and their correlation with user engagement

Fasih Haider, Fahim A Salim, Saturnino Luz, Owen Conlan, Nick Campbell

Research output: Chapter in Book/Report/Conference proceedingConference contribution


As more and more audio-visual content such as talks, lectures and presentations is made available online, it becomes increasingly difficult for prospective viewers of such content to assess which videos they might find interesting or engaging. Automatic classification of content as engaging versus non-engaging might help viewers cope with this situation, and presenters gauge their presentation skills. In addition, automatic classification could be useful for a variety of applications, including recommendation and personalized video segmentation. This paper explores some camera views (close up and distance shots etc.) along with paralinguistic features which can be used to predict viewer engagement, and give feedback to speakers as to whether and why their talk is engaging or not. The TED talk data set (1340 videos) and user engagement ratings are used in this study. This paper also sheds lights on how these engagement ratings are correlated with each other and with the liveliness of speech.
Original languageEnglish
Title of host publicationIEEE International Symposium on Signal Processing and Information Technology
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages6
ISBN (Electronic)978-1-5090-0481-2
ISBN (Print)978-1-5090-0480-5
Publication statusPublished - 7 Dec 2015


Dive into the research topics of 'High level visual and paralinguistic features extraction and their correlation with user engagement'. Together they form a unique fingerprint.

Cite this