As more and more audio-visual content such as talks, lectures and presentations is made available online, it becomes increasingly difficult for prospective viewers of such content to assess which videos they might find interesting or engaging. Automatic classification of content as engaging versus non-engaging might help viewers cope with this situation, and presenters gauge their presentation skills. In addition, automatic classification could be useful for a variety of applications, including recommendation and personalized video segmentation. This paper explores some camera views (close up and distance shots etc.) along with paralinguistic features which can be used to predict viewer engagement, and give feedback to speakers as to whether and why their talk is engaging or not. The TED talk data set (1340 videos) and user engagement ratings are used in this study. This paper also sheds lights on how these engagement ratings are correlated with each other and with the liveliness of speech.
|Title of host publication||IEEE International Symposium on Signal Processing and Information Technology|
|Publisher||Institute of Electrical and Electronics Engineers (IEEE)|
|Number of pages||6|
|Publication status||Published - 7 Dec 2015|