Learning Tags from Unsegmented Videos of Multiple Human Actions

T. M. Hospedales, S. Gong, T. Xiang

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Providing methods to support semantic interaction with growing volumes of video data is an increasingly important challenge for data mining. To this end, there has been some success in recognition of simple objects and actions in video, however most of this work requires strongly supervised training data. The supervision cost of these approaches therefore renders them economically non-scalable for real world applications. In this paper we address the problem of learning to annotate and retrieve semantic tags of human actions in realistic video data with sparsely provided tags of semantically salient activities. This is challenging because of (1) the multi-label nature of the learning problem and (2) realistic videos are often dominated by (semantically uninteresting) background activity un-supported by any tags of interest, leading to a strong irrelevant data problem. To address these challenges, we introduce a new topic model based approach to video tag annotation. Our model simultaneously learns a low dimensional representation of the video data, which dimensions are semantically relevant (supported by tags), and how to annotate videos with tags. Experimental evaluation on three different video action/activity datasets demonstrate the challenge of this problem, and value of our contribution.
Original languageEnglish
Title of host publication2011 IEEE 11th International Conference on Data Mining
Place of Publication978-1-4577-2075-8
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages9
ISBN (Electronic)978-0-7695-4408-3
Publication statusPublished - 1 Dec 2011


Dive into the research topics of 'Learning Tags from Unsegmented Videos of Multiple Human Actions'. Together they form a unique fingerprint.

Cite this