Weakly Supervised Gaussian Networks for Action Detection

Basura Fernando, Cheston Tan Yin Chet, Hakan Bilen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Detecting temporal extents of human actions in videos is a challenging computer vision problem that requires detailed manual supervision including frame-level labels.This expensive annotation process limits deploying action detectors to a limited number of categories. We propose a novel method, called WSGN, that learns to detect actions from weak supervision, using only video-level labels. WSGN learns to exploit both video-specific and dataset-wide statistics to predict relevance of each frame to an action category. This strategy leads to significant gains in action detection for two standard benchmarks THU-MOS14 and Charades. Our method obtains excellent results compared to state-of-the-art methods that uses similar features and loss functions on THUMOS14 dataset. Similarly, our weakly supervised method is only 0.3% mAP behind a state-of-the-art supervised method on challenging Charades dataset for action localization.
Original languageEnglish
Title of host publication2020 IEEE Winter Conference on Applications of Computer Vision
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages10
ISBN (Electronic)978-1-7281-6553-0
ISBN (Print)978-1-7281-6554-7
Publication statusPublished - 14 May 2020
Event2020 Winter Conference on Applications of Computer Vision - Aspen, United States
Duration: 1 Mar 20205 Mar 2020

Publication series

ISSN (Print)2472-6737
ISSN (Electronic)2642-9381


Conference2020 Winter Conference on Applications of Computer Vision
Abbreviated titleWACV 2020
Country/TerritoryUnited States
Internet address


Dive into the research topics of 'Weakly Supervised Gaussian Networks for Action Detection'. Together they form a unique fingerprint.

Cite this