Abstract / Description of output
Typical video classification methods often divide a video into short clips, do inference on each clip independently, then aggregate the clip-level predictions to generate the video-level results. However, processing visually similar clips independently ignores the temporal structure of the video sequence, and increases the computational cost at inference time. In this paper, we propose a novel framework named FASTER, i.e., Feature Aggregation for SpatioTEmporal Redundancy. FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities. The FASTER framework can integrate high quality representations from expensive models to capture subtle motion information and lightweight representations from cheap models to cover scene changes in the video. A new recurrent network (i.e., FAST-GRU) is designed to aggregate the mixture of different representations. Compared with existing approaches, FASTER can reduce the FLOPs by over 10× while maintaining the state-of-the-art accuracy across popular datasets, such as Kinetics, UCF-101 and HMDB-51.
Original language | English |
---|---|
Title of host publication | Proceedings of the AAAI Conference on Artificial Intelligence |
Publisher | AAAI Press |
Pages | 13098-13105 |
Number of pages | 8 |
ISBN (Print) | 978-1-57735-835-0 |
DOIs | |
Publication status | Published - 3 Apr 2020 |
Event | 34th AAAI Conference on Artificial Intelligence - New York, United States Duration: 7 Feb 2020 → 12 Feb 2020 Conference number: 34 https://aaai.org/Conferences/AAAI-19/ |
Publication series
Name | |
---|---|
Publisher | AAAI Press |
Number | 7 |
Volume | 34 |
ISSN (Print) | 2159-5399 |
ISSN (Electronic) | 2374-3468 |
Conference
Conference | 34th AAAI Conference on Artificial Intelligence |
---|---|
Abbreviated title | AAAI 2020 |
Country/Territory | United States |
City | New York |
Period | 7/02/20 → 12/02/20 |
Internet address |
Fingerprint
Dive into the research topics of 'FASTER Recurrent Networks for Efficient Video Classification'. Together they form a unique fingerprint.Profiles
-
Laura Sevilla-Lara
- School of Informatics - Reader
- Institute of Perception, Action and Behaviour
- Language, Interaction, and Robotics
Person: Academic: Research Active