FASTER Recurrent Networks for Efficient Video Classification

Linchao Zhu, Du Tran, Laura Sevilla-Lara, Yi Yang, Matt Feiszli, Heng Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Typical video classification methods often divide a video into short clips, do inference on each clip independently, then aggregate the clip-level predictions to generate the video-level results. However, processing visually similar clips independently ignores the temporal structure of the video sequence, and increases the computational cost at inference time. In this paper, we propose a novel framework named FASTER, i.e., Feature Aggregation for SpatioTEmporal Redundancy. FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities. The FASTER framework can integrate high quality representations from expensive models to capture subtle motion information and lightweight representations from cheap models to cover scene changes in the video. A new recurrent network (i.e., FAST-GRU) is designed to aggregate the mixture of different representations. Compared with existing approaches, FASTER can reduce the FLOPs by over 10× while maintaining the state-of-the-art accuracy across popular datasets, such as Kinetics, UCF-101 and HMDB-51.
Original languageEnglish
Title of host publicationProceedings of the AAAI Conference on Artificial Intelligence
PublisherAAAI Press
Pages13098-13105
Number of pages8
ISBN (Print)978-1-57735-835-0
DOIs
Publication statusPublished - 3 Apr 2020
Event34th AAAI Conference on Artificial Intelligence - New York, United States
Duration: 7 Feb 202012 Feb 2020
Conference number: 34
https://aaai.org/Conferences/AAAI-19/

Publication series

Name
PublisherAAAI Press
Number7
Volume34
ISSN (Print)2159-5399
ISSN (Electronic)2374-3468

Conference

Conference34th AAAI Conference on Artificial Intelligence
Abbreviated titleAAAI 2020
Country/TerritoryUnited States
CityNew York
Period7/02/2012/02/20
Internet address

Fingerprint

Dive into the research topics of 'FASTER Recurrent Networks for Efficient Video Classification'. Together they form a unique fingerprint.

Cite this