Abstract
The current state-of-the-art in Video Classification is based on Bag-of-Words using local visual descriptors. Most commonly these are Histogram of Oriented Gradient (HOG) and Histogram of Optical Flow (HOF) descriptors. While such system is very powerful for classification, it is also computationally expensive. This paper addresses the problem of computational efficiency. Specifically: (1) We propose several speed-ups for densely sampled HOG and HOF descriptors and release Matlab code. (2) We investigate the trade-off between accuracy and computational efficiency of descriptors in terms of frame sampling rate and type of Optical Flow method. (3) We investigate the trade-off between accuracy and computational efficiency for the video representation, using either a k-means or hierarchical k-means based visual vocabulary, a Random Forest based vocabulary or the Fisher kernel.
Original language | English |
---|---|
Title of host publication | Proceedings of International Conference on Multimedia Retrieval |
Place of Publication | New York, NY, USA |
Publisher | ACM |
Number of pages | 8 |
ISBN (Print) | 978-1-4503-2782-4 |
DOIs | |
Publication status | Published - 2014 |
Keywords / Materials (for Non-textual outputs)
- Computational Efficiency, HOF, HOG, Video Classification