Video classification with Densely extracted HOG/HOF/MBH features: an evaluation of the accuracy/computational efficiency trade-off

J. R. R. Uijlings, I. C. Duta, E. Sangineto, Nicu Sebe

Research output: Contribution to journalArticlepeer-review

Abstract

The current state-of-the-art in video classification is based on Bag-of-Words using local visual descriptors. Most commonly these are histogram of oriented gradients (HOG), histogram of optical flow (HOF) and motion boundary histograms (MBH) descriptors. While such approach is very powerful for classification, it is also computationally expensive. This paper addresses the problem of computational efficiency. Specifically: (1) We propose several speed-ups for densely sampled HOG, HOF and MBH descriptors and release Matlab code; (2) We investigate the trade-off between accuracy and computational efficiency of descriptors in terms of frame sampling rate and type of Optical Flow method; (3) We investigate the trade-off between accuracy and computational efficiency for computing the feature vocabulary, using and comparing most of the commonly adopted vector quantization techniques: k-means, hierarchical k-means, Random Forests, Fisher Vectors and VLAD.
Original languageEnglish
Pages (from-to)33-44
Number of pages12
JournalInternational Journal of Multimedia Information Retrieval
Volume4
Issue number1
DOIs
Publication statusPublished - Mar 2015

Keywords

  • Video classification
  • HOG
  • HOF
  • MBH
  • Computational efficiency

Fingerprint

Dive into the research topics of 'Video classification with Densely extracted HOG/HOF/MBH features: an evaluation of the accuracy/computational efficiency trade-off'. Together they form a unique fingerprint.

Cite this