An approach for exploring a video via multimodal feature extraction and user interactions

Fahim A. Salim*, Fasih Haider, Owen Conlan, Saturnino Luz

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


Exploring the content of a video is typically inefficient due to the linear streamed nature of its media and the lack of interactivity. Video may be seen as a combination of a set of features, the visual track, the audio track and transcription of the spoken words, etc. These features may be viewed as a set of temporally bounded parallel modalities. It is our contention that together these modalities and derived features have the potential to be presented individually or in discrete combination, to allow deeper and effective content exploration within different parts of a video in an interactive manner. A novel system for video exploration by offering video content as an alternative representation is proposed. The proposed system represents the extracted multimodal features as an automatically generated interactive multimedia webpage. This paper also presents a user study conducted to learn its (proposed system) usage patterns. The learned usage patterns may be utilized to build a template driven representation engine that uses the features to offer a multimodal synopsis of video that may lead to efficient exploration of video content.

Original languageEnglish
Pages (from-to)1-12
Number of pages12
JournalJournal on Multimodal User Interfaces
Issue number4
Early online date13 Jul 2018
Publication statusPublished - Dec 2018


  • Human media interaction
  • Multimedia analysis
  • Multimodal video processing
  • Video representation


