Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion

Parag K. Mital, Tim J. Smith, Robin L. Hill, John M. Henderson

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Where does one attend when viewing dynamic scenes? Research into the factors influencing gaze location during static scene viewing have reported that low-level visual features contribute very little to gaze location especially when opposed by high-level factors such as viewing task. However, the inclusion of transient features such as motion in dynamic scenes may result in a greater influence of visual features on gaze allocation and coordination of gaze across viewers. In the present study, we investigated the contribution of low- to mid-level visual features to gaze location during free-viewing of a large dataset of videos ranging in content and length. Signal detection analysis on visual features and Gaussian Mixture Models for clustering gaze was used to identify the contribution of visual features to gaze location. The results show that mid-level visual features including corners and orientations can distinguish between actual gaze locations and a randomly sampled baseline. However, temporal features such as flicker, motion, and their respective contrasts were the most predictive of gaze location. Additionally, moments in which all viewers’ gaze tightly clustered in the same location could be predicted by motion. Motion and mid-level visual features may influence gaze allocation in dynamic scenes, but it is currently unclear whether this influence is involuntary or due to correlations with higher order factors such as scene semantics.
Original languageEnglish
Pages (from-to)5-24
Number of pages20
JournalCognitive Computation
Volume3
Issue number1
DOIs
Publication statusPublished - 2011

Keywords / Materials (for Non-textual outputs)

  • Visual attention
  • Eye movements
  • Clustering
  • Dynamic scenes
  • Features

Fingerprint

Dive into the research topics of 'Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion'. Together they form a unique fingerprint.

Cite this