Learning to Detect Objects from Eye-Tracking Data

D.P. Papadopoulous, A.D.F. Clarke, F Keller, V Ferrari

Research output: Contribution to journalArticlepeer-review

Abstract

One of the bottlenecks in computer vision, especially in object detection, is the need for a large amount of training data. Typically, this is acquired by manually annotating images by hand. In this study, we explore the possibility of using eye-trackers to provide training data for supervised machine learning. We have created a new large scale eye-tracking dataset, collecting fixation data for 6270 images from the Pascal VOC 2012 database. This represents 10 of the 20 classes included in the Pascal database. Each image was viewed by 5 observers, and a total of over 178k fixations have been collected. While previous attempts at using fixation data in computer vision were based on a free-viewing paradigm, we used a visual search task in order to increase the proportion of fixations on the target object. Furthermore, we divided the dataset into five pairs of semantically similar classes (cat/dog, bicycle/motorbike, horse/cow, boat/aeroplane and sofa/diningtable), with the observer having to decide which class each image belonged to. This kept the observer's task simple, while decreasing the chance of them using the scene gist to identify the target parafoveally. In order to alleviate the central bias in scene viewing, the images were presented to the observers with a random offset. The goal of our project is to use the eye-tracking information in order to detect and localise the attended objects. Our model so far, based on features representing the location of the fixations and an appearance model of the attended regions, can successfully predict the location of the target objects in over half of images.
Original languageEnglish
Pages (from-to)488-488
Number of pages1
Journali-Perception
Volume5
Issue number5
DOIs
Publication statusPublished - Aug 2014

Fingerprint

Dive into the research topics of 'Learning to Detect Objects from Eye-Tracking Data'. Together they form a unique fingerprint.

Cite this