Training object class detectors with click supervision

Dim P. Papadopoulos, Jasper R. R. Uijlings, Frank Keller, Vittorio Ferrari

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Training object class detectors typically requires a large set of images with objects annotated by bounding boxes. However, manually drawing bounding boxes is very time consuming. In this paper we greatly reduce annotation
time by proposing center-click annotations: we ask annotators to click on the center of an imaginary bounding box which tightly encloses the object instance. We then incorporate these clicks into existing Multiple Instance Learning
techniques for weakly supervised object localization, to jointly localize object bounding boxes over all training images. Extensive experiments on PASCAL VOC 2007 and MS COCO show that: (1) our scheme delivers high-quality detectors, performing substantially better than those produced by weakly supervised techniques, with a modest extra annotation effort; (2) these detectors in fact perform in a range close to those trained from manually drawn bounding boxes; (3) as the center-click task is very fast, our scheme reduces total annotation time by 9x to 18x.
Original languageEnglish
Title of host publicationThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages10
ISBN (Electronic)978-1-5386-0457-1
ISBN (Print)978-1-5386-0458-8
Publication statusE-pub ahead of print - 9 Nov 2017
EventProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu 2017. - Hawaii, Honolulu, United States
Duration: 21 Jul 201726 Jul 2017

Publication series

ISSN (Print)1063-6919


ConferenceProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu 2017.
Country/TerritoryUnited States
Internet address


Dive into the research topics of 'Training object class detectors with click supervision'. Together they form a unique fingerprint.

Cite this