End-to-end training of object class detectors for mean average precision

Paul Henderson, Vittorio Ferrari

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a method for training CNN-based object class detectors directly using mean average precision (mAP) as the training loss, in a truly end to-end fashion that includes non-maximum suppresion (NMS) at training time. This contrasts with the traditional approach of training a CNN for a window classification loss, then applying NMS only at test time, when mAP is used as the evaluation metric in place of classification accuracy. However, mAP following NMS forms a
piecewise-constant structured loss over thousands of windows, with gradients that do not convey useful information for gradient descent. Hence, we define new, general gradient-like quantities for piecewise constant functions, which have wide applicability. We describe how to calculate these efficiently for mAP following NMS, enabling to train a detector based on Fast R-CNN [1] directly for mAP. This model achieves equivalent performance to the standard Fast R-CNN on the PASCAL VOC 2007 and 2012 datasets, while being conceptually more appealing as the very same model and loss are used at both training and test time.
Original languageEnglish
Title of host publicationComputer Vision -- ACCV 2016
PublisherSpringer
Pages198-213
Number of pages15
ISBN (Electronic)978-3-319-54193-8
ISBN (Print)978-3-319-54192-1
DOIs
Publication statusPublished - 11 Mar 2017
Event13th Asian Conference on Computer Vision - Taipei, Taiwan, Province of China
Duration: 20 Nov 201624 Nov 2016
http://www.accv2016.org/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer, Cham
Volume10115
ISSN (Print)0302-9743

Conference

Conference13th Asian Conference on Computer Vision
Abbreviated titleACCV'16
Country/TerritoryTaiwan, Province of China
CityTaipei
Period20/11/1624/11/16
Internet address

Fingerprint

Dive into the research topics of 'End-to-end training of object class detectors for mean average precision'. Together they form a unique fingerprint.

Cite this