Abstract / Description of output
Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods.
Research on self-supervised monocular training usually explores increasingly complex architectures, loss functions, and image formation models, all of which have recently helped to close the gap with fully-supervised methods. We show that a surprisingly simple model, and associated design choices, lead to superior predictions. In particular, we propose (i) a minimum reprojection loss, designed to robustly handle occlusions, (ii) a full-resolution multi-scale sampling method that reduces visual artifacts, and (iii) an auto-masking loss to ignore training pixels that violate camera motion assumptions. We demonstrate the effectiveness of each component in isolation, and show high quality, state-of-the-art results on the KITTI benchmark.
Research on self-supervised monocular training usually explores increasingly complex architectures, loss functions, and image formation models, all of which have recently helped to close the gap with fully-supervised methods. We show that a surprisingly simple model, and associated design choices, lead to superior predictions. In particular, we propose (i) a minimum reprojection loss, designed to robustly handle occlusions, (ii) a full-resolution multi-scale sampling method that reduces visual artifacts, and (iii) an auto-masking loss to ignore training pixels that violate camera motion assumptions. We demonstrate the effectiveness of each component in isolation, and show high quality, state-of-the-art results on the KITTI benchmark.
Original language | English |
---|---|
Title of host publication | 2019 IEEE/CVF International Conference on Computer Vision (ICCV) |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 3827-3837 |
Number of pages | 11 |
ISBN (Electronic) | 978-1-7281-4803-8 |
ISBN (Print) | 978-1-7281-4804-5 |
DOIs | |
Publication status | Published - 27 Feb 2020 |
Event | International Conference on Computer Vision 2019 - Seoul, Korea, Republic of Duration: 27 Oct 2019 → 2 Nov 2019 http://iccv2019.thecvf.com/ |
Publication series
Name | |
---|---|
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
ISSN (Print) | 1550-5499 |
ISSN (Electronic) | 2380-7504 |
Conference
Conference | International Conference on Computer Vision 2019 |
---|---|
Abbreviated title | ICCV 2019 |
Country/Territory | Korea, Republic of |
City | Seoul |
Period | 27/10/19 → 2/11/19 |
Internet address |
Fingerprint
Dive into the research topics of 'Digging Into Self-Supervised Monocular Depth Estimation'. Together they form a unique fingerprint.Profiles
-
Oisin Mac Aodha
- School of Informatics - Reader
- Institute for Adaptive and Neural Computation
- Data Science and Artificial Intelligence
Person: Academic: Research Active