Learning Stereo from Single Images

Jamie Watson, Oisin Mac Aodha, Daniyar Turmukhambetov, Gabriel J. Brostow, Michael Firman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Supervised deep networks are among the best methods for finding correspondences in stereo image pairs. Like all supervised approaches, these networks require ground truth data during training. However, collecting large quantities of accurate dense correspondence data is very challenging. We propose that it is unnecessary to have such a high reliance on ground truth depths or even corresponding stereo pairs. Inspired by recent progress in monocular depth estimation, we generate plausible disparity maps from single images. In turn, we use those flawed disparity maps in a carefully designed pipeline to generate stereo training pairs. Training in this manner makes it possible to convert any collection of single RGB images into stereo training data. This results in a significant reduction in human effort, with no need to collect real depths or to hand-design synthetic data. We can consequently train a stereo matching network from scratch on datasets like COCO, which were previously hard to exploit for stereo. Through extensive experiments we show that our approach outperforms stereo networks trained with standard synthetic datasets, when evaluated on KITTI, ETH3D, and Middlebury. Code to reproduce our results is available at https://github.com/nianticlabs/stereo-from-mono/.
Original languageEnglish
Title of host publicationComputer Vision – ECCV 2020
Subtitle of host publication16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I
PublisherSpringer
Pages722-740
Number of pages19
ISBN (Electronic)978-3-030-58452-8
ISBN (Print)978-3-030-58451-1
DOIs
Publication statusPublished - 3 Nov 2020
Event16th European Conference on Computer Vision - Virtual conference
Duration: 23 Aug 202028 Aug 2020
https://eccv2020.eu/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer, Cham
Volume12346
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th European Conference on Computer Vision
Abbreviated titleECCV 2020
CityVirtual conference
Period23/08/2028/08/20
Internet address

Keywords / Materials (for Non-textual outputs)

  • Stereo matching
  • Correspondence training data

Fingerprint

Dive into the research topics of 'Learning Stereo from Single Images'. Together they form a unique fingerprint.

Cite this