Vision-as-Inverse-Graphics: Obtaining a Rich 3D Explanation of a Scene from a Single Image

Lukasz Romaszko, Christopher K I Williams, Pol Moreno, Pushmeet Kohli

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We develop an inverse graphics approach to the problem of scene understanding, obtaining a rich representation that includes descriptions of the objects in the scene and their spatial layout, as well as global latent variables like the camera parameters and lighting. The framework’s stages include object detection, the prediction of the camera and lighting variables, and prediction of object-specific variables (shape, appearance and pose). This acts like the encoder of an autoencoder, with graphics rendering as the decoder. Importantly the scene representation is interpretable and is of variable dimension to match the detected number of objects plus the global variables. For the prediction of the camera latent variables we introduce a novel architecture termed Probabilistic HoughNets (PHNs), which provides a principled approach to combining information from multiple detections. We demonstrate the quality of the reconstructions obtained quantitatively on synthetic data, and qualitatively on real scenes.
Original languageEnglish
Title of host publicationICCV 2017 Workshop on Geometry Meets Deep Learning
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages9
ISBN (Electronic)978-1-5386-1034-3
ISBN (Print)978-1-5386-1035-0
Publication statusPublished - 23 Jan 2018
EventGeometry Meets Deep Learning ICCV 2017 Workshop - Palazzo del Cinema, Venice, Italy
Duration: 28 Oct 201728 Oct 2017

Publication series

ISSN (Electronic)2473-9944


ConferenceGeometry Meets Deep Learning ICCV 2017 Workshop
Internet address


Dive into the research topics of 'Vision-as-Inverse-Graphics: Obtaining a Rich 3D Explanation of a Scene from a Single Image'. Together they form a unique fingerprint.

Cite this