Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Nanbo Li, Cian Eastwood, Robert B Fisher

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Learning object-centric representations of multi-object scenes is a promising approach towards machine intelligence, facilitating high-level reasoning and control from visual sensory data. However, current approaches for unsupervised object-centric scene representation are incapable of aggregating information from multiple observations of a scene. As a result, these “single-view” methods form their representations of a 3D scene based only on a single 2D observation (view). Naturally, this leads to several inaccuracies, with these methods falling victim to single-view spatial ambiguities. To address this, we propose The Multi-View and Multi-Object Network (MulMON)—a method for learning accurate, object-centric representations of multi-object scenes by leveraging multiple views. In order to sidestep the main technical difficulty of the multi-object-multi-view scenario—maintaining object correspondences across views—MulMON iteratively updates the latent object representations for a scene over multiple views. To ensure that these iterative updates do indeed aggregate spatial information to form a complete 3D scene understanding, MulMON is asked to predict the appearance of the scene from novel viewpoints during training. Through experiments we show that MulMON better-resolves spatial ambiguities than single-view methods—learning more accurate and disentangled object representations—and also achieves new functionality in predicting object segmentations for novel viewpoints. Our implementation and pretrained models are given on GitHub.
Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 33 (NeurIPS 2020)
PublisherCurran Associates Inc
Pages5656-5666
Number of pages11
Publication statusPublished - 6 Dec 2020
EventThirty-fourth Conference on Neural Information Processing Systems - Virtual Conference
Duration: 6 Dec 202012 Dec 2020
https://nips.cc/Conferences/2020

Publication series

NameAdvances in Neural Information Processing Systems
Volume33
ISSN (Electronic)1049-5258

Conference

ConferenceThirty-fourth Conference on Neural Information Processing Systems
Abbreviated titleNeurIPS 2020
CityVirtual Conference
Period6/12/2012/12/20
Internet address

Fingerprint

Dive into the research topics of 'Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views'. Together they form a unique fingerprint.

Cite this