Abstract
Generative object-centric scene representation learning is crucial for structural visual scene understanding. Built upon variational autoencoders (VAEs), current approaches infer a set of latent object representations to interpret a scene observation (e.g. an image) under the assumption that each part (e.g. a pixel) of a scene observation must be explained by one and only one object of the underlying scene. Despite the impressive performance these models achieved in unsupervised scene factorization and representation learning, we show empirically that they often produce duplicate scene object representations which directly harms the scene factorization performance. In this paper, we address the issue by introducing a differentiable prior that explicitly forces the inference to suppress duplicate latent object representations. The extension is evaluated by adding it to three different unsupervised scene factorization approaches. The results show that the models trained with the proposed method not only outperform the original models in scene factorization and have fewer duplicate representations, but also achieve better variational posterior approximations than the original models.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 32nd British Machine Vision Conference (BMVC 2021) |
| Publisher | British Machine Vision Conference |
| Number of pages | 12 |
| Publication status | Published - 25 Nov 2021 |
| Event | The 32nd British Machine Vision Conference - Virtual Duration: 22 Nov 2021 → 25 Nov 2021 https://www.bmvc2021.com/ |
Conference
| Conference | The 32nd British Machine Vision Conference |
|---|---|
| Abbreviated title | BMVC 2021 |
| Period | 22/11/21 → 25/11/21 |
| Internet address |
Keywords / Materials (for Non-textual outputs)
- object-centric representation learning
- variational autoencoders
- scene representation
Fingerprint
Dive into the research topics of 'Duplicate Latent Representation Suppression for Multi-object Variational Autoencoders'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver