MVSAnywhere: Zero-shot multi-view stereo

Sergio Izquierdo, Mohamed Sayed, Michael Firman, Guillermo Garcia-Hernando, Daniyar Turmukhambetov, Javier Civera, Oisin Mac Aodha, Gabriel Brostow, Jamie Watson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Computing accurate depth from multiple views is a fundamental and longstanding challenge in computer vision. However, most existing approaches do not generalize well across different domains and scene types (e.g. indoor vs. outdoor). Training a general-purpose multi-view stereo model is challenging and raises several questions, e.g. how to best make use of transformer-based architectures, how to incorporate additional metadata when there is a variable number of input views, and how to estimate the range of valid depths which can vary considerably across different scenes and is typically not known a priori? To address these issues, we introduce MVSA, a novel and versatile Multi-View Stereo architecture that aims to work Anywhere by generalizing across diverse domains and depth ranges. MVSA combines monocular and multi-view cues with an adaptive cost volume to deal with scale-related issues. We demonstrate state-of-the-art zero-shot depth estimation on the Robust Multi-View Depth Benchmark, surpassing existing multi-view stereo and monocular baselines
Original languageEnglish
Title of host publicationProceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition
PublisherInstitute of Electrical and Electronics Engineers
Pages1-28
Number of pages28
Publication statusAccepted/In press - 26 Feb 2025
EventThe IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025 - Music City Center, Nashville, United States
Duration: 11 Jun 202515 Jun 2025
https://cvpr.thecvf.com/

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
PublisherInstitute of Electrical and Electronics Engineers
ISSN (Print)1063-6919
ISSN (Electronic)2575-7075

Conference

ConferenceThe IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025
Abbreviated titleCVPR 2025
Country/TerritoryUnited States
CityNashville
Period11/06/2515/06/25
Internet address

Keywords / Materials (for Non-textual outputs)

  • computer vision and pattern recognition

Fingerprint

Dive into the research topics of 'MVSAnywhere: Zero-shot multi-view stereo'. Together they form a unique fingerprint.

Cite this