Click to grasp: Zero-shot precise manipulation via visual diffusion descriptors

Nikolas Tsagkas*, Jack Rome, Subramanian Ramamoorthy, Oisin Mac Aodha, Xiaoxuan Chris Lu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Precise manipulation that is generalizable across scenes and objects remains a persistent challenge in robotics. Current approaches for this task heavily depend on having a significant number of training instances to handle objects with pronounced visual and/or geometric part ambiguities. Our work explores the grounding of fine-grained part descriptors for precise manipulation in a zero-shot setting by utilizing web trained text-to-image diffusion-based generative models. We tackle the problem by framing it as a dense semantic part correspondence task. Our model returns a gripper pose for manipulating a specific part, using as reference a user-defined click from a source image of a visually different instance of the same object. We require no manual grasping demonstrations as we leverage the intrinsic object geometry and features. Practical experiments in a real-world tabletop scenario validate the efficacy of our approach, demonstrating its potential for advancing semantic-aware robotics manipulation.
Original languageEnglish
Title of host publicationProceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems
Publication statusAccepted/In press - 30 Jun 2024
Event2024 IEEE/RSJ International Conference on Intelligent Robots and Systems - Abu Dhabi, United Arab Emirates
Duration: 14 Oct 202418 Oct 2024
https://iros2024-abudhabi.org/

Conference

Conference2024 IEEE/RSJ International Conference on Intelligent Robots and Systems
Abbreviated titleIROS 2024
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period14/10/2418/10/24
Internet address

Fingerprint

Dive into the research topics of 'Click to grasp: Zero-shot precise manipulation via visual diffusion descriptors'. Together they form a unique fingerprint.

Cite this