A visual language model for estimating object pose and structure in a generative visual domain

Siddharth Narayanaswamy, Andrei Barbu, Jeffrey Mark Siskind

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a generative domain of visual objects by analogy to the generative nature of human language. Just as small inventories of phonemes and words combine in a grammatical fashion to yield myriad valid words and utterances, a small inventory of physical parts combine in a grammatical fashion to yield myriad valid assemblies. We apply the notion of a language model from speech recognition to this visual domain to similarly improve the performance of the recognition process over what would be possible by only applying recognizers to the components. Unlike the context-free models for human language, our visual language models are context sensitive and formulated as stochastic constraint-satisfaction problems. And unlike the situation for human language where all components are observable, our methods deal with occlusion, successfully recovering object structure despite unobservable components. We demonstrate our system with an integrated robotic system for disassembling structures that performs whole-scene reconstruction consistent with a language model in the presence of noisy feature detectors.
Original languageEnglish
Title of host publication2011 IEEE International Conference on Robotics and Automation
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages4854-4860
Number of pages7
ISBN (Electronic)978-1-61284-385-8
ISBN (Print)978-1-61284-386-5
DOIs
Publication statusPublished - 15 Aug 2011
Event2011 IEEE International Conference on Robotics and Automation - Shanghai, China
Duration: 9 May 201113 May 2011

Publication series

Name
PublisherIEEE
ISSN (Print)1050-4729
ISSN (Electronic)1050-4729

Conference

Conference2011 IEEE International Conference on Robotics and Automation
Abbreviated titleICRA 2011
CountryChina
CityShanghai
Period9/05/1113/05/11

Fingerprint

Dive into the research topics of 'A visual language model for estimating object pose and structure in a generative visual domain'. Together they form a unique fingerprint.

Cite this