Structure Inference for Bayesian Multisensory Scene Understanding

Research output: Contribution to journalArticlepeer-review


We investigate a solution to the problem of multi-sensor scene understanding by formulating it in the framework of Bayesian model selection and structure inference. Humans robustly associate multimodal data as appropriate, but previous modelling work has focused largely on optimal fusion, leaving segregation unaccounted for and unexploited by machine perception systems. We illustrate a unifying, Bayesian solution to multi-sensor perception and tracking which accounts for both integration and segregation by explicit probabilistic reasoning about data association in a temporal context. Such explicit inference of multimodal data association is also of intrinsic interest for higher level understanding of multisensory data. We illustrate this using a probabilistic implementation of data association in a multi-party audio-visual scenario, where unsupervised learning and structure inference is used to automatically segment, associate and track individual subjects in audiovisual sequences. Indeed, the structure inference based framework introduced in this work provides the theoretical foundation needed to satisfactorily explain many confounding results in human psychophysics experiments involving multimodal cue integration and association.
Original languageEnglish
Pages (from-to)2140-2157
Number of pages18
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Issue number12
Publication statusPublished - Dec 2008


  • Informatics
  • Pattern Recognition
  • Scene Analysis
  • Sensor fusion


Dive into the research topics of 'Structure Inference for Bayesian Multisensory Scene Understanding'. Together they form a unique fingerprint.

Cite this