Probabilistic inference for solving (PO) MDPs

Marc Toussaint, Stefan Harmeling, Amos Storkey

Research output: Working paper

Abstract

The development of probabilistic inference techniques has made considerable progress in recent years, in particular with respect to exploiting the structure (e.g., factored, hierarchical or relational) of discrete and continuous problem domains. We show that these techniques can be used also for solving Markov Decision Processes (MDPs) or partial observable MDPs (POMDPs) when formulated in terms of a structured dynamic Bayesian network (DBN). The approach is based on an equivalence between maximization of the expected future return in the time-unlimited MDP and likelihood maximization in a related mixture of finite-time MDPs. This allows us to use expectation maximization (EM) for computing optimal policies, using arbitrary inference techniques in the E-step. Unlike previous approaches we can show that this actually optimizes the discounted expected future return for arbitrary reward functions and without assuming an ad hoc finite total time. We first develop the approach for standard MDPs and demonstrate it using exact inference on a discrete maze and Gaussian belief state propagation in non-linear stochastic optimal control problems. Then we present an extension for solving POMDPs. We consider an agent model that includes an internal memory variable used for gating reactive behaviors. Using exact inference on the respective DBN, the EM-algorithm solves complex maze problems by learning appropriate internal memory representations.
Original languageEnglish
Number of pages22
Publication statusPublished - Dec 2006

Keywords

  • Markov Decision Process, probabilistic inference, POMDP, planning

Fingerprint

Dive into the research topics of 'Probabilistic inference for solving (PO) MDPs'. Together they form a unique fingerprint.

Cite this