Abstract
The development of probabilistic inference techniques has made considerable progress in recent
years, in particular with respect to exploiting the structure (e.g., factored, hierarchical or relational)
of discrete and continuous problem domains. We show that these techniques can be used also for solving
Markov Decision Processes (MDPs) or partial observable MDPs (POMDPs) when formulated in terms of
a structured dynamic Bayesian network (DBN). The approach is based on an equivalence between maximization
of the expected future return in the time-unlimited MDP and likelihood maximization in a related
mixture of finite-time MDPs. This allows us to use expectation maximization (EM) for computing optimal
policies, using arbitrary inference techniques in the E-step. Unlike previous approaches we can show that
this actually optimizes the discounted expected future return for arbitrary reward functions and without
assuming an ad hoc finite total time.
We first develop the approach for standard MDPs and demonstrate it using exact inference on a discrete
maze and Gaussian belief state propagation in non-linear stochastic optimal control problems. Then we
present an extension for solving POMDPs. We consider an agent model that includes an internal memory
variable used for gating reactive behaviors. Using exact inference on the respective DBN, the EM-algorithm
solves complex maze problems by learning appropriate internal memory representations.
Original language | English |
---|---|
Number of pages | 22 |
Publication status | Published - Dec 2006 |
Keywords
- Markov Decision Process, probabilistic inference, POMDP, planning