Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min(Max) Polynomial Bellman Equations

Kousha Etessami, Alistair Stewart, Mihalis Yannakakis

Research output: Contribution to journalArticlepeer-review

Abstract

We show that one can compute the least non-negative solution (a.k.a., least fixed point) for a system of probabilistic min (max) polynomial equations, to any desired accuracy > 0, in time polynomial in both the encoding size of the system and in log(1=). These are Bellman optimality equations for important classes of infinite-state Markov Decision Processes (MDPs), including Branching MDPs (BMDPs), which generalize classic multitype branching stochastic processes. We thus obtain the first polynomial time algorithm for computing, to any desired precision, optimal (maximum and minimum) extinction probabilities for BMDPs. Our algorithms are based on a novel generalization of Newton’s method which employs linear programming in each iteration.We also provide P-time algorithms for computing an ε-optimal policy for both maximizing and minimizing extinction probabilities in a BMDP, whereas we note a hardness result for computing an exact optimal policy. Furthermore, improving on prior results, we provide more efficient P-time algorithms for qualitative analysis of BMDPs, i.e., for determining whether the maximum or minimum extinction probability is 1, and, if so, computing a policy that achieves this. We also observe some complexity consequences of our results for branching simple stochastic games, which generalize BMDPs.
Original languageEnglish
Pages (from-to)34-62
Number of pages57
JournalMathematics of Operations Research
Volume45
Issue number1
Early online date5 Dec 2019
DOIs
Publication statusPublished - 28 Feb 2020

Fingerprint Dive into the research topics of 'Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min(Max) Polynomial Bellman Equations'. Together they form a unique fingerprint.

Cite this