TY - JOUR
T1 - Recursive Stochastic Games with Positive Rewards
AU - Etessami, Kousha
AU - Wojtczak, Dominik
AU - Yannakakis, Mihalis
PY - 2019/7/19
Y1 - 2019/7/19
N2 - We study the complexity of a class of Markov decision processes and, more generally, stochastic games, called 1-exit Recursive Markov Decision Processes (1-RMDPs) and 1-exit Recursive Simple Stochastic Games (1-RSSGs), with strictly positive rewards. These are a class of finitely presented countable-state zero-sum turn-based stochastic games that subsume standard finitestate MDPs and Condon’s simple stochastic games. They correspond to optimization and game versions of several classic stochastic models, with rewards. In particular, they correspond to the MDP and game versions of multi-type branching processes and stochastic context-free grammars with strictly positive rewards. The goal of the two players in the game is to maximize/minimize the total expected reward generated by a play of the game. Such stochastic models arise naturally as models of probabilistic procedural programs with recursion, and the problems we address are motivated by the goal of analyzing the optimal/pessimal expected running time in such a setting. We first show that in such games both players have optimal deterministic “stackless and memoryless” optimal strategies. We then provide polynomial-time algorithms for computing the exact optimal expected reward (which may be infinite, but is otherwise rational), and optimal strategies, for both the maximizing and minimizing single-player versions of the game, i.e., for (1-exit) Recursive Markov Decision Processes (1-RMDPs). It follows that the quantitative decision problem for positive reward 1-RSSGs is in NP ∩ coNP. We show that Condon’s wellknown quantitative termination problem for finite-state simple stochastic games (SSGs) which she showed to be in NP ∩ coNP reduces to a special case of the reward problem for 1-RSSGs, namely, deciding whether the value is ∞. By contrast, for finite-state SSGs with strictly positive rewards, deciding if this expected reward value is ∞ is solvable in P-time. We also show that there is a simultaneous strategy improvement algorithm that converges in a finite number of steps to the value and optimal strategies of a 1-RSSG with positive rewards.
AB - We study the complexity of a class of Markov decision processes and, more generally, stochastic games, called 1-exit Recursive Markov Decision Processes (1-RMDPs) and 1-exit Recursive Simple Stochastic Games (1-RSSGs), with strictly positive rewards. These are a class of finitely presented countable-state zero-sum turn-based stochastic games that subsume standard finitestate MDPs and Condon’s simple stochastic games. They correspond to optimization and game versions of several classic stochastic models, with rewards. In particular, they correspond to the MDP and game versions of multi-type branching processes and stochastic context-free grammars with strictly positive rewards. The goal of the two players in the game is to maximize/minimize the total expected reward generated by a play of the game. Such stochastic models arise naturally as models of probabilistic procedural programs with recursion, and the problems we address are motivated by the goal of analyzing the optimal/pessimal expected running time in such a setting. We first show that in such games both players have optimal deterministic “stackless and memoryless” optimal strategies. We then provide polynomial-time algorithms for computing the exact optimal expected reward (which may be infinite, but is otherwise rational), and optimal strategies, for both the maximizing and minimizing single-player versions of the game, i.e., for (1-exit) Recursive Markov Decision Processes (1-RMDPs). It follows that the quantitative decision problem for positive reward 1-RSSGs is in NP ∩ coNP. We show that Condon’s wellknown quantitative termination problem for finite-state simple stochastic games (SSGs) which she showed to be in NP ∩ coNP reduces to a special case of the reward problem for 1-RSSGs, namely, deciding whether the value is ∞. By contrast, for finite-state SSGs with strictly positive rewards, deciding if this expected reward value is ∞ is solvable in P-time. We also show that there is a simultaneous strategy improvement algorithm that converges in a finite number of steps to the value and optimal strategies of a 1-RSSG with positive rewards.
U2 - 10.1016/j.tcs.2018.12.018
DO - 10.1016/j.tcs.2018.12.018
M3 - Article
VL - 777
SP - 308
EP - 328
JO - Theoretical Computer Science
JF - Theoretical Computer Science
SN - 0304-3975
ER -