Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

Josiah P. Hanna, Peter Stone, Scott Niekum

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

For an autonomous agent, executing a poor policy may be costly or even dangerous. For such agents, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. Current methods for exact high confidence off-policy evaluation that use importance sampling require a substantial amount of data to achieve a tight lower bound. Existing model-based methods only address the problem in discrete state spaces. Since exact bounds are intractable for many domains we trade off strict guarantees of safety for more data-efficient approximate bounds. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete states paces. Since direct use of a model may introduce bias, we derive a theoretical upper bound on model bias for when the model transition function is estimated with i.i.d. trajectories. This bound broadens our understanding of the conditions under which model-based methods have high bias. Finally, we empirically evaluate our proposed methods and analyze the settings in which different bootstrapping off-policy confidence interval methods succeed and fail
Original languageEnglish
Title of host publicationProceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems
Place of PublicationRichland, SC
PublisherInternational Foundation for Autonomous Agents and Multiagent Systems
Number of pages9
Publication statusPublished - 8 May 2017
Event16th International Conference on Autonomous Agents and Multiagent Systems 2017 - Sao Paulo, Brazil
Duration: 8 May 201712 May 2017


Conference16th International Conference on Autonomous Agents and Multiagent Systems 2017
Abbreviated titleAAMAS 2017
CitySao Paulo
Internet address

Keywords / Materials (for Non-textual outputs)

  • reinforcement learning
  • bootstrapping
  • off-policy evaluation


Dive into the research topics of 'Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation'. Together they form a unique fingerprint.

Cite this