Towards a Data Efficient Off-Policy Policy Gradient

Josiah Hanna, Peter Stone

Research output: Contribution to conferencePaperpeer-review

Abstract / Description of output

The ability to learn from off-policy data -- data generated from past interaction with the environment -- is essential to data efficient reinforcement learning. Recent work has shown that the use of off-policy data not only allows the re-use of data but can even improve performance in comparison to on-policy reinforcement learning. In this work we investigate if a recently proposed method for learning a better data generation policy, commonly called a behavior policy, can also increase the data efficiency of policy gradient reinforcement learning. Empirical results demonstrate that with an appropriately selected behavior policy we can estimate the policy gradient more accurately. The results also motivate further work into developing methods for adapting the behavior policy as the policy we are learning changes.
Original languageEnglish
Pages320-323
Number of pages4
Publication statusPublished - 15 Mar 2018
EventAAAI 2018 Spring Symposium Series: Data-Efficient Reinforcement Learning - Palo Alto, United States
Duration: 26 Mar 201828 Mar 2018
https://www.prowler.io/events/aaai-symposium

Symposium

SymposiumAAAI 2018 Spring Symposium Series
Country/TerritoryUnited States
CityPalo Alto
Period26/03/1828/03/18
Internet address

Keywords / Materials (for Non-textual outputs)

  • reinforcement learning
  • policy evaluation
  • off-policy

Fingerprint

Dive into the research topics of 'Towards a Data Efficient Off-Policy Policy Gradient'. Together they form a unique fingerprint.

Cite this