Reducing Sampling Error in Policy Gradient Learning

Josiah P. Hanna, Peter Stone

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper studies a class of reinforcement learning algorithms known as policy gradient methods. Policy gradient methods optimize the performance of a policy by estimating the gradient of the expected return with respect to the policy parameters. One of the core challenges of applying policy gradient methods is obtaining an accurate estimate of this gradient. Most policy gradient methods rely on Monte Carlo sampling to estimate this gradient. When only a limited number of environment steps can be collected, Monte Carlo policy gradient estimates may suffer from sampling error – samples receive more or less weight than they will in expectation. In this paper, we introduce the Sampling Error Corrected policy gradient estimator that corrects the inaccurate Monte Carlo weights. Our approach treats the observed data as if it were generated by a different policy than the policy that actually generated the data. It then uses importance sampling between the two – in the process correcting the inaccurate Monte Carlo weights. Under a limiting set of assumptions we can show that this gradient estimator will have lower variance than the Monte Carlo gradient estimator. We show experimentally that our approach improves the learning speed of two policy gradient methods compared to standard Monte Carlo sampling even when the theoretical assumptions fail to hold.
Original languageEnglish
Title of host publicationProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems
Place of PublicationRichland, SC
PublisherInternational Foundation for Autonomous Agents and Multiagent Systems
Pages1016–1024
Number of pages9
ISBN (Print)9781450363099
Publication statusPublished - 8 May 2019
EventInternational Conference on Autonomous Agents and Multi-Agent Systems 2019 - Montreal, Canada
Duration: 13 May 201917 May 2019
http://aamas2019.encs.concordia.ca/
http://aamas2019.encs.concordia.ca/

Conference

ConferenceInternational Conference on Autonomous Agents and Multi-Agent Systems 2019
Abbreviated titleAAMAS 2019
Country/TerritoryCanada
CityMontreal
Period13/05/1917/05/19
Internet address

Keywords / Materials (for Non-textual outputs)

  • policy gradient
  • importance sampling
  • reinforcement learning
  • monte carlo

Fingerprint

Dive into the research topics of 'Reducing Sampling Error in Policy Gradient Learning'. Together they form a unique fingerprint.

Cite this