Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Wei Zhou, Yiying Li, Yongxin Yang, Huaimin Wan, Timothy M Hospedales

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Off-Policy Actor-Critic (OffP-AC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the actor that trains it to take actions with higher expected return. In this paper, we introduce a flexible and augmented meta-critic that observes the learning process and meta-learns an additional loss for the actor that accelerates and improves actor-critic learning. Compared to existing meta-learning algorithms, meta-critic is rapidly learned online for a single task, rather than slowly over a family of tasks. Crucially, our meta-critic is designed for off-policy based learners, which currently provide state-of-the-art reinforcement learning sample efficiency. We demonstrate that online meta-critic learning benefits to a variety of continuous control tasks when combined with contemporary OffP-AC methods DDPG, TD3 and SAC.
Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 33 (NeurIPS 2020)
PublisherCurran Associates Inc
Pages17662-17673
Number of pages12
Publication statusPublished - 6 Dec 2020
EventThirty-Fourth Conference on Neural Information Processing Systems - Virtual Conference
Duration: 6 Dec 202012 Dec 2020
https://nips.cc/Conferences/2020

Conference

ConferenceThirty-Fourth Conference on Neural Information Processing Systems
Abbreviated titleNeurIPS 2020
CityVirtual Conference
Period6/12/2012/12/20
Internet address

Fingerprint

Dive into the research topics of 'Online Meta-Critic Learning for Off-Policy Actor-Critic Methods'. Together they form a unique fingerprint.

Cite this