Abstract
In reinforcement learning, domain randomisation is an increasingly popular technique for learning more general policies that are robust to domain-shifts at deployment. However, naively aggregating information from randomised domains may lead to high variance in gradient estimation and unstable learning process. To address this issue, we present a peer-to-peer online distillation strategy for RL termed P2PDRL, where multiple workers are each assigned to a different environment, and exchange knowledge through mutual regularisation based on Kullback–Leibler divergence. Our experiments on continuous control tasks show that P2PDRL enables robust learning across a wider randomisation distribution than baselines, and more robust generalisation to new environments at testing.
Original language | English |
---|---|
Title of host publication | Proceedings of The 13th Asian Conference on Machine Learning |
Editors | Vineeth N. Balasubramanian, Ivor Tsang |
Publisher | PMLR |
Pages | 1237-1252 |
Number of pages | 16 |
Publication status | Published - 17 Nov 2021 |
Event | 13th Asian Conference on Machine Learning - Virtual Duration: 17 Nov 2021 → 19 Nov 2021 http://www.acml-conf.org/2021/ |
Publication series
Name | Proceedings of Machine Learning Research |
---|---|
Publisher | PMLR |
Volume | 157 |
ISSN (Electronic) | 2640-3498 |
Conference
Conference | 13th Asian Conference on Machine Learning |
---|---|
Abbreviated title | ACML 2021 |
Period | 17/11/21 → 19/11/21 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- domain randomisation
- deep reinforcement learning
- mutual learning