Abstract
We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. We also introduce a method to flexibly combine intrinsic and extrinsic rewards. We find that the random network distillation (RND) bonus combined with this increased flexibility enables significant progress on several hard exploration Atari games. In particular we establish state of the art performance on Montezuma’s Revenge, a game famously difficult for deep reinforcement learning methods. To the best of our knowledge, this is the first method that achieves better than average human performance on this game without using
demonstrations or having access to the underlying state of the game, and occasionally completes the first level.
demonstrations or having access to the underlying state of the game, and occasionally completes the first level.
Original language | English |
---|---|
Title of host publication | 7th International Conference on Learning Representations (ICLR 2019) |
Pages | 1-17 |
Number of pages | 17 |
Publication status | Published - 9 May 2019 |
Event | Seventh International Conference on Learning Representations - New Orleans, United States Duration: 6 May 2019 → 9 May 2019 https://iclr.cc/ |
Conference
Conference | Seventh International Conference on Learning Representations |
---|---|
Abbreviated title | ICLR 2019 |
Country/Territory | United States |
City | New Orleans |
Period | 6/05/19 → 9/05/19 |
Internet address |
Fingerprint
Dive into the research topics of 'Exploration by random network distillation'. Together they form a unique fingerprint.Profiles
-
Amos Storkey
- School of Informatics - Personal Chair of Machine Learning & Artificial Intelligence
- Institute for Adaptive and Neural Computation
- Data Science and Artificial Intelligence
Person: Academic: Research Active