Abstract
We propose the use of an exploratory self-organised policy to initialise the parameters of the function approximation in the reinforcement learning policy based on the value function of the exploratory probe in a low-dimensional task. For a high-dimensional problems we exploit the property of the exploratory behaviour to establish a coordination among the degrees of freedom of a robot without any explicit knowledge of the configuration of the robot or the environment. The approach is illustrated by a learning tasks in a six-legged robot. Results show that the initialisation based on the exploratory value function improve the learning speed in the low-dimensional task and that some correlation towards a higher reward can be acquired in the high-dimensional task.
Original language | English |
---|---|
Title of host publication | ADVANCES IN ARTIFICIAL LIFE, ECAL 2013 |
Publisher | MIT Press |
Pages | 641-648 |
Number of pages | 8 |
DOIs | |
Publication status | Published - 2013 |