Self-Organisation of Generic Policies in Reinforcement Learning

Simon C. Smith, J. Michael Herrmann

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose the use of an exploratory self-organised policy to initialise the parameters of the function approximation in the reinforcement learning policy based on the value function of the exploratory probe in a low-dimensional task. For a high-dimensional problems we exploit the property of the exploratory behaviour to establish a coordination among the degrees of freedom of a robot without any explicit knowledge of the configuration of the robot or the environment. The approach is illustrated by a learning tasks in a six-legged robot. Results show that the initialisation based on the exploratory value function improve the learning speed in the low-dimensional task and that some correlation towards a higher reward can be acquired in the high-dimensional task.
Original languageEnglish
Title of host publicationADVANCES IN ARTIFICIAL LIFE, ECAL 2013
PublisherMIT Press
Pages641-648
Number of pages8
DOIs
Publication statusPublished - 2013

Fingerprint

Dive into the research topics of 'Self-Organisation of Generic Policies in Reinforcement Learning'. Together they form a unique fingerprint.

Cite this