Lifelong Learning of Structure in the Space of Policies.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We address the problem faced by an autonomous agent that must achieve quick responses to a family of qualitativelyrelated tasks, such as a robot interacting with different types of human participants. We work in the setting where the tasks share a state-action space and have the same qualitative objective but differ in the dynamics and reward process. We adopt a transfer approach where the agent attempts to exploit common structure in learnt policies to accelerate learning in a new one. Our technique consists of a few key steps. First, we use a probabilistic model to describe the regions in state space which successful trajectories seem to prefer. Then, we extract policy fragments from previously-learnt policies for these regions as candidates for reuse. These fragments may be treated as options with corresponding domains and termination conditions extracted by unsupervised learning. Then, the set of reusable policies is used when learning novel tasks, and the process repeats. The utility of this method is demonstrated through experiments in the simulated soccer domain, where the variability comes from the different possible behaviours of opponent teams, and the agent needs to perform well against novel opponents.
Original languageEnglish
Title of host publicationLifelong Machine Learning: Papers from the 2013 AAAI Spring Symposium
PublisherAAAI Press
Pages21-26
Number of pages6
Publication statusPublished - 2013

Fingerprint

Dive into the research topics of 'Lifelong Learning of Structure in the Space of Policies.'. Together they form a unique fingerprint.

Cite this