Task Variability in Autonomous Robots: Offline Learning for Online Performance

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A problem faced by autonomous robots is that of achieving quick, efficient operation in unseen variations of their tasks after experiencing a subset of these variations sampled offline at training time. We model the task variability in terms of a family of MDPs differing in transition dynamics and reward processes. In the case when it is not possible to experiment in the new world, e.g., in real-time situations, a policy for novel instances may be defined by averaging over the policies of the offline instances. This would be suboptimal in the general case, and for this we propose an alternate model that draws on the methodology of hierarchical reinforcement learning, wherein we learn partial policies for partial goals (subtasks) in the offline MDPs, in the form of options, and we treat solving a novel MDP as one of sequential composition of partial policies. Our procedure utilises a modified version of option interruption for control switching where the interruption signal is acquired from offline experience. We also show that desirable performance advantages can be attained in situations where the task can be decomposed into concurrent subtasks, allowing us to devise an alternate control structure that emphasises flexible switching and concurrent use of policy fragments. We demonstrate the utility of these ideas using example gridworld domains with variability in task.
Original languageEnglish
Title of host publicationProceedings of the 5th International Workshop on Evolutionary and Reinforcement Learning for Autonomous Robot Systems (ERLARS)
Number of pages8
Publication statusPublished - 2012

Fingerprint

Dive into the research topics of 'Task Variability in Autonomous Robots: Offline Learning for Online Performance'. Together they form a unique fingerprint.

Cite this