Composing Diverse Policies for Temporally Extended Tasks

Daniel Angelov, Yordan Hristov, Michael Burke, Subramanian Ramamoorthy

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Robot control policies for temporally extendedand sequenced tasks are often characterized by discontinuous switches between different local dynamics. These change-points are often exploited in hierarchical motion planning to build approximate models and to facilitate the design of local, region-specific controllers. However, it becomes combinatorially challenging to implement such a pipeline for complex temporally extended tasks, especially when the sub-controllers work on different information streams, time scales and action spaces. In this paper, we introduce a method that can compose diverse policies comprising motion planning trajectories, dynamic motion primitives and neural network controllers. We introduce a global goal scoring estimator that uses local, per-motion primitive dynamics models and corresponding activation state-space sets to sequence diverse policies in a locally optimal fashion. We use expert demonstrations to convert what is typically viewed as a gradient-based learning process into a planning process without explicitly specifying pre- and post-conditions. We first illustrate the proposed framework using an MDP benchmark to showcase robustness to action and model dynamics mismatch, and then with a particularly complex physical gear assembly task, solved on a PR2 robot. We show that the proposed approach successfully discovers the optimal sequence of controllers and solves both tasks efficiently.
Original languageEnglish
Pages (from-to)2658-2665
Number of pages9
JournalIEEE Robotics and Automation Letters
Volume5
Issue number2
Early online date10 Feb 2020
DOIs
Publication statusPublished - 30 Apr 2020

Keywords / Materials (for Non-textual outputs)

  • Motion and Path Planning
  • Learning and Adaptive Systems
  • Learning from Demonstration

Fingerprint

Dive into the research topics of 'Composing Diverse Policies for Temporally Extended Tasks'. Together they form a unique fingerprint.

Cite this