We present Inference and Distillation for Option Learning (IDOL), a multitask option-learning framework based on Planning-as-Inference. IDOL employs a hierarchical prior and variational-posterior factorisation to learn temporally extended options that allow the higher-level master policy to make decisions with lower frequency, speeding up training on new tasks. IDOL autonomously learns the temporal extension of each option and avoids suboptimal solutions where multiple options learn similar behavior. We demonstrate that this improves performance on new tasks compared to both strong hierarchical and flat transfer-learning baselines.
|Number of pages||10|
|Publication status||Published - 8 Dec 2018|
|Event||Workshop on Probabilistic Reinforcement Learning and Structured Control @ NeurIPS 2018: Infer to Control - Montréal, Canada|
Duration: 8 Dec 2018 → 8 Dec 2018
|Workshop||Workshop on Probabilistic Reinforcement Learning and Structured Control @ NeurIPS 2018|
|Abbreviated title||Infer2Control @ NeurIPS 2018|
|Period||8/12/18 → 8/12/18|