A heuristic strategy for learning in partially observable and non-Markovian domains

Matteo Leonetti, Subramanian Ramamoorthy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Robotic applications are characterized by highly dynamic domains, where the agent has neither full control of the environment nor full observability. In those cases a Markovian model of the domain, able to capture all the aspects that the agent might need to predict, is generally not available or excessively complex. Moreover, robots pose relevant constraints on the amount of experience they can afford, moving the focus of learning their behavior from reaching optimality in the limit, to making the best use of the little information available. We consider the problem of finding the best deterministic policy in a Non-Markovian Decision Process, with a special attention to the sample complexity and the transitional behavior before such a policy is reached. We would like robotic agents to learn in real time while being deployed in the environment, and their behavior to be acceptable even while learning.
Original languageEnglish
Title of host publicationProceedings of the 3 rd International Workshop on Evolutionary and Reinforcement Learning for Autonomous Robot Systems (ERLARS 2010)
Number of pages4
Publication statusPublished - 2010

Fingerprint

Dive into the research topics of 'A heuristic strategy for learning in partially observable and non-Markovian domains'. Together they form a unique fingerprint.

Cite this