We propose a method to generate agent controllers, represented as state machines, to act in partially observable environments. Such controllers are used to constrain the search space, applying techniques from Hierarchical Reinforcement Learning. We define a multi-step process, in which a simulator is employed to generate possible traces of execution. Those traces are then utilized to induce a non-deterministic state machine, that represents all reasonable behaviors, given the approximate models and planners used in simulation. The state machine will have multiple possible choices in some of its states. Those states are choice points, and we defer the learning of those choices to the deployment of the agent in the actual environment. The controller obtained can therefore adapt to the actual environment, limiting the search space in a sensible way.
|Title of host publication||Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 3|
|Publisher||International Foundation for Autonomous Agents and Multiagent Systems|
|Number of pages||2|
|Publication status||Published - 2012|