Abstract
We introduce gain-based policies for exploration in active learning problems. For exploration in multi-armed bandits with the knowledge of reward variances, an ideal gain-maximization exploration policy is described in a unified framework which also includes error-based and counter-based exploration. For realistic situations without prior knowledge of reward variances, we establish an upper bound on the gain function, resulting in a realistic gain- maximization exploration policy which achieves the optimal exploration asymptotically. Finally, we extend the gain- maximization exploration scheme towards partially observable environments. Approximating the environment by a set of local bandits, the agent actively selects its actions by maximizing discounted gain in learning local bandits. The resulting gain-based exploration not only outperforms random exploration, but also produces curiosity-driven behavior which is observed in natural agents.
Original language | English |
---|---|
Title of host publication | Natural Computation, 2007. ICNC 2007. Third International Conference on |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 177-182 |
Number of pages | 6 |
Volume | 1 |
ISBN (Print) | 978-0-7695-2875-5 |
DOIs | |
Publication status | Published - 1 Aug 2007 |
Keywords
- decision making
- knowledge acquisition
- learning (artificial intelligence)
- counter-based exploration
- error-based exploration
- gain-maximization exploration policy
- multi-armed bandits
- optimal exploration asymptotically
- partially observable environments
- Decision making
- Entropy
- Estimation error
- Gain measurement
- Knowledge acquisition
- Learning
- Redundancy
- Robots
- Testing
- Upper bound