Gain-based Exploration: From Multi-armed Bandits to Partially Observable Environments

B. Si, J. M. Herrmann, K. Pawelzik

Research output: Chapter in Book/Report/Conference proceedingConference contribution


We introduce gain-based policies for exploration in active learning problems. For exploration in multi-armed bandits with the knowledge of reward variances, an ideal gain-maximization exploration policy is described in a unified framework which also includes error-based and counter-based exploration. For realistic situations without prior knowledge of reward variances, we establish an upper bound on the gain function, resulting in a realistic gain- maximization exploration policy which achieves the optimal exploration asymptotically. Finally, we extend the gain- maximization exploration scheme towards partially observable environments. Approximating the environment by a set of local bandits, the agent actively selects its actions by maximizing discounted gain in learning local bandits. The resulting gain-based exploration not only outperforms random exploration, but also produces curiosity-driven behavior which is observed in natural agents.
Original languageEnglish
Title of host publicationNatural Computation, 2007. ICNC 2007. Third International Conference on
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages6
ISBN (Print)978-0-7695-2875-5
Publication statusPublished - 1 Aug 2007


  • decision making
  • knowledge acquisition
  • learning (artificial intelligence)
  • counter-based exploration
  • error-based exploration
  • gain-maximization exploration policy
  • multi-armed bandits
  • optimal exploration asymptotically
  • partially observable environments
  • Decision making
  • Entropy
  • Estimation error
  • Gain measurement
  • Knowledge acquisition
  • Learning
  • Redundancy
  • Robots
  • Testing
  • Upper bound


Dive into the research topics of 'Gain-based Exploration: From Multi-armed Bandits to Partially Observable Environments'. Together they form a unique fingerprint.

Cite this