Abstract
Structured models of decision making often assume an agent is aware of all possible states and actions in advance. This assumption is sometimes untenable. In this paper, we learn influence diagrams from both domain exploration and expert assertions in a way which guarantees convergence to optimal behaviour, even when the agent starts unaware of actions or belief variables that are critical to success. Our experiments show that our agent learns optimal behaviour on small and large decision problems, and that allowing an agent to conserve information upon discovering new possibilities results in faster convergence.
Original language | English |
---|---|
Title of host publication | Proceedings of the 36th International Conference on Machine Learning (ICML) |
Editors | Kamalika Chaudhuri, Rusland Salakhutdinov |
Place of Publication | Long Beach, USA |
Publisher | PMLR |
Pages | 2941-2950 |
Number of pages | 10 |
Volume | 97 |
Publication status | E-pub ahead of print - 3 Jul 2019 |
Event | Thirty-sixth International Conference on Machine Learning - Long Beach Convention Center, Long Beach, United States Duration: 9 Jun 2019 → 15 Jun 2019 Conference number: 36 https://icml.cc/Conferences/2019 |
Publication series
Name | Proceedings of Machine Learning Research |
---|---|
Publisher | PMLR |
Volume | 97 |
ISSN (Electronic) | 2640-3498 |
Conference
Conference | Thirty-sixth International Conference on Machine Learning |
---|---|
Abbreviated title | ICML 2019 |
Country/Territory | United States |
City | Long Beach |
Period | 9/06/19 → 15/06/19 |
Internet address |