Data collection is costly. A machine learning model requires input data to produce an output prediction, but that input is often not cost-free to produce accurately. For example, in the social sciences, it may require collecting samples; in signal processing it may involve investing in expensive accurate sensors. The problem of allocating a budget across the collection of different input variables is largely overlooked in machine learning, but is important under real-world constraints. Given that the noise level on each input feature depends on how much resource has been spent gathering it, and given a fixed budget, we ask how to allocate that budget to maximise our expected reward. At the same time, the optimal model parameters will depend on the choice of budget allocation, and so searching the space of possible budgets is costly. Using doubly stochastic gradient methods we propose a solution that allows expressive models and massive datasets, while still providing an interpretable budget allocation for feature gathering at test time.
|Publication status||Published - 2016|
|Event||Reliable Machine Learning in the Wild: NIPS 2016 Workshop - Centre Convencions Internacional, Barcelona, Spain|
Duration: 9 Dec 2016 → …
|Workshop||Reliable Machine Learning in the Wild|
|Period||9/12/16 → …|