An adaptive pre-filtering technique for error-reduction sampling in active learning

Michael Davy*, Saturnino Luz

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Error-reduction sampling (ERS) is a high performing (but computationally expensive) query selection strategy for active learning. Subset optimisation has been proposed to reduce computational expense by applying ERS to only a subset of examples from the pool. This paper compares techniques used to construct the subset, namely random sub-sampling and pre-filtering. We focus on pre-filtering which populates the subset with more informative examples by filtering from the unlabelled pool using a query selection strategy. In this paper we establish whether pre-filtering outperforms sub-sampling optimisation, examine the effect of subset size, and propose a novel adaptive pre-filtering technique which dynamically switches between several alternative pre-filtering techniques using a multi-armed bandit algorithm. Empirical evaluations conducted on two benchmark text categorisation datasets demonstrate that pre-filtered ERS achieve higher levels of accuracy when compared to sub-sampled ERS. The proposed adaptive prefiltering technique is also shown to be competitive with the optimal pre-filtering technique on the majority of problems and is never the worst technique.

Original languageEnglish
Title of host publicationProceedings - IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008
Pages682-691
Number of pages10
DOIs
Publication statusPublished - 30 Dec 2008
EventIEEE International Conference on Data Mining Workshops, ICDM Workshops 2008 - Pisa, Italy
Duration: 15 Dec 200819 Dec 2008

Publication series

NameProceedings - IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008

Conference

ConferenceIEEE International Conference on Data Mining Workshops, ICDM Workshops 2008
CountryItaly
CityPisa
Period15/12/0819/12/08

Fingerprint

Dive into the research topics of 'An adaptive pre-filtering technique for error-reduction sampling in active learning'. Together they form a unique fingerprint.

Cite this