TY - GEN
T1 - An adaptive pre-filtering technique for error-reduction sampling in active learning
AU - Davy, Michael
AU - Luz, Saturnino
PY - 2008/12/30
Y1 - 2008/12/30
N2 - Error-reduction sampling (ERS) is a high performing (but computationally expensive) query selection strategy for active learning. Subset optimisation has been proposed to reduce computational expense by applying ERS to only a subset of examples from the pool. This paper compares techniques used to construct the subset, namely random sub-sampling and pre-filtering. We focus on pre-filtering which populates the subset with more informative examples by filtering from the unlabelled pool using a query selection strategy. In this paper we establish whether pre-filtering outperforms sub-sampling optimisation, examine the effect of subset size, and propose a novel adaptive pre-filtering technique which dynamically switches between several alternative pre-filtering techniques using a multi-armed bandit algorithm. Empirical evaluations conducted on two benchmark text categorisation datasets demonstrate that pre-filtered ERS achieve higher levels of accuracy when compared to sub-sampled ERS. The proposed adaptive prefiltering technique is also shown to be competitive with the optimal pre-filtering technique on the majority of problems and is never the worst technique.
AB - Error-reduction sampling (ERS) is a high performing (but computationally expensive) query selection strategy for active learning. Subset optimisation has been proposed to reduce computational expense by applying ERS to only a subset of examples from the pool. This paper compares techniques used to construct the subset, namely random sub-sampling and pre-filtering. We focus on pre-filtering which populates the subset with more informative examples by filtering from the unlabelled pool using a query selection strategy. In this paper we establish whether pre-filtering outperforms sub-sampling optimisation, examine the effect of subset size, and propose a novel adaptive pre-filtering technique which dynamically switches between several alternative pre-filtering techniques using a multi-armed bandit algorithm. Empirical evaluations conducted on two benchmark text categorisation datasets demonstrate that pre-filtered ERS achieve higher levels of accuracy when compared to sub-sampled ERS. The proposed adaptive prefiltering technique is also shown to be competitive with the optimal pre-filtering technique on the majority of problems and is never the worst technique.
UR - http://www.scopus.com/inward/record.url?scp=62449088420&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2008.52
DO - 10.1109/ICDMW.2008.52
M3 - Conference contribution
AN - SCOPUS:62449088420
SN - 9780769535036
T3 - Proceedings - IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008
SP - 682
EP - 691
BT - Proceedings - IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008
T2 - IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008
Y2 - 15 December 2008 through 19 December 2008
ER -