Optimising Selective Sampling for Bootstrapping Named Entity Recognition

Markus Becker, Ben Hachey, Beatrice Alex, Claire Grover

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Training a statistical named entity recognition system in a new domain requires costly manual annotation of large quantities of in-domain data. Active learning promises to reduce the annotation cost by selecting only highly informative data points. This paper is concerned with a real active learning experiment to bootstrap a named entity recognition system for a new domain of radio astronomical abstracts. We evaluate several committee-based metrics for quantifying the disagreement between classifiers built using multiple views, and demonstrate that the choice of metric can be optimised in simulation experiments with existing annotated data from different domains. A final evaluation shows that we gained substantial savings compared to a randomly sampled baseline.
Original languageEnglish
Title of host publicationProceedings of the International Conference on Machine Learning (ICML-2005) Workshop on Learning with Multiple Views
Number of pages7
Publication statusPublished - 2005

Keywords / Materials (for Non-textual outputs)

  • seer,


Dive into the research topics of 'Optimising Selective Sampling for Bootstrapping Named Entity Recognition'. Together they form a unique fingerprint.

Cite this