Class distribution estimation in imprecise domains based on supervised learning

Víctor González-Castro*, Rocío Alaiz-Rodríguez, Enrique Alegre

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Quantification-or proportion estimation-plays an important role in many practi{reversed not sign}cal classification problems. On the one hand, a machine that automatically classifies an element into a group of predefined classes will make suboptimal decisions if the class distribution in the test (real) domain differs from the one assumed in learning. Estimating the new class distribution is necessary in order to adapt the classifier to the new operational conditions. On the other hand, there are some real domains where the quantification task itself is the main goal. Some fields, such as quality control, direct marketing, tendency study or some textual recognition tasks, require methods that can reliably estimate the proportion of elements within each category without any concerns about how each element has been classified individually. We describe several quantifi{reversed not sign}cation techniques that rely on supervised learning and provide these estimations based on: (a) the classifier confusion matrix, (b) the posterior probability estimations, and (c) distributional divergence measures. We illustrate these techniques, as well as their robustness against the base classifier performance in a practical seminal quality con{reversed not sign}trol setting where the ultimate goal is to quantify the proportion of sperm cells with damaged/intact acrosome.

Original languageEnglish
Title of host publicationPerspectives on Pattern Recognition
PublisherNova Science Publishers Inc
Pages187-202
Number of pages16
ISBN (Print)9781612091181
Publication statusPublished - 1 Dec 2011
Externally publishedYes

Fingerprint

Dive into the research topics of 'Class distribution estimation in imprecise domains based on supervised learning'. Together they form a unique fingerprint.

Cite this