Abstract
In developing a classification model for assigning observations of unknown class to one of a number of specified classes using the values of a set of features associated with each observation, it is often desirable to base the classifier on a limited number of features. Mathematical programming discriminant analysis methods for developing classification models can be extended for feature selection. Classification accuracy can be used as the feature selection criterion by using a mixed integer programming (MIP) model in which a binary variable is associated with each training sample observation, but the binary variable requirements limit the size of problems to which this approach can be applied. Heuristic feature selection methods for problems with large numbers of observations are developed in this paper. These heuristic procedures, which are based on the MIP model for maximizing classification accuracy, are then applied to three credit scoring data sets.
Original language | English |
---|---|
Pages (from-to) | 804-812 |
Number of pages | 9 |
Journal | Journal of the Operational Research Society |
Volume | 61 |
Issue number | 5 |
Early online date | 8 Apr 2009 |
DOIs | |
Publication status | Published - May 2010 |
Keywords / Materials (for Non-textual outputs)
- discriminant analysis
- mathematical programming
- credit scoring