Abstract
Generalized degrees of freedom (GDF), as defined by Ye (1998 JASA 93:120–131), represent the sensitivity of model fits to perturbations of the data. Such GDF can be computed for any statistical model, making it possible, in principle, to derive the effective number of parameters in machine-learning approaches and thus compute information-theoretical measures of fit. We compare GDF with cross-validation and find that the latter provides a less computer-intensive and more robust alternative. For Bernoulli-distributed data, GDF estimates were unstable and inconsistently sensitive to the number of data points perturbed simultaneously. Cross-validation, in contrast, performs well also for binary data, and for very different machine-learning approaches.
Original language | English |
---|---|
Pages (from-to) | 1382-1396 |
Number of pages | 15 |
Journal | Communications in Statistics - Simulation and Computation |
Volume | 47 |
Issue number | 5 |
Early online date | 18 Apr 2017 |
DOIs | |
Publication status | Published - 28 May 2018 |
Keywords / Materials (for Non-textual outputs)
- Boosted regression trees
- Data perturbation
- Model complexity
- Random forest