Computing AIC for black-box models using generalized degrees of freedom: A comparison with cross-validation

Severin Hauenstein*, Simon N. Wood, Carsten F. Dormann

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Generalized degrees of freedom (GDF), as defined by Ye (1998 JASA 93:120–131), represent the sensitivity of model fits to perturbations of the data. Such GDF can be computed for any statistical model, making it possible, in principle, to derive the effective number of parameters in machine-learning approaches and thus compute information-theoretical measures of fit. We compare GDF with cross-validation and find that the latter provides a less computer-intensive and more robust alternative. For Bernoulli-distributed data, GDF estimates were unstable and inconsistently sensitive to the number of data points perturbed simultaneously. Cross-validation, in contrast, performs well also for binary data, and for very different machine-learning approaches.

Original languageEnglish
Pages (from-to)1382-1396
Number of pages15
JournalCommunications in Statistics - Simulation and Computation
Volume47
Issue number5
Early online date18 Apr 2017
DOIs
Publication statusPublished - 28 May 2018

Keywords / Materials (for Non-textual outputs)

  • Boosted regression trees
  • Data perturbation
  • Model complexity
  • Random forest

Fingerprint

Dive into the research topics of 'Computing AIC for black-box models using generalized degrees of freedom: A comparison with cross-validation'. Together they form a unique fingerprint.

Cite this