A novel generalised extreme value gradient boosting decision tree for the class imbalanced problem in credit scoring

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

The performance of credit scoring models can be compromised when dealing with imbalanced datasets, where the number of defaulted borrowers is significantly lower than that of non-defaulters. To address this challenge, we propose a gradient boosting decision tree with the generalised extreme value distribution model (GEV-GBDT). Our approach replaces the conventional symmetric logistic sigmoid function with the asymmetric cumulative distribution function of the GEV distribution as the activation function. We derive a novel loss function based on the maximum likelihood estimation of the GEV distribution within the boosting framework. This modification allows the model to focus more on the minority class by emphasising the tail of the response curve, and the shape parameter of the GEV distribution offers flexibility in controlling the model’s emphasis on minority samples. We examine the performance of this approach using four real-life loan datasets. The empirical results show that the GEV-GBDT model achieves superior classification performance compared to other commonly used imbalanced learning methods, including the synthetic minority oversampling technique and the cost-sensitive framework. Furthermore, we conduct performance tests on several datasets with varying imbalance ratios and find that GEV-GBDT performs better on extremely imbalanced datasets.

Original languageEnglish
Pages (from-to)1-18
Number of pages18
JournalJournal of the Operational Research Society
Early online date1 Nov 2024
DOIs
Publication statusE-pub ahead of print - 1 Nov 2024

Keywords / Materials (for Non-textual outputs)

  • credit scoring
  • gradient boosting decision tree
  • generalised extreme value distribution
  • imbalanced sample

Fingerprint

Dive into the research topics of 'A novel generalised extreme value gradient boosting decision tree for the class imbalanced problem in credit scoring'. Together they form a unique fingerprint.

Cite this