Abstract
Many credit scoring studies suffer from potential sample selection bias due to their exclusive focus on accepted applicants. To address this issue, previous works have proposed reject inference (RI) strategies to first estimate the repayment ability of rejected applicants and then incorporate these estimates as additional supervision signals to refine their credit scoring models. However, existing RI methods often fail to effectively account for the local characteristics and default patterns inherent in diverse applicant groups. Our study introduces a novel ‘divide and conquer’ graph-based RI framework, named SAIL, that effectively captures inter-individual differences relevant to credit scoring. This framework comprises 1) Spectral clustering for the categorisation of accepted and rejected applicants, 2) isolation forests for identifying Anomalies among rejected cases, 3) Iterative relabelling mechanisms incorporating the Label spreading and self-learning algorithm for relabelling rejected samples, and 4) binary classification for the relabelled dataset. Using a unique loan dataset, we find that our proposed framework significantly enhances the efficacy of credit scoring models and outperforms other popular RI techniques in predicting defaults. Furthermore, our ablation studies confirm the crucial role of each component of our framework in enhancing prediction accuracy. Our work provides a comprehensive and adaptative RI framework for financial institutions to improve their loan decision-making and risk management.
Original language | English |
---|---|
Article number | 114106 |
Pages (from-to) | 1-30 |
Number of pages | 30 |
Journal | Annals of Operations Research |
Early online date | 10 May 2025 |
DOIs | |
Publication status | E-pub ahead of print - 10 May 2025 |
Keywords / Materials (for Non-textual outputs)
- reject inference
- credit scoring
- graph theory
- semi-supervised machine learning