AIMS: In this paper, we demonstrate the development and validation of the 10-years type 2 diabetes mellitus (T2DM) risk prediction models based on large survey data.
METHODS: The Survey of Health, Ageing and Retirement in Europe (SHARE) data collected in 12 European countries using 53 variables representing behavioural as well as physical and mental health characteristics of the participants aged 50 or older was used to build and validate prediction models. To account for strongly unbalanced outcome variables, each instance was assigned a weight according to the inverse proportion of the outcome label when the regularized logistic regression model was built.
RESULTS: A pooled sample of 16,363 individuals was used to build and validate a global regularized logistic regression model that achieved an area under the receiver operating characteristic curve of 0.702 (95% CI: 0.698-0.706). Additionally, we measured performance of local country-specific models where AUROC ranged from 0.578 (0.565-0.592) to 0.768 (0.749-0.787).
CONCLUSIONS: We have developed and validated a survey-based 10-year T2DM risk prediction model for use across 12 European countries. Our results demonstrate the importance of re-calibration of the models as well as strengths of pooling the data from multiple countries to reduce the variance and consequently increase the precision of the results.