Impact of data source choice on multimorbidity measurement: a comparison study of 2.3 million individuals in the Welsh National Health Service

Clare MacRae, Daniel R Morales, Stewart W Mercer, Nazir I Lone, Andrew Lawson, Emily Jefferson, David McAllister, Marjan van den Akker, Alan David Marshall, Sohan Seth, Anna Rawlings, Jane Lyons, Ronan Lyons, Amy Mizen, Eleojo Abubakar, Chris Dibben, Bruce Guthrie

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Measurement of multimorbidity in research is variable, including choice of data source used to ascertain conditions. We compared estimated prevalence of multimorbidity and associations with mortality using different data sources.

Cross-sectional study of SAIL Databank data including 2,340,027 individuals of all ages living in Wales on 01 January 2019. Comparison of prevalence of multimorbidity and constituent 47 conditions using data from primary care (PC), hospital inpatient (HI), and linked PC-HI data sources, and examination of associations between condition count and 12-month mortality.

Using linked PC-HI compared with only HI data, multimorbidity was more prevalent (32.2% versus 16.4%), and the population of people identified as having multimorbidity was younger (mean age 62.5 versus 66.8 years) and included more women (54.2% versus 52.6%). Individuals with multimorbidity in both PC and HI data had stronger associations with mortality than those with multimorbidity only in HI data (adjusted odds ratio 7.9 [95%CI 7.6-8.2] versus 4.7 [95%CI 4.5-4.8] in people with ≥4 conditions). Prevalence of conditions identified using only PC versus only HI data were significantly higher for 37/47 and significantly lower for 10/47: the highest PC/HI ratio was for depression (14.2 [95%CI 14.1-14.4]) and lowest for aneurysm (0.51 [95%CI 0.5-0.5]). Agreement in ascertainment of conditions between the two data sources varied considerably, being slight for five (Kappa <0.20), fair for 13 (Kappa 0.21-0.40), moderate for 17 (Kappa 0.41-0.60), and substantial for 12 (Kappa 0.61-0.80) conditions, and by body system was lowest for mental and behavioural disorders. The percentage agreement, individuals with a condition identified in both PC and HI data, was lowest in anxiety (4.6%) and highest in coronary artery disease (62.9%).

Use of single data sources may underestimate prevalence when measuring multimorbidity and many important conditions (especially mental and behavioural disorders). Caution should be used when interpreting findings of research examining individual and multiple long-term conditions using single data sources. Where available, researchers using electronic health data should link primary care and hospital inpatient data to generate more robust evidence to support evidence-based healthcare planning decisions for people with multimorbidity.
Original languageEnglish
Article number309
Number of pages25
JournalBMC Medicine
Publication statusPublished - 15 Aug 2023

Keywords / Materials (for Non-textual outputs)

  • electronic health records
  • multimorbidity
  • epidemiology


Dive into the research topics of 'Impact of data source choice on multimorbidity measurement: a comparison study of 2.3 million individuals in the Welsh National Health Service'. Together they form a unique fingerprint.

Cite this