TY - JOUR
T1 - Ethnicity data resource in population-wide health records
T2 - completeness, coverage and granularity of diversity
AU - CVD-COVID-UK/COVID-IMPACT Consortium
AU - Pineda-Moncusí, Marta
AU - Allery, Freya
AU - Delmestri, Antonella
AU - Bolton, Thomas
AU - Nolan, John
AU - Thygesen, Johan h.
AU - Handy, Alex
AU - Banerjee, Amitava
AU - Denaxas, Spiros
AU - Tomlinson, Christopher
AU - Denniston, Alastair k.
AU - Sudlow, Cathie
AU - Akbari, Ashley
AU - Wood, Angela
AU - Collins, Gary s.
AU - Petersen, Irene
AU - Coates, Laura c.
AU - Khunti, Kamlesh
AU - Prieto-Salhambra, Daniel
AU - Khalid, Sara
N1 - Publisher Copyright:
© 2024. The Author(s).
PY - 2024/2/22
Y1 - 2024/2/22
N2 - Intersectional social determinants including ethnicity are vital in health research. We curated a population-wide data resource of self-identified ethnicity data from over 60 million individuals in England primary care, linking it to hospital records. We assessed ethnicity data in terms of completeness, consistency, and granularity and found one in ten individuals do not have ethnicity information recorded in primary care. By linking to hospital records, ethnicity data were completed for 94% of individuals. By reconciling SNOMED-CT concepts and census-level categories into a consistent hierarchy, we organised more than 250 ethnicity sub-groups including and beyond “White”, “Black”, “Asian”, “Mixed” and “Other, and found them to be distributed in proportions similar to the general population. This large observational dataset presents an algorithmic hierarchy to represent self-identified ethnicity data collected across heterogeneous healthcare settings. Accurate and easily accessible ethnicity data can lead to a better understanding of population diversity, which is important to address disparities and influence policy recommendations that can translate into better, fairer health for all.
AB - Intersectional social determinants including ethnicity are vital in health research. We curated a population-wide data resource of self-identified ethnicity data from over 60 million individuals in England primary care, linking it to hospital records. We assessed ethnicity data in terms of completeness, consistency, and granularity and found one in ten individuals do not have ethnicity information recorded in primary care. By linking to hospital records, ethnicity data were completed for 94% of individuals. By reconciling SNOMED-CT concepts and census-level categories into a consistent hierarchy, we organised more than 250 ethnicity sub-groups including and beyond “White”, “Black”, “Asian”, “Mixed” and “Other, and found them to be distributed in proportions similar to the general population. This large observational dataset presents an algorithmic hierarchy to represent self-identified ethnicity data collected across heterogeneous healthcare settings. Accurate and easily accessible ethnicity data can lead to a better understanding of population diversity, which is important to address disparities and influence policy recommendations that can translate into better, fairer health for all.
U2 - 10.1038/s41597-024-02958-1
DO - 10.1038/s41597-024-02958-1
M3 - Article
SN - 2052-4463
VL - 11
JO - Scientific Data
JF - Scientific Data
IS - 1
M1 - 221
ER -