TY - JOUR
T1 - Harmonising electronic health records for reproducible research: challenges, solutions and recommendations from a UK-wide COVID-19 research collaboration
AU - Abbasizanjani, Hoda
AU - Torabi, Fatemeh
AU - Bedston, Stuart
AU - Bolton, Thomas
AU - Davies, Gareth
AU - Denaxas, Spiros
AU - Griffiths, Rowena
AU - Herbert, Laura
AU - Hollings, Sam
AU - Keene, Spencer
AU - Khunti, Kamlesh
AU - Lowthian, Emily
AU - Lyons, Jane
AU - Mizani, Mehrdad A.
AU - Nolan, John
AU - Sudlow, Cathie
AU - Walker, Venexia
AU - Whiteley, William
AU - Wood, Angela
AU - Akbari, Ashley
N1 - Funding Information:
The British Heart Foundation Data Science Centre (Grant No SP/19/3/34678, awarded to Health Data Research (HDR) UK) funded co-development (with NHS Digital) of the trusted research environment, provision of linked datasets, data access, user software licences, computational usage, and data management and wrangling support, with additional contributions from the HDR UK Data and Connectivity component of the UK Government Chief Scientific Adviser’s National Core Studies programme to coordinate national COVID-19 priority research. Consortium partner organisations funded the time of contributing data analysts, biostatisticians, epidemiologists, and clinicians. This work was supported by the Con-COV team funded by the Medical Research Council (Grant Number: MR/V028367/1). This work was supported by Health Data Research UK, which receives its funding from HDR UK Ltd (HDR-9006) funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation (BHF) and the Wellcome Trust. This work was supported by the ADR Wales programme of work. The ADR Wales programme of work is aligned to the priority themes as identified in the Welsh Government’s national strategy: Prosperity for All. ADR Wales brings together data science experts at Swansea University Medical School, staff from the Wales Institute of Social and Economic Research, Data and Methods (WISERD) at Cardiff University and specialist teams within the Welsh Government to develop new evidence which supports Prosperity for All by using the SAIL Databank at Swansea University, to link and analyse anonymised data. ADR Wales is part of the Economic and Social Research Council (part of UK Research and Innovation) funded ADR UK (Grant ES/S007393/1). This work was supported by the Wales COVID-19 Evidence Centre, funded by Health and Care Research Wales.
Funding Information:
This work is carried out with the support of the BHF Data Science Centre led by HDR UK (BHF Grant No. SP/19/3/34678). This study makes use of de-identified data held in the SAIL Databank and NHS Digital’s TRE for England and made available via the BHF Data Science Centre’s CVD-COVID-UK/COVID-IMPACT consortium. This work uses data provided by patients and collected by the NHS as part of their care and support. We would also like to acknowledge all data providers who make health relevant data available for research. This study makes use of anonymised data held in the Secure Anonymised Information Linkage (SAIL) Databank. This work uses data provided by patients and collected by the NHS as part of their care and support. We would also like to acknowledge all data providers who make anonymised data available for research. We wish to acknowledge the collaborative partnership that enabled acquisition and access to the de-identified data, which led to this output. The collaboration was led by the Swansea University Health Data Research UK team under the direction of the Welsh Government Technical Advisory Cell (TAC) and includes the following groups and organisations: the SAIL Databank, Administrative Data Research (ADR) Wales, Digital Health and Care Wales (DHCW), Public Health Wales, NHS Shared Services Partnership (NWSSP) and the Welsh Ambulance Service Trust (WAST). All research conducted has been completed under the permission and approval of the SAIL independent Information Governance Review Panel (IGRP) project number 0911. We would also like to thanks Caroline E Dale, Samantha Ip, Rochelle Knight, Reecha Sofat, Jonathan Sterne, and Rohan Takhar who supported this work in various approved CVD-COVID-UK projects proposals. Lead author: Hoda Abbasizanjani. Senior Author: Ashley Akbari CVD-COVID-UK/COVID-IMPACT Consortium: Hoda Abbasizanjani1, Fatemeh Torabi1, Thomas Bolton2, Gareth Davies1, Spiros Denaxas2,3, Rowena Griffiths1, Sam Hollings4, Spencer Keene5, Kamlesh Khunti6, Jane Lyons1, Mehrdad A Mizani2, John Nolan2, Cathie Sudlow2, Venexia Walker7, William Whiteley8, Angela Wood5, Ashley Akbari1. A full list of members and their affiliations can be found in https://www.hdruk.ac.uk/projects/cvd-covid-uk-project.
Funding Information:
This work is carried out with the support of the BHF Data Science Centre led by HDR UK (BHF Grant No. SP/19/3/34678). This study makes use of de-identified data held in the SAIL Databank and NHS Digital’s TRE for England and made available via the BHF Data Science Centre’s CVD-COVID-UK/COVID-IMPACT consortium. This work uses data provided by patients and collected by the NHS as part of their care and support. We would also like to acknowledge all data providers who make health relevant data available for research. This study makes use of anonymised data held in the Secure Anonymised Information Linkage (SAIL) Databank. This work uses data provided by patients and collected by the NHS as part of their care and support. We would also like to acknowledge all data providers who make anonymised data available for research. We wish to acknowledge the collaborative partnership that enabled acquisition and access to the de-identified data, which led to this output. The collaboration was led by the Swansea University Health Data Research UK team under the direction of the Welsh Government Technical Advisory Cell (TAC) and includes the following groups and organisations: the SAIL Databank, Administrative Data Research (ADR) Wales, Digital Health and Care Wales (DHCW), Public Health Wales, NHS Shared Services Partnership (NWSSP) and the Welsh Ambulance Service Trust (WAST). All research conducted has been completed under the permission and approval of the SAIL independent Information Governance Review Panel (IGRP) project number 0911. We would also like to thanks Caroline E Dale, Samantha Ip, Rochelle Knight, Reecha Sofat, Jonathan Sterne, and Rohan Takhar who supported this work in various approved CVD-COVID-UK projects proposals. Lead author: Hoda Abbasizanjani. Senior Author: Ashley Akbari
Publisher Copyright:
© 2023, The Author(s).
PY - 2023/1/16
Y1 - 2023/1/16
N2 - BACKGROUND: The CVD-COVID-UK consortium was formed to understand the relationship between COVID-19 and cardiovascular diseases through analyses of harmonised electronic health records (EHRs) across the four UK nations. Beyond COVID-19, data harmonisation and common approaches enable analysis within and across independent Trusted Research Environments. Here we describe the reproducible harmonisation method developed using large-scale EHRs in Wales to accommodate the fast and efficient implementation of cross-nation analysis in England and Wales as part of the CVD-COVID-UK programme. We characterise current challenges and share lessons learnt.METHODS: Serving the scope and scalability of multiple study protocols, we used linked, anonymised individual-level EHR, demographic and administrative data held within the SAIL Databank for the population of Wales. The harmonisation method was implemented as a four-layer reproducible process, starting from raw data in the first layer. Then each of the layers two to four is framed by, but not limited to, the characterised challenges and lessons learnt. We achieved curated data as part of our second layer, followed by extracting phenotyped data in the third layer. We captured any project-specific requirements in the fourth layer.RESULTS: Using the implemented four-layer harmonisation method, we retrieved approximately 100 health-related variables for the 3.2 million individuals in Wales, which are harmonised with corresponding variables for > 56 million individuals in England. We processed 13 data sources into the first layer of our harmonisation method: five of these are updated daily or weekly, and the rest at various frequencies providing sufficient data flow updates for frequent capturing of up-to-date demographic, administrative and clinical information.CONCLUSIONS: We implemented an efficient, transparent, scalable, and reproducible harmonisation method that enables multi-nation collaborative research. With a current focus on COVID-19 and its relationship with cardiovascular outcomes, the harmonised data has supported a wide range of research activities across the UK.
AB - BACKGROUND: The CVD-COVID-UK consortium was formed to understand the relationship between COVID-19 and cardiovascular diseases through analyses of harmonised electronic health records (EHRs) across the four UK nations. Beyond COVID-19, data harmonisation and common approaches enable analysis within and across independent Trusted Research Environments. Here we describe the reproducible harmonisation method developed using large-scale EHRs in Wales to accommodate the fast and efficient implementation of cross-nation analysis in England and Wales as part of the CVD-COVID-UK programme. We characterise current challenges and share lessons learnt.METHODS: Serving the scope and scalability of multiple study protocols, we used linked, anonymised individual-level EHR, demographic and administrative data held within the SAIL Databank for the population of Wales. The harmonisation method was implemented as a four-layer reproducible process, starting from raw data in the first layer. Then each of the layers two to four is framed by, but not limited to, the characterised challenges and lessons learnt. We achieved curated data as part of our second layer, followed by extracting phenotyped data in the third layer. We captured any project-specific requirements in the fourth layer.RESULTS: Using the implemented four-layer harmonisation method, we retrieved approximately 100 health-related variables for the 3.2 million individuals in Wales, which are harmonised with corresponding variables for > 56 million individuals in England. We processed 13 data sources into the first layer of our harmonisation method: five of these are updated daily or weekly, and the rest at various frequencies providing sufficient data flow updates for frequent capturing of up-to-date demographic, administrative and clinical information.CONCLUSIONS: We implemented an efficient, transparent, scalable, and reproducible harmonisation method that enables multi-nation collaborative research. With a current focus on COVID-19 and its relationship with cardiovascular outcomes, the harmonised data has supported a wide range of research activities across the UK.
KW - COVID-19
KW - Common data model
KW - Data harmonisation
KW - Electronic health record
KW - NHS digital TRE for England
KW - Population health
KW - Reproducible research
KW - SAIL databank
KW - Trusted Research Environments
U2 - 10.1186/s12911-022-02093-0
DO - 10.1186/s12911-022-02093-0
M3 - Article
C2 - 36647111
VL - 23
JO - Bmc medical informatics and decision making
JF - Bmc medical informatics and decision making
SN - 1472-6947
IS - 1
M1 - 8
ER -