TY - JOUR
T1 - FAIR data pipeline
T2 - provenance-driven data management for traceable scientific workflows
AU - Mitchell, Sonia Natalie
AU - Lahiff, Andrew
AU - Cummings, Nathan
AU - Hollocombe, Jonathan
AU - Boskamp, Bram
AU - Field, Ryan
AU - Reddyhoff, Dennis
AU - Zarebski, Kristian
AU - Wilson, Antony
AU - Viola, Bruno
AU - Burke, Martin
AU - Archibald, Blair
AU - Bessell, Paul
AU - Blackwell, Richard
AU - Boden, Lisa A
AU - Brett, Alys
AU - Brett, Sam
AU - Dundas, Ruth
AU - Enright, Jessica
AU - Gonzalez-Beltran, Alejandra N
AU - Harris, Claire
AU - Hinder, Ian
AU - David Hughes, Christopher
AU - Knight, Martin
AU - Mano, Vino
AU - McMonagle, Ciaran
AU - Mellor, Dominic
AU - Mohr, Sibylle
AU - Marion, Glenn
AU - Matthews, Louise
AU - McKendrick, Iain J
AU - Mark Pooley, Christopher
AU - Porphyre, Thibaud
AU - Reeves, Aaron
AU - Townsend, Edward
AU - Turner, Robert
AU - Walton, Jeremy
AU - Reeve, Richard
N1 - Funding Information:
The work was funded by the Science and Technology Facilities Council under grant no. ST/V006126/1, Biotechnology and Biological Sciences Research Council (grant nos. BB/M003949/1, BB/R012679/1 and BB/S001034/1), Engineering and Physical Sciences Research Council (grant nos. EP/T004878/1 and EP/V054236/1), Medical Research Council (grant nos. MC_UU_00022/2 and MR/R00241X), Natural Environment Research Council (grant nos. NE/T004193/1 and NE/T010355/1), the Scottish Government Rural and Environment Science and Analytical Services Division (grants ‘Centre of Expertise in Animal Disease Outbreaks’ and ‘Strategic Research Programme’), Scottish Government Chief Scientist Office (grant SPHSU17), the UK Atomic Energy Authority, supported by BEIS, the French National Research Agency (ANR) (IDEXLYON project, grant no. ANR-16-IDEX-0005) and Boehringer Ingelheim Animal Health France (The Veterinary Public Health (VPH) hub). Acknowledgements
Publisher Copyright:
© 2022 The Authors.
PY - 2022/10/3
Y1 - 2022/10/3
N2 - Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of 'following the science' are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. This article is part of the theme issue 'Technical challenges of modelling real-life epidemics and examples of overcoming these'.
AB - Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of 'following the science' are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. This article is part of the theme issue 'Technical challenges of modelling real-life epidemics and examples of overcoming these'.
KW - COVID-19
KW - Data Management
KW - Humans
KW - Pandemics
KW - Software
KW - Workflow
U2 - 10.1098/rsta.2021.0300
DO - 10.1098/rsta.2021.0300
M3 - Article
C2 - 35965468
SN - 1364-503X
VL - 380
JO - Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
JF - Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
IS - 2233
M1 - 20210300
ER -