A roadmap and rich metadata catalogue for the analysis of federated sensitive data

Much data kept by public organisations such as Government departments
contains sensitive data about UK citizens and businesses. Since the Digital
Economy Act 2017, progress has been made to enable accredited researchers to
access these data to perform studies that are in the public benefit. Access is
provided through accredited organisations called Digital Economy Act
Accredited Processing Environments; twelve such organisations currently exist
in the UK. Accredited researchers can submit proposals to an Accredited
Processing Environment to gain access to data held by that organisation. Four
of these organisations provide Trusted Research Environments in which the
researcher can perform their study supported by analytical tools provided in
that environment without the data leaving the environment. This approach is
the most secure service to handle sensitive data.

The challenge for researchers is to answer research questions where datasets
must be combined before analysis and where two or more datasets are owned
by more than one accredited processing environment. We identify two barriers.
First, a researcher cannot see these datasets and based on free text metadata
descriptions of these datasets alone cannot assess whether a combination is
feasible or leads to a sensible analysis. Second, the policies that govern access
are specific between the data owners and the Trusted Research Environments.
To overcome these barriers we propose the following.

We will agree and deliver a roadmap to allow researchers to discover, apply for
and analyse data held at one of the UK’s four national Trusted Research
Environments (TREs) through a single front door, i.e., the Office for National
Statistics (ONS) Integrated Data Service (IDS), the Scottish National Safe
Haven, the SAIL Databank, or the Northern Ireland Statistics and Research
Agency (NISRA). This work will agree standards, policies and procedures to
enable researchers to analyse data combined from more than one government organisation. We will publish templates of all agreements, which can be
adopted by other data investments in taking forward federation.

We will develop software that automatically creates a rich metadata catalogue
for specific datasets in Trusted Research Environments and agree enhanced
metadata standards in ways that researchers can understand if they could
perform an analysis if they had access to the data. This enables researchers to
decide if the investment to combine datasets is worthwhile because they can
determine beforehand if the necessary data is present.

We will demonstrate the use of rich metadata through a specific use case of
linking Scottish Government, HMRC and Office for National Statistics business
data. We will develop a secure query link between the Scottish Safe Haven and
ONS to enable analysis of combined datasets. Our intention is to enable new
policy relevant insights whilst laying the path for an ongoing federation across
UK nation Trusted Research Environments.
