Edinburgh Research Explorer

Providing bespoke synthetic data for the UK Longitudinal Studies and other sensitive data with the synthpop package for R

Research output: Contribution to journalArticle

Related Edinburgh Organisations

Open Access permissions



  • Download as Adobe PDF

    Accepted author manuscript, 1.45 MB, PDF document

    Licence: Creative Commons: Attribution (CC-BY)

Original languageEnglish
Pages (from-to)1-12
JournalStatistical Journal of the IAOS
Publication statusPublished - 21 Jan 2017


Synthetic data methods were designed to address the con
icting demands placed on data holders to unlock the research and policy potential of microdata while at the same time preserving the condentiality of individuals. Recently, these methods have become more widely recognized in the UK and the provision of bespoke synthetic data has been approved to expand the use of one of the UK Longitudinal Studies. The process of producing useful synthetic data involves, however, a substantial investment of research time, as it always requires some customising for the characteristics of an individual data set. At the same time, a substantial part of it can be automated and this is essential when the process has to be conducted rapidly and on a regular basis. This paper describes the application of synthetic data to the UK Longitudinal Studies, details implementation process for the Scottish Longitudinal Study and presents methods used in an R package synthpop that has been developed to facilitate production of non-disclosive entirely synthetic data. A reproducible example using open data is given to illustrate the synthesising procedure and to provide insights into quality of synthetic data generated using dierent automated approaches.

Download statistics

No data available

ID: 30881107