SIPHER’s Synthetic Population for Individuals in Great Britain: Creation, Validation, and Examples of Application

Andreas Hoehn , University of Glasgow
Nik Lomax, University of Leeds
Kashif Zia, University of Glasgow
Emma Comrie, Public Health Scotland
Fraser Bell, Greater Manchester Combined Authority
Gillian Fergie, University of Glasgow
Alison Heppenstall, University of Glasgow
Robin Purshouse, University of Sheffield
Jo Winterbottom, University of Glasgow
Petra Meier, University of Glasgow

Unlike other countries, Great Britain (GB) does not have a comprehensive register-based system. This limits the availability of individual-level data for researchers, analysts, and policymakers seeking to understand the impact and interaction of aspects such as health, employment, or housing at a granular spatial resolution. Creating full-scale synthetic populations via spatial microsimulation can resolve situations where required data does not exist or is not swiftly available outside of national safe haven settings. We describe the creation and validation of a full-scale synthetic data set for the adult population in GB. Using a combinatorial optimisation algorithm (simulated annealing), our data set combines individual-level information from the Understanding Society main survey with aggregate-level population statistics data obtained from the UK Census 2011 and 2020 population projections. The resulting data set, SIPHER’s Synthetic Population for Individuals, is nationally representative with respect to the following characteristics: age, sex, highest qualification, ethnicity, marital status, economic activity, general health, household tenure, and household type. Results of external and internal validation suggest that our data set is well-suited for applications examining health and socioeconomic outcomes at the level of individuals and across small areas. Demonstrating the utility of the data, we present examples where our data set has been used in policy-relevant applications seeking to provide insights into some of GB’s most urgent societal challenges – such as the current cost-of-living crisis or factors contributing to the stagnation of population health improvements. The dataset will soon become available for all registered users of the UK Data Service.

See paper

 Presented in Session 13. Flash session Data Infrastructures for Population Research