Analyzing Biases in Genealogies Using Demographic Microsimulation

Liliana P. Calderón-Bernal , Max Planck Institute for Demographic Research
Diego Alburez-Gutierrez, Max Planck Institute for Demographic Research
Emilio Zagheni, Max Planck Institute for demographic Research

Genealogies are promising sources for addressing many questions in historical and kinship demography. So far, an incomplete understanding of the biases that affect their representativeness has hindered their full exploitation. Here, we report on a series of experiments on synthetic populations aimed at understanding how different sources of bias in ascendant genealogies can affect the accuracy of demographic estimates. We use the SOCSIM demographic microsimulation program and data for Sweden from the Human Fertility Collection (1751-1890), the Human Fertility Database (1891-2022), and the Human Mortality Database (1751-2022). We analyze three sources of bias: selection in direct lineages, incomplete reconstruction of family trees, and missing information on some subpopulations. We evaluate their effect by comparing common demographic measures estimated from ‘fully-recorded’ and ‘bias-infused’ synthetic populations. Our results show that including only direct lineages leads to an underestimation of the Total Fertility Rate (TFR) (c.a. -39% or 0.61 times lower) before the onset of fertility decline, and an overestimation of life expectancy at birth (e0) over the first two centuries (c.a. +42.2%). However, after adding selected collateral kin, the accuracy of the estimates improves: TFR is underestimated by only -0.11% during the first century and e0 is overestimated by only +1.5% over the whole period.

See paper

 Presented in Session 8. Harnessing the Power of Genealogical Data: Opportunities and Challenges