A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation (2024)

Chapter: Appendix B: Inferences Based on Multiple Synthetic Data

Previous Chapter: Appendix A: Technical Details on Measuring Disclosure Risk
Suggested Citation: "Appendix B: Inferences Based on Multiple Synthetic Data." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

Appendix B

Inferences Based on Multiple Synthetic Data

Denote the number of synthetic datasets by m, the estimate of the parameter of interest θ is θ ^ ( j ) and the within-set variability of θ ( j ) ^ by w(j) (Liu, 2022). Let W = m 1 j = 1 m w ( j ) (average within-set variability) and B = j = 1 m ( θ ^ ( j ) θ ^ ) 2 / ( m 1 ) (between-set variability).

Fully synthetic data: The final estimate of θ over m synthetic sets is θ ^ = m 1 j = 1 m θ ^ ( j ) and its estimated variability is given by T = (1 + m−1) BW. Hypothesis testing and confidence interval construction are based on the asymptotic assumption of T 1 2 ( θ ^ θ ) ~ N ( 0 , 1 ) .

Partial synthetic data with or without differential privacy (DP): The final estimate of θ over m synthetic sets is θ ^ = m 1 j = 1 m θ ^ ( j ) and the variance estimator is T = W + m−1B. Hypothesis testing and confidence interval construction are based on the asymptotic assumption of T 1 2 ( θ ^ θ ) ~ t v ( 0 , 1 ) , where the degrees of freedom are v = ( m 1 ) ( 1 + w m 1 B ) 1 . Though the inferential approaches based on multiple synthetic datasets are the same with or without DP, what is captured in the between-set variance component B is different between the two. For DP data synthesis, B has the extra variability in the synthetic data due to the employment of randomized mechanisms for achieving DP guarantees, compared to the case without DP.

Suggested Citation: "Appendix B: Inferences Based on Multiple Synthetic Data." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

This page intentionally left blank.

Suggested Citation: "Appendix B: Inferences Based on Multiple Synthetic Data." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.
Page 211
Suggested Citation: "Appendix B: Inferences Based on Multiple Synthetic Data." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.
Page 212
Next Chapter: Appendix C: Technical Details for Differential Privacy Table Builder
Subscribe to Email from the National Academies
Keep up with all of the activities, publications, and events by subscribing to free updates by email.