Dropdown items
My Academies

Personal Library

Account settings

A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation (2024)

Chapter: Appendix D: Technical Details for Geography Variables

Visit NAP.edu/10766 to get more information about this book, to buy it in print, or to download it as a free PDF.

Previous chapter Next chapter
Page of 248
Search this publication

Previous Chapter: Appendix C: Technical Details for Differential Privacy Table Builder

Page 217 Cite

Suggested Citation: "Appendix D: Technical Details for Geography Variables." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

Appendix D

Technical Details for Geography Variables

Hierarchical modeling is often conveniently thought of as using three stages: data model (e.g., likelihood), process model, and parameter model. Using the notation [Y|X] to denote the conditional distribution of Y given X, one may consider the following conditional distributions [data|process, parameters], [process|parameters], and [parameters]. Then, using Bayes’ rule, the posterior distribution of the process and parameters given the data can be expressed as [process, parameters|data] ∝ [data|process,parameters] × [process|parameters] × [parameters].

The well-known Fay-Herriot (FH) model (Fay & Herriot, 1979) can be expressed as

Z = Y + ∈
Y = Xβ + ξ,

where ∈ ~ N(0, D), D = diag( $σ_{1}^{2}$ , $σ_{2}^{2}$ , … , $σ_{N}^{2}$ ), ξ ~ N(0, $σ_{ξ}^{2}$ ), Z are the direct estimators, and X are known covariates with associated parameters βi (i = 0,1, … , р). In this context, $σ_{i}^{2}$ (i = 1, … , N) is the sampling error variance for area i (i = 1, … , N) and р is the number of covariates (β₀ is the intercept). Thus, this can be expressed hierarchically with [Z|Y, β, $σ_{ξ}^{2}$ ] equal to the distribution of data model given the process and parameters, [Y|β, $σ_{ξ}^{2}$ ] the distribution of the process given the parameters, and [β, $σ_{ξ}^{2}$ ] the distribution of the parameters.

Here is a description of the multivariate spatio-temporal mixed effects model (MSTM) (Bradley et al., 2015). For ease of exposition, this discussion

Page 218 Cite

presents the multivariate spatial case. Details surrounding the spatio-temporal case can be found in Bradley et al. (2015). Similar to the FH model, one has

Z ~ MVN(Y, D)
Y = Xβ + Sη + ξ.

Here, MVN denotes the multivariate normal distribution, D contains the known sampling error variance, S is spatial basis functions with associated coefficients given by the elements of η (see Bradley et al., 2015, for a comprehensive discussion), and ξ ~ MVN(0, $σ_{ξ}^{2}$ I) represents an additional error term to capture fine-scale variation. Importantly, the FH model can be viewed as a special case of the MSTM.

For the unit-level case, one first considers a model for an ignorable design (Battese et al., 1988). Specifically, consider the linear mixed-effects model

y_ij = x_ijβ + v_i + ∈ij,

where y_ij is the response for unit j in area i (i = 1, … , m), x_ij are fixed covariates associated with unit j in area i. β is associated regression coefficients, v_i is area-level random effects for area i with an iid mean zero normal distribution having variance $σ_{v}^{2}$ . Finally, ∈ij is iid normally distributed sampling error random effects with mean zero and variance $σ_{∈}^{2}$ . Importantly, this model can be rewritten in the form of a hierarchical model.

In the case of an informative sample design, one path forward proceeds through the Bayesian pseudo-likelihood (PL; Savitsky & Toth, 2016). The PL is given by

$P L (θ) = \prod_{i} f (y_{i} | θ)^{w_{i}}$ ,

where unit i ranges over the sample and w_i denotes the sample weight for unit i, scaled to sum to the sample size. Combined with a suitable prior distribution on the model parameters θ, this leads to a pseudo-posterior distribution.