Hierarchical modeling is often conveniently thought of as using three stages: data model (e.g., likelihood), process model, and parameter model. Using the notation [Y|X] to denote the conditional distribution of Y given X, one may consider the following conditional distributions [data|process, parameters], [process|parameters], and [parameters]. Then, using Bayes’ rule, the posterior distribution of the process and parameters given the data can be expressed as [process, parameters|data] ∝ [data|process,parameters] × [process|parameters] × [parameters].
The well-known Fay-Herriot (FH) model (Fay & Herriot, 1979) can be expressed as
Z = Y + ∈
Y = Xβ + ξ,
where ∈ ~ N(0, D), D = diag(, , … , ), ξ ~ N(0, ), Z are the direct estimators, and X are known covariates with associated parameters βi (i = 0,1, … , р). In this context, (i = 1, … , N) is the sampling error variance for area i (i = 1, … , N) and р is the number of covariates (β0 is the intercept). Thus, this can be expressed hierarchically with [Z|Y, β, ] equal to the distribution of data model given the process and parameters, [Y|β, ] the distribution of the process given the parameters, and [β, ] the distribution of the parameters.
Here is a description of the multivariate spatio-temporal mixed effects model (MSTM) (Bradley et al., 2015). For ease of exposition, this discussion
presents the multivariate spatial case. Details surrounding the spatio-temporal case can be found in Bradley et al. (2015). Similar to the FH model, one has
Z ~ MVN(Y, D)
Y = Xβ + Sη + ξ.
Here, MVN denotes the multivariate normal distribution, D contains the known sampling error variance, S is spatial basis functions with associated coefficients given by the elements of η (see Bradley et al., 2015, for a comprehensive discussion), and ξ ~ MVN(0, I) represents an additional error term to capture fine-scale variation. Importantly, the FH model can be viewed as a special case of the MSTM.
For the unit-level case, one first considers a model for an ignorable design (Battese et al., 1988). Specifically, consider the linear mixed-effects model
yij = xijβ + vi + ∈ij,
where yij is the response for unit j in area i (i = 1, … , m), xij are fixed covariates associated with unit j in area i. β is associated regression coefficients, vi is area-level random effects for area i with an iid mean zero normal distribution having variance . Finally, ∈ij is iid normally distributed sampling error random effects with mean zero and variance . Importantly, this model can be rewritten in the form of a hierarchical model.
In the case of an informative sample design, one path forward proceeds through the Bayesian pseudo-likelihood (PL; Savitsky & Toth, 2016). The PL is given by
,
where unit i ranges over the sample and wi denotes the sample weight for unit i, scaled to sum to the sample size. Combined with a suitable prior distribution on the model parameters θ, this leads to a pseudo-posterior distribution.