The rstap
package introduced Spatial Temporal Aggregated Predictor or STAP models, as a method by which the effect of Built Environment Features (BEFs) measured as a point pattern could be incorporated into pre-existing regression methods. The rsstap
package builds on this framework by modifying the way in which the spatial temporal exposure effect is estimated. We’ll begin this demonstration by loading the relevant libraries, before getting into the modelling notation and code examples.
Restricting our attention to only spatial exposure for the following illustrative example, the spatial aggregated predictor both rstap
and rsstap
fit models of the following form:
\[ E[g(\mu_i)] = Z_i^{T}\delta + f(\mathcal{D}_i) \quad i=1,...,n \]
While link function \(g(\cdot)\), univariate mean \(\mu_i\) and covariates \(Z_i\) for subject \(i=1,...,n\) are identified from a typical regression model, the function \(f(\mathcal{D}_i)\) novelly represents the \(i\)th subject’s cumulative exposure across space. In rstap
This was accomplished by incorporating distances, \(d\) through a spatial exposure function \(f(\mathcal{D}_i) = \sum_{d \in \mathcal{D}_i} \mathcal{K}_s(d,\theta)\) - typically some exponential decay function, like \(\exp(-\frac{d}{\theta})\) or \(\exp(-(\frac{d}{\theta})^{\eta})\). Unfortunately, estimating this model through standard methods (e.g. MCMC) requires numerous evaluations of the kernel function summed across the, again typically numerous, distances in the sets \(\mathcal{D}_i\) , of subject-BEF distances.
To remedy this situation, the rsstap
package adopts a regression spline basis representation, \(f(d) = \sum_{j=0}^{J} \beta_j\phi_j(d)\), of the spatial exposure effect, removing the kernel and allowing for estimation on a sum of the function of the distances:
\[ E[g(\mu_i)] = Z_i^{T}\delta + \sum_{j=0}^{J}\sum_{d \in \mathcal{D}_i}\beta_j\phi_j(d) \]
Since the sum across distances, \(\sum_{d\in \mathcal{D}_i} \phi_j(d)\) can be done prior to model fitting, standard methods (essentially a bayesian equivalent to mgcv::gam
) can be used, dramatically decreasing computational complexity. Specifically, we use the No U-Turn sampler variant available via rstan
for full bayesian inference of the posterior distribution of these functional terms. Finally, information about priors for rsstap
models can be found here.
We’ll illustrate this using a simple dataset that contains outcome BMI
as a function of a binary covariate sex
and simulated fast food restaurant (FFR) exposure. These data are contained in the benvo
package.
As can be seen below, the model formula and syntax are akin to the typical lm
function, with the addition of a sap(FFR)
term to represent the fact that we’ll be modelling FFR exposure as a spatial aggregated predictor. Let’s fit this model using the standard sampling defaults of 2000 samples drawn from 4 independent chains, using the first 1000 from each as burn-in.
fit <- sstap_lm(BMI ~ sex + sap(FFR), example_benvo)
We’ll examine the model fit by first looking at a print out summary of the model followed by the posterior predictive checks - a basic model check tool.
fit #> #> family: gaussian [identity] #> formula: BMI ~ sex + sap(FFR) #> observations: 1000 #> fixed effects: 12 #> ------ #> Median MAD #> (Intercept) 25.9 0.1 #> sex -2.1 0.1 #> s(FFR).1 3.1 0.9 #> s(FFR).2 2.9 0.1 #> s(FFR).3 3.1 0.1 #> s(FFR).4 2.9 0.1 #> s(FFR).5 2.7 0.1 #> s(FFR).6 1.4 0.1 #> s(FFR).7 0.1 0.1 #> s(FFR).8 0.0 0.1 #> s(FFR).9 -0.1 0.1 #> s(FFR).10 0.7 0.9 #> #> Smoothing terms: #> Median MAD_SD #> smooth_precision[s(FFR)1] 4.0 1.8 #> smooth_precision[s(FFR)2] 0.1 0.1
ppc(fit)
Next we’ll take a look at the spatial effect of one single FFR on the expected BMI of subjects as a function of space. We’ll overlay the true function (since this is simulated data) in red on top.
plot(fit) + truth
While the curves won’t always be so well defined, this example illustrates the basics of fitting stap models using the spline implimentation via rsstap. The next section will go through the basics of spatial-temporal estimation without adjusting for within subject correlation.
In this quick example, we’ll briefly run through the same function calls as before, now using a longitudinal simulated dataset that contains both spatial and temporal data. Note that while a longitudinal model should take into account correlation across a given subject’s measurements, we’ll stick to the simpler model now for exposition (see the longitudinal vignette for how to account for these terms). Let’s first look at the longitudinal_HFS
data that comes with the rsstap package
, renamed here as bdf2
for brevity.
bdf2 #> Subject Data: #> ---------------------------: #> Observations: 596 #> Columns: 6 #> Num Subjects: 300 #> #> BEF Data: #> ---------------------------: #> Number of Features: 1 #> Features: #> Name Measures #> 1 HFS Distance-Time
Looks like this data contains 300 subjects with both distances and times measured for the subjects. In order to fit the same kind of model as before, but now taking into account both space and time, we’ll use the stap
designation in the model formula. This is equivalent to using the t2
spline basis expansion from mgcv
.
fit <- sstap_lm(BMI ~ sex + stap(HFS), benvo = bdf2) #> Warning in check_for_longitudinal_benvo(benvo): This Benvo was constructed with longitudinal data #> but sstap_lm does not adjust for within subject correlation. #> Be advised that parameter standard errors may be overoptimistic.
We’ll examine the posterior predictive checks first to verify that indeed the model can recover the true simulated parameters.
ppc(fit)
Then, onto the spatio-temporal effects. Calling the plot method on this fitted object will produce a contour plot. rsstap
also provides a plot_xsection
which plots the cross section of the surface at a fixed value and a plot3D
function via the rplotly
package that can plot the surface in 3 dimensions.
plot(fit)
plot_xsection(fit,fixed_val=5)
plot3D(fit)
With both spatial and spatio-temporal model-fitting now demonstrated, it is our hope that you have a greater familiarity of some of the basics of rsstap
and the STAP modeling framework.
This concludes the introductory vignette. With built environment - subject data, the rsstap
package can be used to estimate how nearby built environment features may effect the outcome of interest via the STAP modeling framework.
See the other vignettes for elaborations on the spline implementation of spatial temporal aggregated predictors in a longitudinal context, wherein subject specific and between/within effects are estimated as well as so-called “network effects” of BEFs .