Motivation

In the rsstap Introductory vignette, we introduced the STAP formulation of models implemented using splines and demonstrated how the rsstap package uses rstan and mgcv to fit these models in a context where observations are modeled as independent of one another. In this vignette We’ll introduce the longitudinal modeling framework in rsstap that incorporates the familiar lme4 model formula syntax in addition to the between-within decomposition parameterization and syntax used to estimate between-within subject effect estimates of built environment exposure.

Simple Model

We’ll start by loading the relevant libraries and examining the included longitudinal dataset which consists of measurements on subjects “BMI” (outcome) as a function of simulated nearby “healthy food stores” (HFS) over time, similar to the longitudinal data used in the Introductory vignette:

library(rbenvo)
library(rsstap)
bdf
#> Subject Data:
#> ---------------------------:
#> Observations:  1171
#> Columns:  10
#> Num Subjects:  600
#> 
#> BEF Data:
#> ---------------------------:
#> Number of Features:  1
#> Features: 
#>   Name      Measures
#> 1  HFS Distance-Time
bef_summary(bdf)
#> # A tibble: 12 x 8
#> # Groups:   BEF, Distance_Time [2]
#>    BEF   Distance_Time Measurement  Lower Median Upper Number Num_NA
#>    <chr> <chr>               <int>  <dbl>  <dbl> <dbl>  <int>  <int>
#>  1 HFS   Distance                1 0.0261  0.487 0.974   5741      0
#>  2 HFS   Distance                2 0.0245  0.504 0.971   3524      0
#>  3 HFS   Distance                3 0.0284  0.514 0.974   1436      0
#>  4 HFS   Distance                4 0.0216  0.546 0.982    411      0
#>  5 HFS   Distance                5 0.0205  0.535 0.979     57      0
#>  6 HFS   Distance                6 0.213   0.432 0.967      9      0
#>  7 HFS   Time                    1 0.113   2.45  4.86    5741      0
#>  8 HFS   Time                    2 0.167   2.60  4.88    3524      0
#>  9 HFS   Time                    3 0.184   2.57  4.85    1436      0
#> 10 HFS   Time                    4 0.107   2.49  4.80     411      0
#> 11 HFS   Time                    5 0.148   2.56  4.89      57      0
#> 12 HFS   Time                    6 0.709   1.22  4.48       9      0

We’ll fit this model using the included sex and year covariate information, modeling the spatial temporal exposure to HFS’ through a tensor spline basis function expansion and two subject specific terms - accounting for within subject outcome correlation:

\[ E[BMI_{ij}|b_{i1},b_{i2}] = \alpha + I(Female_i)\delta_1 + (\text{year}_{ij})\delta_2 + f(\text{HFS Exposure}_{ij}) + b_{i1} + (\text{year}_{ij})b_{i2} \\\quad i = 1,...,N ; j= 1,...,n_i, \] Where \[ (b_{i1},b_{i2}) = \mathbf{b}_i \sim N(0,\Sigma), \]

akin to the standard mixed effects regression formulation as in the lme4 package and

\[ f(\text{HFS Exposure}_{ij}) = \sum_{l=1}^{L}\sum_{(d,t) \in\mathcal{S}_{ij}} \beta_l\phi_l(d,t), \]

with \(\phi_l(\cdot)\) the basis function evaluation of all relevant subject-HFS distances and times at measurement \(j\).

In order to fit this model in rsstap we can use the code below, setting the optional sampler arguments for rstan similar to how was demonstrated in the Introductory vignette.

fit <- sstap_lmer(BMI ~ sex + year + stap(HFS) + (year|ID), benvo = bdf)

Examining the model output, we’ll first print the model quick summary followed by a graph of the posterior predictive checks to check model fit.

fit
#> 
#>  family:       gaussian [identity]
#>  formula:      BMI ~ sex + year + stap(HFS) + (year | ID)
#>  observations: 1171
#>  fixed effects:  28
#> ------
#>             Median MAD 
#> (Intercept) 32.1    0.1
#> sex         -2.2    0.1
#> year         0.1    0.0
#> t2(HFS).1    0.1    0.6
#> t2(HFS).2    0.1    0.5
#> t2(HFS).3    0.0    0.3
#> t2(HFS).4   -0.2    0.5
#> t2(HFS).5    0.1    0.4
#> t2(HFS).6   -0.4    0.3
#> t2(HFS).7    0.2    0.3
#> t2(HFS).8    0.1    0.3
#> t2(HFS).9    0.0    0.2
#> t2(HFS).10  -0.1    0.4
#> t2(HFS).11  -0.1    0.3
#> t2(HFS).12   0.0    0.2
#> t2(HFS).13  -0.3    0.5
#> t2(HFS).14  -0.2    0.4
#> t2(HFS).15   0.0    0.3
#> t2(HFS).16   0.2    0.5
#> t2(HFS).17   0.3    0.5
#> t2(HFS).18  -0.1    0.3
#> t2(HFS).19   0.3    0.4
#> t2(HFS).20   0.1    0.2
#> t2(HFS).21   0.1    0.3
#> t2(HFS).22  -0.2    0.3
#> t2(HFS).23  -0.1    0.3
#> t2(HFS).24   0.2    0.3
#> t2(HFS).25   0.1    0.3
#> 
#> Smoothing terms:
#>                            Median MAD_SD
#> smooth_precision[t2(HFS)1] 2.9    1.5   
#> smooth_precision[t2(HFS)2] 2.6    1.4   
#> smooth_precision[t2(HFS)3] 2.4    1.3   
#> smooth_precision[t2(HFS)4] 2.1    1.3   
#> 
#> Error terms:
#>  Groups   Name        Std.Dev. Corr  
#>  ID       (Intercept) 0.16774        
#>           year        0.29006  -0.244
#>  Residual             1.19162        
#> Num. levels: ID 600
ppc(fit)

Finally we’ll look at a 3d plot and 2d plot cross sections of the HFS effect across space and time. This will allow us to establish whether there is any effect at all, and what it looks like.

plot3D(fit)