library(sbcrs)
library(rstan)
#> rstan (Version 2.19.2, GitRev: 2e1f913d3ca3)
#> For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores()).
#> To avoid recompilation of unchanged Stan programs, we recommend calling
#> rstan_options(auto_write = TRUE)

In this example, we will calibrate two versions of Neal’s funnel.1

$p(y,x) = \mathsf{normal}(y|0,3) * \prod_{n=1}^9 \mathsf{normal}(x_n|0,\exp(y/2))$

When parameterized in terms of $$x_n$$ and $$y$$, this model is extremely difficult for Stan to sample from. Per the Stan User’s Guide: “The probability contours are shaped like ten-dimensional funnels. The funnel’s neck is particularly sharp because of the exponential function applied to $$y$$.” Ideally, simulation-based calibration should detect the sampling pathologies in this model.

The stan code for this model (without re-parameterization) is:

parameters {
real y;
vector[9] x;
}
model {
y ~ normal(0, 3);
x ~ normal(0, exp(y/2));
}

Compile the Stan model:

funnel <- stan_model(file = system.file('stan', 'funnel.stan', package = 'sbcrs'))

Create an SBC object called sbc1 with functions to: 1) generate simulated values of x and y, and 2) call rstan::sampling, passing in funnel as the model to be sampled.

sbc1 <- SBC$new( params = function(seed, data) { set.seed(seed + 10) y <- rnorm(1, 0, 3) x <- rnorm(9, 0, exp(y/2)) list(y = y, x = x) }, sampling = function(seed, data, params, modeled_data, iters) { sampling(funnel, data = data, seed = seed, chains = 1, iter = 2 * iters, warmup = iters) }) Calibration involves generating data and parameters, and sampling from a Stan model many times. To speed up this process, take advantage of all of your machine’s cores. doParallel::registerDoParallel(cores = parallel::detectCores()) Perform the calibration. sbc1$calibrate(N = 256, L = 50)
sbc1$plot() sbc1$plot('y')

sbc1$plot('x') sbc1$summary()
#>
#>
#>    iq expected.outside actual.outside
#>  0.50             0.50      0.7647059
#>  0.98             0.02      0.4117647

The model is evidently poorly conditioned.

The reparameterized model is based on new primitives: x_raw and y_raw, which follow independent, standard normal distributions. From these newly defined primitives, it then builds up x and y based on their analytical definitions (see above). The stan code for the reparameterized model is:

parameters {
real y_raw;
vector[9] x_raw;
}
transformed parameters {
real y;
vector[9] x;

y = 3.0 * y_raw;
x = exp(y/2) * x_raw;
}
model {
y_raw ~ std_normal(); // implies y ~ normal(0, 3)
x_raw ~ std_normal(); // implies x ~ normal(0, exp(y/2))
}

Compile the (reparameterized) Stan model:

funnel_reparameterized <- stan_model(file = system.file('stan', 'funnel_reparameterized.stan', package = 'sbcrs'))

Create an SBC object called sbc2a with functions to: 1) generate simulated values of x_raw and y_raw, and 2) call rstan::sampling, passing in funnel_reparameterized as the model to be sampled.

sbc2a <- SBC$new( params = function(seed, data) { set.seed(seed + 10) y_raw <- rnorm(1, 0, 1) x_raw <- rnorm(9, 0, 1) list(y_raw = y_raw, x_raw = x_raw) }, sampling = function(seed, data, params, modeled_data, iters) { sampling(funnel_reparameterized, data = data, seed = seed, chains = 1, iter = 2 * iters, warmup = iters) }) Perform the calibration. sbc2a$calibrate(N = 256, L = 50)
sbc2a$plot() sbc2a$plot('y_raw')

sbc2a$plot('x_raw') sbc2a$summary()
#>
#>
#>    iq expected.outside actual.outside
#>  0.50             0.50      0.4313725
#>  0.98             0.02      0.0000000

Rather than comparing simulated and sampled values of x_raw and y_raw, we might instead wish to compare x and y. In the code that creates the object sbc2b, note that the params() function returns a list with named values x and y. Hence the calibration is performed on those parameters.

sbc2b <- SBC$new( params = function(seed, data) { set.seed(seed + 10) y_raw <- rnorm(1, 0, 1) x_raw <- rnorm(9, 0, 1) y <- 3.0 * y_raw; x <- exp(y/2) * x_raw; list(x = x, y = y) }, sampling = function(seed, data, params, modeled_data, iters) { sampling(funnel_reparameterized, data = data, seed = seed, chains = 1, iter = 2 * iters, warmup = iters) }) Perform the calibration. sbc2b$calibrate(N = 256, L = 50)
sbc2b$plot() sbc2b$plot('y')

sbc2b$plot('x') sbc2b$summary()
#>
#>
#>    iq expected.outside actual.outside
#>  0.50             0.50      0.3921569
#>  0.98             0.02      0.0000000