stanify
Introduction
Stanify is a library for translating Vensim models into probabilistic programs defined with the Stan language. It is composed of two parts: a source-to-source code translator for model conversion and a miniature modeling language, V2S, for specifying bayesian models on top of the Vensim model.
Stanify aims to bring together the powerful, intuitive interface for designing dynamic models of Vensim and the robust inference performance of Stan. By using Stanify users can specify priors for Vensim variables, declare observational models, and much more on top of their existing Vensim model without having to write any Stan code.
A simple example: SBC for the predator-prey model
The Stanify workflow begins after you have created a Vensim model. For our example, we will be using a simple predator-prey model.
Suppose we would like to infer the measurement error of the observed predator and prey populations to the Lotka-Volterra model's calculated populations. This involves the assumption that the true observed populations follow a normal distribution centered around the calculated population with unknown standard deviation parameters $\sigma$.
We would like to check whether inference for the model is well-calibrated using Simulation-Based Calibration. This process involves simulating 'fake data' from the generative model, which is used to re-fit the model. Then the re-fitted parameter values' are compared against the 'true' parameter values used to generate the fake data.
Assume we have the stock variables predator
and prey
defined to be the population of the respective species at a
given time. Our goal is to express that the observed values of the population follow a parameterized distribution. This
would be described like the following:
$$ \begin{aligned} {prey_{observed}}_t &\sim \mathrm{normal}(prey_t, {\sigma_{prey}}) \\ {predator_{observed}}_t &\sim \mathrm{normal}(predator_t, {\sigma_{predator}}) \\ \sigma_{prey} &\sim \mathrm{cauchy}(0, 5) \\ \sigma_{predator} &\sim \mathrm{cauchy}(0, 5) \\ \end{aligned} $$
Here we have defined two additional parameters: $\sigma_{prey}$ and $\sigma_{predator}$, which are to be inferred with MCMC.
Once we have a clear picture of what inference is to be done with the Vensim model, it is straightforward to convert this into code:
from stanify.builders.vensim2stan import Vensim2Stan
import numpy as np
vensim_model_path = "vensim_models/prey_predator_wopnoise.mdl"
We write the above observation error model as a string holding the V2S code. V2S is the syntax used to describe such models in a compact manner. For more information regarding V2S, refer to the Introduction to the V2S syntax section.
The V2S code that corresponds to the above observation model would be written as the following:
v2s_code = """
prey_obs[timesteps] ~ normal(prey[timesteps], sigma_prey);
predator_obs[timesteps] ~ normal(predator[timesteps], sigma_predator);
sigma_prey<lower=0> ~ lognormal(0, 1);
sigma_predator<lower=0> ~ lognormal(0, 1);
"""
Important: Note that in Stanify all variable names defined in Vensim are converted into lowercase and then into an
identifier. This means that if you had a variable named AVG RETURN TIME
in Vensim, in V2S you need to reference it as
avg_return_time
.
Since we now have the Vensim model and V2S code that describes the statistical model we wish to perform inference,
we will invoke Vensim2Stan
, which is the entry point for all things related to Stanify:
timesteps = np.arange(0, 10, dtype=np.float32) + 1e-6
v2s = Vensim2Stan(v2s_code, vensim_model_path, data_variable=["predator_obs", "prey_obs"], initial_time=0,
integration_times=timesteps)
The first two arguments are the V2S code string and the path to the Vensim model. data_variable
argument denotes for
SBC, which parameters should be generated for 'fake data'. In our case, we will generate both prey_obs
and
predator_obs
as fake data for SBC. initial_time
and integration_times
indicate the starting timestep and which
timesteps to calculate the integrals for the differential equations.
As explained above, SBC requires two steps; generating fake data and re-fitting parameters using the fake data.
Normally, this involves running two separate Stan programs repeatedly. However, Vensim2Stan provides a convenience
method run_sbc()
which can automatically perform SBC. For our example, we will be generating 1000 fake datasets, which
each will be used to fit parameters over 1000 iterations, for each dataset:
sbc_result_inferencedata = v2s.run_sbc(n_fits=1000, n_draws=1000, n_chains=4)
Running the above function returns an arviz.inferencedata object holding the generated fake data and the fitted parameters. These values will be the basis for plotting SBC results for analysis.
The first type of plot is a rank histogram plot - by the assumption of SBC, a well calibrated model in the context of SBC implies the uniformity of ranks:
plot_rank_hist(sbc_result_inferencedata, "sigma_prey")
plot_rank_hist(sbc_result_inferencedata, "sigma_predator")
A supplementary plot is the rank ECDF plot which also aids in identifying (un)uniformity:
plot_ecdf(sbc_idata, "sigma_prey")
plot_ecdf(sbc_idata, "sigma_prey")
Note on performing just inference(fitting parameters)
By default, Stanify performs SBC for the specified V2S-vensim model. If you're not interested in SBC and would just like
to fit your existing data to the model, you can use the generated data2draws.stan
Stan program to perform inference.
Introduction to the V2S syntax
The V2S syntax is designed to help integrate Vensim subscripts with Stan. The syntax is very simple and should get you up and running with hierarchical models.
It is heavily influenced by both the Rat PPL and the subscript feature of Vensim.
Basic Syntax
Let's take a hierarchical predator-prey model as an example. Suppose we defined two stock variables, prey, predator
for each prey/predator quantities, just like the example we used in the previous section.
But also suppose each stock variable is augmented with a region subscript, such that in Vensim, you would've written:
prey[region] = INTEG(prey birth rate[region]-prey death rate[region], 30)
predator[region] = INTEG(predator birth rate[region]-predator death rate[region], 4)
Ignore the right-hand side for now; we're just interested in prey
and predator
values, which have a subscript
region
like so with two distinct regions:
region = A, B
Suppose that the observed values are normally distributed, with some unknown measurement error existing for each region. The assumption is that the measurement error follows $\mathrm{cauchy}(0, 5)$. Mathematically this would mean:
$$ \begin{aligned} {prey_{observed}}_{region,t} &\sim \mathrm{normal}(prey, {\sigma_{prey}}_{region}) \\ {predator_{observed}}_{region,t} &\sim \mathrm{normal}(predator, {\sigma_{predator}}_{region}) \\ {\sigma_{prey}}_{region} &\sim \mathrm{cauchy}(0, 5) \\ {\sigma_{predator}}_{region} &\sim \mathrm{cauchy}(0, 5) \\ \end{aligned} $$
The V2S model corresponding to the above specification would be written as so:
prey_observed[region, timesteps] ~ normal(prey[region, timesteps], sigma_prey[region]);
predator_observed[region, timesteps] ~ normal(predator[region, timesteps], sigma_predator[region]);
sigma_prey<lower=0.0>[region] ~ cauchy(0, 5);
sigma_predator<lower=0.0>[region] ~ cauchy(0, 5);
Let's pick this apart piece by piece:
prey_observed[region, timesteps] ~ normal(prey[region, timesteps], sigma_prey[region]);
predator_observed[region, timesteps] ~ normal(predator[region, timesteps], sigma_predator[region]);
The first two lines here define prey_observed
and predator_observed
as sampled variables, which are drawn from a
normal distribution. The bracket notation [region, timesteps]
appended to the variables indicate that variables should be created
for every value of the subscript region
. Since region
has two values(A
and B
), prey_observed
and
predator_observed
would each become a 2-by-(# of timesteps) array.
Note that there's a subscript timesteps
, which wasn't present in the Vensim model. This is a special subscript for V2S
that must be present for stock variables. It indicates obviously the time index of the stock variable.
sigma_prey<lower=0.0>[region] ~ cauchy(0, 5);
sigma_predator<lower=0.0>[region] ~ cauchy(0, 5);
Here we see the same bracket syntax again, which tells V2S to make a sigma parameter for each region and time. Additionally, we have another syntax that indicates the lower bound of $\sigma$ be zero, since it would be the standard deviation.
The exact syntax for defining lower and upper bounds are of the following form:
<lower=n, upper=m>
where n, m
is a number. You can use both of them at the same time or just a single one; this depends on the modeling
scenario. Note that by default when you omit the bounds it will be interpreted as an unbounded parameter:
<lower=-inf, upper=inf>
LHS Variable Ambiguity
Sometimes You have a static data variable defined in Vensim and would like to use that variable as a data variable, even though it was used on the LHS of a sampling statement.
Suppose I had some variable numberofpeople
in the Vensim model, and it had a static value of 4. There are 2 ways to
interpret the following V2S code:
mean ~ normal(0, 1);
numberofpeople ~ normal(mean, 1);
It can mean either:
numberofpeople
is a parameter, which is simulated to be drawn from thenormal(mean, 1)
distribution.numberofpeople
is known, but is drawn from the distribution.
Normally in V2S it will default to interpreting it as the first case, which will then override the static value(4
)
defined in the Vensim model and declare it to be a Stan parameter becoming a quantity to be sampled.
If you wish to treat the variable as data, simply add a @
before the variable usage:
@numberofpeople ~ normal(mean, 1);
Now V2S will not consider numberofpeople
as a parameter, given it exists in the Vensim model and is static.
Variable Transformations (WIP)
V2S supports assignment statements(=
) which are processed by generating code for the "transformed parameters" block in
Stan.
The specific order between constraints(bound transformations) and variable transformations are done according to the following order:
- Parameters are drawn from the unconstrained space
- Parameters are constrained, if applicable.
- Variable transformations are calculated based on the constrained values from step 2.
Note that density evaluations are also performed on the constrained parameters.
1r''' 2# Introduction 3 4Stanify is a library for translating Vensim models into probabilistic programs defined with the Stan language. 5It is composed of two parts: a source-to-source code translator for model conversion and a miniature modeling language, 6V2S, for specifying bayesian models on top of the Vensim model. 7 8Stanify aims to bring together the powerful, intuitive interface for designing dynamic models of Vensim and the robust 9inference performance of Stan. By using Stanify users can specify priors for Vensim variables, declare observational 10models, and much more on top of their existing Vensim model without having to write any Stan code. 11 12## A simple example: SBC for the predator-prey model 13 14The Stanify workflow begins after you have created a Vensim model. For our example, we will be using a simple 15predator-prey model. 16 17Suppose we would like to infer the measurement error of the observed predator and prey populations to the Lotka-Volterra 18 model's calculated populations. This involves the assumption that the true observed populations follow a normal 19distribution centered around the calculated population with unknown standard deviation parameters $\sigma$. 20 21We would like to check whether inference for the model is well-calibrated using Simulation-Based Calibration. This 22process involves simulating 'fake data' from the generative model, which is used to re-fit the model. Then the re-fitted 23parameter values' are compared against the 'true' parameter values used to generate the fake data. 24 25Assume we have the stock variables `predator` and `prey` defined to be the population of the respective species at a 26given time. Our goal is to express that the observed values of the population follow a parameterized distribution. This 27would be described like the following: 28 29$$ 30\begin{aligned} 31{prey_{observed}}_t &\sim \mathrm{normal}(prey_t, {\sigma_{prey}}) \\\ 32{predator_{observed}}_t &\sim \mathrm{normal}(predator_t, {\sigma_{predator}}) \\\ 33\sigma_{prey} &\sim \mathrm{cauchy}(0, 5) \\\ 34\sigma_{predator} &\sim \mathrm{cauchy}(0, 5) \\\ 35\end{aligned} 36$$ 37 38Here we have defined two additional parameters: $\sigma_{prey}$ and $\sigma_{predator}$, which are to be inferred with 39MCMC. 40 41Once we have a clear picture of what inference is to be done with the Vensim model, it is straightforward to convert 42this into code: 43 44``` 45from stanify.builders.vensim2stan import Vensim2Stan 46import numpy as np 47 48vensim_model_path = "vensim_models/prey_predator_wopnoise.mdl" 49``` 50 51We write the above observation error model as a string holding the V2S code. V2S is the syntax used to describe such 52models in a compact manner. For more information regarding V2S, refer to the 53[Introduction to the V2S syntax](#introduction-to-the-v2s-syntax) section. 54 55The V2S code that corresponds to the above observation model would be written as the following: 56 57``` 58v2s_code = """ 59prey_obs[timesteps] ~ normal(prey[timesteps], sigma_prey); 60predator_obs[timesteps] ~ normal(predator[timesteps], sigma_predator); 61sigma_prey<lower=0> ~ lognormal(0, 1); 62sigma_predator<lower=0> ~ lognormal(0, 1); 63""" 64``` 65 66**Important**: Note that in Stanify all variable names defined in Vensim are converted into lowercase and then into an 67identifier. This means that if you had a variable named `AVG RETURN TIME` in Vensim, in V2S you need to reference it as 68`avg_return_time`. 69 70Since we now have the Vensim model and V2S code that describes the statistical model we wish to perform inference, 71we will invoke `Vensim2Stan`, which is the entry point for all things related to Stanify: 72 73``` 74timesteps = np.arange(0, 10, dtype=np.float32) + 1e-6 75 76v2s = Vensim2Stan(v2s_code, vensim_model_path, data_variable=["predator_obs", "prey_obs"], initial_time=0, 77 integration_times=timesteps) 78``` 79 80The first two arguments are the V2S code string and the path to the Vensim model. `data_variable` argument denotes for 81SBC, which parameters should be generated for 'fake data'. In our case, we will generate both `prey_obs` and 82`predator_obs` as fake data for SBC. `initial_time` and `integration_times` indicate the starting timestep and which 83timesteps to calculate the integrals for the differential equations. 84 85---- 86As explained above, SBC requires two steps; generating fake data and re-fitting parameters using the fake data. 87Normally, this involves running two separate Stan programs repeatedly. However, Vensim2Stan provides a convenience 88method `run_sbc()` which can automatically perform SBC. For our example, we will be generating 1000 fake datasets, which 89 each will be used to fit parameters over 1000 iterations, for each dataset: 90 91``` 92sbc_result_inferencedata = v2s.run_sbc(n_fits=1000, n_draws=1000, n_chains=4) 93``` 94 95Running the above function returns an 96[arviz.inferencedata](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html) object holding the 97generated fake data and the fitted parameters. These values will be the basis for plotting SBC results for analysis. 98 99The first type of plot is a rank histogram plot - by the assumption of SBC, a well calibrated model in the context 100of SBC implies the uniformity of ranks: 101 102``` 103plot_rank_hist(sbc_result_inferencedata, "sigma_prey") 104plot_rank_hist(sbc_result_inferencedata, "sigma_predator") 105``` 106 107A supplementary plot is the rank ECDF plot which also aids in identifying (un)uniformity: 108``` 109plot_ecdf(sbc_idata, "sigma_prey") 110plot_ecdf(sbc_idata, "sigma_prey") 111``` 112 113### Note on performing just inference(fitting parameters) 114 115By default, Stanify performs SBC for the specified V2S-vensim model. If you're not interested in SBC and would just like 116 to fit your existing data to the model, you can use the generated `data2draws.stan` Stan program to perform inference. 117 118# Introduction to the V2S syntax 119 120The V2S syntax is designed to help integrate Vensim subscripts with Stan. The syntax is very simple and should get you 121up and running with hierarchical models. 122 123It is heavily influenced by both the Rat PPL and the subscript feature of Vensim. 124 125### Basic Syntax 126 127Let's take a hierarchical predator-prey model as an example. Suppose we defined two stock variables, `prey, predator` 128for each prey/predator quantities, just like the example we used in the previous section. 129 130But also suppose each stock variable is augmented with a region subscript, such that in Vensim, you would've written: 131``` 132prey[region] = INTEG(prey birth rate[region]-prey death rate[region], 30) 133predator[region] = INTEG(predator birth rate[region]-predator death rate[region], 4) 134``` 135 136Ignore the right-hand side for now; we're just interested in `prey` and `predator` values, which have a subscript 137`region` like so with two distinct regions: 138``` 139region = A, B 140``` 141Suppose that the observed values are normally distributed, with some unknown measurement error existing for each 142region. The assumption is that the measurement error follows $\mathrm{cauchy}(0, 5)$. Mathematically this would mean: 143 144$$ 145\begin{aligned} 146{prey_{observed}}_{region,t} &\sim \mathrm{normal}(prey, {\sigma_{prey}}_{region}) \\\ 147{predator_{observed}}_{region,t} &\sim \mathrm{normal}(predator, {\sigma_{predator}}_{region}) \\\ 148{\sigma_{prey}}_{region} &\sim \mathrm{cauchy}(0, 5) \\\ 149{\sigma_{predator}}_{region} &\sim \mathrm{cauchy}(0, 5) \\\ 150\end{aligned} 151$$ 152 153The V2S model corresponding to the above specification would be written as so: 154``` 155prey_observed[region, timesteps] ~ normal(prey[region, timesteps], sigma_prey[region]); 156predator_observed[region, timesteps] ~ normal(predator[region, timesteps], sigma_predator[region]); 157sigma_prey<lower=0.0>[region] ~ cauchy(0, 5); 158sigma_predator<lower=0.0>[region] ~ cauchy(0, 5); 159``` 160 161Let's pick this apart piece by piece: 162``` 163prey_observed[region, timesteps] ~ normal(prey[region, timesteps], sigma_prey[region]); 164predator_observed[region, timesteps] ~ normal(predator[region, timesteps], sigma_predator[region]); 165``` 166 167The first two lines here define `prey_observed` and `predator_observed` as sampled variables, which are drawn from a 168normal distribution. The bracket notation `[region, timesteps]` appended to the variables indicate that variables should be created 169for every value of the subscript `region`. Since `region` has two values(`A` and `B`), `prey_observed` and 170`predator_observed` would each become a **2-by-(# of timesteps) array**. 171 172Note that there's a subscript `timesteps`, which wasn't present in the Vensim model. This is a special subscript for V2S 173 that **must be present for stock variables**. It indicates obviously the time index of the stock variable. 174 175``` 176sigma_prey<lower=0.0>[region] ~ cauchy(0, 5); 177sigma_predator<lower=0.0>[region] ~ cauchy(0, 5); 178``` 179Here we see the same bracket syntax again, which tells V2S to make a sigma parameter for each region and time. Additionally, we 180have another syntax that indicates the lower bound of $\sigma$ be zero, since it would be the standard deviation. 181 182The exact syntax for defining lower and upper bounds are of the following form: 183``` 184<lower=n, upper=m> 185``` 186where `n, m` is a number. You can use both of them at the same time or just a single one; this depends on the modeling 187scenario. Note that by default when you omit the bounds it will be interpreted as an unbounded parameter: 188``` 189<lower=-inf, upper=inf> 190``` 191 192### LHS Variable Ambiguity 193 194Sometimes You have a static data variable defined in Vensim and would like to use that variable as a data variable, even 195though it was used on the LHS of a sampling statement. 196 197Suppose I had some variable `numberofpeople` in the Vensim model, and it had a static value of 4. There are 2 ways to 198interpret the following V2S code: 199``` 200mean ~ normal(0, 1); 201numberofpeople ~ normal(mean, 1); 202``` 203 204It can mean either: 205 206- `numberofpeople` is a parameter, which is simulated to be drawn from the `normal(mean, 1)` distribution. 207- `numberofpeople` is known, but is drawn from the distribution. 208 209Normally in V2S it will default to interpreting it as the first case, which will then override the static value(`4`) 210defined in the Vensim model and declare it to be a Stan parameter becoming a quantity to be sampled. 211 212If you wish to treat the variable as *data*, simply add a `@` before the variable usage: 213 214``` 215@numberofpeople ~ normal(mean, 1); 216``` 217 218Now V2S will not consider `numberofpeople` as a parameter, given it exists in the Vensim model and **is static**. 219 220### Variable Transformations (WIP) 221<strike> 222V2S supports assignment statements(`=`) which are processed by generating code for the "transformed parameters" block in 223Stan. 224 225The specific order between constraints(bound transformations) and variable transformations are done according to the 226following order: 227 2281. Parameters are drawn from the unconstrained space 2292. Parameters are constrained, if applicable. 2303. Variable transformations are calculated **based on the constrained values** from step 2. 231 232Note that density evaluations are also performed on the **constrained parameters**. 233</strike> 234'''