stanify

Introduction

Stanify is a library for translating Vensim models into probabilistic programs defined with the Stan language. It is composed of two parts: a source-to-source code translator for model conversion and a miniature modeling language, V2S, for specifying bayesian models on top of the Vensim model.

Stanify aims to bring together the powerful, intuitive interface for designing dynamic models of Vensim and the robust inference performance of Stan. By using Stanify users can specify priors for Vensim variables, declare observational models, and much more on top of their existing Vensim model without having to write any Stan code.

A simple example: SBC for the predator-prey model

The Stanify workflow begins after you have created a Vensim model. For our example, we will be using a simple predator-prey model.

Suppose we would like to infer the measurement error of the observed predator and prey populations to the Lotka-Volterra model's calculated populations. This involves the assumption that the true observed populations follow a normal distribution centered around the calculated population with unknown standard deviation parameters $\sigma$.

We would like to check whether inference for the model is well-calibrated using Simulation-Based Calibration. This process involves simulating 'fake data' from the generative model, which is used to re-fit the model. Then the re-fitted parameter values' are compared against the 'true' parameter values used to generate the fake data.

Assume we have the stock variables predator and prey defined to be the population of the respective species at a given time. Our goal is to express that the observed values of the population follow a parameterized distribution. This would be described like the following:

$$ \begin{aligned} {prey_{observed}}_t &\sim \mathrm{normal}(prey_t, {\sigma_{prey}}) \\ {predator_{observed}}_t &\sim \mathrm{normal}(predator_t, {\sigma_{predator}}) \\ \sigma_{prey} &\sim \mathrm{cauchy}(0, 5) \\ \sigma_{predator} &\sim \mathrm{cauchy}(0, 5) \\ \end{aligned} $$

Here we have defined two additional parameters: $\sigma_{prey}$ and $\sigma_{predator}$, which are to be inferred with MCMC.

Once we have a clear picture of what inference is to be done with the Vensim model, it is straightforward to convert this into code:

from stanify.builders.vensim2stan import Vensim2Stan
import numpy as np

vensim_model_path = "vensim_models/prey_predator_wopnoise.mdl"

We write the above observation error model as a string holding the V2S code. V2S is the syntax used to describe such models in a compact manner. For more information regarding V2S, refer to the Introduction to the V2S syntax section.

The V2S code that corresponds to the above observation model would be written as the following:

v2s_code = """
prey_obs[timesteps] ~ normal(prey[timesteps], sigma_prey);
predator_obs[timesteps] ~ normal(predator[timesteps], sigma_predator);
sigma_prey<lower=0> ~ lognormal(0, 1);
sigma_predator<lower=0> ~ lognormal(0, 1);
"""

Important: Note that in Stanify all variable names defined in Vensim are converted into lowercase and then into an identifier. This means that if you had a variable named AVG RETURN TIME in Vensim, in V2S you need to reference it as avg_return_time.

Since we now have the Vensim model and V2S code that describes the statistical model we wish to perform inference, we will invoke Vensim2Stan, which is the entry point for all things related to Stanify:

timesteps = np.arange(0, 10, dtype=np.float32) + 1e-6

v2s = Vensim2Stan(v2s_code, vensim_model_path, data_variable=["predator_obs", "prey_obs"], initial_time=0,
                  integration_times=timesteps)

The first two arguments are the V2S code string and the path to the Vensim model. data_variable argument denotes for SBC, which parameters should be generated for 'fake data'. In our case, we will generate both prey_obs and predator_obs as fake data for SBC. initial_time and integration_times indicate the starting timestep and which timesteps to calculate the integrals for the differential equations.

As explained above, SBC requires two steps; generating fake data and re-fitting parameters using the fake data. Normally, this involves running two separate Stan programs repeatedly. However, Vensim2Stan provides a convenience method run_sbc() which can automatically perform SBC. For our example, we will be generating 1000 fake datasets, which each will be used to fit parameters over 1000 iterations, for each dataset:

sbc_result_inferencedata = v2s.run_sbc(n_fits=1000, n_draws=1000, n_chains=4)

Running the above function returns an arviz.inferencedata object holding the generated fake data and the fitted parameters. These values will be the basis for plotting SBC results for analysis.

The first type of plot is a rank histogram plot - by the assumption of SBC, a well calibrated model in the context of SBC implies the uniformity of ranks:

plot_rank_hist(sbc_result_inferencedata, "sigma_prey")
plot_rank_hist(sbc_result_inferencedata, "sigma_predator")

A supplementary plot is the rank ECDF plot which also aids in identifying (un)uniformity:

plot_ecdf(sbc_idata, "sigma_prey")
plot_ecdf(sbc_idata, "sigma_prey")

Note on performing just inference(fitting parameters)

By default, Stanify performs SBC for the specified V2S-vensim model. If you're not interested in SBC and would just like to fit your existing data to the model, you can use the generated data2draws.stan Stan program to perform inference.

Introduction to the V2S syntax

The V2S syntax is designed to help integrate Vensim subscripts with Stan. The syntax is very simple and should get you up and running with hierarchical models.

It is heavily influenced by both the Rat PPL and the subscript feature of Vensim.

Basic Syntax

Let's take a hierarchical predator-prey model as an example. Suppose we defined two stock variables, prey, predator for each prey/predator quantities, just like the example we used in the previous section.

But also suppose each stock variable is augmented with a region subscript, such that in Vensim, you would've written:

prey[region] = INTEG(prey birth rate[region]-prey death rate[region], 30)
predator[region] = INTEG(predator birth rate[region]-predator death rate[region], 4)

Ignore the right-hand side for now; we're just interested in prey and predator values, which have a subscript region like so with two distinct regions:

region = A, B

Suppose that the observed values are normally distributed, with some unknown measurement error existing for each region. The assumption is that the measurement error follows $\mathrm{cauchy}(0, 5)$. Mathematically this would mean:

$$ \begin{aligned} {prey_{observed}}_{region,t} &\sim \mathrm{normal}(prey, {\sigma_{prey}}_{region}) \\ {predator_{observed}}_{region,t} &\sim \mathrm{normal}(predator, {\sigma_{predator}}_{region}) \\ {\sigma_{prey}}_{region} &\sim \mathrm{cauchy}(0, 5) \\ {\sigma_{predator}}_{region} &\sim \mathrm{cauchy}(0, 5) \\ \end{aligned} $$

The V2S model corresponding to the above specification would be written as so:

prey_observed[region, timesteps] ~ normal(prey[region, timesteps], sigma_prey[region]);
predator_observed[region, timesteps] ~ normal(predator[region, timesteps], sigma_predator[region]);
sigma_prey<lower=0.0>[region] ~ cauchy(0, 5);
sigma_predator<lower=0.0>[region] ~ cauchy(0, 5);

Let's pick this apart piece by piece:

prey_observed[region, timesteps] ~ normal(prey[region, timesteps], sigma_prey[region]);
predator_observed[region, timesteps] ~ normal(predator[region, timesteps], sigma_predator[region]);

The first two lines here define prey_observed and predator_observed as sampled variables, which are drawn from a normal distribution. The bracket notation [region, timesteps] appended to the variables indicate that variables should be created for every value of the subscript region. Since region has two values(A and B), prey_observed and predator_observed would each become a 2-by-(# of timesteps) array.

Note that there's a subscript timesteps, which wasn't present in the Vensim model. This is a special subscript for V2S that must be present for stock variables. It indicates obviously the time index of the stock variable.

sigma_prey<lower=0.0>[region] ~ cauchy(0, 5);
sigma_predator<lower=0.0>[region] ~ cauchy(0, 5);

Here we see the same bracket syntax again, which tells V2S to make a sigma parameter for each region and time. Additionally, we have another syntax that indicates the lower bound of $\sigma$ be zero, since it would be the standard deviation.

The exact syntax for defining lower and upper bounds are of the following form:

<lower=n, upper=m>

where n, m is a number. You can use both of them at the same time or just a single one; this depends on the modeling scenario. Note that by default when you omit the bounds it will be interpreted as an unbounded parameter:

<lower=-inf, upper=inf>

LHS Variable Ambiguity

Sometimes You have a static data variable defined in Vensim and would like to use that variable as a data variable, even though it was used on the LHS of a sampling statement.

Suppose I had some variable numberofpeople in the Vensim model, and it had a static value of 4. There are 2 ways to interpret the following V2S code:

mean ~ normal(0, 1);
numberofpeople ~ normal(mean, 1);

It can mean either:

numberofpeople is a parameter, which is simulated to be drawn from the normal(mean, 1) distribution.
numberofpeople is known, but is drawn from the distribution.

Normally in V2S it will default to interpreting it as the first case, which will then override the static value(4) defined in the Vensim model and declare it to be a Stan parameter becoming a quantity to be sampled.

If you wish to treat the variable as data, simply add a @ before the variable usage:

@numberofpeople ~ normal(mean, 1);

Now V2S will not consider numberofpeople as a parameter, given it exists in the Vensim model and is static.

Variable Transformations (WIP)

~~V2S supports assignment statements(=) which are processed by generating code for the "transformed parameters" block in Stan.~~

The specific order between constraints(bound transformations) and variable transformations are done according to the following order:

Parameters are drawn from the unconstrained space

Parameters are constrained, if applicable.

Variable transformations are calculated based on the constrained values from step 2.

~~Note that density evaluations are also performed on the constrained parameters.~~

View Source

  1r'''
  2# Introduction
  3
  4Stanify is a library for translating Vensim models into probabilistic programs defined with the Stan language.
  5It is composed of two parts: a source-to-source code translator for model conversion and a miniature modeling language,
  6V2S, for specifying bayesian models on top of the Vensim model.
  7
  8Stanify aims to bring together the powerful, intuitive interface for designing dynamic models of Vensim and the robust
  9inference performance of Stan. By using Stanify users can specify priors for Vensim variables, declare observational
 10models, and much more on top of their existing Vensim model without having to write any Stan code.
 11
 12## A simple example: SBC for the predator-prey model
 13
 14The Stanify workflow begins after you have created a Vensim model. For our example, we will be using a simple
 15predator-prey model.
 16
 17Suppose we would like to infer the measurement error of the observed predator and prey populations to the Lotka-Volterra
 18 model's calculated populations. This involves the assumption that the true observed populations follow a normal
 19distribution centered around the calculated population with unknown standard deviation parameters $\sigma$.
 20
 21We would like to check whether inference for the model is well-calibrated using Simulation-Based Calibration. This
 22process involves simulating 'fake data' from the generative model, which is used to re-fit the model. Then the re-fitted
 23parameter values' are compared against the 'true' parameter values used to generate the fake data.
 24
 25Assume we have the stock variables `predator` and `prey` defined to be the population of the respective species at a
 26given time. Our goal is to express that the observed values of the population follow a parameterized distribution. This
 27would be described like the following:
 28
 29$$
 30\begin{aligned}
 31{prey_{observed}}_t &\sim \mathrm{normal}(prey_t, {\sigma_{prey}}) \\\
 32{predator_{observed}}_t &\sim \mathrm{normal}(predator_t, {\sigma_{predator}}) \\\
 33\sigma_{prey} &\sim \mathrm{cauchy}(0, 5) \\\
 34\sigma_{predator} &\sim \mathrm{cauchy}(0, 5) \\\
 35\end{aligned}
 36$$
 37
 38Here we have defined two additional parameters: $\sigma_{prey}$ and $\sigma_{predator}$, which are to be inferred with
 39MCMC.
 40
 41Once we have a clear picture of what inference is to be done with the Vensim model, it is straightforward to convert
 42this into code:
 43
 44```
 45from stanify.builders.vensim2stan import Vensim2Stan
 46import numpy as np
 47
 48vensim_model_path = "vensim_models/prey_predator_wopnoise.mdl"
 49```
 50
 51We write the above observation error model as a string holding the V2S code. V2S is the syntax used to describe such
 52models in a compact manner. For more information regarding V2S, refer to the
 53[Introduction to the V2S syntax](#introduction-to-the-v2s-syntax) section.
 54
 55The V2S code that corresponds to the above observation model would be written as the following:
 56
 57```
 58v2s_code = """
 59prey_obs[timesteps] ~ normal(prey[timesteps], sigma_prey);
 60predator_obs[timesteps] ~ normal(predator[timesteps], sigma_predator);
 61sigma_prey<lower=0> ~ lognormal(0, 1);
 62sigma_predator<lower=0> ~ lognormal(0, 1);
 63"""
 64```
 65
 66**Important**: Note that in Stanify all variable names defined in Vensim are converted into lowercase and then into an
 67identifier. This means that if you had a variable named `AVG RETURN TIME` in Vensim, in V2S you need to reference it as
 68`avg_return_time`.
 69
 70Since we now have the Vensim model and V2S code that describes the statistical model we wish to perform inference,
 71we will invoke `Vensim2Stan`, which is the entry point for all things related to Stanify:
 72
 73```
 74timesteps = np.arange(0, 10, dtype=np.float32) + 1e-6
 75
 76v2s = Vensim2Stan(v2s_code, vensim_model_path, data_variable=["predator_obs", "prey_obs"], initial_time=0,
 77                  integration_times=timesteps)
 78```
 79
 80The first two arguments are the V2S code string and the path to the Vensim model. `data_variable` argument denotes for
 81SBC, which parameters should be generated for 'fake data'. In our case, we will generate both `prey_obs` and
 82`predator_obs` as fake data for SBC. `initial_time` and `integration_times` indicate the starting timestep and which
 83timesteps to calculate the integrals for the differential equations.
 84
 85----
 86As explained above, SBC requires two steps; generating fake data and re-fitting parameters using the fake data.
 87Normally, this involves running two separate Stan programs repeatedly. However, Vensim2Stan provides a convenience
 88method `run_sbc()` which can automatically perform SBC. For our example, we will be generating 1000 fake datasets, which
 89 each will be used to fit parameters over 1000 iterations, for each dataset:
 90
 91```
 92sbc_result_inferencedata = v2s.run_sbc(n_fits=1000, n_draws=1000, n_chains=4)
 93```
 94
 95Running the above function returns an
 96[arviz.inferencedata](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html) object holding the
 97generated fake data and the fitted parameters. These values will be the basis for plotting SBC results for analysis.
 98
 99The first type of plot is a rank histogram plot - by the assumption of SBC, a well calibrated model in the context
100of SBC implies the uniformity of ranks:
101
102```
103plot_rank_hist(sbc_result_inferencedata, "sigma_prey")
104plot_rank_hist(sbc_result_inferencedata, "sigma_predator")
105```
106
107A supplementary plot is the rank ECDF plot which also aids in identifying (un)uniformity:
108```
109plot_ecdf(sbc_idata, "sigma_prey")
110plot_ecdf(sbc_idata, "sigma_prey")
111```
112
113### Note on performing just inference(fitting parameters)
114
115By default, Stanify performs SBC for the specified V2S-vensim model. If you're not interested in SBC and would just like
116 to fit your existing data to the model, you can use the generated `data2draws.stan` Stan program to perform inference.
117
118# Introduction to the V2S syntax
119
120The V2S syntax is designed to help integrate Vensim subscripts with Stan. The syntax is very simple and should get you
121up and running with hierarchical models.
122
123It is heavily influenced by both the Rat PPL and the subscript feature of Vensim.
124
125### Basic Syntax
126
127Let's take a hierarchical predator-prey model as an example. Suppose we defined two stock variables, `prey, predator`
128for each prey/predator quantities, just like the example we used in the previous section.
129
130But also suppose each stock variable is augmented with a region subscript, such that in Vensim, you would've written:
131```
132prey[region] = INTEG(prey birth rate[region]-prey death rate[region], 30)
133predator[region] = INTEG(predator birth rate[region]-predator death rate[region], 4)
134```
135
136Ignore the right-hand side for now; we're just interested in `prey` and `predator` values, which have a subscript
137`region` like so with two distinct regions:
138```
139region = A, B
140```
141Suppose that the observed values are normally distributed, with some unknown measurement error existing for each
142region. The assumption is that the measurement error follows $\mathrm{cauchy}(0, 5)$. Mathematically this would mean:
143
144$$
145\begin{aligned}
146{prey_{observed}}_{region,t} &\sim \mathrm{normal}(prey, {\sigma_{prey}}_{region}) \\\
147{predator_{observed}}_{region,t} &\sim \mathrm{normal}(predator, {\sigma_{predator}}_{region}) \\\
148{\sigma_{prey}}_{region} &\sim \mathrm{cauchy}(0, 5) \\\
149{\sigma_{predator}}_{region} &\sim \mathrm{cauchy}(0, 5) \\\
150\end{aligned}
151$$
152
153The V2S model corresponding to the above specification would be written as so:
154```
155prey_observed[region, timesteps] ~ normal(prey[region, timesteps], sigma_prey[region]);
156predator_observed[region, timesteps] ~ normal(predator[region, timesteps], sigma_predator[region]);
157sigma_prey<lower=0.0>[region] ~ cauchy(0, 5);
158sigma_predator<lower=0.0>[region] ~ cauchy(0, 5);
159```
160
161Let's pick this apart piece by piece:
162```
163prey_observed[region, timesteps] ~ normal(prey[region, timesteps], sigma_prey[region]);
164predator_observed[region, timesteps] ~ normal(predator[region, timesteps], sigma_predator[region]);
165```
166
167The first two lines here define `prey_observed` and `predator_observed` as sampled variables, which are drawn from a
168normal distribution. The bracket notation `[region, timesteps]` appended to the variables indicate that variables should be created
169for every value of the subscript `region`. Since `region` has two values(`A` and `B`), `prey_observed` and
170`predator_observed` would each become a **2-by-(# of timesteps) array**.
171
172Note that there's a subscript `timesteps`, which wasn't present in the Vensim model. This is a special subscript for V2S
173 that **must be present for stock variables**. It indicates obviously the time index of the stock variable.
174
175```
176sigma_prey<lower=0.0>[region] ~ cauchy(0, 5);
177sigma_predator<lower=0.0>[region] ~ cauchy(0, 5);
178```
179Here we see the same bracket syntax again, which tells V2S to make a sigma parameter for each region and time. Additionally, we
180have another syntax that indicates the lower bound of $\sigma$ be zero, since it would be the standard deviation.
181
182The exact syntax for defining lower and upper bounds are of the following form:
183```
184<lower=n, upper=m>
185```
186where `n, m` is a number. You can use both of them at the same time or just a single one; this depends on the modeling
187scenario. Note that by default when you omit the bounds it will be interpreted as an unbounded parameter:
188```
189<lower=-inf, upper=inf>
190```
191
192### LHS Variable Ambiguity
193
194Sometimes You have a static data variable defined in Vensim and would like to use that variable as a data variable, even
195though it was used on the LHS of a sampling statement.
196
197Suppose I had some variable `numberofpeople` in the Vensim model, and it had a static value of 4. There are 2 ways to
198interpret the following V2S code:
199```
200mean ~ normal(0, 1);
201numberofpeople ~ normal(mean, 1);
202```
203
204It can mean either:
205
206- `numberofpeople` is a parameter, which is simulated to be drawn from the `normal(mean, 1)` distribution.
207- `numberofpeople` is known, but is drawn from the distribution.
208
209Normally in V2S it will default to interpreting it as the first case, which will then override the static value(`4`)
210defined in the Vensim model and declare it to be a Stan parameter becoming a quantity to be sampled.
211
212If you wish to treat the variable as *data*, simply add a `@` before the variable usage:
213
214```
215@numberofpeople ~ normal(mean, 1);
216```
217
218Now V2S will not consider `numberofpeople` as a parameter, given it exists in the Vensim model and **is static**.
219
220### Variable Transformations (WIP)
221<strike>
222V2S supports assignment statements(`=`) which are processed by generating code for the "transformed parameters" block in
223Stan.
224
225The specific order between constraints(bound transformations) and variable transformations are done according to the
226following order:
227
2281. Parameters are drawn from the unconstrained space
2292. Parameters are constrained, if applicable.
2303. Variable transformations are calculated **based on the constrained values** from step 2.
231
232Note that density evaluations are also performed on the **constrained parameters**.
233</strike>
234'''