Building a new stochastic SEIR model to deal with probability distributions
probability distributions
modeling
SEIR
epidemiology
stochastic
COVID-19
Author
Jeffrey Post
Published
March 31, 2020
Motivation for write-up
This is the 4th part of a multi-part series blog post on modeling in epidemiology.
The COVID-19 pandemic has brought a lot of attention to study of epidemiology and more specifically to the various mathematical models that are used to inform public health policies. Everyone has been trying to understand the growth or slowing of new cases and trying to predict the necessary sanitary resources. This blog post attempts to explain the foundations for some of the most used models and enlighten the reader on two key points.
After introducing the concepts of compartmentalization and disease dynamics in the first blog post, the second part looked at a deterministic numerical solution for the SEIR model discussed, and the effects of the parameters \(\beta\), \(\sigma\), and \(\gamma\) in parts 1 and 2.
Part 3 made the argument that most models ignore individual-level disease dynamics in favor of averaging population-level \(\sigma\) and \(\gamma\) parameters and showed some big discrepancies between actual COVID-19 probability distributions for those parameters and those used in research.
This 4th part is where I build a numerical SEIR model that takes into account these probability distributions in order to tweak the model as close to COVID-19 data as possible.
Building a stochastic model
As opposed to the deterministic model from Part 2, this model is going to focus on individual level disease dynamics to model the disease propagation.
The basic idea of this model is to have a dataframe with the number of rows equal to the population size (each individual is a row) and two columns:
State column to describe the state of each individual (S, E, I, or R)
Day column to save the day of transition of the individual into that state
However, the population-level rates of transmission still apply here i.e. a person goes from S → E following three points:
the number of contacts the person has per unit time (given by \(r\))
the chance a given contact is with an I - infectious individual (the higher thenumber of I, the higher the chance)
the chance of an S contracting the disease from a contact with an I (given by \(\rho\))
This is done stochastically.
Once a person becomes E, their progression is unique to them. This progression is calculated in advance for computational reason, but it allows to use the time ditributions we want.
Code
import pandas as pdimport numpy as npimport mathimport plotly.graph_objects as goimport plotly.express as pxfrom scipy.stats import exponfrom scipy.stats import gammafrom scipy.stats import weibull_minfrom numpy.random import default_rngrng = default_rng()import plotlyimport plotly.io as piofrom IPython.display import display, HTML## Tomas Mazak's workaroundplotly.offline.init_notebook_mode()display(HTML(#'<script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-MML-AM_SVG"></script>''<script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-MML-AM_SVG"></script>''<script src="https://cdn.plot.ly/plotly-3.0.1.js" charset="utf-8"></script>'))pio.renderers.default ="plotly_mimetype+notebook_connected"pio.templates.default ="plotly_dark"# Let's build a numerical solutiondef seir_model(init, parms, days): S_0, E_0, I_0, R_0 = init Epd, Ipd, Rpd = [0], [0], [0] S, E, I, R = [S_0], [E_0], [I_0], [R_0] dt=0.1 t = np.linspace(0,days,int(days/dt)) sigma, beta, gam = parmsfor _ in t[1:]: next_S = S[-1] - beta*S[-1]*I[-1]*dt Epd.append(beta*S[-1]*I[-1]*dt) next_E = E[-1] + (beta*S[-1]*I[-1] - sigma*E[-1])*dt Ipd.append(sigma*E[-1]*dt) next_I = I[-1] + (sigma*E[-1] - gam*I[-1])*dt Rpd.append(gam*I[-1]*dt) next_R = R[-1] + (gam*I[-1])*dt S.append(next_S) E.append(next_E) I.append(next_I) R.append(next_R)return np.stack([S, E, I, R, Epd, Ipd, Rpd]).T
Creating the initial population dataframe
Below is a function to create the initial population dataframe: * \(p\) is the population number * \(num_E\) is the number of people exposed on day 0 * \(num_I\) is the number of infectious on day 0 * \(num_R\) is the number of people recovered on day 0
Code
# Need this new function for model below:def make_df(p,num_E, num_I, num_R): df = pd.DataFrame(np.full((p,1), 'S').T[0], columns=['State']) df['Day'] =0 tochange=df.loc[rng.choice(p, size=num_E+num_I+num_R, replace=False),'State'].index df.loc[tochange[0:num_E],'State'] ='E' df.loc[tochange[num_E:num_I+num_E],'State'] ='I' df.loc[tochange[num_E+num_I:num_E+num_I+num_R],'State'] ='R'return df
Building the model
Code
def seir_model_stoch(beta, p, num_E, num_I, num_R, days, T_Latent, T_Infectious):# Initialize population dataframe with data given by user df = make_df(p,num_E, num_I, num_R)# This variable is used to track daily value of beta if it varies over time xxbeta=np.array([],dtype=float)# Initialize the arrays to return# Below are numbers of S, E, I, R total S=np.array([],dtype=int) E=np.array([],dtype=int) I=np.array([],dtype=int) R=np.array([],dtype=int)# Below are the daily additions in S, E, I, R Spd=np.array([],dtype=int) Epd=np.array([],dtype=int) Ipd=np.array([],dtype=int) Rpd=np.array([],dtype=int) b=beta# Stochastic model so use random values to decide on progression rand = np.random.random(size=(p,days))# Depending if you want exponential or gamma distribution for T_Latentif T_Latent =='expon': EtoI = expon.rvs(loc=0,scale=5.2,size=p)else: EtoI = gamma.rvs(1.8,loc=0.9,scale=(5.2-1.8)/0.9,size=p)# Depending if you want exponential, gamma, or Weibull distribution for T_Infectious# Uses distributions found on blog part 3if T_Infectious =='expon': ItoR = expon.rvs(loc=0,scale=28.85,size=p)elif T_Infectious =='gamma': ItoR = gamma.rvs(4,loc=3,scale=4.25,size=p) else: ItoR = weibull_min.rvs(2.3, loc=2, scale=20.11, size=p)# Iterate over every day the simulation is runfor j inrange(0,days-1):# Record daily beta values xxbeta=np.append(beta, b)# First we get the index of the individuals that will change state today:# Random number tells you which 'S' have been exposed on this day StoE_index = df.loc[(df.State =='S') & (rand[:,j] < b[j]*len(np.where(df.State=='I')[0])/p)].index# For each row, if a person has been a certain number of days in E, they will go to I# This follows EtoI variable which is either exponential or gamma distributed according to above EtoI_index = df.loc[(df.State =='E') & (j-df.Day >= EtoI)].index# Similaraly as above# For each row, if a person has been a certain number of days in I, they will go to R# This follows EtoI variable which is either exponential or gamma distributed according to above ItoR_index = df.loc[(df.State =='I') & (j-df.Day >= ItoR)].index# Use indexes collected above to populate per day values Epd = np.append(Epd,len(StoE_index)) Ipd = np.append(Ipd,len(EtoI_index)) Rpd = np.append(Rpd,len(ItoR_index))# Now we use the indexes collected above randomly to change the actual population dataframe to the new states df.iloc[ItoR_index] = ['R', j] df.iloc[EtoI_index] = ['I', j] df.iloc[StoE_index] = ['E', j]# Append the S, E, I, and R arrays S=np.append(S,len(np.where(df.State=='S')[0])) E=np.append(E,len(np.where(df.State=='E')[0])) I=np.append(I,len(np.where(df.State=='I')[0])) R=np.append(R,len(np.where(df.State=='R')[0]))# Code below for control measures to reduce beta values# if ((I[-1] > 1000) & (Ipd[-1] > 399)): # b = beta2# elif ((I[-1] > 1000) & (Ipd[-1] < 400)): # b = beta3 Epd[0]+=num_E Ipd[0]+=num_I Rpd[0]+=num_Rreturn S,E,I,R, Epd, Ipd, Rpd, xxbeta
Sanity check
Let’s first make sure the stochastic model above gives similar result to the deterministic model previously used in part 2 if we use an exponential distribution for \(T_{Latent}\) and \(T_{Infectious}\).
E → I
So let’s first set all individuals to exposed on day 0 and see the progression to I with exponential and gamma distributions.
Code
# Define parameters for stochastc modeldays =20p =10000num_E =10000num_I =0num_R =0beta_stoch =0.5*np.ones(days)# Comparing with previous deterministic modelinit =0, p, 0, 0sigma =1/5.2# 1/5 --> 5 days on average to go from E --> Ibeta_det =0.5gam =1/28.85# 1/11 --> 11 days on average to go from I --> Rparms = sigma, beta_det, gam# Run deterministic simulationresults_avg = seir_model(init, parms, days)# Run stochastic simulation with exponential distributionresults_stoch_exp = seir_model_stoch(beta_stoch, p, num_E, num_I, num_R, days, 'expon', 'expon')# Run stochastic simulation with gamma distributionresults_stoch_gam = seir_model_stoch(beta_stoch, p, num_E, num_I, num_R, days, 1, 'expon')
Code
fig = go.Figure(data=[ go.Scatter(name='Exponential', x=np.arange(len(results_stoch_exp[0])), y=100*(1-results_stoch_exp[1]/p), line={'dash':'dash', 'color':'red'}), go.Scatter(name='Gamma', x=np.arange(len(results_stoch_gam[0])), y=100*(1-results_stoch_gam[1]/p), line={'dash':'dash', 'color':'green'}), go.Scatter(name='Deterministic', x=np.linspace(0,days,days*10), y=100*(1-results_avg.T[1]/p), line={'dash':'dot', 'color':'blue'}), ])fig.update_layout( title='Number of E moving to I over time when all population is exposed on day 0', xaxis_title='Days', yaxis_title='Percent of exposed having become infectious', legend=dict( x=1, y=1, traceorder="normal", ))fig.show()
So we can see using the exponential distribution for \(T_{Latent}\) in our stochastic model very closely resembles the deterministic model from part 2.
We can see using the gamma distribution forces the behaviour of individual-level disease progression also.
I → R
Now let’s set all individuals to infectious on day 0 and see the progression to R with exponential, gamma, and Weibull distributions.
Code
# Define parameters for stochastc modeldays =100p =10000num_E =0num_I =10000num_R =0beta_stoch =0.5*np.ones(days)# Comparing with previous average deterministic modelinit =0, 0, p, 0sigma =1/5.2# 1/5 --> 5 days on average to go from E --> Ibeta_det =0.5gam =1/28.85# 1/11 --> 11 days on average to go from I --> Rparms = sigma, beta_det, gam# Run deterministic simulationresults_avg = seir_model(init, parms, days)# Run stochastic simulation with exponential distributionresults_stoch_exp = seir_model_stoch(beta_stoch, p, num_E, num_I, num_R, days, 1, 'expon')# Run stochastic simulation with gamma distributionresults_stoch_gam = seir_model_stoch(beta_stoch, p, num_E, num_I, num_R, days, 1, 'gamma')# Run stochastic simulation with gamma distributionresults_stoch_wei = seir_model_stoch(beta_stoch, p, num_E, num_I, num_R, days, 1, 'weibull')
Code
fig = go.Figure(data=[ go.Scatter(name='Exponential', x=np.arange(len(results_stoch_exp[0])), y=100*(1-results_stoch_exp[2]/p), line={'dash':'dash', 'color':'red'}), go.Scatter(name='Gamma', x=np.arange(len(results_stoch_gam[0])), y=100*(1-results_stoch_gam[2]/p), line={'dash':'dash', 'color':'green'}), go.Scatter(name='Weibull', x=np.arange(len(results_stoch_wei[0])), y=100*(1-results_stoch_wei[2]/p), line={'dash':'dash', 'color':'orange'}), go.Scatter(name='Deterministic', x=np.linspace(0,days,days*10), y=100*(1-results_avg.T[2]/p), line={'dash':'dot', 'color':'blue'}), ])fig.update_layout( title='Number of I moving to R over time when all population is infectious on day 0', xaxis_title='Days', yaxis_title='Percent of infectious having become recovered', legend=dict( x=1, y=1, traceorder="normal", ))fig.show()
So we can see using the exponential distribution for \(\gamma\) in our stochastic model very closely resembles the deterministic model from part 2.
We can see using the gamma or Weibull distributions forces the behaviour of individual-level disease progression also and results in a vastly different picture for progression from I → R.
Comparing deterministic with stochastic SEIR models
Now that we know our model works, let’s quickly see the effect of stochasticity on the model.
We use the deterministic model from blog pat 2 as basis, and so the stochastic model here will use exponential distributions for \(\sigma\) and \(\gamma\).
Code
# Define parameters for stochastic modeldays =200p =10000num_E =1num_I =0num_R =0beta_stoch =0.5*np.ones(days)# Define parameters for deterministic modelinit =1-(num_E/p)-(num_I/p)-(num_R/p), num_E/p, num_I/p, num_R/psigma =1/5.2# 1/5 --> 5 days on average to go from E --> Ibeta_det =0.5gam =1/28.85# 1/11 --> 11 days on average to go from I --> Rparms = sigma, beta_det, gam# Run deterministic simulationresults_avg = seir_model(init, parms, days)# Run 3 stochastic simulationsresults_stoch1 = seir_model_stoch(beta_stoch, p, num_E, num_I, num_R, days, 'expon', 'expon')results_stoch2 = seir_model_stoch(beta_stoch, p, num_E, num_I, num_R, days, 'expon', 'expon')results_stoch3 = seir_model_stoch(beta_stoch, p, num_E, num_I, num_R, days, 'expon', 'expon')
As you can see here, it is difficult to tell how much the gamma distributed \(\sigma\) differs from the exponential distributed model (other than just timing).
The infectious peak might be a little lower and delayed a bit with gama distribution, but it is hard to tell for sure from this.
The peak of exposed individuals seems to be a bit higher and delayed with gamma distribution versus exponential distribution.
\(\gamma\): exponential, gamma, or Weibull distribution
In this section we want to examine the effect of having \(T_{Infectious}\) be gamma or Weibull distribution on the SEIR model.
As you can see here, it is a lot easier to differentiate between the two.
A gamma distributed \(\gamma\) results in a higher peak of infectious people and underlines how using the usual deterministic models can vastly underestimate peak infectious people.
Overall both the gamma and Weibull distributions were very close to the actual distribution for COVID-19 \(T_{Infectious}\) so it makes sense that the simulations results in similar curbs here.
Impact of distribution of \(T_{Infectious}\) on Infectious Peak
In the plots above we can see the peak of infectious individuals is higher in the simulations done with Gamma or Weibull distributions than in those done with the exponential distribution.
Note we have not changed anything for \(\beta\) and in the simulations above we have the following:
So while we have a higher \(R_0\) when using the exonential distribution for \(T_{Infectious}\), the peak of infectious individuals is lower than in the simulations using gamma and Weibull distributions with lower \(R_0\).
We had previously seen that increasing \(R_0\) resulted in high infectious peaks, but this is only true when comparing similar distributions.
Discussion
We can see the actual distribution of \(\sigma\) and \(\gamma\) carry importance in the resulting SEIR models.
\(R_0\)
In part 1 we saw that \(R_0\) was fully characterized by \(\beta\) and \(\gamma\) in the sense that
\[R_0 = \frac{\beta}{\gamma}\]
We can clearly see here however that \(R_0\) is not a good enough measure the indicate peak infectious individuals - which is closely related to the peak number of sick individuals which in turn determines required sanitary resources.
The actual distribution of \(T_{Infectious}\) mus tbe taken into account to estimate true values of peaks.
Further questions
A couple questions are left to be answered:
How can we control the spread of an epidemic?
How can we evaluate \(\beta\) from the data collected on a population level?