Motivation for write-up

This is the 5th part of a multi-part series blog post on modeling in epidemiology.

The COVID-19 pandemic has brought a lot of attention to the study of epidemiology and more specifically to the various mathematical models that are used to inform public health policies. Everyone has been trying to understand the growth or slowing of new cases and trying to predict the necessary sanitary resources. This blog post attempts to explain the foundations for some of the most used models and enlighten the reader on two key points.

The 1st part of the blog series showed an epidemic could occur when $R_0 > 1$ and that it was fully characerized by the average $\beta$ and the average $T_{Infectious}$.

We have also seen that the higher $R_0$, the faster and the higher the peak infectious will be.

The latest blog post however showed the importance of the distribution of $T_{Infectious}$ to simulate the SEIR model and how it impacted the spread of the disease and the peak of infectious individuals. Even with lower $R_0$ values, the infectious peak were higher when using Gamma or Weibull distributions for $T_{Infectious}$.

This 5th installment examines this discrepancy further.

#collapse_hide
!pip install plotly==4.6.0
import pandas as pd
import numpy as np
import math
import plotly.graph_objects as go
import plotly.express as px
from scipy.stats import expon
from scipy.stats import gamma
from scipy.stats import weibull_min
from numpy.random import default_rng
rng = default_rng()

# Let's build a numerical solution
def seir_model(init, parms, days):
    S_0, E_0, I_0, R_0 = init
    Epd, Ipd, Rpd = [0], [0], [0]
    S, E, I, R = [S_0], [E_0], [I_0], [R_0]
    dt=0.1
    t = np.linspace(0,days,int(days/dt))
    sigma, beta, gam = parms
    for _ in t[1:]:
        next_S = S[-1] - beta*S[-1]*I[-1]*dt
        Epd.append(beta*S[-1]*I[-1]*dt)
        next_E = E[-1] + (beta*S[-1]*I[-1] - sigma*E[-1])*dt
        Ipd.append(sigma*E[-1]*dt)
        next_I = I[-1] + (sigma*E[-1] - gam*I[-1])*dt
        Rpd.append(gam*I[-1]*dt)
        next_R = R[-1] + (gam*I[-1])*dt
        S.append(next_S)
        E.append(next_E)
        I.append(next_I)
        R.append(next_R)
    return np.stack([S, E, I, R, Epd, Ipd, Rpd]).T
Requirement already satisfied: plotly==4.6.0 in /usr/local/lib/python3.6/dist-packages (4.6.0)
Requirement already satisfied: retrying>=1.3.3 in /usr/local/lib/python3.6/dist-packages (from plotly==4.6.0) (1.3.3)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from plotly==4.6.0) (1.12.0)

#collapse_hide
# Need this new function for model below:
def make_df(p,num_E, num_I, num_R):
  df = pd.DataFrame(np.full((p,1), 'S').T[0], columns=['State'])
  df['Day'] = 0
  tochange=df.loc[rng.choice(p, size=num_E+num_I+num_R, replace=False),'State'].index
  df.loc[tochange[0:num_E],'State'] = 'E'
  df.loc[tochange[num_E:num_I+num_E],'State'] = 'I'
  df.loc[tochange[num_E+num_I:num_E+num_I+num_R],'State'] = 'R'
  return df

#collapse_hide
def seir_model_stoch(beta, p, num_E, num_I, num_R, days, T_Latent, T_Infectious):

    # Initialize population dataframe with data given by user
    df = make_df(p,num_E, num_I, num_R)
    
    # This variable is used to track daily value of beta if it varies over time
    xxbeta=np.array([],dtype=float)

    # Initialize the arrays to return
    # Below are numbers of S, E, I, R total
    S=np.array([],dtype=int)
    E=np.array([],dtype=int)
    I=np.array([],dtype=int)
    R=np.array([],dtype=int)
    # Below are the daily additions in S, E, I, R
    Spd=np.array([],dtype=int)
    Epd=np.array([],dtype=int)
    Ipd=np.array([],dtype=int)
    Rpd=np.array([],dtype=int)

    b=beta
    
    # Stochastic model so use random values to decide on progression
    rand = np.random.random(size=(p,days))

    # Depending if you want exponential or gamma distribution for T_Latent
    if T_Latent == 'expon':
      EtoI = expon.rvs(loc=0,scale=5.2,size=p)
    else:
      EtoI = gamma.rvs(1.8,loc=0.9,scale=(5.2-1.8)/0.9,size=p)

    # Depending if you want exponential, gamma, or Weibull distribution for T_Infectious
    # Uses distributions found on blog part 3
    if T_Infectious == 'expon':
      ItoR = expon.rvs(loc=0,scale=28.85,size=p)
    elif T_Infectious == 'gamma':
      ItoR = gamma.rvs(4,loc=3,scale=4.25,size=p)    
    else:
      ItoR = weibull_min.rvs(2.3, loc=2, scale=20.11, size=p)

    # Iterate over every day the simulation is run
    for j in range(0,days-1):

        # Record daily beta values
        xxbeta=np.append(beta, b)

        # First we get the index of the individuals that will change state today:

        # Random number tells you which 'S' have been exposed on this day 
        StoE_index = df.loc[(df.State == 'S') & (rand[:,j] < b[j]*len(np.where(df.State=='I')[0])/p)].index

        # For each row, if a person has been a certain number of days in E, they will go to I
        # This follows EtoI variable which is either exponential or gamma distributed according to above
        EtoI_index = df.loc[(df.State == 'E') & (j-df.Day >= EtoI)].index
        
        # Similaraly as above
        # For each row, if a person has been a certain number of days in I, they will go to R
        # This follows EtoI variable which is either exponential or gamma distributed according to above
        ItoR_index = df.loc[(df.State == 'I') & (j-df.Day >= ItoR)].index

        # Use indexes collected above to populate per day values
        Epd = np.append(Epd,len(StoE_index))
        Ipd = np.append(Ipd,len(EtoI_index))
        Rpd = np.append(Rpd,len(ItoR_index))

        # Now we use the indexes collected above randomly to change the actual population dataframe to the new states
        df.loc[ItoR_index, 'State'] = 'R'
        df.loc[EtoI_index, 'State'] = 'I'
        df.loc[StoE_index, 'State'] = 'E'
        df.loc[ItoR_index, 'Day'] = j
        df.loc[EtoI_index, 'Day'] = j
        df.loc[StoE_index, 'Day'] = j
        df.loc[ItoR_index, 'DayR'] = j
        df.loc[EtoI_index, 'DayI'] = j
        df.loc[StoE_index, 'DayE'] = j
        
        # Append the S, E, I, and R arrays
        S=np.append(S,len(np.where(df.State=='S')[0]))
        E=np.append(E,len(np.where(df.State=='E')[0]))
        I=np.append(I,len(np.where(df.State=='I')[0]))
        R=np.append(R,len(np.where(df.State=='R')[0]))

        # Code below for control measures to reduce beta values
#        if ((I[-1] > 1000) & (Ipd[-1] > 399)): 
#            b = beta2
#        elif ((I[-1] > 1000) & (Ipd[-1] < 400)): 
#            b = beta3
                
    Epd[0]+=num_E
    Ipd[0]+=num_I
    Rpd[0]+=num_R

    return S,E,I,R, Epd, Ipd, Rpd, xxbeta, df

Peak of infectious individuals

Let's first try to characterize when the peak of infectious indiviuals occurs in the SEIR model.

The peak of infectious individuals occurs when the number of individuals that recover per day ($R_{pd}$) becomes greater than the number of new infectious individuals per day ($I_{pd}$), i.e. when: $$R_{pd} \geq I_{pd}$$

Convolution and daily numbers

Mathematical derivation of $I_{pd}$:

In the stochastic model, the number of new infectious individuals per day is the sum of the daily new exposures per day of the previous days multiplied by the probability that they become infectious after so many days.

We write: $$I_{pd}[j] = \sum_{n_L=0}^{M_L-1}h_L[n_L]~E_{pd}[j-n_L]$$

Mathematically, this means $I_{pd}[j]$ is the result of convolution of $E_{pd}$ and an impulse response $h_L[n]$ where $h_L[n]$ describes the distribution of $T_{Latent}$, and we can write: $$I_{pd}[j] = h_L[j]\circledast E_{pd}[j]$$

From this derivation, we can see the $I_{pd}$ depends on the distribution of $T_{Latent}$ and $E_{pd}$ (the number of new exposures per day) of the previous days.

When initial conditions are the same, this means $T_{Infectious}$ has no impact on $I_{pd}$ and so the rate of new infectious individuals will look the same between the two models (exponential or gamma distributed $T_{Infectious}$).

Mathematical derivation of $R_{pd}$:

We can similarly describe $R_{pd}[j]$ as the result of convolution of $I_{pd}$ and an impulse response $h_I[n]$ where $h_I[n]$ describes the distribution of $T_{Infectious}$.

In other words: $$R_{pd}[j] = \sum_{n_I=0}^{M_I-1}h_I[n_I]~I_{pd}[j-n_I] = h_I[j]\circledast I_{pd}[j]$$

When does the peak occur?

It occurs on day j where the threshold below is reached: $$R_{pd}[j] = I_{pd}[j]$$ $$\leftrightarrow \sum_{n_I=0}^{M_I-1}h_I[n_I]~I_{pd}[j-n_I] = \sum_{n_L=0}^{M_L-1}h_L[n_L]~E_{pd}[j-n_L]$$

While this is complicated to solve analytically, we can look at the distribution of $T_{Infectious}$ and get some clues as to what might happen with different distributions.

$I_{pd}$ and $R_{pd}$ for $T_{Infectious} \sim Exp(\lambda)$ vs. $T_{Infectious} \sim Weibull(\lambda, k,\gamma)$

$T_{Infectious} \sim Exp(28.85)$ vs. $T_{Infectious} \sim Weibull(2.3, 20.11, 2)$

Let's first have another look at the difference between these two distributions:

#collapse_hide
locw=2
wk = 2.3
wl = (20-locw)/(math.log(2)**(1/wk))

loce=0
scalee=28.85-loce

p=10000

df = pd.DataFrame({
    'Exponential': expon.rvs(loc=loce, scale=scalee,size=p),
    'Weibull': weibull_min.rvs(wk, loc=locw, scale=wl,size=p)
    })

fig = px.histogram(df.stack().reset_index().rename(columns={"level_1": "Distribution"}), x=0, color="Distribution", marginal='box')
fig.update_layout(
    title={
        'text':'Exponential vs. Weibull distributions',
        'x':0.5,
        'xanchor':'center'
    },
    barmode='overlay',
    xaxis_title='Days',
    yaxis_title='Count',
    legend=dict(
        x=1,
        y=0,
        traceorder="normal",
    )
)
fig.show()

Impact on $I_{pd}$ $R_{pd}$:

Let's see how the two distributions of $T_{Infectious}$ result in different curbs for $I_{pd}$ and $R_{pd}$.

#collapse_hide
# Define parameters for stochastic model
days = 200
p = 10000
num_E = 1
num_I = 0
num_R = 0
beta_stoch = 0.5*np.ones(days)

# Run 2 stochastic simulations, 1 with exponential gamma, 1 with weibull gamma
results_stoch0 = seir_model_stoch(beta_stoch, p, num_E, num_I, num_R, days, 1, 'expon')
results_stoch1 = seir_model_stoch(beta_stoch, p, num_E, num_I, num_R, days, 1, 1)

#collapse_hide
fig = go.Figure(data=[    
    go.Scatter(name='Ipd_Exp', x=np.arange(len(results_stoch0[0])), y=results_stoch0[5], line={'dash':'dashdot', 'color':'red'}, legendgroup="stoch4"),
    go.Scatter(name='Rpd_Exp', x=np.arange(len(results_stoch0[0])), y=results_stoch0[6], line={'dash':'dashdot', 'color':'green'}, legendgroup="stoch4"),
    go.Scatter(name='Ipd_Weibull', x=np.arange(len(results_stoch1[0])), y=results_stoch1[5], line={'dash':'solid', 'color':'red'}, legendgroup="stoch3"),
    go.Scatter(name='Rpd_Weibull', x=np.arange(len(results_stoch1[0])), y=results_stoch1[6], line={'dash':'solid', 'color':'green'}, legendgroup="stoch3")
])

fig.update_layout(
    xaxis_title = 'Day',
    yaxis_title = 'Count',
    title={
        'text':r'$\text{Effect of Exponential vs. Weibull distributed } T_{Infectious} \text{ on } I_{pd} \text{ and } R_{pd}$',
        'x':0.5,
        'xanchor':'center'
    }
)

fig.show()

Analysis:

As discussed above, we can see here $I_{pd}$ are very similar between the two models.

However, there are two key takeaways from these graphs:

  1. Rise of total infectious individuals:

The Exponential distribution leads to more recoveries early on as the $R_{pd}$ for the Exp distribution leads the $R_{pd}$ of the Weibull distribution by ~6 days in the earl recovery period.

Intuitively, we can thus understand that total infectious people will increase slower with the Eponential distribution than the Weibull distribution (since more people recover faster in the Exp model), and so the peak of infectious individuals will be earlier with the Weibull distribution.

  1. Recoveries:

While the peak appears faster with a Weibull distribution, we also see that after the initial delay, the Weibull distribution leads to larger number of recoveries and so the total number of infectious individuals will tend to 0 much quicker than with the Exp distribution.

Discussion

While $R_0$ seemed to be a good measure of the spread of disease and a good predictor of the peak of infectious individuals in a population, we find here that the actual distribution of $T_{Infectious}$ plays a crucial role in detemining the dynamics of the SEIR model.

That is to say that while estimating $R_0$ is important in times of epidemics or pandemics, finding the actual distributions of $T_{Latent}$ and $T_{Infetious}$ are equally important in predicting the impact of the disease.