Motivation for write-up

The real-world motivation for this write-up can be found under Story Time section, but I first wanted to give a bit of theoretical background here.

The importance of testing has been greatly talked about these last few weeks/months with the emergence of the COVID-19 pandemic with numerous articles being published, all underlining the importance of testing. The part emphasized is the fact that early testing allows for quick isolation of sick individuals and tracing of their potential contacts, and thus limiting the potential for spread.

The kind of test for this are called virologic testing and test directly for the presence of virus in an individual (active infection). This is done with Nucleic Acid Tests, or NAT, usually after amplification of the very small amount of genetic material present via Polymerase Chain Reaction. Results are available within hours or days and require diagnostic machinery and specialists.

Knowing who has been infected is also important as it could allow already recovered patients (who are thought to gain immunity from COVID-19) to return safely to work and live basically normally. Tests that check for past infections exist, and are called serology or antibody tests. They check for specific antibodies that match those deveopped during an immune response response against SARS-CoV-2.

This is all good in theory, but with a disease that can cause such serious conditions as COVID-19 can, we need to be sure a positive test means for certain that person is now immmune, or we risk allowing individuals with false positives to return to normal when they should not, and continue the damaging spread of the disease.

The aim of this short right-up is to clear up some misconceptions around testing protocols, discuss the importance of false positives, false negatives, and its importance to guiding public health policies. The idea is basically to answer the following questions:

  • How many tests should return positive for a person to be, say 95% or 99% person sure he is now immune?
  • What if a different test is negative?

Specificity, Sensitivity, False positives, False negatives?

As briefly explained above, neither virological and serological tests are infallible. False positives i.e. healthy individuals with a positive test, and false negatives i.e. infected indiviuals with negative tests, can, and do happen.

There are numerous reasons how and why this can happen, but that is not the point of this write-up. Here, we acknowledge the fact non-perfect tests are a reality and establish testing protocol to deal with that fact.

Thankfully, before being shipped out, the various laboratories test their tests. They are able to characterize them rather precisely and give an indiction of how useful they may be with two important values:

  • Specificity
  • Sensitivity

Specificity

Specificity is the true negative rate - i.e. the percentage of healthy people correctly identified as such (for antibody testing, it is the percentage of people not having antibodies correctly identified as such).

In other words, if a test was used on 100 people who do not have antibodies, the number of people correctly identified as not hvaing antibodies is the specificity.

A perfect test with 100% specificity, means there are no false positives. This has major implications in the current context of COVID-19 pandemic as having an anitbody test with 100% specificity would allow immune people to know so for certain (as long as research showed antibodies gave immunity).

Mathematically, we pose specificity as follows:

$Specificity = \frac{True\ negatives}{True\ negatives + False\ posiives}$

Sensitivity

Sensitivity is the true positive rate - i.e. the percentage of infected people correctly identified as such (for antibody tests, it is the percentage of people having antibodies correctly identified as such).

In other words, if an antibody test was used on 100 people with antibodies, the number of people correctly identified as having anitbodies is the sensitivity.

A perfect test with 100% sensitivity, means there are no false negatives.

Mathematically, we pose specificity as follows:

$Sensitivity = \frac{True\ positives}{True\ positives + False\ negatives}$

Prevalence

Prevalence is simply the proportion of a population that has a certain characteistic. In the current context of antibody testing, the prevalence will be defined as the proportion of people who have antibody conferring immunity to COVID-19 (i.e. the proportion that has had the disease).

$Prevalence = \frac{\#\ People\ with\ antibodies}{Total\ number\ of\ people}$

Where $Total\ number\ of\ people$ is simply $\#\ People\ with\ antibodies + \# People\ without\ antibodies$

Story time - Part 1

Specificity, sensitivity, prevalence, false negatives, false positives.. This is all good, but it can be a bit abstract outside of a specific testing context.

Let's use the current COVID-19 pandemic as an example.

Antibody tests are finally becoming available to the general population, and you want to know if you've had the disease (developped antibodies against it).

  • Now let's say you had influenza like symptoms back in January or February, would you expect a positive or negative result on the test?
  • What if you haven't been sick but want to check out of curiosity, what result would you expect?
  • If it does come back positive, how certain would you be that you actually have those antibodies and it wasn't a false positive?
  • You decide to use a second test to make sure, again it comes positive. Now how certain are you that you have antibodies?
  • Out of extreme precaution you decide to try a test from another laboratory (different specificity and sensitivity), and this time the test comes back negative. It's become a bit more complex to evaluate your situation now.
  • So how about another test from this second laboratory? Again, negative.. Two positives, two negatives - what can you make of this information?

However far fetched this scenario may seem, it is exactly what happened to this Florida physician:

There are two questions that come out of this story:

  • After those 4 tests, what is the probability that Dr. Antevy has those antibodies - or more generally, can we calculate the probability of someone having antibodies given their test results?
  • What should be the threshold of such a probability to minimize the risk of someone without antibodies going out in nature thinking he does ? (obviously if someone has 10 positive tests in a row, it seems sure enough that person has antibodies) This pushes for the need of rigorous testing protocol.

Calculating probabilites given test results

Clearly, our objective is to calculate the probability that a person has antibodies, or:

$P(seropositive)$

Conditional probabilities

Baye's theorem describes probabilities when given evidence.

Say a person has had some COVID-19 symptoms (dry cough, fever, loss of smell, slight fever) a few weeks ago. He might say there is a 75% chance that he had contracted COVID-19, and 25% chance it was another disease. In this case:

$P(seropositive) = 0.75$

Now this person goes to get an antibody test. What is the probability he is seropositive given a positive or negative result? Baye's theorem allows us to write it as follows:

$P(seropositive\ |\ positive\ test) = \frac{P(positive\ test\ |\ seropositive)\ *\ P(seropositive)}{P(positive\ test)}$

and

$P(seropositive\ |\ negative\ test) = \frac{P(negative\ test\ |\ seropositive)\ *\ P(seropositive)}{P(negative\ test)}$

Note:

$P(seropositive)$ is called the prior.

$P(seropositive\ |\ positive\ test)$ and $P(seropositive\ |\ negative\ test)$ are called the posterior.

$P(Positive\ test)$

Let's have a look at the probability of getting a positive test - there are 2 ways to get a positive result :

  • A false positive
  • A true positive

$P(False\ positive) = P(Positive\ test\ |\ seronegative)*P(seronegative)$

And

$P(True\ positive) = P(Positive\ test\ |\ seropositive)*P(seropositive)$

So:

$P(Positive\ test) = P(Positive\ test\ |\ seropositive)*P(seropositive) + P(Positive\ test\ |\ seronegative)*P(seronegative)$

Sensitivity and Specificity revisited

Earlier we saw:

$Sensitivity = \frac{True\ positives}{True\ positives + False\ negatives}$

And that

$Specificity = \frac{True\ negatives}{True\ negatives + False\ positives}$

But we can rewrite these equations as follows:

$Sensitivity = P(Positive\ test\ |\ seropositive)$

And

$Specificity = P(Negative\ test\ |\ seronegative) = 1-P(Positive\ test\ |\ seronegative)$

Re-writing the posterior probability

Using Baye's rule and the calculations above we can re-write the posterior equations as follows:

$P(seropositive\ |\ Positive\ test) = \frac{Sensitivity*P(seropositive)}{Sensitivity*P(seropositive)+ (1-Specificity)*(1-P(seropositive))}$

And:

$P(seronegative\ |\ Negative\ test) = \frac{Specificity*(1-P(seropositive))}{Specificity*(1-P(seropositive))+(1-Sensitivity)*P(seropositive)}$

The role of prevalence in these calculations

The equations above describe the probability for an individual given a test result and their prior probability. This prior probability can be estimated given presence or not of symptoms, contact with other infected individuals, location, other diagnostics, etc...

However, on a population level, if we were to test a random individual, this prior becomes the prevalence and for a random individual, the equations become:

$P(seropositive\ |\ Positive\ test) = \frac{Sensitivity*Prevalence}{Sensitivity*Prevalence+(1-Specificity)*(1-Prevalence)}$

And:

$P(seronegative\ |\ Negative\ test) = \frac{Specificity*(1-Prevalence)}{Specificity*(1-Prevalence)+(1-Sensitivity)*Prevalence}$

Serology testing simulation

Let's see what these equations look like in practice.

#collapse_hide
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

#collapse_hide
# Let's write a function to output the posterior probability given prior, test result, and test characteristics (sensitivity and specificity)
def Pposterior(Pprior, test_res, Sn, Sp):
  if test_res:
    return ((Sn * Pprior) / (Sn * Pprior + (1-Sp) * (1-Pprior)))
  else:
    return (1-((Sp * (1-Pprior))/(1-(Sn * Pprior + (1-Sp) * (1-Pprior)))))

Say we have an antibody test with 90% sensitivity and 90% specificity - meaning we have 90% true positives and 90% true negatives, we obtain a graph as below:

#collapse_hide

# Below is the prior probability of being infected:
num=10000
Pprior = np.linspace((1/num),(num-1)/num,num=num)

# Graph the results
fig = go.Figure(data=[
    go.Scatter(name='Test negative', x=100*Pprior, y=100*Pposterior(Pprior, False, 0.9, 0.9), line_color="green"),
    go.Scatter(name='Test positive', x=100*Pprior, y=100*Pposterior(Pprior, True, 0.9, 0.9), line_color="red"),
    go.Scatter(name='No test', x=100*Pprior, y=100*Pprior, line_color="blue")
])

fig.update_layout(
    xaxis_title = 'Prior probability of being infected',
    yaxis_title = 'Posterior probability of being infected given test result<br>Specificity=90.0<br>Sensitivity=90.0'
)

fig.show()