Predictions of COVID-19 development with a focus on India

I have been tracking the COVID-19 pandemic, as most of us have. This is the first pandemic our generation has seen. And it is a pandemic that is taking place in a world that is dramatically different from the one the Spanish Flu, plague and cholera epidemics took place in.

In our data rich world, Johns Hopkins University has been putting together a very comprehensive global dataset for COVID-19, which is updated every day. I pulled the dataset upto 27-Mar-2020 and chose to look at the countries that are relevant to me. This doesn’t imply that other countries are not important, but analysing data for a whole lot of countries all at once is rarely insightful. There are too many differences between countries and cultures that one cannot keep track of, or have sufficient information of at once. I also chose these countries because they represent an archetype for the different ways in which the COVID-19 pandemic was/is being handled.

Here is my list :

  1. Germany – a country that is managing the pandemic well, keep their mortality very low, and performing in excess of 0.5 million tests every week
  2. Spain and Italy – countries that managed the pandemic badly, to the point that their current response has to be drastic
  3. US – a country that had a half-hearted response to the pandemic, but will probably have the resources to combat it
  4. India – a very large country that has instituted very drastic measures from the very beginning. This is also the only country that is at the beginning of the curve, and has not experienced exponential growth yet. Also my native country.
  5. South Korea – a country that has passed the exponential phase and has managed to successfully control the pandemic.

I did not include China, because the numbers available may be conflicting. China has also largely handled its problem in a way that  is unlikely for democratic countries to emulate. Most countries will not be able to build hospitals, impose draconian restrictions or allow deaths as reports from China suggest.

At this point I should put out an important disclaimer. I have described my workflow and sources. Still, this is an exercise for curiosity performed within a Saturday morning. It is not an epidemiological trial, and the conclusions, figures and numbers presented here should not be treated as the result of rigorous scientific study.

All right, now to business. I simply plotted the cumulative number of infections in the countries I chose. It is obvious that the all countries except South Korea are still in the “growing phase”, while India is at the very beginning, with less than 1000 infected individuals.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Next, I plotted all these curves on 1 graph. Again, we see that South Korea has managed to “flatten the curve” and India has not even begun the exponential phase.

 

The circles indicate data points for every country, for every day since 22-JAN-2020 right upto 27-MAR-2020.

Fitting the Data

The data in hand it was time to look for an equation to fit the data.

The idea is that if one can find a mathematical expression that accurately represents the data for the period from of 66 days from 22-JAN-2020 to 27-MAR-2020, we can use that expression to calculate what will happen in the future.

I chose a particular equation called the Hill Function. This equation is very commonly used by biochemists and so far, I am aware of its uses only in the field of molecular biology.

Still, at its core, the equation represents a process where something happening, increases the chance of that thing happening even more. This is what happens in an epidemic as well. The more people are infected, the more is the chance that even greater number of people will be infected. The Hill equation is very good at capturing this process.

The modified and generalized form of the equation is as follows:

START and END represents the number of people who are infected at the beginning and the end of the epidemic.

The first parameter is k.
This indicates how fast the disease is spreading. In our case, the value of k is the number of days it will take have half the people who will have the disease to be infected. The higher is the value of k, the slower the epidemic progresses.

Th other more interesting parameter is n. This is called the “co-operativity” or the “Hill Coefficient”. It indicates how bad the situation is in terms of already infected people , passing on the infection to even more people. If n = 1, it means every person only randomly catches the disease, and pre-existing diseased people do not worsen the situation. An example of this is diabetes – having people with diabetes in the population does not increase the chance of other people getting it. This is of course, not the case for our epidemic which is caused by a virus that spreads. Contagious diseases will have n > 1. If n is greater than 1, then it means that people are transmitting the infection. The higher the value of n, the greater is the transmissivity of the disease in a population.

Here is an example of what the curves of the Hill equation look like:

What happens when we use this equation to fit our data from the 6 countries for the number of infected individuals?

The graph below shows the fit of the Hill equation to the infected individuals. The dots indicate the actual number reported, while the line indicates what the Hill equation says should happen. As you can see, the Hill equation is very accurate in correctly calculating the number of infected individuals.

Germany

India

Korea, South

Spain

US

Italy

Global

Number of Points

396

Degrees of Freedom

378

Reduced Chi-Sqr

154717.78322

Residual Sum of Squares

1.50433E7

8811.01615

1844879.03185

1.29669E7

1.61134E7

1.2506E7

5.84833E7

R-Square (COD)

0.99819

0.99611

0.99803

0.99902

0.99937

0.99963

0.99933

Adj. R-Square

0.9993

Fit Status Succeeded(100) Succeeded(100) Succeeded(100) Succeeded(100) Succeeded(100) Succeeded(100) Succeeded(100)

This table shows how well the Hill-Equation can capture the data. The value of R-square is 0.99 for all countries, which means that the equation accurately describes 99% of the daily changes in the number of infected individuals for all the countries. That’s great!

Predictions

Now that we have captured the rise in number of infections in a mathematical equation, we can use it to calculate what happens. Here are the fit results for the different countries, calculated for a total of 200 days. The original data, fit and how the curve develops over time is shown.

START

END

k

n

Value

Standard Error

Value

Standard Error

Value

Standard Error

Value

Standard Error

Germany

0

0

109587.55071

7504.80604

66.98877

0.62685

13.00089

0.30875

India

0

0

625859.1801

1.33457E9

123.18645

25123.61701

10.51675

16.41077

Korea, South

0

0

8856.12544

126.49043

41.26855

0.2563

11.49088

0.71672

Spain

0

0

294368.42324

42164.10098

73.17473

1.23983

12.04181

0.23792

US

0

0

274574.14203

14541.97769

67.93991

0.32002

18.63325

0.26068

Italy

0

0

146147.21166

3080.04576

63.62672

0.26983

9.91061

0.11151

The first important number is the END value. This tells us how many people will end up being infected by the very end of the epidemic. The value of k tells us whether we are at the half-way point in the epidemic. These numbers agree with what we already know. The value of n tells us how badly the disease is spreading.

For example, South Korea will have the least number of infections at only 8856 individuals, since they have already flattened the curve, and passed the half-way point at day 41 (they were one of the earlier infected countries). In the next 2 weeks, the epidemic will be over in S. Korea.

We also know that Germany has controlled the epidemic relatively well so far …they will end with 109,587 infections give or take 7500, and they are exactly at the half-way point at Day 67 (28-MAR-2020). This means that the epidemic in Germany will continue for 2 more months, at which point the curve will be flat.

Clearly, Spain and Italy did a worse job of controlling the epidemic. Spain will both end up with almost 300,000 infections and will be able to flatten the curve almost a week later than Germany. Italy has responded now so drastically, that they will be able to flatten the curve in a couple of months along with Germany, with almost 150,000 infections and a very high price in mortality because of ignoring the epidemic in the beginning.

Now the more alarming news. The US will also have greater than 250,000 infections but their measures will allow them to catch up. The high number of infections is understandable, it is a country with nearly 4 times the population of the Germany, Spain, Italy and S. Korea. But they have a co-operativity value of 18! This is much higher than other countries, indicating that the US is has not taken quarantine and lockdowns seriously so far.

And what about India? India has not suffered the brunt of the disease yet. The epidemic is just beginning in India and things are uncertain yet. The prediction is that there will be 625,000 infections in India…but the uncertainty means that it could go up to 1.3 billion (in the extremely unlikely worst case). The lockdown and quarantine Is working, and the disease is spreading quite slowly in India (n = 10), even slower than it spread in Europe, the US and S. Korea. But Indians should know that it is still early days, and the half-way point for us is still 57 days, or 8 weeks away. We expect to flatten the curve 16 weeks from now if things go as planned.

I plan to see in the coming weeks how valid this analysis turns out to be. As the epidemic proceeds and more dtaa points become available, the uncertainty shoukd reduce significantly.

 

Comments are welcome.

 

Data Sources

CSSEGISandDATA GOVID-19 : https://github.com/CSSEGISandData/COVID-19

This data can be visualized on an ArcGIS Dashboard.

0 0 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments