I have been tracking the COVID19 pandemic, as most of us have. This is the first pandemic our generation has seen. And it is a pandemic that is taking place in a world that is dramatically different from the one the Spanish Flu, plague and cholera epidemics took place in.
In our data rich world, Johns Hopkins University has been putting together a very comprehensive global dataset for COVID19, which is updated every day. I pulled the dataset upto 27Mar2020 and chose to look at the countries that are relevant to me. This doesn’t imply that other countries are not important, but analysing data for a whole lot of countries all at once is rarely insightful. There are too many differences between countries and cultures that one cannot keep track of, or have sufficient information of at once. I also chose these countries because they represent an archetype for the different ways in which the COVID19 pandemic was/is being handled.
Here is my list :
 Germany – a country that is managing the pandemic well, keep their mortality very low, and performing in excess of 0.5 million tests every week
 Spain and Italy – countries that managed the pandemic badly, to the point that their current response has to be drastic
 US – a country that had a halfhearted response to the pandemic, but will probably have the resources to combat it
 India – a very large country that has instituted very drastic measures from the very beginning. This is also the only country that is at the beginning of the curve, and has not experienced exponential growth yet. Also my native country.
 South Korea – a country that has passed the exponential phase and has managed to successfully control the pandemic.
I did not include China, because the numbers available may be conflicting. China has also largely handled its problem in a way that is unlikely for democratic countries to emulate. Most countries will not be able to build hospitals, impose draconian restrictions or allow deaths as reports from China suggest.
At this point I should put out an important disclaimer. I have described my workflow and sources. Still, this is an exercise for curiosity performed within a Saturday morning. It is not an epidemiological trial, and the conclusions, figures and numbers presented here should not be treated as the result of rigorous scientific study.
All right, now to business. I simply plotted the cumulative number of infections in the countries I chose. It is obvious that the all countries except South Korea are still in the “growing phase”, while India is at the very beginning, with less than 1000 infected individuals.
Next, I plotted all these curves on 1 graph. Again, we see that South Korea has managed to “flatten the curve” and India has not even begun the exponential phase.
The circles indicate data points for every country, for every day since 22JAN2020 right upto 27MAR2020.
Fitting the Data
The data in hand it was time to look for an equation to fit the data.
The idea is that if one can find a mathematical expression that accurately represents the data for the period from of 66 days from 22JAN2020 to 27MAR2020, we can use that expression to calculate what will happen in the future.
I chose a particular equation called the Hill Function. This equation is very commonly used by biochemists and so far, I am aware of its uses only in the field of molecular biology.
Still, at its core, the equation represents a process where something happening, increases the chance of that thing happening even more. This is what happens in an epidemic as well. The more people are infected, the more is the chance that even greater number of people will be infected. The Hill equation is very good at capturing this process.
The modified and generalized form of the equation is as follows:
START and END represents the number of people who are infected at the beginning and the end of the epidemic.
The first parameter is k.
This indicates how fast the disease is spreading. In our case, the value of k is the number of days it will take have half the people who will have the disease to be infected. The higher is the value of k, the slower the epidemic progresses.
Th other more interesting parameter is n. This is called the “cooperativity” or the “Hill Coefficient”. It indicates how bad the situation is in terms of already infected people , passing on the infection to even more people. If n = 1, it means every person only randomly catches the disease, and preexisting diseased people do not worsen the situation. An example of this is diabetes – having people with diabetes in the population does not increase the chance of other people getting it. This is of course, not the case for our epidemic which is caused by a virus that spreads. Contagious diseases will have n > 1. If n is greater than 1, then it means that people are transmitting the infection. The higher the value of n, the greater is the transmissivity of the disease in a population.
Here is an example of what the curves of the Hill equation look like:
What happens when we use this equation to fit our data from the 6 countries for the number of infected individuals?
The graph below shows the fit of the Hill equation to the infected individuals. The dots indicate the actual number reported, while the line indicates what the Hill equation says should happen. As you can see, the Hill equation is very accurate in correctly calculating the number of infected individuals.
Germany 
India 
Korea, South 
Spain 
US 
Italy 
Global 

Number of Points 
396 

Degrees of Freedom 
378 

Reduced ChiSqr 
154717.78322 

Residual Sum of Squares 
1.50433E7 
8811.01615 
1844879.03185 
1.29669E7 
1.61134E7 
1.2506E7 
5.84833E7 
RSquare (COD) 
0.99819 
0.99611 
0.99803 
0.99902 
0.99937 
0.99963 
0.99933 
Adj. RSquare 
0.9993 

Fit Status  Succeeded(100)  Succeeded(100)  Succeeded(100)  Succeeded(100)  Succeeded(100)  Succeeded(100)  Succeeded(100) 
This table shows how well the HillEquation can capture the data. The value of Rsquare is 0.99 for all countries, which means that the equation accurately describes 99% of the daily changes in the number of infected individuals for all the countries. That’s great!
Predictions
Now that we have captured the rise in number of infections in a mathematical equation, we can use it to calculate what happens. Here are the fit results for the different countries, calculated for a total of 200 days. The original data, fit and how the curve develops over time is shown.
START 
END 
k 
n 

Value 
Standard Error 
Value 
Standard Error 
Value 
Standard Error 
Value 
Standard Error 

Germany 
0 
0 
109587.55071 
7504.80604 
66.98877 
0.62685 
13.00089 
0.30875 
India 
0 
0 
625859.1801 
1.33457E9 
123.18645 
25123.61701 
10.51675 
16.41077 
Korea, South 
0 
0 
8856.12544 
126.49043 
41.26855 
0.2563 
11.49088 
0.71672 
Spain 
0 
0 
294368.42324 
42164.10098 
73.17473 
1.23983 
12.04181 
0.23792 
US 
0 
0 
274574.14203 
14541.97769 
67.93991 
0.32002 
18.63325 
0.26068 
Italy 
0 
0 
146147.21166 
3080.04576 
63.62672 
0.26983 
9.91061 
0.11151 
The first important number is the END value. This tells us how many people will end up being infected by the very end of the epidemic. The value of k tells us whether we are at the halfway point in the epidemic. These numbers agree with what we already know. The value of n tells us how badly the disease is spreading.
For example, South Korea will have the least number of infections at only 8856 individuals, since they have already flattened the curve, and passed the halfway point at day 41 (they were one of the earlier infected countries). In the next 2 weeks, the epidemic will be over in S. Korea.
We also know that Germany has controlled the epidemic relatively well so far …they will end with 109,587 infections give or take 7500, and they are exactly at the halfway point at Day 67 (28MAR2020). This means that the epidemic in Germany will continue for 2 more months, at which point the curve will be flat.
Clearly, Spain and Italy did a worse job of controlling the epidemic. Spain will both end up with almost 300,000 infections and will be able to flatten the curve almost a week later than Germany. Italy has responded now so drastically, that they will be able to flatten the curve in a couple of months along with Germany, with almost 150,000 infections and a very high price in mortality because of ignoring the epidemic in the beginning.
Now the more alarming news. The US will also have greater than 250,000 infections but their measures will allow them to catch up. The high number of infections is understandable, it is a country with nearly 4 times the population of the Germany, Spain, Italy and S. Korea. But they have a cooperativity value of 18! This is much higher than other countries, indicating that the US is has not taken quarantine and lockdowns seriously so far.
And what about India? India has not suffered the brunt of the disease yet. The epidemic is just beginning in India and things are uncertain yet. The prediction is that there will be 625,000 infections in India…but the uncertainty means that it could go up to 1.3 billion (in the extremely unlikely worst case). The lockdown and quarantine Is working, and the disease is spreading quite slowly in India (n = 10), even slower than it spread in Europe, the US and S. Korea. But Indians should know that it is still early days, and the halfway point for us is still 57 days, or 8 weeks away. We expect to flatten the curve 16 weeks from now if things go as planned.
I plan to see in the coming weeks how valid this analysis turns out to be. As the epidemic proceeds and more dtaa points become available, the uncertainty shoukd reduce significantly.
Comments are welcome.
Data Sources
CSSEGISandDATA GOVID19 : https://github.com/CSSEGISandData/COVID19
This data can be visualized on an ArcGIS Dashboard.
 World Health Organization (WHO): https://www.who.int/
 DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.
 BNO News: https://bnonews.com/index.php/2020/02/thelatestcoronaviruscases/
 National Health Commission of the People’s Republic of China (NHC):
http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml  China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm
 Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html
 Macau Government: https://www.ssm.gov.mo/portal/
 Taiwan CDC: https://sites.google.com/cdc.gov.tw/2019ncov/taiwan?authuser=0
 US CDC: https://www.cdc.gov/coronavirus/2019ncov/index.html
 Government of Canada: https://www.canada.ca/en/publichealth/services/diseases/coronavirus.html
 Australia Government Department of Health: https://www.health.gov.au/news/coronavirusupdateataglance
 European Centre for Disease Prevention and Control (ECDC): https://www.ecdc.europa.eu/en/geographicaldistribution2019ncovcases
 Ministry of Health Singapore (MOH): https://www.moh.gov.sg/covid19
 Italy Ministry of Health: http://www.salute.gov.it/nuovocoronavirus
 1Point3Arces: https://coronavirus.1point3acres.com/en
 WorldoMeters: https://www.worldometers.info/coronavirus/