- NL Sena
‘All models are wrong, some are useful.’
Let the title not put you off. It is really quite simple to understand how epidemics like the coronavirus outbreak are tracked, studied, understood, and modelled.
On March 5, India reported a cumulative total of 28 cases of coronavirus infection. On May 31, there were 1,82,142 cases. This growth in the number of people infected (quite a few have recovered and over 5,000 have died so far) is what the mathematics of epidemics tries to study and model.
Relative to India’s 1.3 billion population, these numbers are small compared to infections and deaths in less populous countries such as Italy, France, Britain, and Iran. Today, India is 7th globally in overall reported cases. But the important point is that the epidemic in India is still growing.
How do we measure this growth? And can this measure of growth tell us what will happen in the near future? These are the questions I shall deal with.
Epidemics grow exponentially, not linearly. If you understand simple and compound interest, then you will get exponential growth. Exponential growth is when the number of new cases each day increases over time and is proportional to the total cases as of that date. In linear growth, the number added each day is the same and has no relationship to the cumulative total of cases.
Consider this chart. It is based on real data from the Indian government’s .
I have deliberately chosen a start date of April 23 when there were already over 21,000 cases. The number of cases being reported each day has steadily grown. There are a few days when it seems to drop but the trend is steadily upward. And there is a relationship between the daily cases and the cumulative number of cases. So, it meets the two criteria for exponential growth.
The one single number that describes this exponential growth is the compound daily growth rate, or CDGR. It is analogous to the compound annual growth rate or CAGR in finance theory. Quite simply, the CAGR is the constant annual rate of return that would take an investment from its initial value to its final value if the profits were to be reinvested at the end of each year. The formula for CAGR is:
Applying this to the Covid-19 situation in India, we have the time not in years but in days and a beginning value of 21,700 on April 23, and a final value of 1,82,142 on May 31 – 38 days later. So, the CDGR becomes:
This need not be daunting. Plug the numbers into a standard calculator or into a spreadsheet and you will get the answer 0.0576 or, in percentage terms, 5.76 percent.
Of course, this assumes a constant daily rate of growth, that’s, it is an average over these 38 days. It may have been higher earlier on and lower more recently. It is equally simple to calculate the CDGR over the most recent five-day period. Over the five-day period between May 26 (1,45,380 cases) and May 31 (1,82,142 cases), the CDGR works out at 4.61 percent.
The doubling time is the time it would take for a quantity to increase to twice its starting value assuming that it grows at a constant compound growth rate.
It is analogous to the time it would take for your investment to double in value for a given constant annual rate of return, only here we measure it in days.
The formula is:
Where 'r' is the steady rate of growth expressed as a percentage.
This requires a bit more calculator work but is not at all difficult. If your investment grows at a steady 10 percent per year, then you would double it in a little over seven years.
If you don’t want to mess around with logarithms and calculators, there is a simple rule of thumb which works with remarkable accuracy:
Where 'r' is the growth rate as a percentage.
Let’s apply this to the reported Covid-19 data from India.
If the latest growth rate is 4.61 percent per day, then I work out the doubling time to be 15.38 days by the complicated formula and 15.2 days by the simpler rule of 70. Not a bad approximation!
Predicting the future
“Modelling” is a big word that statisticians and consultants like to throw around but what we have just done is enough to do a simple, but not in the least inaccurate, bit of modelling.
If the latest growth rate is 4.61 percent and the doubling time is approximately 15 days, and the number of cases at the end of May is about 1,80,000, then we can predict that, unless the characteristics of the epidemic change, we will have:
3,60, 000 cases by 15 June.
7,20,000 cases by 30 June.
1.44 million cases by mid-July.
Modelling is simple in concept. A model is a simplified statistical representation of a real world situation used to understand it better and make predictions about how the situation will develop in the future.
The key ingredients are data and assumptions. The data that we used are the daily reported number of newly diagnosed cases. This is by no means complete; there may well be many more cases (and deaths) that we remain unaware of. But it is the best we have.
The key assumptions we made are that the compound daily growth rate of the last few days will continue to prevail in the near future. The growth rate may well decline, or it may shoot up, especially since the lockdown is about to be lifted substantially. Time will tell.
Modelling is not a once-and-for-all exercise. As more data become available and the assumptions are modified the model will change as will our predictions. It’s an iterative process that is at best imprecise, at worst dangerously misleading.
As the famous aphorism, commonly attributed to the British statistician George Box, has it, “All models are wrong, some are useful.”
Jammi N Rao is an independent public health physician. He has worked with the ICMR, Hyderabad, and the UK National Health Service in England. He also served as a senior civil servant in the UK health department. He has particular skills in data science, medical research ethics, evaluation, and critical appraisal of clinical evidence.