The rise of data as an antivirus | Latest News India - Hindustan Times

The rise of data as an antivirus

Hindustan Times, New Delhi | By
Dec 31, 2020 05:33 AM IST

Covid-19 triggered the biggest public data-gathering exercise in history, enabling scientists and data experts to break down, analyse and study how the infection behaved, and to save lives

No one could have known back in December 2019, when the first batch of infections of Covid-19 were identified in Wuhan, China, that what was happening would change the world as we know it. Events throughout 2020 would be drastically dictated, and reshaped, by the discovery of that virus.

What transpired through the year was a global pandemic of a scale not seen since the 1918 Spanish Flu epidemic.(Getty images)
What transpired through the year was a global pandemic of a scale not seen since the 1918 Spanish Flu epidemic.(Getty images)

What transpired through the year was a global pandemic of a scale not seen since the 1918 Spanish Flu epidemic. But in 100-plus years since then, much has changed. Technology and globalisation now allow people to travel at a far faster rate, which ended up spreading the disease a lot quicker. But this advancement in technology also allowed scientists to track, in near real-time, how the virus was spreading; discover more about the virus almost every passing day; understand which treatments were working and which weren’t ; measure the impact of the outbreak on the economy; and most recently, monitor new mutations in the virus strain.

Hindustan Times - your fastest source for breaking news! Read now.

Within weeks, government-funded as well as independent data scientists kicked off large-scale tracking exercises that generated troves of real-time data, all posted online, and shared to be used by the entire world. This data, coupled with recent advances in computing power and new analytical study methods, would go on to play a massive role as the year progressed. The role of high-quality, transparent, and disaggregated data and statistics became the pivot around which governments across the world drafted strategies to fight the outbreak.

All this led to the biggest collaborative public data-gathering exercise in history, enabling thousands of people (scientists and amateur data experts alike) to break down, analyse and study how the virus behaved, and to find ways in which a pandemic could be stopped.

Also Read | Year-end cheer: India closer to Covid-19 vaccine after UK’s Oxford nod

As 2020 ends , here’s a look at what a year of data told us about a new virus, and what it says about the future.


Among the several lessons from the 1918 influenza was that such massive viral outbreaks come in distinct waves, which generally tend to get more severe in each instance. The 1918 influenza rolled out in three distinct waves – the first peaked in early 1918, the second towards the spring of the same year, while the third peaked in winter of 1918 and early 1919. The second wave caused significantly more deaths than the first, while the third was marginally less fatal than the second. Scientists believe that the second wave was caused by a more virulent mutant virus, somewhat akin to the mutant strain of Covid-19 discovered in the UK earlier this month. The second wave was also exacerbated by troop movement for World War I.

Such clear waves have been visible in the coronavirus outbreak as well – although the severity has altered to a degree. Let’s take the case of the United States (by far the worst-hit in the world with nearly 20 million cases and nearly 350,000 deaths). It is the only country to have witnessed three clear waves with each reporting more cases than the one preceding it. But deaths during the second wave didn’t rise as much, while the third wave, despite being more fatal than the second, was proportionately better than the first wave (see chart 1). Most European nations are currently grappling with the second wave – which, again, was more severe than the first in terms of cases, but better in terms of deaths.

The drop in deaths in the second wave in Covid can primarily be explained by two main factors. One, thanks to advances in technology, doctors and scientists were able to work at a far more frenetic pace and were able to identify treatments far more effectively compared to a century ago. Within a few months of the emergence of the virus, they were able to learn a good deal about the treatments that work, and just as importantly, those that don’t. Second, thanks to advancement in IT, medical experts the world over are now able to instantly share any breakthrough treatment to better to protect people.


One of the most crucial data aspects of Covid-19 is the positivity rate, or the percentage of samples tested that return positive for Covid-19. This is a key metric in determining how widespread the virus is in a community, with some saying it is a clearer indicator of the intensity of an outbreak than case figures. A common thread in the outbreak across the world was that the positivity rate of a region started rising a few weeks before cases did, and vice versa, dropping before cases came under control. India’s case and positivity rate trajectories serve as a good example (see chart 2).

An HT analysis on September 28 showed that states and Union territories in India that had a lower positivity rate, and thus a well-calibrated testing strategy, saw a lower proportion of their population succumbing to the virus. This means that adequate testing, especially when the positivity rate in a region starts rising, forms a very crucial aspect of saving lives.

According to the World Health Organization a positivity rate below 5% for at least two weeks in a region means the outbreak is under control. In India, the positivity rate has been below this threshold for a month-and-a-half now.


The number of secondary cases generated from every single primary case is one of the most crucial factors that determines how fast an epidemic grows, and subsequently how dangerous it is. One of the key characteristics of Covid-19 that epidemiologists noticed early on during its spread was how certain people ended up becoming superspreaders – they were responsible for infecting a lot more people. While instances of superspreading have been reported for more than a century, dating back to 1918, a vast volume of research data was generated this year regarding Covid-19 superspreaders.

A study published in October in the journal Science, on transmission patterns of Sars-CoV-2, showed that there was a wide variance in the number of people to whom infected people passed on the virus — while many were not infecting anyone, a handful of superspreaders were responsible for a majority of new infections. It showed that only 8% of all infected patients accounted for 60% of new infections (and thus become superspreaders), while 70% of infected patients did not pass the disease to anyone (see chart 3). The contact-tracing study, conducted by a group of researchers led by Ramanan Laxminarayan, director of the Washington-based Centre for Disease Dynamics, Economics and Policy (CDDEP), looked at disease transmission patterns in at least 575,000 people who were exposed to nearly 85,000 Covid infections in Andhra Pradesh and Tamil Nadu.

Another study by researchers at Massachusetts Institute of Technology (MIT), published in early November, analysed 60 superspreaders and superspreading events to find that these have a much larger role in transmission than earlier believed, and that such examples were a lot more common than earlier estimated. The study’s lead author, Felix Wong, a post-doctoral researcher at MIT, wrote that extreme events that deviate significantly from the mean are a lot more frequent than what one would predict. “Most people generate zero or one cases, but it’s the people generating hundreds of cases that we really should be worried about,” Wong wrote.

Experts say superspreaders are the real engine behind the growth of Covid. In order to contain superspreading, the MIT researchers ran mathematical simulations where gatherings were limited to 10 people, i.e. each infected person was assumed to have had only 10 contacts (and infecting only 10 even in the most extreme case). This, they found, curbed the impact of the small superspreader group, bringing down infection numbers quickly.


One of the earliest lessons picked up by doctors and scientists about the nature of Covid-19 was how it was disproportionately fatal for those who are older, even though it infects people in the younger age groups far more. In other words, the risk of severe illness, and subsequently death, increases with age; but the risk of infection is higher among younger people.

This is very clearly visible in the death trends in the US, where eight out of 10 people who have died were above the age of 65 years, according to data maintained by the Centres for Disease Control and Prevention (CDC). This is very similar in India as well, where nearly nine in every 10 people who lost their lives to the disease till December 16 were above the age of 45, even though only four out of every 10 reported infections fell in that age group.

The role of age also explains why countries such as India performed better at saving lives than the West. Countries with a lower median age ended up seeing a better case fatality rate – the proportion of infected patients that die. In India, for instance, where the median age is around 28 years (the youngest among the world’s worst-hit countries), around 1.4% of those infected have died – the lowest among the countries worst-hit by Covid. In contrast, Italy, which has the oldest population among world’s worst-hit Covid nations with a median age of 46 years, has seen the largest proportion of deaths – 3.5%. Similarly, the UK (median age: 41) and the US (median age: 38) have both fared much worse than India in CFR – 3.1% and 1.7% respectively.


The final goal when battling an outbreak is to save as many lives as possible – an area where doctors and governments are improving every day. When Covid-19 surfaced, in the initial weeks, very little was understood about how the disease progressed, and a relatively larger proportion of those infected were dying.

In the first few weeks of the outbreak in China, around 17% of all people getting infected were dying (for cases reported in Wuhan between January 1 and 10). By the end of January, this number had settled to around the 3% in China – the CFR number that was then widely regarded as the global average of the disease. However, when the disease gripped other parts of the world, the CFR showed massive variances. In France, for instance, one in every five people (19.7%) getting infected was dying by early May. Around the same time in Italy, around 14% of all infected people were dying. While this may be because of the relatively older population in these countries (as stated above), this high CFR was not limited to just these countries. Around the same time, over 7% of all people who had been reported infected had died the world over.

But from there, things start getting better. As scientists and doctors studied the disease more, they figured out better treatments, while the lockdowns imposed roughly from May to August in the West gave some reprieve to the health care systems in these countries. By early September, the global CFR dropped to less than half the peak level – 3.4%. In fact, if we only look at the cases reported after September across the world, this number has again halved and stands at 1.6%. The overall global CFR of the disease is 2.2% at the moment.


Throughout history, humanity has never seen a time where the importance of rapidly processed, clear, open and disaggregated data and statistics has been as clearly underlined as it has been during this year. A stronger argument has never been in place to have accurate, open and readily available data at all levels – local, district, state, national, even global.

Governments across the world must push to make all Covid-19 data — testing, geographical spread, availability of hospital beds, results of clinical trials, age/gender break-ups — public. In this global crisis, data collaboration between governments, doctors, epidemiologists, researchers, academia, think tanks, civil society, as well as the private sector, has been pivotal in saving lives. Independent researchers, journalists and data scientists have set up open-sourced Covid-19 dashboards to track infection data from nearly every country in the world. Data nerds across the world (this writer included) have relied heavily on dashboards such as Johns Hopkins University Covid-19 database, Worldometers, Our World in Data, and others, to analyse trajectories and trends.

Sharing information openly and transparently on caseloads and deaths, gaps in the health care system, vaccine development, and scientific and medical developments, is the foundation on which scientists and governments will build the machinery to mitigate this pandemic.

Unveiling 'Elections 2024: The Big Picture', a fresh segment in HT's talk show 'The Interview with Kumkum Chadha', where leaders across the political spectrum discuss the upcoming general elections. Watch Now!
Share this article

    Jamie Mullick works as a chief content producer at Hindustan Times. He uses data and graphics to tell his stories.

Story Saved
Live Score
Saved Articles
My Reads
Sign out
New Delhi 0C
Saturday, February 24, 2024
Start 14 Days Free Trial Subscribe Now
Follow Us On