Preserve the integrity of India’s data ecosystem
India’s data ecosystem needs saving, but first, we need to convince enough people that it is worth saving, that the questions it answers are vital, and the people it counts deserve to be heard
Data journalism is still new enough in India for a data journalist to be an object of curiosity, and I get a lot of questions when I tell people what my job is. People also have strong views about data, many of these quite far from the truth. Over time, I have become accustomed to three fundamental things that people get wrong about India’s data landscape.
The first thing that many assume is that there is very little data available in India. There is good reason for casual observers to feel this way — important household surveys such as the Census and the National Sample Surveys are conducted with significant time gaps; opinion polling around non-election issues is still nascent, opaque and unreliable; and on several key issues (as was exposed during the pandemic), simply no data is available.
But the fact is that India’s official statistical architecture is impressive, pedigreed and wide-ranging. And newer private household surveys such as the Indian Human Development Survey conducted by the National Council for Applied Economic Research and the University of Maryland, the Lok Surveys conducted by the Lok Foundation, University of Oxford and the Centre for Monitoring Indian Economy (CMIE), as well as CMIE’s Consumer Pyramid Household Survey have filled some of the gaps in our knowledge. This data allowed me to answer questions both big and small in my new book, Whole Numbers And Half Truths: What Data Can and Cannot Tell Us About Modern India — from how much money India’s middle class really makes, to how much fish Bengalis really eat.
Privately conducted surveys, in particular, come with potential sampling biases, but a robust public debate around these issues is already on, and this will only make the surveys better. Indian data is not always easy to access or interpret, but for many important questions about the way India operates, we do have data that can help us make sense of what we are seeing, and not have to rely on pre-fabricated narratives.
In the case of the Covid-19 mortality, for instance, official data on Covid deaths was not reliable, and estimates of deaths from all causes were not made public. But journalists were able to extract this data state-by-state to present a more accurate picture.
The second misconception is the theory that data — official data in particular — does not capture the “real India”. This is not a purely academic discussion. In February 2019, for instance, Prime Minister (PM) Narendra Modi, just a month away from launching his re-election campaign, stood before Parliament to talk about jobs. Just a week earlier, Business Standard had leaked a bombshell report that the government had been suppressing data — India’s unemployment had hit an all-time record high, the data showed. In his speech, the PM claimed, instead, that until now, employment had been estimated by capturing jobs in only seven to eight sectors, but the world had changed. App-based taxi aggregators had sprung up, he said — were they driverless cars? Millions of young people had taken government loans to start new enterprises, but they are not captured by job data, he added. There were loud cheers from the treasury benches.
But this isn’t actually true. By looking closely at the National Statistical Office’s employment data, I show that these jobs were indeed captured. Which isn’t to say that official statistics are faultless. For instance, the fact that India’s consumption expenditure data diverges significantly from Gross Domestic Product (GDP) estimates had potential explanations, but isn’t taken seriously enough by many on the Left, leaving Right-leaning economists the space to argue that consumption data be thrown out altogether. Both sides need to speak to each other more, or we risk official data losing some credibility.
The third misapprehension among people is that Indian data must be “fudged” and manipulated. Once again, it is not my case that Indian data is perfect or unfalsifiable. However, conversations with insiders have assured me that it is difficult to manipulate official data and the instances where data has raised eyebrows have been rare enough to force big public debates.
This is not to say that it can never happen — in fact, both the internal integrity and the external credibility of India’s statistical architecture are being systematically undermined, and two dangerous processes have begun.
The first is the seeding of the pernicious narrative that official surveys are outdated and administrative data, which is limited and more closely controlled, provides better answers instead. The second is more ham-handed — simply delaying, and in one history-making episode, suppressing, data that is inconvenient by belatedly calling it flawed.
There are mechanisms that could give Indian statistics greater institutional independence, but this needs strong democratic pressure. India’s data ecosystem needs saving, but first, we need to convince enough people that it is worth saving, that the questions it answers are vital, and the people it counts deserve to be heard.
Rukmini S is a Chennai-based data journalist. Her book, Whole Numbers And Half Truths: What Data Can and Cannot Tell Us About Modern India will be published on December 6
The views expressed are personal
Data journalism is still new enough in India for a data journalist to be an object of curiosity, and I get a lot of questions when I tell people what my job is. People also have strong views about data, many of these quite far from the truth. Over time, I have become accustomed to three fundamental things that people get wrong about India’s data landscape.
The first thing that many assume is that there is very little data available in India. There is good reason for casual observers to feel this way — important household surveys such as the Census and the National Sample Surveys are conducted with significant time gaps; opinion polling around non-election issues is still nascent, opaque and unreliable; and on several key issues (as was exposed during the pandemic), simply no data is available.
But the fact is that India’s official statistical architecture is impressive, pedigreed and wide-ranging. And newer private household surveys such as the Indian Human Development Survey conducted by the National Council for Applied Economic Research and the University of Maryland, the Lok Surveys conducted by the Lok Foundation, University of Oxford and the Centre for Monitoring Indian Economy (CMIE), as well as CMIE’s Consumer Pyramid Household Survey have filled some of the gaps in our knowledge. This data allowed me to answer questions both big and small in my new book, Whole Numbers And Half Truths: What Data Can and Cannot Tell Us About Modern India — from how much money India’s middle class really makes, to how much fish Bengalis really eat.
Privately conducted surveys, in particular, come with potential sampling biases, but a robust public debate around these issues is already on, and this will only make the surveys better. Indian data is not always easy to access or interpret, but for many important questions about the way India operates, we do have data that can help us make sense of what we are seeing, and not have to rely on pre-fabricated narratives.
In the case of the Covid-19 mortality, for instance, official data on Covid deaths was not reliable, and estimates of deaths from all causes were not made public. But journalists were able to extract this data state-by-state to present a more accurate picture.
The second misconception is the theory that data — official data in particular — does not capture the “real India”. This is not a purely academic discussion. In February 2019, for instance, Prime Minister (PM) Narendra Modi, just a month away from launching his re-election campaign, stood before Parliament to talk about jobs. Just a week earlier, Business Standard had leaked a bombshell report that the government had been suppressing data — India’s unemployment had hit an all-time record high, the data showed. In his speech, the PM claimed, instead, that until now, employment had been estimated by capturing jobs in only seven to eight sectors, but the world had changed. App-based taxi aggregators had sprung up, he said — were they driverless cars? Millions of young people had taken government loans to start new enterprises, but they are not captured by job data, he added. There were loud cheers from the treasury benches.
But this isn’t actually true. By looking closely at the National Statistical Office’s employment data, I show that these jobs were indeed captured. Which isn’t to say that official statistics are faultless. For instance, the fact that India’s consumption expenditure data diverges significantly from Gross Domestic Product (GDP) estimates had potential explanations, but isn’t taken seriously enough by many on the Left, leaving Right-leaning economists the space to argue that consumption data be thrown out altogether. Both sides need to speak to each other more, or we risk official data losing some credibility.
The third misapprehension among people is that Indian data must be “fudged” and manipulated. Once again, it is not my case that Indian data is perfect or unfalsifiable. However, conversations with insiders have assured me that it is difficult to manipulate official data and the instances where data has raised eyebrows have been rare enough to force big public debates.
This is not to say that it can never happen — in fact, both the internal integrity and the external credibility of India’s statistical architecture are being systematically undermined, and two dangerous processes have begun.
The first is the seeding of the pernicious narrative that official surveys are outdated and administrative data, which is limited and more closely controlled, provides better answers instead. The second is more ham-handed — simply delaying, and in one history-making episode, suppressing, data that is inconvenient by belatedly calling it flawed.
There are mechanisms that could give Indian statistics greater institutional independence, but this needs strong democratic pressure. India’s data ecosystem needs saving, but first, we need to convince enough people that it is worth saving, that the questions it answers are vital, and the people it counts deserve to be heard.
Rukmini S is a Chennai-based data journalist. Her book, Whole Numbers And Half Truths: What Data Can and Cannot Tell Us About Modern India will be published on December 6
The views expressed are personal
All Access.
One Subscription.
Get 360° coverage—from daily headlines
to 100 year archives.
Archives
HT App & Website