From missing data to unreliable numbers, India’s statistical ecosystem needs an overhaul
The data that does exist is sometimes unreliable but is used anyway because there is no alternative. Several important data sets are released with a huge time lag. Others are missing granular district-level estimates
In July, in front of a roomful of policy wonks, government officials and journalists, Union health secretary CK Mishra made an honest acknowledgement — there are serious problems with India’s public health statistics.
For one, he said, data from the latest round of the National Family Health Survey (NFHS-4) — the major source for detailed health statistics in India, conducted under the aegis of the ministry of health and family welfare (MoHFW) itself — is unreliable for certain states.
On top of that, the Health Management Information System (HMIS), which Mishra called “a data mine”, is not effectively used. “We use very little of it in the planning process” due to lack of expertise to read and understand the data, he said.
The health secretary’s statement raises concerns: how can the country formulate evidence-based policy or plan wisely for the future without credible data? And Mishra, a 34-year veteran of the Indian Administrative Service who was appointed to head the MoHFW last year, is not alone. A recent paper by the Health Team of the National Institute of Public Finance and Policy, New Delhi, found that the country’s health data was unreliable, irregularly published, and failed to cover a broad-enough population.
PROBLEMS GALORE
And such problems are not restricted to the health sector alone. The entire Indian data ecosystem needs improvement. Former RBI governor Duvvuri Subbarao has stated that monetary policy decisions often go astray because of erroneous data provided by the government. The debate on the reliability of India’s macroeconomic data, GDP and IIP numbers, for instance, remains unsettled. At a time when unemployment — or rather, underemployment — is a key socio-economic concern, economists cannot measure the problem’s magnitude because they do not have credible figures and surveys. India’s agricultural statistics have also come under the scanner. Talk about crime, and all you have is aggregated data from FIRs — no official crime victimisation surveys have been instituted yet.
To be sure, every data set comes with caveats that must be considered when making interpretations. But some failings appear to be a standard characteristic of Indian data sets.
To begin with, there isn’t enough data. The data that does exists is sometimes unreliable but is used anyway because there is no alternative. Several important data sets are released with a huge time lag. Others are missing granular districtlevel estimates. If such estimates are present, they are not always used for policy making or governance. And even when data sets are good and people want to use them, there may be too few who understand how to work with them, as Mishra said about HIMS.
Taken together, these shortcomings amount to an Indian statistical ecosystem that falls short of the needs of the world’s largest democracy.
MODES OF DATA COLLECTION
There are two major modes of data collection: administrative, which refers to data collected as a result of an organisation’s daily operations (think of patient registrations at a hospital or new accounts opened at a bank); and surveys, which are based on how a part of a population (what statisticians call a ‘sample’) responds to a set of questions.
PC Mahalanobis, the statistician credited for laying the foundations of the data systems of independent India, “focused on creating credible data sets from representative sample surveys,” says a Mint essay which traced the history of Indian statistical system.
But Mahalanobis’s preference for surveys came at the expense of data collection at the administrative level, the essay argued, and may have undermined the government’s ability to collect regular, reliable data.
“Instead of being sparingly used for purposes where there was no alternative to sampling, sampling became the first choice of technique for collecting data.” Sometimes, surveys are the only way to capture data. Economic statistics, for example, cannot be collected at the administrative level because of the huge size of the Indian economy’s informal sector, which employs around 90% of the country’s workforce, says Pronab Sen, former chief statistician of India.
Yet India faces challenges to conducting good surveys a population of more than a billion people, relatively high rates of illiteracy, and dependence on the informal economy that simply do not exist in much of the rest of the world, says Sen.
VACANCY ISSUES
The government also employs too few people to carry out regular and robust surveys. The National Sample Survey Office’s (NSSO) field operations division, which is responsible for collecting primary socio-economic data, has around 24% of positions vacant for the posts of junior and senior statistical officers.
The NSSO’s critics do not realise how hard it is to undertake actual data collection on the ground, Sonalde Desai, professor of sociology at the University of Maryland who also conducts the India Human Development Survey (IHDS), said in an email. Without adequate internal staff, the agency must contract with outside agencies.
“This is what both IHDS and NFHS do, and only we know how difficult it is to maintain quality. Some of the agencies we work with are fantastic, and some are struggling themselves. This requires enormous supervision, and if one slips there, the data can be highly questionable,” Desai said.
“This hit-and-miss approach is not acceptable for data that form the core of our policy-building process.”
Experts say that technology can be leveraged to improve data collection systems. Private data collection agencies are already making use of apps and tools to conduct surveys electronically, rather than on paper. But that comes with its own challenges. Richa Verma, who leads the research and analysis team at Social Cops, a data intelligence company, says that better design is key to make it easier for people to adopt technology.
While working with the government and various non-profits, Verma found that many of its trainees have never used a smartphone. Data collection technology must be made simple, and appropriate training must be conducted, so that anyone can be trained to use it.