Towards AI-ready data for Bharat

Authored by - Saurabh Garg. secretary, ministry of statistics and programme implementation and Shalini Kapoor, chief strategist, Data & AI, EkStep Foundation.

Published on: May 13, 2026, 11:55:45 IST

By Saurabh Garg, Shalini Kapoor

Prefer HTon Google

Share via

Copy link

India's digital transformation has made government data more accessible than ever. The next frontier is making this data work seamlessly with AI systems that citizens and businesses increasingly rely on for information and guidance.

Often, critical information relevant to policy making for facilitating ease of living for citizens resides within government databases, regulatory portals, certification systems, and official statistical repositories, precise, structured, and authoritative, yet largely inaccessible to analytical and modern AI tools. While AI systems can generate fluent answers from scraped internet content, they remain disconnected from government databases due to their design.

As a result, when citizens, policy-makers, or entrepreneurs seek guidance on questions related to regulatory and schematic requirements like government certification or eligibility for government schemes using AI tools, the AI produces informed approximations rather than definitive answers, not because the knowledge is absent, but because it cannot reach the systems where that knowledge truly resides.

As citizens and enterprises alike increasingly turn to AI to explore such information, and this usage is only expected to grow as awareness of AI's capabilities expands, the need for AI-ready data is more urgent than ever.

India generates enormous volumes of data across agriculture, commerce, health, education, and other citizen services.

The National Statistics Office's e-Sankhyiki portal alone holds 135 million records spanning various socio-economic indicators like GDP, industrial production, consumer prices, and labour force surveys. Through its Digital India initiatives, the Government has already taken significant steps in digitising public services and improving access to schemes and regulatory information through portals like UMANG, MyScheme, and many more, generating information at a massive scale.

While these efforts have made information more accessible to citizens and businesses, the data remains dispersed across multiple platforms and departments. The next step is to extend this remarkable digital transformation to AI systems by making our information portals interoperable for them, enabling a simple natural-language query to retrieve accurate, official answers directly from these government sources and provide citizens and businesses with precise guidance in real time.

Data exists across multiple specialized systems and portals, each designed to serve specific purposes and audiences. A wealth of resources spanning departments, documents, and datasets, each offering unique insights, though often in different formats and contexts, is available; the need is to make them AI-ready. Connecting these resources effectively could empower citizens and businesses with a more complete and actionable view of the information available. AI has already proven it can operate at this scale, processing vast datasets, converting unstructured documents into usable formats, and answering complex queries in natural language.

The opportunity lies in applying this technology to government data. By bridging the gap between AI tools and the rich information already available, we can unlock economic value that compounds across millions of users, including enterprises.

Reducing information friction benefits citizens and businesses alike, but can be especially transformative for small businesses and those at the fringe, for whom access to timely intelligence and regulatory guidance can open opportunities that were previously harder to reach. One of the best use cases can be in the MSME sector.

When these enterprises can efficiently access market intelligence, compliance requirements, and scheme eligibility, they compete more effectively in global markets, optimise their operations, and scale faster. Manufacturing competitiveness strengthens, employment expands across cities and towns, and supply chains become more resilient. The multiplier effect of better information access across this vast enterprise base compounds into significant economic acceleration. With such enablement, the MSME sector can truly act as a champion as envisaged in this year's budget.

All this is only possible once we have AI-ready data. Weak data foundations cannot be compensated by AI. NSO India, being the nodal agency for official statistics for the Government of India, has taken several steps for data harmonisation like prescribing National Meta Data Structure, Statistical Quality Assurance Framework, compilation of unique identifiers, and codes and classifications to be used while creating the schematic data layer for interoperability.

The requirement of data access is fulfilled through a digital bouquet for data dissemination consisting of a website, eSankhyiki portal, mobile app, microdata portal, and metadata portal. These applications provide government data in an interoperable manner using APIs for the consumption of both humans and machines. Grounded in these standards, NSO India has AI-ready data for use.

On February 6, 2026, the National Statistics Office, India took a significant step forward by launching a Model Context Protocol server that places an open technology layer over the eSankhyiki repository. This infrastructure allows AI systems to connect directly to datasets on eSankhyiki and query them programmatically. Anyone can now ask an AI assistant about pricing trends, employment patterns, or industrial output and receive answers synthesized from official statistics in real time.

Market analysis that previously demanded days of specialised expertise becomes accessible through natural language queries. Immediate use cases from dashboard creation to deep-dive data analysis started pouring in from users. This represents a fundamental shift in how government data serves the economy.

In an AI-powered world, data needs to be more than published and organized. It needs to be trusted, interoperable, contextual, and machine-usable. Users need to know where data comes from and whether they can rely on it. Privacy protections need to be built into the architecture, not added as an afterthought. Governance frameworks need to define legitimate use explicitly. Quality assurance needs to be continuous, flagging inconsistencies before they propagate through AI systems.

These elements determine whether AI generates reliable outputs and builds public confidence. The NSO implementation incorporates these safeguards by design. The MCP server provides access to verified official statistics through controlled protocols while data remains with its authoritative source. When datasets are standardised, consented, and discoverable through common APIs, they become an AI-ready foundation that any model or application can safely build on.

The initial deployment covers seven datasets: Periodic Labour Force Survey, Consumer Price Index, Annual Survey of Industries, Index of Industrial Production, National Account Statistics, Wholesale Price Index, and Environmental Statistics, with many more to come in the future. Each can be queried individually or in combination.

Consider what this makes possible. An MSME exploring new markets can analyse employment trends and industrial capacity in target regions, all through a single conversation with an AI assistant. A researcher studying regional economic patterns can combine industrial output data with labour force surveys without manually reconciling different formats and time periods. The infrastructure scales through progressive integration. When more official data sources join the network, queries can integrate them with other datasets. Each new connection expands and multiplies the analytical possibilities across all existing sources, enabling deeper insights and more comprehensive decision-making.

MOSPI’s data harmonisation and linking datasets initiatives is building foundations for a data ready future. Government datasets can become accessible foundations that any developer, researcher, or enterprise can build on. Small businesses gain market intelligence previously affordable only to large corporations. Policymakers receive real-time feedback. Researchers combine datasets that were previously incompatible.

As datasets join the network, each connection multiplies analytical possibilities across all existing sources. Economic returns will compound as adoption grows. When data becomes a horizontal capability that cuts across sectors, innovations stop being trapped in silos.

This approach recognises a fundamental principle: those who generate data should benefit from it. India's competitive advantage in the AI era will come from mobilising diverse, contextual, high-quality data responsibly.

The question is no longer whether data can create economic value. The question is whether we design systems where that value is shared widely enough to power both prosperity and public good. AI-ready data infrastructure, built with trust and safeguards as foundation, transforms scattered information into accessible intelligence.

This is how India builds the data foundation that powers AI for economic growth at population scale en route to Viksit Bharat.

(The views expressed are personal)

This article is authored by Saurabh Garg. secretary, ministry of statistics and programme implementation and Shalini Kapoor, chief strategist, Data & AI, EkStep Foundation.

Artificial Intelligence

Home/Ht Insight/Future Tech/Towards AI-ready Data For Bharat