India’s AI strategy should be to create a data infrastructure
Assembling the right kinds of data will be challenging, and requires careful planning and foresight
Hardly a week passes without yet another story about the Artificial Intelligence race between China and the US. China has allocated massive funding to AI to enable new capabilities in transportation, surveillance and weapons. China is “all in” on AI, with Chinese private funding in AI in the last decade growing seven-fold faster than US funding.

India has largely ignored AI. No Indian university or think-tank figures as a serious AI research entity measured by citations or other measures of innovation such as victories in prestigious AI contests, where algorithms compete on standard problems. China now increasingly features such winners and boasts two AI institutions in the top 10 globally who work closely with the government. In contrast, India lags behind in terms of vision, infrastructure and the funding required to become a major player in AI innovation. Is it too late for India to play catchup in AI?
Interestingly, the answer is that it isn’t too late. Indeed there could be advantages in letting others do the heavy lifting in creating the algorithms, and focus on vision and data instead. Why?
The truth is that algorithms make their way into the public domain almost instantly. For example, the various deep learning algorithms that have revolutionised image recognition and language processing are commonly available. In contrast, data, which powers the algorithms, are very difficult to obtain and almost never shared. Tesla and Google don’t share their autonomous vehicle data for good reason. Knowledge springs from a combination of unique data and general-purpose algorithms, and increasingly so when cloud power can be pooled and storage is cheap. Chinese companies and the government are focused on collecting their own data, leveraging the algorithms that were developed almost entirely in North America.
In reality, data are the bottleneck, the assembly of which usually takes up the majority of the work in AI projects. Indeed, the surge of novel machine learning algorithms in image and language processing was driven by the elimination of the data acquisition bottleneck — the fact that raw data from autonomous vehicles or smartphones can be streamed directly into algorithms without any human input has been a huge win for machine learning and AI. Availability of gobs of clean training data enable machines to improve their decision making without human intervention with each passing day.
If data are the bottleneck and the source of advantage, India’s focus should be on creating a data infrastructure in multiple areas. This will provide the grist for machine learning algorithms that will pay dividends in the years to come. Specifically, the real win for India would seem to be in creating a data infrastructure that will improve the quality of life at an everyday level, specifically in air, transportation, water, food, governance, and education. For example, Indian urban centres are a mess: polluted and chaotic. Better air quality and transportation would mitigate many basic health and quality of life issues. Imagine the impacts on health and productivity if air quality and commute times improve by an average of 10% or 20% over the next decade through more intelligent sensory and administrative systems driven by data coupled with sensible incentive systems.
Efficiency in governance is an equally pressing challenge in terms of the capacity of the State to provide basic services such as law and order, transportation and power. Surveillance technology offers huge promise in helping deploy scarce human resources “on demand” without sacrificing privacy or freedom at the individual level. If we can define sensible metrics such as crime reduction, problem resolution times, etc., progress will be well defined and measurable.
The good news is that India boasts the world’s biggest success in identity and real-time authentication through the Aadhaar platform. It would be fruitful for India to replicate this infrastructure success by creating similar data platforms that improve the lives of its people.
China was quick to copy the algorithms developed in North America and apply them to their own data in military and commercial applications. It has a draconian policy towards data ownership that ignores privacy or human rights concerns. That model will not work in India, but nor is it necessary to make progress towards what matters to India. Assembling the right kinds of data will be challenging, and requires careful planning and foresight. That is the place to start, and there is little time to waste.
Vasant Dhar is Professor at the NYU Center for Data Science and the Stern School of Business. His research focuses on when we should trust data-driven learning machines with decisions.
The views expressed are personal
