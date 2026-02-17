Deploying AI for the people must cut through the hype so that “dashboard success” does not become “welfare failure,” the head of one of the world’s foremost economic research bodies has said, warning that scaling unproven artificial intelligence solutions can cause active harm — not just waste money. Iqbal Dhaliwal is the global executive director of the Abdul Latif Jameel Poverty Action Lab

As policymakers, researchers and tech leaders gather for “AI for Social Good” discussions on the second day of the India AI Impact Summit in New Delhi on Tuesday, Iqbal Dhaliwal, global executive director of the Abdul Latif Jameel Poverty Action Lab (J-PAL), said the current obsession with app downloads and chat logs threatens to obscure the ultimate goal: improving human lives. Worse, it risks entrenching inequality, eroding public trust in technology, and displacing critical human judgment.

“Engagement is important ... but those are necessary conditions — they are not sufficient conditions,” Dhaliwal told HT on Monday. “The sufficient condition has not changed ... [it is] impact.”

J-PAL on Tuesday launched the AI Evidence Playbook — a manual to guide government officials on deploying AI responsibly. The compendium distils lessons from global research to align AI adoption with rigorous evidence.

The “engagement trap”

Dhaliwal’s caution is rooted in what he calls the “engagement trap” — where high usage metrics mask failure to deliver real-world benefits. He cited a WhatsApp-based AI chatbot for Kenyan entrepreneurs that saw massive engagement, with 85% of users interacting with it.

“Great engagement ... but profits or revenue did not seem to go up,” Dhaliwal noted. “Knowing people are chatting with a bot ... doesn’t demonstrate impacts on real-world outcomes.”

Beyond measuring whether AI works, Dhaliwal emphasised assessing cost effectiveness and distributional impacts. “Is it effective? Number two, is it cost-effective?” he asked. “Ultimately everybody’s offering these things for free in the beginning, but data and queries will cost money and have environmental implications.”

Equally important is asking who benefits. “Is it benefiting everybody or is the average impact being determined by some people?” he said, noting results might be driven by already high-performing businesses rather than those most in need.

Development sector déjà vu

Dhaliwal framed the current AI hype within a familiar pattern of development sector disappointments. “This has been the trap in the development sector all along,” he said, citing microfinance and “One Laptop Per Child”.

“Microfinance was supposed to solve all our problems — empower the woman, increase her livelihood, reduce health shocks,” he recalled. “We found that microfinance does a few things good. Maybe it increases income, but it’s not automatically going to lead to other things.”

Similarly, One Laptop Per Child was meant to improve education and help children access social benefit programmes. “It just doesn’t work because the backward and forward linkages of how it is going to happen in the field are not going to happen,” he said.

The danger is not just wasted money but active harm. Scaling ineffective solutions can entrench inequality or displace critical human judgment. He pointed to mental health apps as particularly concerning, warning against replacing counsellors with algorithms simply because of shortages.

“Instead of the doctor, now it’s a one-on-one relationship between AI and that patient. This could go south very quickly,” he said.

While a top doctor at AIIMS might filter out “hallucinations” from an AI tool, a nurse in a primary health centre in Bihar or Uttar Pradesh might feel compelled to follow “advice given by the department”, potentially causing harm if the AI is wrong.

The risks extend to eroding public trust in technology. Dhaliwal illustrated this with agricultural extension workers — traditionally scarce but valuable advisers who would visit farms, recommend treatments, and critically, return to check results.

“The ag extension worker would go back after a week and say ‘kuch farak pada?’ (did it make a difference?),” Dhaliwal explained. “If the farmer says ‘ye to aur kharab ho gaya’ (this got even worse), he would quickly amend it.”

Now imagine an AI app giving wrong advice through photo-based diagnosis. “What happens if you give them wrong advice [and] something bad happens? What about trust in the technology? Will they ever come back to you or not?”

India as evidence capital

Despite the caution, Dhaliwal sees India as uniquely positioned to solve this global puzzle. While the West struggles with fragmented legacy systems, India’s Digital Public Infrastructure (DPI) — including Aadhaar and UPI — has made it the “AI application capital of the world”.

“The difference of coming to India and seeing the DPI stack at work versus going to another low and middle-income country is phenomenal,” Dhaliwal said. “The digitisation and data available in India is off the chart.”

This digital backbone allows India to run rapid, low-cost randomised evaluations (RCTs) to test if AI tools actually work.

Speed vs rigour

Dhaliwal acknowledged the tension between rigorous evaluation and political timelines is not new. “In my 16 years at J-PAL now, what has been the common question? ‘This is going to take too long. We need to implement it right now,’” he said.

The pressure comes from multiple sources: upcoming elections, bureaucratic transfers (officials wanting results before their 18-month posting ends), and budgetary cycles requiring spending by March 31. “And underlying all of this is also a desire to do well,” he noted.

His response: “Slow it down not because we don’t care about the poor, but precisely because we care about the poor and actually want them to have an outcome.”

But he pushed back on the premise. Thanks to India’s digital infrastructure, evaluations are now “much faster”. “We can capture data about health admissions, educational outcomes, farmers’ productivity much more now digitally,” allowing quicker preliminary results.

Augmentation, not replacement

Dhaliwal argued the imperative in the Global South is deploying AI to save humans from drudgery, not replace them.

He pointed to “Letrus”, an AI grading tool in Brazil that handles mechanical work like checking spelling and grammar for standardised tests. This freed teachers to focus on mentorship. “It makes the role of a human grader redundant ... [freeing] the teacher to sit down with you ... and think about the analytical way in which you approached this problem,” he explained.

Similarly, AI can correct human bias in tax collection. In Senegal — a context similar to neighbourhoods like Vasant Kunj — human tax assessors often “under-assess” ultra-luxury homes because they cannot conceive of the cost of high-end finishes.

The bureaucrat’s checklist

Dhaliwal, a former IAS officer, offered a pragmatic “checklist” for district magistrates and secretaries at the summit. He would demand answers to three questions before signing any AI contract.

First, the theory of change: “Is our theory of change that the teacher is redundant, or... that the teacher is doing things which are useless and can be automated for them to focus on more analytical side of teaching?”

Second, the training data: “Did they download training data from the US or is the training data coming from India? Because that will determine completely, especially in health outcomes ... We have suffered for many years because a lot of these trials happened in the West.”

Finally, field robustness: “How will this work in the field? I already told you examples of machines which will fall down ... internet and electricity which keeps going away.”

“I would ask all of these questions very rigorously,” Dhaliwal concluded. “And after that I would say ... ‘Let’s pilot it for three, four, five months. Let’s test the hell out of it ... and if yes, then let’s go for it.’”

Dhaliwal clarified that rigorous randomised trials aren’t needed for every application. Simple A/B testing works for apps handling billing questions, with post-interaction ratings or callback tracking.