Building trust in self-governing AI

This article is authored by Sudeepta Veerapaneni, partner and chief innovation officer, Deloitte India.

Published on: Feb 10, 2026, 12:05:09 IST

By Sudeepta Veerapaneni

Prefer HTon Google

Share via

Copy link

Five years into India’s AI surge, leaders have learned that scaling AI is anything but effortless - it’s complex, gritty, and unforgiving when controls are weak. Behind the headlines of breakthrough models lies a hard truth: success depends less on algorithms and more on governance, trust, and operational discipline. While these systems promise speed and scalability, they also introduce new risks: Opaque decision-making, ethical blind spots, and accountability gaps. The question isn’t whether businesses should embrace agentic AI--it’s how they can do so responsibly, embedding trust at every layer.

A recent analysis found that when global AI models were tested with India-specific prompts, they sometimes reflected unintended cultural or contextual biases. Similarly, evaluations using the Indian-BhED dataset have shown that large language models (LLMs) may sometimes produce responses that reflect common patterns or assumptions present in local data. Both these examples underscore the importance of rigorous local auditing and customisation to ensure that AI systems operate fairly and accurately across diverse environments.

The rapid rise of agentic AI is also ushering in a suite of new threats that businesses cannot overlook.

First, model-drift and unintended behaviour are becoming more common as autonomous systems update themselves from live data streams. A June 2024 incident at a major Indian payments platform saw its self-optimising fraud-detection engine start flagging legitimate transactions from small merchants, causing a temporary freeze of  ₹2 billion in sales.

Second, privacy-by-design is being tested in real time. During the early-2024 public beta of a new generative-AI model, the system inadvertently cached snippets of users’ health queries, prompting an investigation by the European Data Protection Board and a swift rollout of an automated “audit-log” feature.

Third, bias is surfacing across sectors beyond language. A 2024 audit of a regional insurance-tech startup revealed that its AI-driven claim-approval workflow rejected ~ 68 % of policies from Tier-2 and Tier-3 districts, a pattern traced to training data that over-represented urban loss ratios.

Fourth, the weaponisation of generative tools is moving from experimental labs to everyday attacks. In early 2024, a deep-fake scam targeted the Hong Kong-based multinational. The organisation suffered a loss of roughly  ₹25.6 million after fraudsters used AI-generated voice and video to impersonate senior executives and trick employees into transferring funds.

Finally, regulatory pressure is tightening. The RBI’s Digital Lending Directions now require an explicit human-in-the-loop checkpoint for any AI-driven credit decision, and the US SEC has begun flagging AI-generated disclosures that lack traceability.

Together, these examples illustrate that autonomy without transparent guardrails can quickly translate into operational, reputational, and compliance fallout—making a “glass-box” approach to AI governance essential for sustainable growth.

Designing trustworthy, agentic AI hinges on three sharp principles that are already being turned into practice by leading firms.

First, transparent, auditable AI begins with a decision-by-decision log that captures the input data, model version, and a concise business rationale for every inference. Hallucinations -instances where the model confidently generates false information - remain the biggest obstacle to moving from flashy demos to dependable corporate tools. To overcome this, engineers now employ a three-layer defence: grounding the model with retrieval-augmented generation, guiding its reasoning through structured prompts and temperature control, and governing output with guardrails and dedicated “critic” agents.

* Grounding: Retrieval-Augmented Generation (RAG): The most effective way to stop hallucinations is to stop the AI from relying on its memory. Instead of letting the AI "guess" based on its training data, you give it a "closed-book" exam. When a user asks a question, the system first searches your private, verified documents (PDFs, Wikis, Databases) for the answer. It forces the AI to act as a librarian rather than a storyteller.

* Guiding: Reasoning frameworks: Sometimes AI hallucinates because it tries to jump to an answer too fast. You can use specific prompting techniques to slow it down like,

* Chain-of-thought (CoT): You force the AI to explain its step-by-step logic before giving the final answer. If the logic is flawed, the error is easier to spot.

* Temperature control: In technical settings, you set the "temperature" to 0. This makes the AI less "creative" and more likely to give the same factual response every time. One such example is Deloitte’s Tax Pragya which has turned its AI-led search and summarisation platform for India’s tax professionals into a “glass-box”. It is trained on more than 1.2 million tax cases and over 5,000 Deloitte technical papers, solutions and proprietary expert insights to deliver near-zero hallucinations.

* Governing: Guardrails and "guardian agents" - Modern systems use a second AI to "police" the first one.

* Nemo guardrails/llama guard: These are separate, small models that sit between the AI and the user. They scan the AI’s output for toxic language, policy violations, or "hallucination patterns" before the user ever sees the text.

* The "critic" agent: In agentic workflows, you often have two agents, one AI generates the response (the generator), while another fact-checks (the critic) it against trusted data sources.

Second, human-in-the-loop safeguards act as a critical safety net for high-stakes decisions. When an AI’s confidence score drops below a set threshold, the process pauses and routes the decision to an expert. For example, RBI-compliant fintechs send low-confidence credit scores to senior loan officers, while health tech tools require a radiologist’s sign-off before critical imaging results reach patients. Each intervention is logged with the AI output, creating a full audit trail that ensures compliance without sacrificing speed and scalability.

Continuous risk monitoring and drift control serve as an early warning system for autonomous AI. Real-time dashboards track key metrics against approved baselines, while automated alerts and rollbacks kick in when thresholds are breached - preventing large-scale errors. Each alert includes root-cause data, helping analysts quickly identify issues like feature changes or shifts in user behavior and restore stability.

An e-commerce giant’s fraud-detection engine spotted an abrupt rise in legitimate orders being flagged as fraud, triggered an alert within minutes, and automatically rolled the model back to the previous stable version. Analysts then identified a misaligned feature and corrected it, restoring normal transaction flow and protecting merchants’ sales.

Overseeing all of these safeguards is an AI ethics board - a board that reviews every autonomous system before it is fielded, sets clear policy standards, runs regular bias and risk audits, and updates the governance framework as models evolve. By coupling transparent logging, human-in-the-loop checks, and a dedicated ethics committee, organisations can ensure robust oversight.

One such exemplary model is that of the Australian Defence Force, where the ethics board actively collaborates with technical and operational teams, conducts rigorous pre-deployment reviews, and maintains continuous monitoring to ensure AI systems remain transparent, accountable, and aligned with organisational values.

Turning autonomous AI from a dazzling demo into a reliable corporate tool requires more than clever prompts or larger models. It demands a disciplined, layered defence: grounding outputs in verified data, guiding reasoning with structured prompts, and governing behaviour through guardrails and independent critics. When these technical measures are paired with business-focused practices - audit-ready logs, human checkpoints, and real-time drift detection - organisations can reap the efficiency gains of agentic AI without exposing themselves to ethical, legal, or reputational risk. For Indian enterprises standing at the crossroads of innovation and regulation, the path forward is clear: Adopt a glass-box mindset.

This article is authored by Sudeepta Veerapaneni, partner and chief innovation officer, Deloitte India.

Artificial Intelligence

Home/Ht Insight/Future Tech/Building Trust In Self-governing AI