AI Summit 2026: ‘Training on copyrighted data amounts to stealing’, says Pleias founder Stasenko

Speaking to HT ahead of the India AI Impact Summit 2026 in New Delhi, Stasenko argued for “sovereign”, locally run systems over the cloud-dependent models built by US tech giants

Published on: Feb 16, 2026, 05:02:08 IST

By Sejal Sharma, New Delhi

Prefer HTon Google

Share via

Copy link

Anastasia Stasenko, founder of European AI lab Pleias, has bluntly characterised the way large AI companies train their models on copyrighted data as “stealing”. Speaking to HT ahead of the India AI Impact Summit 2026 in New Delhi, Stasenko argued for “sovereign”, locally run systems over the cloud-dependent models built by US tech giants.

“It’s stealing, okay? It’s not an opinion, it’s just a fact,” Stasenko said, referring to recent copyright disputes involving companies such as OpenAI and Anthropic. Her comments reflect a growing global debate over whether scraping books, articles and websites to train AI systems constitutes “fair use” or copyright infringement.

Stasenko is among the international founders invited to the New Delhi Summit taking place at Bharat Mandapam from February 16-20. While many attendees represent heavily funded frontier labs racing to build larger models — such as Google DeepMind, OpenAI, Meta, Microsoft and Anthropic — Pleias has taken a different route. The company builds smaller systems trained only on open or public-domain material, designed to run locally rather than through expensive cloud APIs.

Stasenko’s critique comes as Indian policymakers are actively examining these issues. The Department for Promotion of Industry and Internal Trade (DPIIT), under the Ministry of Commerce and Industry, has constituted an expert committee to determine if copyright law needs changes to address AI training. In a working paper released in December, the panel proposed a “hybrid system” under which AI firms could obtain a mandatory blanket licence to train on lawfully accessed content. The proposal includes royalties paid at the commercial use stage through a centralised mechanism.

‘Rebellious’ approach

Stasenko said Pleias deliberately chose to test whether large datasets scraped from the internet were truly necessary. “We were a little bit in a rebellious mood because Sam Altman at some point said it’s not possible to train good AI without basically stealing other people’s intellectual property, and we’re like, let’s try,” she said.

The company trains “compact language models” under 3 billion parameters, rather than the multi-billion-dollar systems pursued by leading US labs. Pleias also released the Common Corpus, which Stasenko describes as the world’s largest open dataset for LLM training with over 2 trillion tokens, adding that it has been used by several “sovereign” or nationally developed language models.

Focus on the Global South

Unlike cloud-dependent chatbots, Pleias’s systems are designed to operate offline. “What it actually means in production… is that a model can work on device without internet connection,” Stasenko said.

The company has piloted deployments in Senegal and the Democratic Republic of the Congo, building health and education AI assistants that run entirely offline on low-cost Android phones and basic hardware such as Raspberry Pi devices. Instead of sending queries to distant data centres, the system retrieves answers from locally stored documents, reducing costs and connectivity requirements.

Stasenko argued this approach is practical for countries like India, where public service delivery and internet access remain concerns. “I think that India has actually experienced firsthand how unequal the distribution of power in AI actually is,” she said. “We need to decentralise… not only in terms of their economic concentration, but also in terms of technological skills and infrastructure.”

Stasenko hopes to explore partnerships with Indian non-profits and research groups, such as Wadhwani AI, during the summit.

Expectations from New Delhi

Looking ahead to the summit, Stasenko expects discussions to move beyond declarations. She contrasted her expectations for New Delhi with the Parisian Summit last year, which she felt was “very much business oriented”.

“For the Indian summit… it’s really more about impact for public good,” she said. Her broader hope is for “more diversity for AI, more decentralisation, which would lead to democratisation,” allowing countries to build locally without depending entirely on a few global providers.

Follow India news real-time updates and the latest news covered on Hindustan Times, featuring today's critical updates on Sonam Wangchuk LIVE and more across India.