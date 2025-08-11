In recent days, a battle has been brewing that may realign the contours of web standards with artificial intelligence (AI), the idea of an open web, and how data is collected by AI companies. Internet infrastructure giants Cloudflare fired the first shots, alleging that Perplexity uses stealth to access and collect data from websites that specifically prefer not to. The AI company counters with something bordering on the philosophical — by asking if, with the rise of AI-powered assistants and user-driven agents, the boundary between what counts as “just a bot” and what serves the immediate needs of real people has become increasingly blurred. Cloudflare alleged that Perplexity uses stealth to access and collect data. (Cloudflare)

In reality, expect this conversation to continue, as publishers deal with what Cloudflare CEO Matthew Prince says is AI’s existential threat to publishers. Perplexity says “companies like Cloudflare mischaracterize user-driven AI assistants as malicious bots”, but Cloudflare makes it clear they’re “giving content creators and publishers more control over how their content is accessed. The company’s Content Independence Day, announced last month, which Prince says marks “the default to block AI crawlers unless they pay creators for their content”.

In this specific instance, Cloudflare says they’ve observed Perplexity accessing websites in ways that evade site owners’ preferences—specifically, disallowing access to activity in a website’s robots.txt file, a process known as crawling. This file is an essential part of any website’s framework, one that website owners use to instruct web crawlers (such as search engines) which parts of the website they are allowed to access and list.

“We are observing stealth crawling behaviour from Perplexity, an AI-powered answer engine. Although Perplexity initially crawls from their declared user agent, when they are presented with a network block, they appear to obscure their crawling identity in an attempt to circumvent the website’s preferences,” Cloudflare details in a technical post, before adding, “We received complaints from customers who had both disallowed Perplexity crawling activity in their robots.txt files. These customers told us that Perplexity was still able to access their content even when they saw its bots successfully blocked.”

Cloudflare’s tests, they say, were able to replicate the behaviour of obfuscation, the company’s customer websites were complaining about. They also use a testing equivalence of contrast to make a further point — comparable tests with OpenAI’s ChatGPT crawler show that it stopped when disallowed, and did not follow up with other user agents after being blocked

Perplexity’s argument: An AI assistant is a human’s assistant

Perplexity’s response doesn’t seem to directly address the issue of potential obfuscation and using bypass methodology to still access information from a website after being told not to; the AI company is raising more reflective points. For one, they say modern AI assistants are fundamentally different from traditional web crawling, which was used by search engines over the years. It calls the use of AI tools for search “user-driven” agents, which don’t need to adhere to the rules of the web as we knew them.

“When you ask Perplexity a question that requires current information—say, “What are the latest reviews for that new restaurant?”—the AI doesn’t already have that information sitting in a database somewhere. Instead, it goes to the relevant websites, reads the content, and brings back a summary tailored to your specific question. This is fundamentally different from traditional web crawling, in which crawlers systematically visit millions of pages to build massive databases, whether anyone asked for that specific information or not,” the company responds.

Perplexity insists that its user-driven agents do not store the information or train with it. “When Google’s search engine crawls to build its index, that’s different from when it fetches a webpage because you asked for a preview. When Perplexity fetches a web page, it’s because you asked a specific question requiring current information,” Perplexity’s statement says.

The AI company insists Cloudflare doesn’t know what they’re classifying. “It appears Cloudflare confused Perplexity with 3-6M daily requests of unrelated traffic from BrowserBase, a third-party cloud browser service that Perplexity only occasionally uses for highly specialised tasks (less than 45,000 daily requests),” they add.

Is there a way forward as AI changes web and search?

In the long term, these aren’t minor semantics. At this time, the internet industry hasn’t harmonised with the changes, where AI chatbots are increasingly becoming default search tools, instead of traditional search engines such as Google Search and Microsoft Bing. Google, too, within Search, is now increasingly layering AI, with add-ins such as AI Overviews, before listing relevant website links.

Cloudflare notes that more than 2.5 million sites have opted to block AI training since July, and that it has been evangelising “pay per crawl” ideas. At the same time, Perplexity insists its model derives no training use from these data and information fetches, but in reality, publishers might still want consent, control, or payment for automated access.

This Cloudflare and Perplexity controversy represents more than a technical dispute between two companies.It may well be the first spark for how the internet is rapidly changing in terms of how information is accessed and served using AI, must address web preferences. A fundamental question must be asked at this point. Is the largely collaborative, mostly trust-based model that has governed the web for decades at risk, and can it survive the aggressive data collection needs of modern AI systems? Things may not change immediately, but Cloudflare has trained the spotlight on robots.txt, and a conversation has begun.