As OpenAI’s Sora ups the ante, what’s next for AI detection tests? - Hindustan Times
close_game
close_game

As OpenAI’s Sora ups the ante with text-to-video, what’s next for AI detection tests?

ByAnesha George
Feb 16, 2024 09:42 PM IST

Algorithms were already struggling to identify AI-generated text, voice, video, imagery. Could a new tech alliance get the tech world to enforce metadata?

How did they get the dragon to move as it does? Or get the crowd to look so diverse?

A still from a video generated by Sora, in response to the text prompt: Chinese Lunar New Year celebration with Chinese Dragon. PREMIUM
A still from a video generated by Sora, in response to the text prompt: Chinese Lunar New Year celebration with Chinese Dragon.

Questions flooded social-media platforms as OpenAI launched its AI-driven text-to-video generator, Sora, this week. None of those questions involved the key one: Is this supposed to be real?

Because there is nothing whatever to indicate that the videos aren’t.

What does this mean for AI verification tests, and for companies such as Sentinel AI and Sensity AI, which were already struggling to make good on the promise of AI detection?

AI alterations to video had so far has been the easiest to detect, since successive frames could be scrutinised for inconsistencies. Sora’s video-generation capabilities look set to change that.

A still from a Sora video, created by the AI program in response to the prompt: Golden retriever puppies playing in the snow.
A still from a Sora video, created by the AI program in response to the prompt: Golden retriever puppies playing in the snow.

OpenAI has already announced that it is working to “build tools… such as a detection classifier that can tell when a video was generated by Sora”, with a statement also pledging that “We plan to include C2PA metadata in the future if we deploy the model in an OpenAI product.”

C2PA is the Coalition for Content Provenance and Authenticity (C2PA), made up of companies such as Adobe, Intel, Microsoft, BBC and Sony, who have pledged to work together to create a digital nutrition label (a term first suggested by Adobe last year) that would display information clearly on all AI-generated media. This information will likely include the name of the AI tool used, the date of generation, and details of edits made to an original work.

OpenAI, though not a member of C2PA, had announced even before the Sora launch that it would integrate support for these standards into its algorithms. Meta has pledged to uphold them on Facebook, Instagram and Threads.

And Google announced last week that it is joining C2PA too. It certainly has reason to.

In September, the top Google search result for “tank man” was an image of a man taking a selfie in front of a tank at Beijing’s Tiananmen Square, instead of the iconic 1989 news photograph of a lone man facing down a convoy.

Two months later, an AI-led reimagining of an Edward Hopper painting went viral. In place of the American artist’s iconic oil-on-canvas, Nighthawks — an evocative art work showing four people in a lonely diner, a deserted street outside — was a remastering by X user @soncharm that mimicked the aesthetic, somewhat, but moved everyone outdoors, into a sunny day.

Above, X user @soncharm’s reimagining of Edward Hopper’s Nighthawks, and the original painting. Top right, a man takes a selfie against tanks at Tiananman Square, in an AI-led take on the iconic news photograph from 1989.
Above, X user @soncharm’s reimagining of Edward Hopper’s Nighthawks, and the original painting. Top right, a man takes a selfie against tanks at Tiananman Square, in an AI-led take on the iconic news photograph from 1989.

Vermeer’s girl with a pearl earring has appeared with make-up filters, and tiny lit lanterns in her ears. Van Gogh’s starry night has been turned into AI-generated video.

These reimaginings, mystifying as they are, are at least easy to spot. It is the ones designed to pass for real that are, of course, the real worry. A digital nutrition label would help, but would be hard to enforce, at least at first.

Meanwhile, video-generation platforms in particular are evolving fast. Ones to watch this year, aside from Sora, include Runway, Pika and Google’s Lumiere. On the detection side, new methods are being explored, but they are far from foolproof.

With video, for instance, Sentinel AI is deploying advanced neural networks to analyse facial expressions and blinking, breathing and speech patterns for inconsistencies. With images, platforms such as Sensity AI are training algorithms to look for repetitive textures, irregular colour gradients, and anomalies in pixel distribution.

Audio is particularly difficult, and identifying AI interference in this format has in fact been announced as a core focus area of security solutions company McAfee’s new Project Mockingbird, launched last month.

There is an obvious problem posed by each new set of parameters. Every time AI-generated content is rejected as AI-generated, the reasons listed tell the generative program what to avoid, going forward.

And so detection becomes something of an ouroboros, with AI programs trying to outsmart AI programs, while learning and drawing from each other.

Detecting AI-generated text remains perhaps the toughest challenge. The likelihood of false flagging is high because a person’s style of writing can closely match a style that the AI model was trained on, says technology analyst Kashyap Kompella. “One could also use the same text prompt for three different AI models and then blend the results to create a piece that would be close to impossible to trace,” adds HT’s technology editor Vishal Mathur.

Which brings the issue, full circle, to metadata.

“Strict standards here could potentially be used for digital rights management and to enforce copyright protection,” Kompella says. Admittedly, these would remain easy to evade, at least in the short-term. Several operating systems already offer the option of erasing metadata before sharing content.

It will eventually boil down to raised technology standards, compulsory metadata and stringent penalties for tampering, Kompella says. “User education should be a priority too.”

On that front, academic projects are seeking to pitch in. Detect Fakes, hosted at Northwestern University, was set up by researchers from its Kellogg School of Management, initially at the MIT Media Lab, in 2019. It offers updated lists of signs to look for, in image and videos. These include details of blink rates, skin textures, shadows around eyes and eyebrows, hair transformations, and the physics of lighting.

What’s really worrying is the socio-political impact of failing at this, Kompella says. “Deepfakes heighten uncertainty and mistrust, which means that once people are told that a certain set of visuals contains both real and AI-generated images the likelihood of them disbelieving real images also increases,” he says. “User education can only do so much. In a society where the fact-finding mechanism is broken, people will likely claim that real images that don’t support their views are deepfakes too.”

And that is an ouroboros on an entirely different scale.

Are you a cricket buff? Participate in the HT Cricket Quiz daily and stand a chance to win an iPhone 15 & Boat Smartwatch. Click here to participate now.

Catch your daily dose of Fashion, Health, Festivals, Travel, Relationship, Recipe and all the other Latest Lifestyle News on Hindustan Times Website and APPs.

Continue reading with HT Premium Subscription

Daily E Paper I Premium Articles I Brunch E Magazine I Daily Infographics
freemium
SHARE THIS ARTICLE ON
Share this article
SHARE
Story Saved
Live Score
OPEN APP
Saved Articles
Following
My Reads
Sign out
New Delhi 0C
Tuesday, April 23, 2024
Start 14 Days Free Trial Subscribe Now
Follow Us On