close_game
close_game

OpenAI’s ChatGPT looks to stylise reality

Mar 27, 2025 04:19 PM IST

This is not the first time a generative artificial intelligence chatbot has proved adept at creating images based on prompts.

A tool that proved so popular, OpenAI had to revoke access to free users within a day of release. The company’s image generation addition, released earlier this week (which CEO Sam Altman describes as “incredible technology”), is a significant step forward for ChatGPT underlined by the GPT-4o model, as it competes with growing competition.

Representational image. (AFP)
Representational image. (AFP)

This is not the first time a generative artificial intelligence (AI) chatbot has proved adept at creating images based on prompts. That’s something xAI’s Grok and Google Gemini do too. But this may boost realism like never before, by changing the method of training.

“GPT-4o image generation excels at accurately rendering text, precisely following prompts, and leveraging 4o’s inherent knowledge base and chat context—including transforming uploaded images or using them as visual inspiration,” OpenAI says, detailing the update.

Its uniqueness resides in granularity, including styles such as “Studio Ghibli”, which are proving incredibly popular on social media. GPT-4o can handle prompts that demand up to 20 different objects to be detailed in a generation. OpenAI’s claim is “other systems struggle with around 5-8 objects.”

The reasoning behind this improvement in understanding is a group of humans who painstakingly labelled training data. That, OpenAI hopes, would boost accuracy and understanding. “The tighter binding of objects to their traits and relations allows for better control,” they add.

Studio Ghibli, a Japanese animation studio founded in 1985 by Hayao Miyazaki, Isao Takahata, and Toshio Suzuki, is known for its hand-drawn animations, and distinctive for soft color palettes, detailed natural settings, and lush backgrounds. Some of their works include the Spirited Away, My Neighbor Totoro, and Howl’s Moving Castle films.

The way this work is, a user uploads an image, or describe a scene, using text prompts. An example of this could be “Visualise this photo into a Studio Ghibli-style anime illustration with soft textures, warm colours, and whimsical details.” In a few seconds, GPT-4o generates an image.

The model’s ability to mimic Ghibli’s aesthetic elements stems from its training on massive data sets of images and text, though OpenAI doesn’t disclose specifics. ChatGPT has 400 million active users, of which 2 million are paying enterprise subscribers. The company hasn’t shared latest numbers for paying individual subscribers.

The trends were viral, so much so quick commerce platforms Zomato and Swiggy too joined in, with posts of ‘Ghibli-fied’ images of delivery partners and products.

This isn’t the first time an ability to transform images into different styles, has caught the attention of social media users.

In 2016, the Prisma app, quickly gained popularity, using neural networks and AI to give photos different stylisations of famous artists including Pablo Picasso and Norwegian painter Edvard Munch. Initially launched on iOS for Apple iPhones, it was downloaded 7.5 million times in the debut week. The Android app released later clocked 1.7 million downloads on day one.

Those were early days, much before AI was in vogue.

OpenAI says the new image generation models have been trained on a joint distribution of online images and text, which enabled them to learn not just how images relate to language, but how they relate to each other.

‘Reinforcement learning’ method that uses human feedback for improvement, alongside “aggressive post-training” that is believed to give AI models better visual fluency with generations, underline improvements claimed for consistency and contextual awareness.

The company is aware the level of realism may lead to offensive creations too. All image generations using ChatGPT will adhere to C2PA (Coalition for Content Provenance and Authenticity) metadata guidelines, confirms Jackie Shannon, who is ChatGPT Multimodal Product Lead. This will allow viewers to distinguish between generations and real images,

They must actively monitor for prompts that may intend to generate images of violence, child sexual abuse materials and sexual deepfakes, for instance.

“What we’d like to aim for is that the tool doesn’t create offensive stuff unless you want it to, in which case within reason it does,” Altman says.

“As we talk about in our model spec, we think putting this intellectual freedom and control in the hands of users is the right thing to do, but we will observe how it goes and listen to society,” he adds.

OpenAI, taking cognisance of its earlier licensing and consent troubles with artists and creators, says there are policies in place for visual generations within ChatGPT.

“We’re respecting of the artists’ rights in terms of how we do the output, and we have policies in place that prevent us from generating images that directly mimic any living artists’ work,” says Brad Lightcap, COO of OpenAI.

Another reason the latest ChatGPT update is a big deal is because it represents a significant transition from text-only or externally dependent image generation tools (such as previous ChatGPT versions with DALL-E), to fully integrated multimodal systems based on models such as the GPT-4o.

In fact, it is also representative of significant progress AI has made in the past few months, including Chinese company DeepSeek’s supposedly frugal approach to building AI models, and the rise of Agentic AI tools that want to replace functions within an enterprise.

Google’s Imagen 3 model underlines the Gemini chatbot’s image generation capabilities across Gemini on the web and the smartphone apps. Some of the image generation functionalities are available for free, but the more detailed options are part of the AI Premium plan ( 1,950 per month).

Upon an image generation, Gemini prompts users to try adding more details in a prompt.

xAI’s Grok 3, which rightly got the spotlight following an impressive update to its chatbot capabilities a few weeks ago, too has had image generation since earlier 2025 — and its available free for all Grok users. There can of course be subjectivity about generation detailing and style preferences.

OpenAI’s intent was to make it available across subscription tiers, but Altman confirms that the “rollout to our free tier is unfortunately going to be delayed for a while.” For now, ChatGPT Plus ( 1,999 per month) and ChatGPT Pro ( 19,900 per month) subscribers will continue to have access to the new native image-generation capabilities.

Other AI companies will have to catch up, and fast. Such as Claude by Anthropic, which can process images but don’t yet generate them natively without external tools. Anthropic has however suggested that future updates would change that. Microsoft Copilot also generates images, but isn’t a a fully independent system and relies on OpenAI’s DALL-E 3 model.

Apple too has released the Image Playground as part of their Apple Intelligence suite, work on which is ongoing, with regular updates. This is available on iPhone, iPad and Mac, with close integration with Apple’s own apps including Messages and Notes.

Stay updated with the latest Business News on Petrol Price, Gold Rate, Income Tax Calculator along with Silver Rates, Diesel Prices and Stock Market Live Updates on Hindustan Times.
SHARE THIS ARTICLE ON
SHARE
Story Saved
Live Score
Saved Articles
Following
My Reads
Sign out
New Delhi 0C
Wednesday, April 23, 2025
Start 14 Days Free Trial Subscribe Now
Follow Us On