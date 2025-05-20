Mountain View, California: If ever you wanted a crisp summarisation of what artificial intelligence (AI) has achieved in a rather short window of time, Sundar Pichai, chief executive officer, Google and Alphabet put it rather eloquently. “More intelligence is available, for everyone, everywhere. And the world is responding, adopting AI faster than ever before…What all this progress means is that we’re in a new phase of the AI platform shift. Where decades of research are now becoming a reality for people, businesses and communities all over the world,” he said. Incoming improvements to the Gemini 2.5 Pro model, add new reasoning capabilities with Deep Think mode.

There are very few certainties in the world. There’s day after night. And there’s Google’s annual I/O developer conference, which sets the ball rolling for the company’s widening portfolio of apps and services. Google is bringing significant upgrades to the Gemini 2.5 models, new generative AI models Veo 3 and Imagen 4, AI filmmaking tool Flow, Gemini’s proactively personal pitch, AI in Search finds more relevance, a new subscription if you’re willing to pay more for Google’s AI services, and then there’s the vision for building a universal AI assistant — something Google is closer to achieving, than many may realise immediately.

HT had detailed big changes already lined up for Android this year, days ahead of the I/O keynote, alongside extensive measures being built to fight spammers and scammers. Little surprise, then, that attention rightly shifts to the AI conversation.

Gemini as a universal AI agent

Google is, of course, not alone in this conversation. AI agents remain a continuing theme, something OpenAI, IBM, Anthropic and Microsoft, more recently, too, have made a case for. Some call them “AI agents” or “Agentic AI”, Google calls this a universal AI agent. Key to this will be AI’s ability to use world knowledge, reasoning, and simulate natural environments, just as a human brain would do.

“Our recent updates to Gemini are critical steps towards unlocking our vision for a universal AI assistant, one that's helpful in your everyday life, that's intelligent and understands the context you're in, and that can plan and take actions on your behalf across any device. This is our ultimate goal for the Gemini app, an AI that's personal, proactive and powerful,” noted Demis Hassabis, CEO of Google DeepMind, in a session of which HT was a part.

Hassabis explained this as an “AI that’s intelligent, understands the

context you are in, and that can plan and take action on your behalf, across any device.” Gemini models would provide the foundation.

This will be a culmination of Project Mariner, which “explores the future of human-agent interaction, starting with browsers”, as well as Project Astra, for video understanding, screen sharing and memory. The vision now includes a system of agents that can complete up to ten different tasks at a time. These tasks can include looking up information, making bookings, buying things, and researching a topic, all in parallel.

Microsoft, at BUILD this week, detailed a native Model Context Protocol (MCP) in Windows and the launch of the Windows AI Foundry as the foundation for a future of AI agents.

Anthropic last year introduced MCP, an open-source standard. It is also called the “USB-C port of AI”. Simplicity and wide-spread support is key, since app developers can use MCP to enable their apps or agents to talk to other apps and services.

“We added native SDK support for Model Context Protocol (MCP) definitions in the Gemini API for easier integration with open-source tools. We’re also exploring ways to deploy MCP servers and other hosted tools making it easier for you to build agentic applications,” said Tulsee Doshi, Senior Director, Product Management at Google.

Model updates, with long-term vision

Google is rolling out significant upgrades for the Gemini 2.5 Flash and Gemini 2.5 Pro models. Incoming improvements to the Gemini 2.5 Pro model, add new reasoning capabilities with Deep Think mode. Its specific focus on complex math and coding tasks, will be relevant for Gemini’s march towards an ‘agentic AI’ vision.

The lighter Gemini 2.5 Flash receives improved reasoning, multimodality, code and long context. For now, the updated 2.5 Flash is available as ‘experimental’ in Google AI Studio for developers, in Vertex AI for enterprises, and the Gemini app for everyone — its final release is pegged for early June.

“Because we're defining the frontier with 2.5 Pro DeepThink, we're taking extra time to conduct more frontier safety evaluations and get further input from safety experts. As part of that, we’re going to make it available to trusted testers via the Gemini API to get their feedback before making it widely available,” explained Koray Kavukcuoglu, chief technology officer of Google DeepMind.

New creative generative AI modes

Google’s newest generative media models are arriving now. The video generation model Veo 3 and the image generation model Imagen 4, find new capabilities. Alongside new updates for the previous generation Veo 2 model, including camera controls to make precise adjustments to elements of a video generation such as camera movements or the zoom, as well as better referencing from images of scenes, characters and objects, for video generation.

“We're also expanding access to Lyria 2, giving musicians more tools to create music. Finally, we’re inviting visual storytellers to try Flow, our new AI filmmaking tool. Using Google DeepMind’s most advanced models, Flow lets you weave cinematic films with more sophisticated control of characters, scenes and styles, to bring your story to life,” said Eli Collins, vice president, Product Management, Google DeepMind.

Veo 3 can now generate videos with audio, such as traffic noises in the background of a city street scene, or even a dialogue between characters, as well as better replication of real-world physics, lip-syncing and better understanding of prompts.

Imagen 4, meanwhile, arrives with the promise of recreating better details.

“Imagen 4 has remarkable clarity in fine details like intricate fabrics, water droplets, and animal fur, and excels in both photorealistic and abstract styles. Imagen 4 can create images in a range of aspect ratios and up to 2k resolution - even better for printing or presentations. It is also significantly better at spelling and typography, making it easier to create your own greeting cards, posters and even comics, explained Collins.

Imagen 4 is now available in the Gemini app, Whisk, Vertex AI and across Slides, Vids, Docs in Workspace. Veo 3’s availability is, for now, limited in the Gemini app to Ultra subscribers in the US, and in Flow.