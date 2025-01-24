OpenAI unveils new AI tool ‘Operator’ for independent web tasks | All you need to know
Operator uses a new model called Computer-Using Agent (CUA), combining GPT-4's vision capabilities with advanced reasoning through reinforcement learning.
OpenAI on Thursday launched ‘Operator’, a new AI tool designed to perform tasks on the web independently. The company explained that the Operator can handle various repetitive browser tasks, such as filling out forms, ordering groceries, and even creating memes.
By using the same interfaces and tools that humans interact with daily, Operator enhances AI’s utility, helping people save time on routine tasks and providing new opportunities for business engagement.
“Today we’re releasing Operator(opens in a new window), an agent that can go to the web to perform tasks for you. Using its own browser, it can look at a webpage and interact with it by typing, clicking, and scrolling. It is currently a research preview, meaning it has limitations and will evolve based on user feedback. Operator is one of our first agents, which are AIs capable of doing work for you independently—you give it a task and it will execute it,” OpenAI said on Thursday.
Currently. Operator is available to Pro users in the US via operator.chatgpt.com. This research preview allows OpenAI to gather insights from users and the wider ecosystem to refine the tool. The company plans to expand access to Plus, Team, and Enterprise users, and eventually integrate these features into ChatGPT.
All you need to know about Operator
- Operator is powered by a new model called Computer-Using Agent (CUA), which combines GPT-4's vision capabilities with advanced reasoning through reinforcement learning. It's designed to interact with graphical user interfaces (GUIs), like buttons, menus, and text fields that appear on a screen.
- Operator can "see" through screenshots and "interact" using actions like a mouse and keyboard, enabling it to perform web tasks without needing custom API integrations.
- If Operator faces challenges or makes mistakes, it uses its reasoning abilities to self-correct. If it gets stuck and needs assistance, it hands control back to the user, ensuring a smooth and collaborative experience.
- While CUA is still in early stages and has some limitations, it has achieved state-of-the-art results in WebArena and WebVoyager, two significant browser benchmarks. Additional details about evaluations and the research behind Operator are available in the research blog post.
- To get started, users simply describe the task they want done, and Operator handles the rest. Users can take over control of the remote browser at any point, and Operator will ask the user to take over for tasks involving logins, payment details, or CAPTCHAs.
