OpenAI is set to release a new AI agent called Operator, which can perform web tasks autonomously. The research preview will launch on January 23, 2025, for subscribers of OpenAI’s $200 per month ChatGPT Pro tier in the united states.
- OpenAI releases AI agent called Operator
- Operator uses browser for web interactions
- Combines vision capabilities with reasoning
- Self-corrects and requests user control
- Collaborates with companies for real-world needs
- Future integration into ChatGPT planned
Operator is designed to enhance user experience by automating tasks on the web. It employs a “Computer-Using Agent” model that combines the vision capabilities of GPT-4o with reinforcement learning for advanced reasoning. This allows Operator to interact with graphical user interfaces (GUIs) by typing, clicking, and scrolling through its own browser.
Key features of Operator include:
- Ability to “see” through screenshots and interact using standard mouse and keyboard actions.
- Self-correcting reasoning capabilities, allowing it to ask users for control when necessary.
- Designed to refuse harmful requests and block disallowed content.
OpenAI is collaborating with various companies such as DoorDash and Uber to ensure Operator meets real-world needs while adhering to established norms. However, the company notes that Operator may struggle with complex interfaces, such as creating slideshows or managing calendars. Future plans include expanding access to Plus, Team, and Enterprise users and integrating these capabilities into ChatGPT.
In summary, OpenAI’s Operator AI agent represents a significant step towards automating web interactions. With its advanced capabilities and planned expansions, it aims to address user needs effectively while maintaining safety and compliance.