From Computer Use to Stargate

Anshul Kumar   |   15 min read 

Stargate is a significant artificial intelligence (AI) infrastructure initiative announced this week. This joint venture involves major technology companies, including OpenAI, SoftBank, Oracle, and investment firm MGX, with the objective of investing up to $500 billion over the next four years to develop advanced AI infrastructure across the United States: Announcing The Stargate Project | OpenAI

How is this changing the dynamics of organizations? I am unsure. Microsoft definitely has a significant investment of $80 billion each year in Azure with OpenAI models. "I am good for my $80 billion," says Nadella.

Here is also a corporate blog by Microsoft at the same day when Stargate is announced - Microsoft and OpenAI evolve partnership to drive the next phase of AI - The Official Microsoft Blog.

Let us shift the focus slightly and discuss the recent developments in computer use that have occurred this week.

With OpenAI announcing Operator, an agent that can go to the web to perform tasks for you. It can automate various tasks - like filling out forms, booking travel, or even creating memes—by remotely interacting with a web browser much as a person would, via mouse clicks, scrolling, and typing. Let's learn a bit about this new development. 

A demo of Operator using website (Instacart) and adding the ingredients of a recipe to grocery cart is in the video below-

Operator is powered by a new model called Computer-Using Agent (CUA). Combining GPT-4o's vision capabilities with advanced reasoning through reinforcement learning, CUA is trained to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a screen.

Operator can “see” (through screenshots) and “interact” (using all the actions a mouse and keyboard allow) with a browser, enabling it to take action on the web without requiring custom API integrations.

If it encounters challenges or makes mistakes, Operator can leverage its reasoning capabilities to self-correct. When it gets stuck and needs assistance, it simply hands control back to the user, ensuring a smooth and collaborative experience.

What's next

Currently, Operator is available exclusively for Pro users in the US region. OpenAI plans to expand its availability to Plus, Team, and enterprise users in the near future, integrating its capabilities into ChatGPT. 

CUA in the API: OpenAI intends to make the model behind Operator, CUA, available in the API soon, enabling developers to create their own computer- agents.

Enhanced Capabilities: OpenAI seeks to enhance Operator's proficiency in managing extended and intricate workflows.

Here is a quick comparison among Operator, Project Mariner and Claude computer use.
Image
The evolving landscape of computer use AI emphasizes creating intelligent systems that augment human capabilities. Expect groundbreaking advancements in agent-based automation, where AI will increasingly take on complex tasks that were once solely in the domain of humans. The potential for increased efficiency and innovation in this area is vast, and the pace of progress shows no signs of slowing. Stay tuned to this exciting frontier as it continues to redefine what’s possible in the realm of AI and automation.

About the author

useful description of image if informative and not decoration only.

Anshul Kumar

As a Generative AI Technology Evangelist, Anshul Kumar plays a pivotal role in integrating advanced AI technologies into the strategic framework of organizations. The transformative projects led by Anshul demonstrate a dedication to leveraging Generative AI for impactful solutions in insurance and the software development lifecycle, reflecting a deep understanding of AI and ML applications. His work not only fosters innovation but also strengthens customer satisfaction through adaptable, future-ready architectures.