If a workforce of human engineers constructed an online browser that solely half-worked, it wouldn’t get individuals speaking. However when Michael Truell, CEO of coding startup Cursor, posted on X final week {that a} swarm of AI brokers had constructed a browser that, he wrote, “kind of works”—whereas operating uninterrupted for every week with none human intervention—it went viral throughout the tech world, with over 6 million views.
Why the excitement? Two huge causes: For one factor, AI’s consideration span has traditionally been brief. Within the early days of ChatGPT, fashions may keep on job for just a few seconds. That horizon stretched to minutes for higher fashions, then to hours. The Cursor challenge claims to be one of many first instances an AI system has sustained a posh, open-ended software program challenge for a whole week with out human steering.
As well as, single AI brokers are restricted to centered, small duties. However getting lots of of brokers to coordinate on an enormous challenge has nonetheless appeared futuristic. That’s why Cursor needed to see how far they may push autonomous coding—on a challenge that might take months for a human workforce—by having an “orchestra” of AI brokers working as a workforce. May an AI system be persistent sufficient, and work collectively nicely sufficient, to discover code, break work into elements, debug itself, and preserve shifting ahead for days with out drifting away from the duty at hand?
An AI agent ‘orchestra’
The researchers discovered that the reply was principally sure. Cursor’s experiment orchestrated lots of of brokers into one thing like a software program workforce. It had “planners,” “workers,” and “judges” coordinating throughout thousands and thousands of strains of code. This hints at what each Cursor and OpenAI say is a close to future through which AI doesn’t simply help staff, however takes on complete initiatives. That may essentially reshape how advanced work will get accomplished—first in software program growth, however then in different professions.
There have been AI swarm experiments for a few years now. However as we speak, Cursor says, fashions are smarter and might keep coherent for for much longer. The fashions could be run at a far bigger scale, with a customized layer that orchestrates lots of of brokers and retains them from descending into chaos.
Jonas Nelle, an engineer at Cursor engaged on long-running AI brokers, informed Fortune that as AI fashions preserve getting higher, engineers and researchers have to revisit their assumptions each few months about what the AI fashions can do. Whereas he admitted he “wouldn’t download it and delete Chrome today,” the browser challenge was “certainly better than anything models previously would have been able to do.”
These long-running brokers are an necessary frontier, added Invoice Chen, an OpenAI engineer who stress-tests and evaluates the real-world conduct of the corporate’s fashions. The size of a job, and the truth that an AI system can accomplish the duty autonomously and coherently is a “very good indicator of how intelligent and how general a system is,” he stated. The Cursor challenge, which was powered by OpenAI’s GPT-5.2, is “a direct result of us really continuously pushing forward the boundaries of model capabilities.” Sooner or later, he stated, there shall be even longer horizon checks.
AI agent swarms usually are not prepared for enterprise use
Nonetheless, these usually are not production-ready programs. In addition to being buggy and incomplete, a challenge operating swarms of brokers for days or even weeks is dear. Whereas costs have fallen steeply over the previous yr, long-running jobs with lots of of AI brokers can nonetheless rack up prices.
There are additionally safety points. An autonomous system raises worries about vulnerabilities, knowledge leaks, and far more, and requires many new layers of management and auditability.
However Chen stated he foresees a close to future the place one thing like this may very well be prepared “for broad consumption and at a not prohibitive cost. Progress has been continuous so far, he explained, and there have been important unlocks every step of the way. For now, he said, the excitement is driven by the fact that this is a real, practical example of model capability, “versus how this model performs on academic and public evaluations and benchmarks.”
The shift has shocked even longtime AI observers. In a current submit, unbiased researcher Simon Willison predicted that by 2029, somebody would construct a full internet browser largely utilizing AI—and that it wouldn’t even be shocking. “Rolling a new web browser is one of the most complicated software projects I can imagine,” he wrote. Cursor could have accelerated that timeline. “I may have been off by three years,” Willison stated. “I have to admit I’m very surprised to see something this capable emerge so quickly.”
This speaks to what OpenAI and others have talked about as a “capabilities overhang”—the concept essentially the most refined AI fashions can do far more than what’s publicly deployed, however the best mixture of instruments, product design, and drops in price can all of a sudden make them usable at scale. So whereas instruments just like the Cursor browser aren’t fairly prepared for primetime, the trajectory is obvious.
