Summer time Yue may go on security and alignment on Meta’s superintelligence crew, however even she admits she isn’t resistant to overconfidence on the subject of autonomous AI brokers.
In a submit on X Monday, Yue described how her OpenClaw autonomous AI brokers—constructed to run domestically on a Mac mini laptop—deleted her complete inbox, ignoring directions to pause and ask for affirmation first.
“I had to RUN to my Mac Mini like I was defusing a bomb,” she mentioned. It was, she added, a “rookie mistake.” The workflow had been working in a take a look at inbox she used to securely trial the agent for weeks, she defined, however in the true inbox the agent misplaced her authentic instruction.
Yue’s expertise stands in stark distinction to viral posts similar to The Lobster Revolution: Why 24/7 AI Brokers Simply Modified Every thing, through which Peter Diamandis claims always-on AI is much extra frictionless.
“Let me tell you what it feels like to use this,” Diamandis wrote. “You get up within the morning and your agent—mine is called Skippy, cheerfully sarcastic and absurdly succesful—has achieved eight hours of labor whilst you slept. It learn a thousand pages of markdown. It organized your recordsdata. It drafted three undertaking plans. It booked your journey. It researched that query you had at 11 PM and forgot about.
“When my Mac mini went offline for six hours, I felt withdrawal,” he added. “Like my best friend disappeared.”
Collectively, these dueling accounts of the facility of AI brokers seize the stress on the coronary heart of right this moment’s push towards “always-on” AI. As instruments like OpenClaw and Claude Code make it technically potential for brokers to run for lengthy intervals, pleasure is rising across the thought of AI that works whilst you sleep. However in apply, early customers say that autonomy stays fragile, unpredictable, and labor-intensive to handle. Fairly than changing human work, right this moment’s brokers usually require fixed monitoring, guardrails, and intervention, particularly when the stakes rise past low-risk experiments.
AI brokers work finest when duties are easy and low-stakes
Shyamal Anadkat, who beforehand labored as an utilized AI engineer at OpenAI, mentioned most of right this moment’s profitable brokers nonetheless require frequent human check-ins or are restricted to tightly bounded, well-defined duties—although he emphasised that it will change as measurement and analysis methods enhance.
“A system that’s 95% accurate on individual steps becomes chaotic over a 20-step autonomous workflow,” Anadkat mentioned. “Long-horizon planning is still weak.” Consequently, he defined, brokers might carry out effectively on brief activity chains however are likely to disintegrate when requested to handle advanced, multiday tasks. Reminiscence is one other main limitation: “In many agents, memory is either nonexistent or fragile. You need systems that can maintain a coherent model of your work context, priorities, and constraints.”
That doesn’t imply the promise of AI brokers is all smoke and mirrors, in keeping with Yoav Shoham, a former principal scientist at Google, a professor emeritus at Stanford, and cofounder of AI21 Labs. Nevertheless it does imply there’s the hazard of individuals getting forward of themselves. Immediately’s AI brokers, he defined, work finest when the duty is low-risk, loosely outlined, and low cost to get mistaken.
“Developers like toys, and you have this toy that can do wonderful things,” he advised Fortune. “As long as what they’re doing is fairly simple and fairly low-stakes with high tolerance for error, that’s fine.” For instance, in the event you needed your agent to learn 10,000 web sites and do one thing fascinating with the outcomes to present you tidbits of data in a single day that could possibly be helpful.
However for mission-critical enterprise workflows, the bar is far increased. Corporations want techniques which are verifiable, repeatable, and cost-effective—necessities that shortly erode the set-it-and-forget-it promise of totally autonomous, always-on brokers. In extremely structured domains like coding or math, deeper automation is already potential. However for many real-world enterprise processes, Shoham says, the work required to make brokers dependable usually outweighs the profit.
Bret Greenstein, chief AI officer at consulting agency West Monroe, identified that instruments like OpenClaw really feel like a tipping level just like what occurred with generative AI when ChatGPT launched in 2022—for the primary time, it has made the concept of AI brokers accessible. Nonetheless, it’s not a 24/7 “magic solution.”
Capability to delegate to an AI agent feels highly effective
Nonetheless, there’s little doubt that the flexibility to delegate real-world duties to an AI agent is deeply compelling for customers, Greenstein emphasised. He pointed to his personal expertise handing an AI agent the mundane activity of getting his garments picked as much as be dry-cleaned—and watching it quietly full the job finish to finish.
“OpenClaw is set up so it shouldn’t feel safe for most people,” Greenstein mentioned. “It doesn’t feel mature enough to be a trusted part of our lives yet.” For AI to be welcomed into on a regular basis life or enterprise operations, he added, it has to earn belief over time—a lot the best way belief is established socially.
Even so, demand is already evident. Greenstein pointed to meetups and early business gatherings devoted to OpenClaw, a fast emergence he described as uncommon for such a younger software. “It shows the hunger people have for AI that’s actually useful,” he mentioned—techniques that transfer past answering questions and begin taking motion.
Aaron Levie, CEO of cloud-based content material administration and collaboration firm Field, known as what is going on now with AI brokers “little glimmers” of what may occur sooner or later.
“Some glimmers end up not manifesting, some glimmers just become the standard,” he defined, pointing to 2 years in the past when AI firm Cognition launched an early agent known as Devin that might combine with Slack for activity delegation, bug fixes, knowledge evaluation, and code assessment. On the time, it was nonetheless seen as futuristic, however right this moment, “no one is confused that this is a standard practice,” he mentioned. “You can just Slack Claude Code to go work on stuff—what seemed like a totally crazy idea is now basically the standard of any modern engineering team.”
However whereas AI brokers have gotten excellent at automating particular, discrete duties, they continue to be poor at dealing with the broader, context-heavy work that makes up most jobs, Levie emphasised. AI brokers might totally automate a handful of duties, however wrestle with the remainder—together with navigating relationships and collaborating in conferences.
“When you hear an AI lab say we’re going to automate all knowledge work in 24 months, that’s usually a very narrow definition of jobs,” he mentioned. “The definition of what an agent can do is not the same definition of what the job is that gets hired in the economy.”
The belief issue issues for when issues can go mistaken
Avinash Vootkuri, a employees knowledge scientist at a prime Fortune 500 retailer, mentioned that almost all enterprise AI brokers “absolutely require a babysitter” and, for now, can work solely in enterprise settings with tightly bounded autonomy and intensive guardrails. “The stakes are massive,” he defined.
For instance, he described constructing an agentic system for enterprise cybersecurity the place AI brokers don’t merely set off alerts and await human assessment however actively examine them. As a substitute of flooding analysts with 1000’s of warnings, the brokers collect proof in actual time—querying threat-intelligence databases, analyzing behavioral patterns, and filtering out false positives—earlier than deciding whether or not a scenario warrants escalation.
The system depends on tightly bounded autonomy and intensive guardrails, decreasing human workload with out eradicating oversight.
In cybersecurity, he defined, if the agent will get it mistaken, the implications are rapid and extreme. “The AI either blocks legitimate customers (causing massive revenue loss) or it lets a sophisticated threat actor into the network,” he mentioned. “It absolutely matters if things go wrong.”
In accordance with Breeanna Whitehead, who runs an AI operations consultancy the place she builds AI-powered techniques for executives and founders, the business is in a “trust calibration phase.”
AI brokers can do greater than most individuals allow them to, however lower than the hype suggests.
“The real skill isn’t building the agent—it’s designing the handoff,” she defined. “Most people either over-trust agents and end up cleaning up messes, or they micromanage every output and wonder why AI feels like more work instead of less.” The concept, she mentioned, is to design clear handoff factors, the place one thing could be totally delegated, one other factor may get a fast assessment, whereas one other activity stays only for people to do.
For now, sleep could also be elusive when working with AI brokers
For now, working with AI brokers might have much less to do with sleeping whereas they work than with staying half-awake whereas they do. Instruments like OpenClaw can run for hours at a time, however for a lot of early customers, that autonomy comes with a brand new type of vigilance—checking logs, reviewing outputs, and stepping in earlier than issues go mistaken.
That dynamic was captured in a latest viral submit titled Token Nervousness, through which investor Nikunj Kothari described a buddy leaving a celebration early—not as a result of he was drained, however as a result of he needed to get again to his brokers. “Nobody questions it anymore,” Kothari wrote. “Half the room is thinking the same thing. The other half are probably checking the progress of their agents. At a party.”
The dream of AI that works whilst you sleep could also be actual. However for now, it’s nonetheless protecting lots of people awake.
