AI Influencer Matt Shumer penned a viral weblog on X about AI’s potential to disrupt, and in the end automate, nearly all data work that has racked up greater than 55 million views up to now 24 hours.
Shumer’s 5,000-word essay actually hit a nerve. Written in a breathless tone, the weblog is constructed as a warning to family and friends about how their jobs are about to be radically upended. (Fortune additionally ran an tailored model of Shumer’s publish as a commentary piece.)
“On February 5th, two major AI labs released new models on the same day: GPT-5.3 Codex from OpenAI, and Opus 4.6 from Anthropic,” he writes. “And something clicked. Not like a light switch…more like the moment you realize the water has been rising around you and is now at your chest.”
Shumer says coders are the canary within the coal mine for each different career. “The experience that tech workers have had over the past year, of watching AI go from ‘helpful tool’ to ‘does my job better than I do,’ is the experience everyone else is about to have,” he wries. “Law, finance, medicine, accounting, consulting, writing, design, analysis, customer service. Not in ten years. The people building these systems say one to five years. Some say less. And given what I’ve seen in just the last couple of months, I think ‘less’ is more likely.”
However regardless of its viral nature, Shumer’s assertion that what’s occurred with coding is a prequel for what is going to occur in different fields—and, critically, that this may occur inside just some years—appears mistaken to me. And I write this as somebody who wrote a guide (Mastering AI: A Survival Information to Our Superpowered Future) that predicted AI would massively remodel data work by 2029, one thing which I nonetheless consider. I simply don’t suppose the complete automation of processes that we’re beginning to see with coding is coming to different fields as shortly as Shumer contends. He could also be directionally proper however the dire tone of his missive strikes me as fear-mongering, and primarily based largely on defective assumptions.
Not all data work is like software program growth
Shumer says that the rationale code has been the realm the place autonomous agentic capabilities have had the most important influence thus far is the AI firms have devoted a lot consideration to it. They’ve executed so, Shumer says, as a result of these frontier mannequin firms see autonomous software program growth as key to their very own companies, enabling AI fashions to assist construct the subsequent era of AI fashions. On this, the AI firms’ wager appears to be paying off: the tempo at which they’re churning out higher fashions has picked up markedly up to now yr. And each OpenAI and Anthropic have stated that the code behind their most up-to-date AI fashions was largely written by AI itself.
Shumer says that whereas coding is a number one indicator, the identical efficiency good points seen in coding arrive in different domains, though generally a few yr later than the uplift in coding. (Shumer doesn’t provide a cogent explaination for why this lag may exist though he implies it’s just because the AI mannequin firms optimize for coding first after which finally get round to enhancing the fashions in different areas.)
However what Shumer doesn’t say is that one more reason that progress in automating software program growth has been extra speedy than in different areas: coding has some quantitative metrics of high quality that merely don’t exist in different domains. In programming, if the code is admittedly unhealthy it merely received’t compile in any respect. Insufficient code can also fail numerous unit assessments that the AI coding agent can carry out. (Shumer doesn’t point out that at present’s coding brokers generally lie about conducting unit assessments—which is considered one of many causes automated software program growth isn’t foolproof.)
Many builders say the code that AI writes is commonly respectable sufficient to cross these primary assessments however continues to be not superb: that it’s inefficient, inelegant, and most necessary, insecure, opening a corporation that makes use of it to cybersecurity dangers. However in coding there are nonetheless some methods to construct autonomous AI brokers to handle a few of these points. The mannequin can spin up sub-agents that examine the code it has written for cybersecurity vulenerabilites or critique the code on how environment friendly it’s. As a result of software program code might be examined in digital environments, there are many methods to automate the method of reinforcement studying–the place an agent learns by expertise to maximise some reward, comparable to factors in a recreation–that AI firms use to form the conduct of AI fashions after their preliminary coaching. Which means the refinement of coding brokers might be executed in an automatic means at scale.
Assessing high quality in lots of different domains of data work is way tougher. There aren’t any compilers for regulation, no unit assessments for a medical therapy plan, no definitive metric for a way good a advertising and marketing marketing campaign is earlier than it’s examined on shoppers. It’s a lot tougher in different domains to collect ample quantities of knowledge from skilled specialists about what “good” appears to be like like. AI firms understand they’ve an issue gathering this type of information. It’s why they’re now paying hundreds of thousands to firms like Mercor, which in flip are shelling out massive bucks to recruit accountants, finance professionals, attorneys and docs to assist present suggestions on AI outputs so AI firms can practice their fashions higher.
It’s true that there are benchmarks that present the latest AI fashions making speedy progress on skilled duties outdoors of coding. Probably the greatest of those is OpenAI’s GDPVal benchmark. It reveals that frontier fashions can obtain parity with human specialists throughout a variety {of professional} duties, from complicated authorized work to manufacturing to healthcare. Up to now, the outcomes aren’t in for the fashions OpenAI and Anthropic launched final week. However for his or her predecessors, Claude Opus 4.5 and GPT-5.2, the fashions obtain parity with human specialists throughout a various vary of duties, and beat human specialists in lots of domains.
So wouldn’t this counsel that Shumer is right? Properly, not so quick. It seems that in lots of professions what “good” appears to be like like is extremely subjective. Human specialists solely agreed with each other on their evaluation of the AI outputs about 71% of the time. The automated grading system utilized by OpenAI for GDPVal has much more variance, agreeing on assessments solely 66% of the time. So these headlines numbers about how good AI is at skilled duties might have a large margin of error.
Enterprises want reliability, governance, and auditability
This variance is likely one of the issues that holds enterprises again from deploying totally automated workflows. It’s not simply the output of the AI mannequin itself is likely to be defective. It’s that, because the GDPVal benchmark suggests, the equal of an automatic unit take a look at in {many professional} contexts may produce an misguided outcome a 3rd of the time. Most firms can’t tolerate the chance that poor high quality work being shipped in a 3rd of instances. The dangers are just too nice. Typically, the danger is likely to be merely reputational. In others, it might imply speedy misplaced income. However in {many professional} duties, the implications of a mistaken resolution might be much more extreme: skilled sanction, lawsuits, the lack of licenses, the lack of insurance coverage cowl, and, even, the danger of phyiscal hurt and demise—generally to massive numbers of individuals.
What’s extra, making an attempt to maintain a human-in-the-loop to overview automated outputs is problematic. At present’s AI fashions are genuinely getting higher. Hallucinations happen much less continuously. However that solely makes the issue worse. As AI-generated errors turn out to be much less frequent, human reviewers turn out to be complacent. AI errors turn out to be tougher to identify. AI is great at being confidently mistaken and at presenting outcomes which might be in impeccable in kind however lack substance. That bypasses a number of the proxy standards people use to calibrate their degree of vigilance. AI fashions typically fail in methods which might be alien to the methods human fail on the similar duties, which makes guarding towards AI-generated errors extra of a problem.
For all these causes, till the equal of software program growth’s automated unit assessments are developed for extra skilled fields, deploying automated AI workflows in lots of data work contexts will probably be too dangerous for many enterprises. AI will stay an assistant or copilot to human data employees in lots of instances, fairly than totally automating their work.
There are additionally different causes that the form of automation software program builders have noticed are unlikely for different classes of data work. In lots of instances, enterprises can’t give AI brokers entry to the sorts of instruments and information methods they should carry out automated workflows. It’s notable that essentially the most enthusiastic boosters of AI automation thus far have been builders who work both by themselves or for AI-native startups. These software program coders are sometimes unencumbered by legacy methods and tech debt, and sometimes don’t have loads of governance and compliance methods to navigate.
Massive organizations typically presently lack methods to hyperlink information sources and software program instruments collectively. In different instances, considerations about safety dangers and governance imply massive enterprises, particularly in regulated sectors comparable to banking, finance, regulation, and healthcare, are unwilling to automate with out ironclad ensures that the outcomes will probably be dependable and that there’s a course of for monitoring, governing, and auditing the outcomes. The methods for doing this are presently primitive. Till they turn out to be far more mature and strong, don’t anticipate enterprises to completely automate the manufacturing of enterprise essential or regulated outputs.
Critics say Shumer shouldn’t be trustworthy about LLM failings
I’m not the one one who discovered Shumer’s evaluation defective. Gary Marcus, the emeritus professor of cognitive science at New York College who has turn out to be one of many main skeptics of at present’s massive language fashions, informed me Shumer’s X publish was “weaponized hype.” And he pointed to issues with even Shumer’s arguments about automated software program growth.
“He gives no actual data to support this claim that the latest coding systems can write whole complex apps without making errors,” Marcus stated.
He factors out that Shumer mischaracterizes a well known benchmark from the AI analysis group METR that tries to measure AI fashions’ autonomous coding capabilities that implies AI’s skills are doubling each seven months. Marcus notes that Shumer fails to say that the benchmark has two thresholds for accuracy, 50% and 80%. However most companies aren’t interested by a system that fails half of the time, and even one which fails one out of each 5 makes an attempt.
“No AI system can reliably do every five-hour long task humans can do without error, or even close, but you wouldn’t know that reading Shumer’s blog, which largely ignores all the hallucination and boneheaded errors that are so common in every day experience,” Marcus says.
He additionally famous that Shumer didn’t cite latest analysis from Caltech and Stanford that chronicled a variety of reasoning errors in superior AI fashions. And he identified that Shumer has been caught beforehand making exaggerated claims in regards to the skills of an AI mannequin he skilled. “He likes to sell big. That doesn’t mean we should take him seriously,” Marcus stated.
Different critics of Shumer’s weblog level out that his financial evaluation is ahistorical. Each different technological revolution has, within the long-run, created extra jobs than it eradicated. Connor Boyack, president of the Libertas Institute, a coverage suppose tank in Utah, wrote a complete counter-blog publish making this argument.
So, sure, AI could also be poised to remodel work. However the form of full job automation that some software program builders have began to watch is feasible for some duties? For many data employees, particularly these embedded in massive organizations, that’s going to take for much longer than Shumer implies.
