Is Anthropic 'nerfing' Claude? Customers more and more report efficiency degradation as leaders push again

A rising variety of builders and AI energy customers are taking to social media to accuse Anthropic of degrading the efficiency of Claude Opus 4.6 and Claude Code — deliberately or as an consequence of compute limits — arguing that the firm’s flagship coding mannequin feels much less succesful, much less dependable and extra wasteful with tokens than it did simply weeks in the past.

The complaints have unfold shortly on Github, X and Reddit over the previous a number of weeks, with a number of high-reach posts alleging that Claude has turn out to be worse at sustained reasoning, extra doubtless to abandon duties halfway by way of, and extra inclined to hallucinations or contradictions.

Some customers have framed the situation as “AI shrinkflation” — the concept that prospects are paying the similar value for a weaker product.

Others have gone additional, suggesting Anthropic could also be throttling or in any other case tuning Claude downward in periods of heavy demand.

These claims stay unproven, and Anthropic staff have publicly denied that the firm degrades fashions to handle capability. At the similar time, Anthropic has acknowledged actual modifications to utilization limits and reasoning defaults in current weeks, which has made the broader debate extra flamable.

VentureBeat has reached out to Anthropic for additional clarification on the current accusations, together with whether or not any current modifications to reasoning defaults, context dealing with, throttling habits, inference parameters or benchmark methodology may assist clarify the spike in complaints.

We’ve additionally requested how Anthropic explains the current benchmark-related claims and whether or not it plans to publish further knowledge that might reassure prospects. As of publication time, we are awaiting a response.

Viral consumer complaints, together with from an AMD Senior Director, argue Claude has turn out to be much less succesful

One in all the most detailed public complaints originated as a GitHub issue filed by Stella Laurenzo on April 2, 2026, whose LinkedIn profile identifies her as Senior Director in AMD’s AI group.

In that submit, Laurenzo wrote that Claude Code had regressed to the level that it may not be trusted for advanced engineering work, then backed that declare with a sprawling evaluation of 6,852 Claude Code session recordsdata, 17,871 pondering blocks and 234,760 device calls.

The grievance argued that, beginning in February, Claude’s estimated reasoning depth fell sharply whereas indicators of poorer efficiency rose alongside it, together with extra untimely stopping, extra “easiest repair” habits, extra reasoning loops, and a measurable shift from research-first habits to edit-first habits.

The submit’s broader level was that for superior engineering workflows, prolonged reasoning is not a luxurious however a part of what makes the mannequin usable in the first place.

That GitHub thread then escaped into the broader social media dialog, with X customers together with @Hesamation, who posted screenshots of Laurenzo’s GitHub submit to X on April 11, turning it into an much more viral speaking level.

That amplification mattered as a result of it gave the wider “Claude is getting worse” narrative one thing extra concrete than anecdotal frustration: a protracted, data-heavy submit from a senior AI chief at a significant chip firm arguing that the regression was seen in logs, tool-use patterns and consumer corrections, not simply intestine feeling.

Anthropic’s public response targeted on separating perceived modifications from precise mannequin degradation. In a pinned follow-up on the same GitHub issue posted per week in the past, Claude Code lead Boris Cherny thanked Laurenzo for the care and depth of the evaluation however disputed its essential conclusion.

Cherny stated the “redact-thinking-2026-02-12” header cited in the grievance is a UI-only change that hides pondering from the interface and reduces latency, however “does not impression pondering itself,” “pondering budgets,” or how prolonged reasoning works below the hood.

He additionally stated two different product modifications doubtless affected what customers have been seeing: Opus 4.6’s transfer to adaptive pondering by default on Feb. 9, and a March 3 shift to medium effort, or effort stage 85, as the default for Opus 4.6, which he stated Anthropic seen as the greatest steadiness throughout intelligence, latency and price for many customers.

Cherny added that customers who need extra prolonged reasoning can manually swap effort increased by typing /effort excessive in Claude Code terminal classes.

That trade will get at the core of the controversy. Critics like Laurenzo argue that Claude’s habits in demanding coding workflows has plainly worsened and level to logs and utilization patterns as proof.

Anthropic, in contrast, is not saying nothing modified. It is saying the greatest current modifications have been product and interface decisions that have an effect on what customers see and the way a lot effort the system expends by default, not a secret downgrade of the underlying mannequin. That distinction could also be technically vital, however for energy customers who really feel the product is delivering worse outcomes, it is not essentially a satisfying one.

Exterior protection from TechRadar and PC Gamer additional amplified Laurenzo’s submit and bigger wave of settlement from some energy customers.

One other viral post on X from developer Om Patel on April 7 made the similar argument in much more direct phrases, claiming that somebody had “truly measured” how a lot “dumber” Claude had gotten and summarizing the end result as a 67% drop.

That submit helped popularize the “AI shrinkflation” label and pushed the controversy past hard-core Claude Code customers into the broader AI discourse on X.

These claims have resonated as a result of they map intently onto what many annoyed customers say they are seeing in follow: extra unfinished duties, extra backtracking, extra token burn and a stronger sense that Claude is much less keen to motive deeply by way of sophisticated coding jobs than it was earlier this 12 months.

Benchmark posts turned anecdotal frustration right into a public controversy

The loudest benchmark-based declare got here from BridgeMind, which runs the BridgeBench hallucination benchmark. On April 12, the account posted that Claude Opus 4.6 had fallen from 83.3% accuracy and a No. 2 rating in an earlier end result to 68.3% accuracy and No. 10 in a brand new retest, calling that proof that “Claude Opus 4.6 is nerfed.”

That submit unfold broadly and have become one in every of the essential anchors for the broader public case that Anthropic had degraded the mannequin.

Different customers additionally circulated benchmark-related or test-based posts suggesting that Opus 4.6 was underperforming versus Opus 4.5 in sensible coding duties.

Nonetheless different posts pointed to TerminalBench-related outcomes as supposed proof that the mannequin’s habits had modified in sure harnesses or product contexts.

The impact was cumulative: benchmark screenshots, side-by-side exams and anecdotal frustration all started reinforcing each other in public.

That issues as a result of benchmark claims have a tendency to journey farther than extra subjective complaints. A developer saying a mannequin “feels worse” is one factor. A screenshot exhibiting a rating drop from No. 2 to No. 10, or a dramatic share swing in accuracy, offers the look of laborious proof, even when the underlying comparability could also be extra sophisticated.

Critics of the benchmark claims say the proof is weaker than it appears

An important rebuttal to the BridgeBench declare did not come from Anthropic. It got here from Paul Calcraft, an outside software and AI researcher on X, who argued that the viral comparability was deceptive as a result of the earlier Opus 4.6 end result was primarily based on solely six duties whereas the later one was primarily based on 30.

In his phrases, it was a “DIFFERENT BENCHMARK.” He additionally stated that on the six duties the two runs shared in widespread, Claude’s rating moved solely modestly, from 87.6% beforehand to 85.4% in the later run, and that the greater swing appeared to come principally from a single fabrication end result with out repeats. He characterised that as one thing that might simply fall inside atypical statistical noise.

That outdoors rebuttal issues as a result of it undercuts one in every of the cleanest and most viral claims in circulation. It does not show customers are improper to assume one thing has modified. Nevertheless it does recommend that no less than a few of the benchmark proof now driving the story could also be overstated, poorly normalized or not straight comparable.

Even the BridgeBench submit itself drew a group notice to related impact. The notice stated the two benchmark runs coated totally different scopes — six duties in a single case and 30 in the different — and that the common-task subset confirmed solely a minor change. That does not make the later end result meaningless, however it weakens the strongest model of the “BridgeBench proved it” argument.

This is now a key function of the controversy: the claims are not all equally robust. Some are grounded in first-hand consumer expertise. Some level to actual product modifications. Some rely on benchmark comparisons that will not be apples-to-apples. And a few rely on inferences about hidden system habits that customers outdoors Anthropic can’t straight verify.

Earlier capability limits gave customers a motive to suspect extra modifications below the hood

The present backlash additionally lands in the shadow of an actual, confirmed Anthropic coverage change from late March. On March 26, Anthropic technical staffer Thariq Shihipar posted that, “To handle rising demand for Claude,” the firm was adjusting how 5-hour session limits work for Free, Professional and Max subscribers throughout peak hours, whereas holding weekly limits unchanged.

He added that in weekdays from 5 a.m. to 11 a.m. Pacific time, customers would transfer by way of their 5-hour session limits sooner than before. In follow-up posts, he stated Anthropic had landed effectivity wins to offset a few of the impression, however that roughly 7% of customers would hit session limits they might not have hit before, significantly on Professional tiers.

In an e-mail on March 27, 2026, Anthropic instructed VentureBeat that Staff and Enterprise prospects have been not affected by these modifications, and that the shift was not dynamically optimized per consumer however as an alternative utilized to the peak-hour window the firm had publicly described. Anthropic additionally stated it was persevering with to spend money on scaling capability.

These feedback have been about session limits, not mannequin downgrades. However they are vital context, as a result of they set up two issues that customers now preserve connecting in public: first, Anthropic has been coping with surging demand; second, it has already modified how utilization is rationed throughout busy durations. That does not show Anthropic diminished mannequin high quality. It does assist clarify why so many customers are primed to imagine one thing else may additionally have modified.

Immediate caching and TTL

A separate, more recent GitHub issue broadens the dispute past mannequin high quality and into pricing and quota habits. In situation #46829, consumer seanGSISG argued that Claude Code’s prompt-cache time-to-live, or TTL, appeared to shift from a one-hour setting again to a five-minute setting in early March, primarily based on evaluation of almost 120,000 API calls drawn from Claude Code session logs throughout two machines.

The grievance argues that this variation drove significant will increase in cache-creation prices and quota burn, particularly for long-running coding classes the place cached context expires shortly and have to be rebuilt. The writer claims that this helps clarify why some subscription customers started hitting utilization limits that they had not beforehand encountered.

What makes this situation notable is that Anthropic did not flatly deny that one thing modified. In a reply on the thread, Jarred Sumner stated the March 6 change was actual and intentional, however rejected the framing that it was a regression. He stated Claude Code makes use of totally different cache durations for various request varieties, and that one-hour cache is not all the time cheaper as a result of one-hour writes value extra up entrance and solely lower your expenses when the similar cached context is reused sufficient instances to justify it.

In his telling, the change was a part of ongoing cache optimization work, not a silent downgrade, and the pre–March 6 habits described in the situation “wasn’t the meant regular state.”

The thread later drew a extra detailed response from Anthropic’s Cherny, who described one-hour caching as “nuanced” and stated the firm has been testing heuristics to enhance cache hit charges, token utilization and latency for subscribers. Cherny stated Anthropic retains five-minute cache for a lot of queries, together with subagents that are hardly ever resumed, and stated turning off telemetry additionally disables experiment gates, which may trigger Claude Code to fall again to a five-minute default in some circumstances.

He added that Anthropic plans to expose surroundings variables that permit customers power one-hour or five-minute cache habits straight. Collectively, these replies do not validate the situation writer’s declare that Anthropic silently made Claude Code dearer total, however they do affirm that Anthropic has been actively experimenting with cache habits behind the scenes throughout the similar interval customers started complaining extra loudly about quota burn and altering product habits.

Anthropic says user-facing modifications, not secret degradation, clarify a lot of the uproar

Anthropic-affiliated staff have publicly pushed again on the broadest accusations. In a single broadly circulated reply on X, Cherny responded to claims that Anthropic had secretly nerfed Claude Code by writing, “This is false.”

He stated Claude Code had been defaulted to medium effort in response to consumer suggestions that Claude was consuming too many tokens, and that the change had been disclosed each in the changelog and in a dialog proven to customers once they opened Claude Code.

That response is notable as a result of it concedes a significant product change whereas rejecting the extra conspiratorial interpretation of it. Anthropic is not saying nothing modified. It is saying that what modified was disclosed and was geared toward balancing token use, not secretly decreasing mannequin high quality.

Public documentation additionally helps the indisputable fact that effort defaults have been in movement. Claude Code’s changelog says that on April 7, Anthropic modified the default effort stage from medium to excessive for API-key customers in addition to Bedrock, Vertex, Foundry, Staff and Enterprise customers.

That means Anthropic has actively been tuning these settings throughout totally different segments, which may plausibly have an effect on consumer perceptions even when the core mannequin weights are unchanged.

Shihipar has additionally straight denied the broader demand-management accusation. In a reply on X posted April 11, he stated Anthropic does not “degrade” its fashions to higher serve demand. He additionally stated that modifications to pondering summaries affected how some customers have been measuring Claude’s “pondering,” and that the firm had not discovered proof backing the strongest qualitative claims now spreading on-line.

The actual situation could also be belief as a lot as mannequin high quality

What is clear is {that a} belief hole has opened between Anthropic and a few of its most demanding customers.

For builders who rely on Claude Code all day, refined shifts in seen pondering output, effort defaults, token burn, latency tradeoffs or utilization caps can really feel indistinguishable from a weaker mannequin.

That is true whether or not the root trigger is a product setting, a UI change, an inference-policy tweak, capability strain or a real high quality regression.

It additionally means each side of the combat could also be speaking previous one another. Customers are describing what they expertise: extra friction, extra failures and fewer confidence. Anthropic is responding in product phrases: effort defaults, hidden pondering summaries, changelog disclosures, and denials that demand strain is inflicting secret mannequin degradation.

These are not essentially incompatible descriptions. A mannequin can really feel worse to customers even when the firm believes it has not “nerfed” the underlying mannequin in the approach critics allege. However coming at a time when Anthropic’s chief rival OpenAI has recently pivoted and put more resources behind its competing, enterprise and vibe-coding targeted product Codex — even offering a new, more mid-range ChatGPT subscription in an effort to enhance utilization of the device — it is definitely not the sort of publicity that stands to profit Anthropic or its buyer retention.

At the similar time, the public proof stays blended. A few of the most viral claims have come from builders with detailed logs and powerful opinions primarily based on repeated use. A few of the benchmark proof has been challenged by outdoors observers on methodological grounds. And Anthropic’s personal current modifications to limits and settings make sure that this debate is occurring towards a backdrop of actual changes, not pure rumor.

Disclaimer: This article is sourced from external platforms. OverBeta has not independently verified the information. Readers are advised to verify details before relying on them.