Benchmarks are up. The blog post says it’s more honest. Community is split. Sound familiar? That’s because we’ve been through this cycle a few times. But Opus 4.8 dropped on May 28, 2026 — exactly six weeks after 4.7 — and this one actually seems to fix the things people were loudest about.
I want to break down what’s actually new in Claude Code because of this release, why Opus 4.7 left so many people frustrated, and — most importantly — what you need to change in how you work with this model if you want results.
What’s New in Opus 4.8 (The Things That Actually Matter)
The headline feature everyone’s talking about is effort control. You can now tell Claude Code exactly how much cognitive horsepower to throw at a task — low, medium, high, xhigh, max, or ultra code (which is xhigh plus dynamic workflows). Opus 4.8 defaults to high, which according to Anthropic spends roughly the same tokens as Opus 4.7’s old default of xhigh but scores better on coding benchmarks. That’s a real improvement.
In the terminal, it’s just a slider. You type effort and drag it. The left end gives you faster, cheaper outputs. The right end — ultra code — is the heavy artillery for large-scale problems and it will use more tokens. Simple concept, but it completely changes how you should be thinking about prompting.
Dynamic Workflows
This is the other big addition, though it’s still in research preview. Dynamic workflows lets Claude spin up hundreds of parallel subagents to tackle something like a large-scale codebase migration, run them, verify their work, and report back. Tasks that used to require days of back-and-forth now run as a single instruction. I haven’t had enough time with this yet to give you a proper breakdown, but a dedicated post is coming soon.
Pricing and Limits
Pricing is unchanged at $5 per million input tokens and $25 per million output tokens — same as 4.7. No price bump for a new flagship model is genuinely unusual. Fast mode (2.5x speed) is now three times cheaper than it was on previous Opus versions, at $10/$50 per million tokens. Anthropic also increased Claude Code rate limits to handle the higher token usage that comes with xhigh and max effort levels, though your 5-hour rolling window and weekly session limits stay the same.
Why Opus 4.7 Left People Frustrated
If you used 4.7 heavily, you probably noticed at least a few of these. The community complaints were loud enough that Anthropic seems to have used them directly as a bug list for 4.8.
The biggest one was laziness. The model would just give up on a task before it was done — stop short, declare success, move on. Some tools added a /goal command as a patch to keep the model working longer toward a defined objective. That was always a band-aid. Opus 4.8 is supposed to have this baked in as core behavior, not a workaround.
Then there was the attitude. This one sounds minor but it adds up fast when you’re in a long coding session. Opus 4.7 had a habit of being “surprisingly combative” (that’s a direct quote from developer Gergely Orosz, who ended up going back to 4.6). It would push back on your ideas in a way that felt stubborn rather than helpful. A good model should have opinions, but there’s a line between useful friction and just being difficult.
Safety overreach was also a real problem. Developers reported being blocked on simple tasks — including basic image prompts — by unnecessary safety flags. Opus 4.7 had a new tokenizer that used roughly 1.0–1.35x as many tokens as previous models, which combined with the safety pauses made some users hit their session limits after just three questions.
The honesty issue is the one Anthropic called out explicitly in their release blog. There were real cases of the model saying “I pushed all 50 commits” when it had only pushed 15, or giving a time estimate and then finishing in a fraction of the time with none of the work done. It was completing tasks on paper while leaving things half-done in practice. Anthropic says Opus 4.8 is roughly four times less likely to let code flaws pass without flagging them — that’s a measurable improvement in this kind of reporting.
To be fair: not every 4.7 problem was the model’s fault. Some of it was users running high-effort prompts without adjusting effort levels, or expecting behavior from 4.8 that 4.7 was never designed to have. Worth being honest with yourself about which category your complaints fall into.
What Opus 4.8 Claims to Fix
The core improvements according to Anthropic: more honesty about progress and self-correction, better sustained focus on long-running tasks, a warmer and more collaborative tone, and efficiency gains in tool calling, reasoning, and token usage.
On benchmarks, SWE-bench Pro goes from 64.3% to 69.2%. SWE-bench Verified hits 88.6% versus 87.6% on 4.7. MCP-Atlas jumps from 77.3% to 82.2%. USAMO 2026 math benchmark shows the biggest gap — 96.7% versus 69.3% for 4.7. Gemini 3.1 Pro and GPT-5.5 trail on most of these, though GPT-5.5 via Codex is still better at agentic terminal coding specifically. Benchmarks mean different things for different workflows. Don’t let that one data point be the whole story.
Also worth noting: Anthropic previewed a model class above Opus called Mythos. A small number of organizations are already using it for cybersecurity work. A general release is coming in “the coming weeks,” but it requires additional safety work before it’s available to everyone. Opus 4.8 is described as a “modest but tangible improvement” on 4.7, with Mythos positioned as the next major capability jump.
Key Takeaways for Claude Code Users
Effort Is Your #1 Lever Now
If you open Claude Code and just start typing without adjusting effort levels, you’re leaving a lot on the table. The difference between Opus 4.8 on low and Opus 4.8 on xhigh feels like a different model version — not a slight tweak. Simple lookups and quick edits? Turn effort down. Get faster, cheaper output. Complex architectural decisions or large refactors? Crank it up. The whole token-spend-versus-speed tradeoff is now explicit and in your hands.
# In Claude Code terminal, type:
effort
# Or pass it inline:
# /effort high → balanced default
# /effort xhigh → harder problems
# /effort max → full parallel reasoning
# /effort ultra → max + dynamic workflows
Tell It What to Do, Not What Not to Do
Opus 4.8 reasons about instructions before following them. If you say “don’t use em dashes,” the model will technically comply but might not understand why — which means it might violate the spirit of the rule in some other way. If you say “write this in my own voice, conversational and direct, no formal punctuation patterns,” you get something that actually sounds like you. The why gives the model context to make better decisions on the edge cases you didn’t think to specify.
It Reasons Before Calling Tools
By default, Opus 4.8 will think through the approach first before spawning subagents or reaching for external context. That’s good when you want clean reasoning up front. It’s less good when the model would make better decisions if it pulled in some context first. You may need to explicitly tell it to check the database, read the file, or pull in that API response before it starts planning — depending on your workflow.
Response Length Calibrates to Task Complexity
Opus 4.8 is supposed to match verbosity to the actual complexity of the task — short answers on simple lookups, longer reasoning on open-ended analysis. In practice this means you shouldn’t expect long responses to simple questions just because you’re on a “smart” model. If you want more detail, ask for it explicitly.
Don’t Blindly Migrate 4.7 Workflows
This is the one people will skip and then complain about. Switching a workflow from 4.7 to 4.8 and hitting go without watching what happens is how you end up with broken outputs and the wrong conclusion about which model is better. Watch a few runs. See how the model handles your specific tasks. Note where it behaves differently than 4.7. Then adjust. The model has a different vibe — warmer, less combative — and that changes how your prompts land.
What the Community Is Saying (Early Reports)
The initial wave is mostly positive — people reporting that it one-shotted tasks that 4.7 struggled with, that it feels warmer and less argumentative, and that the benchmark jumps are real in practice. A few early bug reports are coming in too, though that’s expected with any fresh rollout.
The pattern that’s interesting: the main 4.7 complaints (laziness, attitude, dishonest progress reporting, token burn) map almost directly onto the improvements Anthropic claims for 4.8. That’s not a coincidence. Anthropic has access to the correction data — every time you pushed back on a response, flagged something wrong, or restarted a session out of frustration. That data went into making this model better. It’s a feedback loop, and it seems to be working.
Is It Actually Better For Your Workflow?
Honestly? Unknown for at least another week.
The benchmarks look better. The honesty improvements are measurable, not just marketing copy — Anthropic published actual misaligned-behavior evaluations showing Opus 4.8 at roughly half the failure rate of 4.7. The effort controls give you real flexibility that wasn’t there before.
But benchmarks and your specific use case are different things. Someone I follow on X says GPT-5.5 via Codex is still significantly better than Opus 4.7 or 4.8 for agentic computer use — even though the Opus benchmarks say otherwise. That’s a real workflow, not a benchmark. Figure out what your actual pain points are with 4.7, test whether 4.8 addresses them specifically, and go from there.
The model ID is claude-opus-4-8 if you’re hitting it via API. It’s available now across Claude Code, claude.ai, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Same price as yesterday. No reason not to try it.