If you’ve been using Claude Code for any serious project, you’ve probably hit that wall. The model slows down, the context gets bloated, and before you know it, you’re staring at a rate limit message when you’re right in the middle of something. It’s frustrating — and it’s mostly avoidable.
Anthropic recently doubled Claude Code’s five-hour rate limits for Pro, Max, Team, and Enterprise plans, and removed the peak-hours throttling that was quietly cutting limits during busy windows. That’s a real improvement — but if you’re still running Opus on every task and never clearing your context, you’ll blow through the new limits just as fast. Here’s how to actually use those limits well.
What Changed With Claude Code Limits (And What Didn’t)
In May 2026, Anthropic announced three concrete changes to Claude Code usage limits — backed by a compute deal with SpaceX that gave them access to over 220,000 NVIDIA GPUs at the Colossus 1 data center in Tennessee:
- Five-hour rate limits doubled — for Pro, Max, Team, and seat-based Enterprise plans.
- Peak-hour throttling removed — Pro and Max users now get consistent limits regardless of time of day.
- API rate limits significantly increased — Claude Opus API limits jumped by more than an order of magnitude for most tiers.
What they didn’t change: the weekly rolling cap. The five-hour window is now wider, but the ceiling above a full week of usage is a separate number. Most devs were being clipped inside individual sessions — not by the weekly ceiling — so this change matters more in practice than it sounds. But knowing the difference helps you plan your sessions better.
The Real Reason You Keep Running Out
Here’s something that surprises most people: every message you send in Claude Code re-sends your entire conversation history as input tokens. Message 5 is cheap. Message 150 is expensive — not because it’s long, but because it’s carrying 149 messages of history with it.
This is why sessions that felt fine for the first hour suddenly drain your quota in the last 15 minutes. You’re not sending more — you’re accumulating.
There are two ways this compounds:
- Running Opus on everything, including routine tasks that Sonnet handles just as well.
- Carrying context across completely unrelated tasks without ever clearing it.
Fix those two things and you’ll stretch your limits noticeably without needing to wait for Anthropic to give you more.
Use the Right Model for the Right Job
Opus is a powerful model. It’s also several times more expensive per turn than Sonnet, and Sonnet more than Haiku. If you’re running Opus by default on every task, you’re burning quota on work that doesn’t need it.
Anthropic’s own documentation spells out a practical breakdown:
- Opus — hard debugging, cross-cutting refactors, architecture decisions, anything where deeper reasoning actually changes the output.
- Sonnet — features, tests, known bugs, standard refactors. Most of your daily coding falls here.
- Haiku — mechanical work: renames, log lines, regex explanations, boilerplate.
The smartest pattern is to plan with Opus, execute with Sonnet. Claude Code has a built-in model alias for this: /model opusplan. It uses Opus while in plan mode (Shift+Tab to enable), then switches automatically to Sonnet once the plan is ready and execution begins. You get Opus reasoning where it counts without paying for it on every implementation step.
# Switch to the opusplan model in Claude Code
/model opusplan
# This uses Opus during planning mode (Shift+Tab)
# Then auto-switches to Sonnet for implementation
If you’re on a Max plan, Sonnet pulls roughly five times less from your usage limits than Opus for equivalent work. Route 70–80% of your execution tasks through Sonnet and the math works out — you’re effectively multiplying your usable throughput without spending more.
Manage Your Context Window Aggressively
Context rot is real. As the context window fills up, Claude starts losing track of older decisions, misses things it already read, and repeats itself. The fix isn’t to push through — it’s to manage the context before it becomes a problem.
/compact vs /clear — Which One to Use
These two commands solve different problems:
- /compact — compresses the conversation into a summary while preserving the most relevant context. Use this when you want to keep working on the same task but the window is getting full. Run it at around 70% capacity — not 90%, because by then the summary gets too compressed to be useful.
- /clear — wipes everything. No history, no summary, clean slate. Use this when you’re switching to a completely different task, or when the context has accumulated incorrect assumptions that Claude keeps reverting to.
If Claude has made the same mistake twice and you’ve corrected it twice, don’t correct it a third time — /clear and write a better initial prompt that incorporates what you learned.
One important thing: /clear deletes session history with no recovery. Before you run it, save any important decisions or context to your CLAUDE.md file, which survives the clear and loads automatically at the start of every new session.
CLAUDE.md Is Your Persistent Memory
CLAUDE.md is a project-level config file that Claude Code reads at the start of every session. If you find yourself repeating the same instructions across sessions — code style, build commands, which model to use for which task, architecture decisions — put them in CLAUDE.md instead. You only pay for them once per session rather than re-explaining every time.
# Example CLAUDE.md
## Project Context
- Stack: Next.js 14, TypeScript, Tailwind CSS
- API routes live in /src/app/api
- Always run `pnpm typecheck` after changes
## Model Defaults
- Use Sonnet for features and tests
- Use Opus only for architecture decisions and hard debugging
- When compacting: always preserve the list of modified files
## Code Style
- Snake_case for API endpoints
- 2-space indentation
- Prefer composition over inheritance
Keep it under 200 lines. If it gets longer, Claude starts ignoring parts of it — the important rules get buried. Move detailed reference docs to separate files and reference them with @filename inside CLAUDE.md when needed.
Other Things That Actually Help
Reference Files By Path, Don’t Paste Them
Anything you paste into the chat sits in context for the rest of the session. If you paste a 500-line file to ask a question about one function, you’ve added 500 lines to every subsequent message’s input cost. Instead, tell Claude where the file lives and what to look at:
# Don't do this
[paste entire auth.ts file]
"Why is this failing?"
# Do this instead
"Look at the validateToken function in src/auth.ts — it's returning null on valid tokens."
Claude Code reads files selectively when you give it a path. You get the same result without inflating the context window with content it doesn’t need.
Break Large Tasks Into Sessions
Asking Claude to “refactor the entire project, write tests, and update the docs” in one go is the fastest way to burn a session. The context fills up with work from step 1 by the time you’re on step 3, and quality drops. Break it into separate sessions with clear boundaries. Each session starts fresh, costs less, and tends to produce better work because the model isn’t trying to track 10 different things at once.
Use /cost to Track Where Your Tokens Are Going
Claude Code has a /cost command that shows you cumulative token spend for the session. If a number looks unexpectedly high, it almost always traces back to a very long session that was never cleared, or Opus running on tasks it didn’t need to.
What the New Limits Actually Mean for You
The doubled five-hour rate limits are a genuine improvement — especially with peak throttling gone, which was a frustrating invisible cap that affected Pro and Max users during busy hours without much explanation. For most developers, the per-window throttle was the actual bottleneck, not the weekly ceiling. That’s the one that just got wider.
But the limit increase is a bigger win for developers who are already managing their sessions well. If you’re still defaulting to Opus, never clearing context between tasks, and pasting entire files into the chat — the new numbers will disappear just as fast as the old ones did.
The pattern that actually works: Opus on planning and review, Sonnet on execution, /compact at 70% context, /clear at task boundaries, CLAUDE.md for anything you’d otherwise repeat. That’s the setup that takes the doubled limits and turns them into something you can feel across a full work day.