To split or not to split: Managing context in Claude Code

Every Claude Code session has me asking the same question: can this task fit, or will context saturation quietly wreck the output halfway through? Token usage depends on the codebase and the task in ways you can’t anticipate, so the split points are never obvious.

Levi Stringer ran the cleanest experiment I’ve seen on this. One-shot cost $1.82 and shipped an over-engineered hashing algorithm. Atomic breakdown cost $0.77 and produced five lines. The heuristic is that if you can’t state the task in one sentence with one measurable outcome, you’re going to pay for it (in tokens and in cleanup).

But where to split?

60% threshold

Blake Crosley tracked 50 sessions and found quality degrades at ~60% context utilization, well before auto-compact kicks in at 75-92%. By the time the UI warns you, you’re already in the degraded zone.

My rule of thumb is that if Claude needs to read more than 5-6 files or run several commands, split it. It’s crude, but reading count correlates with token burn better than my intuitions about “complexity” ever did.

/compact is also useful. You can also add granularity to the command: /compact Focus on the API changes, summarize the debugging. Run it every 25-30 minutes or after each subtask.

Subagents

I used to treat decomposition as binary, i.e., break it down myself, or hand Claude the whole thing. I’m using subagents now. Hand over the whole task, but tell Claude to delegate research and exploration.Subagents let you spend tokens on exploration in a throwaway context and keep the implementation window clean.

The built-ins worth knowing are Explore (runs on Haiku, cheap, fast, read-only, use it liberally), Plan (inherits the parent model for Plan Mode research), and a general-purpose agent for multi-step work. Custom agents live in .claude/agents/ with their own tool scopes and models.

Delegation is one level deep. Subagents can’t spawn subagents. Your top-level decomposition is the only decomposition you get; Claude won’t recursively unpack work for you.
Opus 4.6 over-delegates. Anthropic’s own prompting guide flags this, noting that Opus reaches for subagents when a direct approach would be faster and cheaper. If you’re on Opus, put explicit “when not to spawn subagents” guidance in CLAUDE.md or you’ll pay for exploration you didn’t need.

The plan-to-filesystem pattern

Boris Tane’s pipeline gives good results. Research phase produces research.md. Planning phase produces plan.md with snippets and trade-offs. Finally edit the plan in your editor with inline corrections, e.g., domain knowledge Claude couldn’t infer, hard constraints.

You can change it before any code is written. Once it’s stable, he generates a granular todo list and fires one implementation prompt that works through every item.

Multi-session parallelism for large tasks

When a task obviously won’t fit in one window, stop trying to make it fit.

Boris Cherny runs 10-15 concurrent Claude Code sessions, each in its own git worktree for filesystem isolation. One bounded task per session, one fresh context each. The only real constraint is non-overlapping file sets (e.g., backend in one, frontend in another, migrations in a third). Two agents editing the same file can cause merge conflicts and context confusion.

For work that genuinely spans multiple windows on the same files, Anthropic’s long-running agents guide recommends a handoff pattern where an initializer agent bootstraps the environment and writes a progress file; subsequent agents read that file plus git history to reconstruct state. Blake Crosley formalized this as session handoff docs with a fixed schema (Status, Files Modified, Key Decisions, Blockers, Next Steps). Across 49 handoffs in his study, the schema was what kept continuity.

My decision tree

One sentence, one measurable diff? Prompt directly. Plan Mode is overhead.
5-6+ files to read, or architectural ambiguity? Plan Mode. Explore -> plan -> iterate -> implement.
Multiple components with clean seams? Decompose into plan.md, run each as a focused session, or let Claude delegate the parallel parts to subagents.
Obviously bigger than one window? Parallel sessions in separate worktrees, coordinated through a progress file on disk.

Every one of these is a way of keeping the implementation context clean.