To split or not to split: Managing context in Claude Code
I’m always wondering whether the task at hand could be better solved by splitting it into smaller, more manageable subtasks that fit the context window well. And since token usage is difficult to predict, and is based on the codebase and complexity of the task, I find it hard to determine the optimal split points. As long as context saturation degrades performance, manually planning and decomposing complex tasks is essential for maintaining code quality.
Levi Stringer, Developer Advocate at Anthropic, suggests manual decomposition to manage context effectively. His experiment compared a one-shot attempt against an atomic breakdown approach on identical bug fixes. The one-shot attempt cost $1.82 and produced an overcomplicated custom hashing algorithm, while the atomic approach cost $0.77 and produced five clean lines of code. Therefore, if you can’t articulate the task in one sentence with one measurable outcome, split it.
But where to split?
60% threshold
Blake Crosley published a measurement study based on 50 Claude Code sessions where he tracked context utilization against output quality. His finding was that quality degrades at approximately 60% context utilization that is far earlier than the 75-92% threshold where auto-compaction kicks in.
So now I split if Claude needs to read more than 5-6 files and run several commands to complete a task. It’s an imperfect heuristic, but it maps more reliably to actual token consumption than trying to estimate complexity.
Running /compact every 25-30 minutes or after completing each subtask helps. You can even guide what gets preserved: /compact Focus on the API changes we made, summarize the debugging session tells Claude what context actually matters going forward.
Subagents
I was thinking about task decomposition as a binary i.e. either I break it down manually, or I give Claude the whole thing. But subagents give you a third option: give Claude the whole task, but tell it to use subagents for research and exploration. The main context stays clean for the actual implementation work.
Three built-in subagent types exist: Explore (runs on Haiku, fast and cheap, read-only), Plan (inherits the parent model, handles research during Plan Mode), and a general-purpose agent for complex multi-step tasks. You can also define custom subagents as Markdown files in .claude/agents/ with specific tool permissions, models, and scoping.
One architectural constraint worth knowing: subagents cannot spawn other subagents. The delegation is exactly one level deep. This means your decomposition still needs to happen at the top level and Claude can’t recursively break down work into arbitrarily nested subtasks.
There’s also a nuance with Claude Opus 4.6 specifically: Anthropic’s own prompting guide notes it has a strong tendency to spawn subagents even when a simpler direct approach would suffice. If you’re on Opus, it’s worth adding explicit guidance in your CLAUDE.md about when subagents are and aren’t warranted.
The plan-to-filesystem pattern
I’ve seen good results from Boris Tane’s pipeline. His workflow starts with a deep-read research phase that produces a research.md, followed by a planning phase that produces a plan.md with code snippets and trade-offs.
What makes this different from just using Plan Mode is the annotation cycle. He opens the plan in his editor and adds inline corrections, domain knowledge, and constraints across 1-6 rounds of iteration. The plan becomes a shared artifact between you and the agent. Something you can read, edit, disagree with, and refine before any code gets written. He then generates a granular todo list and issues a single implementation prompt telling Claude to work through every task and mark each as completed.
Multi-session parallelism for large tasks
For tasks that obviously exceed a single context window I resort to multi-session parallelism.
Boris Cherny runs 10-15 concurrent sessions, each in its own git worktree for filesystem isolation. Each session gets one bounded task with its own fresh context window. The key constraint for parallel work is non-overlapping file sets: Agent 1 handles the backend API, Agent 2 handles the frontend, Agent 3 handles database migrations. If multiple agents touch the same file, you get conflicts.
Anthropic’s engineering blog on long-running agents recommends a structured handoff pattern for work spanning multiple context windows: an initializer agent sets up the environment and creates a progress file, then subsequent coding agents read that progress file and git history to understand state.
Blake Crosley formalized this into session handoff documents with a consistent format: Status, Files Modified, Key Decisions, Blockers, and Next Steps. Across 49 handoff docs in his study, this pattern maintained continuity without relying on context that would be compacted away.
My current workflow
Can I describe the task and expected diff in one sentence? Skip the plan, just prompt directly.
Does the task require reading more than 5-6 files or involve architectural uncertainty? Use Plan Mode. Have Claude explore first, produce a plan, iterate on it, then implement.
Does the task span multiple components with clear boundaries? Decompose manually into subtasks, write them to a plan.md or todo.md, and run each as a focused session or let Claude use subagents for the parallel pieces.
Is the task obviously too large for one context window? Split into parallel sessions in separate git worktrees, connected by a shared progress file on disk.