How I Do Spec-Driven Development (SDD) with Claude Code

Here’s my full spec-driven development setup after months of daily Claude Code use: a four-phase workflow, a CLAUDE.md structure that actually gets followed, context management rules that prevent the cliff-edge degradation problem, and the subagent delegation pattern.

I use Claude Code for almost everything: features, refactors, bug fixes, migrations. The workflow I’m about to describe works across all of them.

Why specs

When you let Claude jump straight to code, you’re asking it to figure out what to build, how to build it, and actually build it, all in one context window. That’s three jobs. It does a mediocre version of all three instead of a good version of any one.

Spec-driven development means: write a spec first, save it as a file, then use that file as the single source of truth for implementation. Claude reads the spec. Claude implements the spec. If the spec is wrong, you fix the spec, not the chat history.

Phase 1: Explore

Before Claude writes a single line, it reads. I’m explicit about this:

Read the auth module, user model, and existing middleware.
Understand the patterns. Do NOT write code.

That last line matters. Without it, Claude will start “helping” by drafting implementations mid-research. You don’t want that yet.

For larger features, I spin up parallel subagents: one to research the data layer, one for the API surface, one for existing patterns in the codebase. Each returns focused findings. The main session’s context stays clean.

Phase 2: Write the spec

Toggle Plan Mode with Shift+Tab. Have Claude draft a specification, not code.

My specs have three layers:

Requirements: the what and why. Problem statement, goals, acceptance criteria. Constraints. What’s explicitly out of scope — what to build and what not to build.

Design: the how. Architecture decisions, data model changes, API contracts with actual request/response examples, edge cases, error handling. I include code snippets here when the interface matters:

// API contract for the spec, not the implementation
POST /api/auth/login
Request:  { email: string, password: string }
Response: { token: string, expiresAt: number }
Error:    { code: "INVALID_CREDENTIALS" | "ACCOUNT_LOCKED", message: string }

Tasks: the implementation plan. Small, atomic steps. Each task touches 3 files or fewer. Dependencies are explicit. Every task has a definition of done.

- [ ] Task 1: Add `sessions` table migration (depends on: nothing)
      Files: db/migrations/003_sessions.sql, db/schema.ts
      Done: migration runs, schema types updated

- [ ] Task 2: Implement session store (depends on: Task 1)
      Files: lib/auth/session-store.ts, lib/auth/session-store.test.ts
      Done: unit tests pass

- [ ] Task 3: Add login endpoint (depends on: Task 2)
      Files: app/api/auth/login/route.ts, lib/auth/session-store.ts
      Done: integration test passes, error cases covered

Save the spec as docs/spec-<feature>.md in your repo. This file is your recovery point. Context degrades, you /clear. You come back a week later: point a fresh session at the spec and you’re back to full speed.

A technique I got from Anthropic’s team: instead of writing the full spec yourself, give Claude a rough outline and have it interview you using the AskUserQuestion tool. It asks probing questions

What should happen if the session expires mid-request?

Do you need refresh tokens or just short-lived JWTs?

and surfaces requirements you hadn’t thought through. I use this for every spec now.

Phase 3: Implement with subagents

This is the part that made the biggest difference. Don’t tell Claude “implement the spec” in the main session. Delegate through subagents.

The prompt:

Implement docs/spec-auth.md. Use the task tool — each task
handled by a subagent so context stays clean. Commit after
each task.

What happens: Claude reads the spec, spins up a subagent for Task 1, that subagent gets a fresh context window scoped to just its slice of work, it implements and commits, then the next subagent picks up Task 2. The main session stays clean because it’s just tracking progress and handing off context.

This is context isolation through delegation. The orchestrator never fills up with implementation details. Each subagent works in a tight scope and produces one commit. If Task 4 breaks something Task 2 built, the git history tells you exactly where to look.

Critical: each subagent should return a summary, not its full context. A well-designed subagent prompt includes an output format: what it built, what it changed, any decisions it made that deviated from the spec.

Phase 4: Verify

Every task compiles and passes tests before the next one starts. Tests are items in the task list, not a separate phase tacked on at the end.

For new features, I include test tasks inline:

- [ ] Task 2: Implement session store
- [ ] Task 2a: Write session store unit tests
      Done when: create, read, delete, expiry cases pass

For bugs, the loop is strict TDD: write a failing test that reproduces the bug, confirm it fails, fix the code, confirm it passes. This prevents Claude from “fixing” a bug by quietly changing behavior somewhere else. The test pins the expected behavior before any code changes.

I also use a PostToolUse hook to run the TypeScript compiler after every edit:

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Write|Edit",
      "hooks": [{ "type": "command", "command": "npx tsc --noEmit --skipLibCheck || true" }]
    }]
  }
}

This way type errors surface immediately, not at commit time.

CLAUDE.md: less is more

CLAUDE.md loads into context at session start. Every line competes for Claude’s attention. Claude can follow roughly 150-200 instructions reliably, and its system prompt already burns about 50 of those. You have maybe 100-150 slots. Spend them wisely.

I keep mine under 100 lines. Here’s the actual structure I use:

# Project: my-app
Next.js 14 App Router, TypeScript strict, Drizzle ORM, PostgreSQL

## Commands
- `npm run dev` — dev server on :3000
- `npm run test` — Jest with coverage
- `npm run lint` — ESLint + Prettier check
- `npm run db:migrate` — run Drizzle migrations

## Code style
- TypeScript strict mode, no `any`
- Named exports only, no default exports
- ES modules, not CommonJS
- Collocate tests next to source: `foo.ts` → `foo.test.ts`

## Architecture
- `/app` — Next.js App Router pages and API routes
- `/components/ui` — reusable UI primitives
- `/lib` — business logic, utilities, DB queries
- `/db` — schema, migrations, seed data

## Workflow
- Use Plan Mode for new features before writing code
- Break tasks touching >3 files into smaller tasks first
- Derive tasks from spec files — don't invent new ones
- Write tests alongside implementation, not after

What I keep out of CLAUDE.md:

Long docs: link to them instead. "For auth patterns, see docs/auth-architecture.md" lets Claude load on-demand without bloating every session.
Formatting rules: that’s a linter’s job. Enforce it with a PostToolUse hook running Prettier.
Obvious patterns: don’t burn instruction slots telling Claude how to write a function.
Feature-specific details: those live in the feature spec, not global config.

One rule I enforce: when Claude makes a mistake I’ve corrected before, I tell it to add a prevention rule to CLAUDE.md. The file self-corrects over time.

Context management

This is how you get the most out of Claude.

Clear between tasks. /clear after every distinct unit of work. Past 80k tokens or 40% of context used, clear. One feature’s history polluting another feature’s implementation causes subtle bugs.

Don’t use /compact. Automatic compaction is opaque. You don’t control what gets dropped. The better pattern: have Claude write its current progress and plan to a markdown file, /clear, start a new session pointing at that file. You decide what carries forward.

Write your current progress, remaining tasks, and any
decisions made to docs/progress-auth.md. Include enough
detail that a fresh session can continue without re-reading
all source files.

Then in the new session:

Read docs/spec-auth.md and docs/progress-auth.md.
Continue implementation from where the previous session left off.

One task per session. Context degradation isn’t gradual; it’s a cliff. Claude is sharp for the first 40-50k tokens, then drops off fast. Keep sessions focused on a single task and clear before starting the next one.

Watch your MCP token budget. If you’re running MCP servers, each one eats context. With ~14 MCP servers enabled, your usable context window shrinks from 200k to under 70k. I keep 3-4 active per project usually.

Git worktrees for parallel work

This isn’t strictly SDD, but it’s the force multiplier that makes the workflow scale. Instead of branches:

git worktree add ../my-app-feature-auth -b feature/auth
cd ../my-app-feature-auth && claude

Each worktree is a fully isolated working directory with its own Claude session. I run 2-3 features in parallel this way, one per terminal tab. No branch switching, no stash juggling, no context contamination between features.

What I actually run daily

The full loop for a new feature:

Open Claude Code. Paste a one-paragraph feature description.
Shift+Tab into Plan Mode. Have Claude explore the codebase and draft a spec.
Review the spec. Have Claude interview me for gaps. Iterate until the spec is tight.
Save to docs/spec-<feature>.md.
Tell Claude to implement via subagents, one commit per task.
Review commits as they land. Fix spec if something’s wrong. /clear and resume from spec if context degrades.
Final review pass. Merge.

Most features take 2-3 sessions. Large refactors might take 5-6, each starting from the spec file. The spec is always the source of truth, not the chat, not my memory, not Claude’s context window.

Three things to start with if you’re trying this for the first time:

Plan Mode before code. Every time. Toggle it with Shift+Tab and don’t let Claude edit files until the spec exists.
Subagents for implementation. Context isolation is the whole game.
Commit per task. Granular history gives you granular rollback.

A spec file, Plan Mode, and disciplined /clear usage covers 80% of the value. Hooks, custom slash commands, and multi-agent orchestration are optimizations you add once the core workflow is muscle memory.