I rely on Claude Code, but it has a structural limitation: it defaults to implementation-first. It writes the āHappy Path,ā ignoring edge cases. When I try to force TDD in a single context window, the implementation ābleedsā into the test logic (context pollution). This article documents a multi-agent system using Claudeās āSkillsā and āHooksā that enforces a strict Red-Green-Refactor cycle.
Framework Agnostic
While this article uses Vue as an example, the TDD principles and Claude Code workflow apply to any technology. Whether youāre working with React, Angular, Svelte, or even backend languages like Python, Go, or Rustāthe Red-Green-Refactor cycle and subagent orchestration work the same way.
The Problem with AI-Assisted TDD
When I ask Claude to āimplement feature X,ā it writes the implementation first. Every time. TDD flips thisāyou write the test first, watch it fail, then write minimal code to make it pass.
I needed a way to:
- Force test-first ā No implementation before a failing test exists
- Keep phases focused ā The test writer shouldnāt think about implementation details
- Ensure refactoring happens ā Easy to skip when the feature already works
Skills + Subagents
Claude Code supports two features I hadnāt explored until recently:
- Skills Understanding Claude Code's Full Stack: MCP, Skills, Subagents, and Hooks Explained A practical guide to Claude Code's features ā explained in the order they were introduced: MCP (2024), Claude Code core (Feb 2025), Plugins (2025), and Agent Skills (Oct 2025). What each does, how they fit together, and when to use what. (
.claude/skills/): High-level workflows that orchestrate complex tasks - Agents (
.claude/agents/): Specialized workers that handle specific jobs
You might wonder: why use subagents at all? Skills alone could handle the TDD workflow. But thereās a catchācontext pollution.
The Context Pollution Problem
Why Single-Context LLMs Cheat at TDD
When everything runs in one context window, the LLM cannot truly follow TDD. The test writerās detailed analysis bleeds into the implementerās thinking. The implementerās code exploration pollutes the refactorerās evaluation. Each phase drags along baggage from the others.
This isnāt just messyāit fundamentally breaks TDD. The whole point of writing the test first is that you donāt know the implementation yet. But if the same context sees both phases, the LLM subconsciously designs tests around the implementation itās already planning. It ācheatsā without meaning to.
Subagents solve this architectural limitation. Each phase runs in complete isolation:
- The test writer focuses purely on test designāit has no idea how the feature will be implemented
- The implementer sees only the failing testāit canāt be influenced by test-writing decisions
- The refactorer evaluates clean implementation codeāit starts fresh without implementation baggage
Each agent starts with exactly the context it needs and nothing more. This isnāt just organizationāitās the only way to get genuine test-first development from an LLM.
Combining skills with subagents gave me exactly what I needed:
The TDD Skill
The orchestrating skill lives at .claude/skills/tdd-integration/skill.md:
---
name: tdd-integration
description: Enforce Test-Driven Development with strict Red-Green-Refactor cycle using integration tests. Auto-triggers when implementing new features or functionality. Trigger phrases include "implement", "add feature", "build", "create functionality", or any request to add new behavior. Does NOT trigger for bug fixes, documentation, or configuration changes.
---
# TDD Integration Testing
Enforce strict Test-Driven Development using the Red-Green-Refactor cycle with dedicated subagents.
## Mandatory Workflow
Every new feature MUST follow this strict 3-phase cycle. Do NOT skip phases.
### Phase 1: RED - Write Failing Test
š“ RED PHASE: Delegating to tdd-test-writer...
Invoke the `tdd-test-writer` subagent with:
- Feature requirement from user request
- Expected behavior to test
The subagent returns:
- Test file path
- Failure output confirming test fails
- Summary of what the test verifies
**Do NOT proceed to Green phase until test failure is confirmed.**
### Phase 2: GREEN - Make It Pass
š¢ GREEN PHASE: Delegating to tdd-implementer...
Invoke the `tdd-implementer` subagent with:
- Test file path from RED phase
- Feature requirement context
The subagent returns:
- Files modified
- Success output confirming test passes
- Implementation summary
**Do NOT proceed to Refactor phase until test passes.**
### Phase 3: REFACTOR - Improve
šµ REFACTOR PHASE: Delegating to tdd-refactorer...
Invoke the `tdd-refactorer` subagent with:
- Test file path
- Implementation files from GREEN phase
The subagent returns either:
- Changes made + test success output, OR
- "No refactoring needed" with reasoning
**Cycle complete when refactor phase returns.**
## Multiple Features
Complete the full cycle for EACH feature before starting the next:
Feature 1: š“ ā š¢ ā šµ ā
Feature 2: š“ ā š¢ ā šµ ā
Feature 3: š“ ā š¢ ā šµ ā
## Phase Violations
Never:
- Write implementation before the test
- Proceed to Green without seeing Red fail
- Skip Refactor evaluation
- Start a new feature before completing the current cycle
The description field contains trigger phrases so Claude activates this skill automatically when I ask to implement something. Each phase has explicit āDo NOT proceed untilā¦ā gatesāClaude needs clear boundaries. The š“š¢šµ emojis make tracking progress easy in the output.
The Test Writer Agent (RED Phase)
At .claude/agents/tdd-test-writer.md:
---
name: tdd-test-writer
description: Write failing integration tests for TDD RED phase. Use when implementing new features with TDD. Returns only after verifying test FAILS.
tools: Read, Glob, Grep, Write, Edit, Bash
skills: vue-integration-testing
---
# TDD Test Writer (RED Phase)
Write a failing integration test that verifies the requested feature behavior.
## Process
1. Understand the feature requirement from the prompt
2. Write an integration test in `src/__tests__/integration/`
3. Run `pnpm test:unit <test-file>` to verify it fails
4. Return the test file path and failure output
## Test Structure
typescript
import { afterEach, describe, expect, it } from 'vitest'
import { createTestApp } from '../helpers/createTestApp'
import { resetWorkout } from '@/composables/useWorkout'
import { resetDatabase } from '../setup'
describe('Feature Name', () => {
afterEach(async () => {
resetWorkout()
await resetDatabase()
document.body.innerHTML = ''
})
it('describes the user journey', async () => {
const app = await createTestApp()
// Act: User interactions
await app.user.click(app.getByRole('button', { name: /action/i }))
// Assert: Verify outcomes
expect(app.router.currentRoute.value.path).toBe('/expected')
app.cleanup()
})
})
## Requirements
- Test must describe user behavior, not implementation details
- Use `createTestApp()` for full app integration
- Use Testing Library queries (`getByRole`, `getByText`)
- Test MUST fail when run - verify before returning
## Return Format
Return:
- Test file path
- Failure output showing the test fails
- Brief summary of what the test verifies
I limited the tools to only whatās needed for writing and running tests. The skills field pulls in my vue-integration-testing skill for project-specific context. And the explicit return format ensures clean handoffs between phases.
About the vue-integration-testing skill
This skill defines how I want tests written: using jsdom with Vue Test Utils, writing BDD-style tests that describe user behavior, and avoiding mocks wherever possible. I donāt see much value in unit tests that mock everythingāthey often just verify implementation details rather than actual functionality. Integration tests that exercise real code paths catch more bugs.
The Implementer Agent (GREEN Phase)
At .claude/agents/tdd-implementer.md:
---
name: tdd-implementer
description: Implement minimal code to pass failing tests for TDD GREEN phase. Write only what the test requires. Returns only after verifying test PASSES.
tools: Read, Glob, Grep, Write, Edit, Bash
---
# TDD Implementer (GREEN Phase)
Implement the minimal code needed to make the failing test pass.
## Process
1. Read the failing test to understand what behavior it expects
2. Identify the files that need changes
3. Write the minimal implementation to pass the test
4. Run `pnpm test:unit <test-file>` to verify it passes
5. Return implementation summary and success output
## Principles
- **Minimal**: Write only what the test requires
- **No extras**: No additional features, no "nice to haves"
- **Test-driven**: If the test passes, the implementation is complete
- **Fix implementation, not tests**: If the test fails, fix your code
## Return Format
Return:
- Files modified with brief description of changes
- Test success output
- Summary of the implementation
The Refactorer Agent (REFACTOR Phase)
At .claude/agents/tdd-refactorer.md:
---
name: tdd-refactorer
description: Evaluate and refactor code after TDD GREEN phase. Improve code quality while keeping tests passing. Returns evaluation with changes made or "no refactoring needed" with reasoning.
tools: Read, Glob, Grep, Write, Edit, Bash
skills: vue-composables
---
# TDD Refactorer (REFACTOR Phase)
Evaluate the implementation for refactoring opportunities and apply improvements while keeping tests green.
## Process
1. Read the implementation and test files
2. Evaluate against refactoring checklist
3. Apply improvements if beneficial
4. Run `pnpm test:unit <test-file>` to verify tests still pass
5. Return summary of changes or "no refactoring needed"
## Refactoring Checklist
Evaluate these opportunities:
- **Extract composable**: Reusable logic that could benefit other components
- **Simplify conditionals**: Complex if/else chains that could be clearer
- **Improve naming**: Variables or functions with unclear names
- **Remove duplication**: Repeated code patterns
- **Thin components**: Business logic that should move to composables
## Decision Criteria
Refactor when:
- Code has clear duplication
- Logic is reusable elsewhere
- Naming obscures intent
- Component contains business logic
Skip refactoring when:
- Code is already clean and simple
- Changes would be over-engineering
- Implementation is minimal and focused
## Return Format
If changes made:
- Files modified with brief description
- Test success output confirming tests pass
- Summary of improvements
If no changes:
- "No refactoring needed"
- Brief reasoning (e.g., "Implementation is minimal and focused")
This agent has a decision framework for whether to refactor. Sometimes āno refactoring neededā is the right answer. The skills field references my vue-composables skill so it knows my projectās patterns for extracting reusable logic.
Real Example: Adding Workout Detail View
Hereās what this looks like in practice. My request:
āWhen a user is on the Workouts page, they should be able to click on a past workout and see a detail view of what exercises and sets they completed.ā
The workflow executes like this:
š“ RED Phase
The tdd-test-writer produced:
// src/__tests__/integration/workout-detail.spec.ts
import { afterEach, describe, expect, it } from 'vitest'
import { createTestApp } from '../helpers/createTestApp'
import { db, generateId } from '@/db'
describe('Workout History Detail View', () => {
afterEach(async () => {
// cleanup
})
it('navigates to detail view when clicking a completed workout', async () => {
// Arrange: Create a completed workout
const completedWorkout = {
id: generateId(),
name: 'Push Day',
exercises: [{
name: 'Bench Press',
sets: [{ kg: '100', reps: '10' }]
}],
completedAt: Date.now(),
}
await db.workouts.add(completedWorkout)
// Act: Navigate and click
const app = await createTestApp()
await app.user.click(app.getByRole('button', { name: /workouts/i }))
await app.user.click(await app.findByText('Push Day'))
// Assert
expect(app.router.currentRoute.value.path)
.toBe(`/workouts/${completedWorkout.id}`)
expect(app.getByText('Bench Press')).toBeDefined()
expect(app.getByText('100')).toBeDefined()
app.cleanup()
})
})
The test failed:
AssertionError: expected '/workouts' to be '/workouts/d747077d-...'
Exactly what I wantedāa clear failure pointing to missing functionality.
š¢ GREEN Phase
The tdd-implementer created:
WorkoutDetailView.vueā New detail view component- Modified
TheWorkoutsView.vueā Added click handlers - Modified
router/index.tsā Added/workouts/:idroute
Test passed. Minimal implementation, just enough to satisfy the assertions.
šµ REFACTOR Phase
The tdd-refactorer evaluated the code and made improvements:
- Extracted
useWorkoutDetailcomposable How to Test Vue Composables: A Comprehensive Guide with Vitest Learn how to effectively test Vue composables using Vitest. Covers independent and dependent composables, with practical examples and best practices. ā Reusable data fetching with discriminated union states - Created shared formatters ā Pulled
formatDurationandformatDateintolib/formatters.ts - Added accessibility ā Keyboard navigation for clickable cards
All tests still passed. The cycle completed.
The Test Helper
A crucial piece making all this work is my createTestApp() helper:
// src/__tests__/helpers/createTestApp.ts
export async function createTestApp(): Promise<TestApp> {
const pinia = createPinia()
const router = createRouter({
history: createMemoryHistory(),
routes,
})
render(App, {
global: { plugins: [router, pinia] },
})
await router.isReady()
return {
router,
user: userEvent.setup(),
getByRole: screen.getByRole,
getByText: screen.getByText,
findByText: screen.findByText,
waitForRoute: (pattern) => waitFor(() => {
if (!pattern.test(router.currentRoute.value.path)) {
throw new Error('Route mismatch')
}
}),
cleanup: () => { document.body.innerHTML = '' },
}
}
This gives agents a consistent API for rendering the full app and simulating user interactions. They donāt need to figure out how to set up Vue, Pinia, and Vue Router each timeāthey just call createTestApp() and start writing assertions.
Hooks for Consistent Skill Activation
Even with well-written skills, Claude sometimes skipped evaluation and jumped straight to implementation. I tracked this informallyāskill activation happened maybe 20% of the time.
I found a great solution in Scott Spenceās post on making skills activate reliably. He tested 200+ prompts across different hook configurations and found that a āforced evalā approachāmaking Claude explicitly evaluate each skill before proceedingājumped activation from ~20% to ~84%.
The fix: hooks Get Notified When Claude Code Finishes With Hooks Set up desktop notifications in Claude Code to know when Claude needs your input or permission. Learn how to use hooks for instant alerts instead of constantly checking. . Claude Code runs hooks at specific lifecycle points, and I used UserPromptSubmit to inject a reminder before every response.
In .claude/settings.json:
{
"hooks": {
"UserPromptSubmit": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "npx tsx \"$CLAUDE_PROJECT_DIR/.claude/hooks/user-prompt-skill-eval.ts\"",
"timeout": 5
}
]
}
]
}
}
The hook script at .claude/hooks/user-prompt-skill-eval.ts:
#!/usr/bin/env npx tsx
import { readFileSync } from 'node:fs'
import { stdout } from 'node:process'
function main(): void {
readFileSync(0, 'utf-8') // consume stdin
const instruction = `
INSTRUCTION: MANDATORY SKILL ACTIVATION SEQUENCE
Step 1 - EVALUATE:
For each skill in <available_skills>, state: [skill-name] - YES/NO - [reason]
Step 2 - ACTIVATE:
IF any skills are YES ā Use Skill(skill-name) tool for EACH relevant skill NOW
IF no skills are YES ā State "No skills needed" and proceed
Step 3 - IMPLEMENT:
Only after Step 2 is complete, proceed with implementation.
CRITICAL: You MUST call Skill() tool in Step 2. Do NOT skip to implementation.
`
stdout.write(instruction.trim())
}
main()
Results
With this hook, skill activation jumped from ~20% to ~84%. Now when I say āimplement the workout detail view,ā the TDD skill triggers automatically.
Conclusion
Claude Codeās default behavior produces implementation-first code with minimal test coverage. Without constraints, it optimizes for āworking codeā rather than ātested code.ā
The system described here addresses this through architectural separation:
- Hooks inject evaluation logic before every prompt, increasing skill activation from ~20% to ~84%
- Skills define explicit phase gates that block progression until each TDD step completes
- Subagents enforce context isolationāthe test writer cannot see implementation plans, so tests reflect actual requirements rather than anticipated code structure
The setup cost is ~2 hours of configuration. After that, each feature request automatically follows the Red-Green-Refactor cycle without manual enforcement.