Next Talk: Clean Code is Sexy Again: Making Your Vue Project AI Ready

May 22, 2026 — MadVue, Madrid

Conference
Skip to content

How to do AFK Coding

Published: at 

TLDR

  • 1–3 story-point tickets fit in one Opus 4.7 or GPT 5.5 shot. 5+ pointers don't. The context fills, the model auto-compacts, and the refactor never happens.
  • AFK coding is HITL at the edges, AFK in the middle. You align on the spec, you review the PR. Everything between runs without you.
  • The pipeline: spec → vertical-slice sub-tickets → Ralph loop per slice with TDD red-green-refactor → dedicated refactor pass → agent-browser QA → PR.
  • You're a manager now: review, QA, and several pipelines in parallel.

The Pipeline#

The workflow I use to ship a 5-point ticket while I’m away from the keyboard:

Pixel art diagram of the AFK coding pipeline showing six phases: Spec, Slice, Ralph Loop, Refactor, QA, Review, with HITL human judgment bookending AFK agent execution
The AFK coding pipeline: human judgment at the edges, agent execution in the middle

Six phases. The first and last are human, the middle four run on their own.

1. Align on spec        ── HITL ──  business + dev + AI interviewer
2. Slice the ticket     ── AFK  ──  vertical slices, one per sub-ticket
3. Ralph loop per slice ── AFK  ──  fresh context, TDD red-green-refactor
4. Refactor pass        ── AFK  ──  the step LLMs always skip
5. Agentic QA           ── AFK  ──  agent-browser drives the real UI
6. Review               ── HITL ──  dev reviews, business does UAT

Two questions first: when this pipeline pays off, and what “AFK” looks like in practice.

Two Kinds of Tickets#

Pixel art comparison of a small ticket flowing one prompt to one agent to a clean PR, versus a large ticket where one huge prompt produces context overflow, skipped tests, and a messy PR
1–3 pointers fit one prompt. 5+ pointers overflow context, skip tests, and ship messy PRs.

Most tickets fit in one Claude Code session. Paste it in, watch it land, ship the PR. Opus 4.7 or GPT 5.5 close out a 1–3 pointer with maybe one nudge. No pipeline needed.

The interesting tickets are the 5-pointers. Frontend and backend, a multi-step wizard with three API calls, server-side validation, optimistic UI, and a test plan that catches the failure modes you care about. Paste that into a fresh chat and one of three things happens:

  1. Context runs out. The model hits the window halfway through and auto-compacts. The summary drops the bit you needed. As HumanLayer’s 12 Factor Agents puts it: “Even as models support longer and longer context windows, you’ll ALWAYS get better results with a small, focused prompt and context.”
  2. No refactoring. LLMs default to append, not restructure. Ship five of these in a row and your codebase looks like the inside of a hoarder’s garage.
  3. Silent skip. When a test fails on round four, the model deletes the test rather than fixing the bug. Opus does this often. Michael Arnaldi caught it taking the same shortcut in his Effect workshop Vibe Engineering Effect Apps: Just Clone the Repo Michael Arnaldi's workshop on building Effect apps with coding agents. The single trick: clone the library repo as a git subtree so the agent reads source code instead of fighting stale docs. ai-agentseffecttypescript +2 : “sometimes a feature, sometimes a bug — takes shortcuts, sleeps, removes failing tests rather than fixing them.”

This is a workflow problem, not a model problem. Big tickets need a pipeline, not a prompt. The one diagrammed above.

What AFK Coding Looks Like#

Pixel art assembly line showing a human aligning the spec at the top, three agents doing the work in the middle, and a human reviewing the PR at the bottom
Human aligns the spec, agents do the work, human reviews the PR.

AFK = away from keyboard. HITL = human in the loop. Sounds opposed; isn’t. The good workflow is HITL at the edges, AFK in the middle.

You don’t write less code. You run three pipelines in parallel while reading a book. You manage instead of coding. I wrote about the broader shift in The Software Factory The Software Factory: Why Your Team Will Never Work the Same Again We already have everything we need to build software factories. Teams will change. The only variable is speed. aisoftware-engineeringteams +1 ; this post zooms into the pipeline that powers a single ticket inside it.

Phase 1: Align on the Spec (HITL)#

Don’t skip this. Every problem downstream starts here. Dex Horthy puts the cost in stark terms in Advanced Context Engineering for Coding Agents:

“A flawed specification cascades through hundreds of lines; flawed research generates thousands of bad lines.”

A 5-pointer touches enough surface area that “what we want” needs to be explicit. Sit with the business, talk through the wizard, draw the state machine on a napkin. Then write a real PRD. A Jira description doesn’t count.

Use the AI here too, but in interview mode, not generation mode. Open Claude Code with a spec template and let it ask you the questions you’d otherwise forget:

Prompt

i want to build a multi-step booking wizard. interview me with one question at a time until you have enough to write a PRD. cover: goals, user, happy path, edge cases, validation, error states, and out-of-scope. write the PRD to docs/prd-booking-wizard.md when done.

The PRD is the single artifact the rest of the pipeline reads. Skimp here and the agent will ship the wrong thing five times in parallel. Spec-Driven Development with Claude Code in Action Spec-Driven Development with Claude Code in Action A practical workflow for tackling large refactors with Claude Code using parallel research subagents, written specs, and the new task system for context-efficient implementation. claude-codeailocal-first +1 covers the prompt patterns I lean on; GitHub’s Spec Kit ships a four-phase /specify → /plan → /tasks → /implement workflow that encodes the same idea if you’d rather not roll your own.

Phase 2: Slice Into Vertical Tickets (AFK)#

The PRD is too big for one Ralph loop. Slice it.

The right unit isn’t “frontend task” + “backend task” + “tests task.” That’s a horizontal slice, and horizontal slices kill agents. Every slice needs every other slice to be useful, so nothing works until everything works. Compaction guaranteed.

A vertical slice is a thin end-to-end strip of behaviour: one API endpoint, the one form step that consumes it, and the one test that proves it works. Each vertical slice is independently shippable. Each one fits in a Ralph loop. Each one survives if its sibling slice fails.

Pixel art diagram contrasting horizontal slicing (frontend, backend, tests stacked and locked together) against vertical slicing (four end-to-end slices each with their own UI, API, and tests)
Horizontal slices block on each other. Vertical slices each ship on their own.

This is Factor 10 of 12 Factor Agents applied to ticket scoping: “Small, focused agents — keep agents to 3–20 steps; avoid monolithic designs.” A vertical slice is the ticket-shaped equivalent. Steve Yegge frames the alternative bluntly in 2026: The Year The IDE Died:

“Nature builds ant swarms and Claude Code built this huge muscular ant that’s just going to bite you in half and take all your resources.”

For our wizard:

Slice 1: Step 1 — guest info       (form + POST /booking/draft + e2e test)
Slice 2: Step 2 — room selection   (form + GET /rooms + e2e test)
Slice 3: Step 3 — payment          (form + POST /booking/confirm + e2e test)
Slice 4: Wizard state machine      (cross-step navigation + e2e test)

Four sub-tickets. Each is a 1–2 pointer. Each is something Opus 4.7 can ship in one go.

Have the agent do the slicing for you:

Prompt

read docs/prd-booking-wizard.md. break it into vertical slices. each slice must be end-to-end (UI + API + test) and independently shippable. write the slices to docs/tickets/01-.md, 02-.md, …

Phase 3: Ralph Loop Per Slice (AFK)#

Pixel art of the Ralph loop: an agent reads PROMPT.md, picks an unchecked task, implements, tests, commits, and ticks the checkbox, all on a fresh context every loop
Each iteration: read PROMPT.md, pick an unchecked task, implement, test, commit, tick the box.

For each slice, spin up a fresh agent in a Ralph loop.

Ralph is Geoffrey Huntley’s technique: a while loop that pipes a prompt file into a fresh claude invocation, over and over, until a stopping condition trips. Every iteration starts with an empty context, reads the prompt file, picks the highest-priority unfinished task, ships it, commits, and exits.

while :; do
  cat PROMPT.md | claude --dangerously-skip-permissions
done

Huntley’s framing of why it works:

“That’s the beauty of Ralph — the technique is deterministically bad in an undeterministic world.”

You aren’t aiming for one perfect run. You’re aiming for many cheap runs that converge, with the prompt file as the ratchet. He shipped six repos overnight at a YC hackathon and replaced a $50K outsourcing contract with $297 of API spend.

The reason the loop works is the one Dex Horthy chronicles in A Brief History of Ralph: context windows degrade as they fill. Resetting between iterations keeps every task in the model’s “smart zone.” Factory’s write-up on The Context Window Problem makes the same case from the engineering side. Bigger windows aren’t the answer; smaller scopes are.

Pixel art comparison of a long session where context fills, focus drops, and compaction produces confusion, versus three fresh sessions each with small context that ship clean tasks
One long session compacts and confuses. Many short sessions stay sharp.

A small driver script gives each slice its own Ralph instance, in its own git worktree, with its own log file:

for ticket in docs/tickets/*.md; do
  name=$(basename "$ticket" .md)
  git worktree add "../wt-$name" -b "feat/$name"
  (
    cd "../wt-$name"
    cp "$ticket" PROMPT.md
    while :; do
      cat PROMPT.md | claude --dangerously-skip-permissions
      grep -q "^- \[ \]" PROMPT.md || break  # exit when no unchecked items
    done
  ) &
done
wait

Four slices, four worktrees, four agents, in parallel. Matt Pocock’s Sandcastle (covered here) is the productionised version of this, with sandboxing on top so the agents can’t rm -rf your home directory. The cautionary tale is real. See Claude Code is Amazing… Until It DELETES Production. YOLO mode without a sandbox is how you wake up to an empty home folder.

TDD Inside the Loop#

Pixel art of the red-green-refactor cycle next to a pressure gauge labelled 'backpressure from tests, types, lint'
Red-green-refactor is the agent's permission slip to commit. Tests, types, and lint are the backpressure that keeps it honest.

The Ralph prompt for each slice forces red-green-refactor inside every iteration:

Prompt

you are working on ONE slice. read PROMPT.md.

for each unchecked task:

  1. red: write a failing e2e test that asserts the behaviour
  2. green: write the minimum code to pass the test
  3. refactor: clean up; tests must stay green
  4. commit
  5. tick the box in PROMPT.md exit when all boxes are ticked.

Two reasons TDD matters more for AFK agents than for humans:

Moss Ebeling calls this back pressure: automated feedback that lets the agent self-correct without you in the loop. If your tests, types, and lint don’t break when something is wrong, the agent doesn’t know it’s wrong. Every minute you spend tightening backpressure is a minute you don’t spend reviewing trivial mistakes by hand.

Phase 4: Refactor Pass (AFK)#

Pixel art before-and-after comparison: messy code with anys, duplication, long files and dead branches gets cleaned by a /simplify pass into typed, small functions, shared composables, clean names, dead code gone, and consistent style
A dedicated /simplify pass turns `any`s, duplication, dead code, and inconsistent naming into typed, small, shared, clean code.

LLMs cheat on refactoring. Even with red-green-refactor in the loop, the “refactor” step usually shrinks to “rename a variable.” The deeper work (extracting a composable, collapsing duplication across the four slices, killing the dead branches) never happens. Leave a single any lying around and Opus will copy it everywhere (Arnaldi’s observation).

This is why I built /simplify: a Claude Code skill that runs an Opus subagent over the recently changed files with one job. Clean it up without changing behaviour. Tests stay green, naming improves, ternaries unfold, abstractions collapse. It’s the dedicated refactor pass the implementation loop won’t do on its own. It ships as a Claude Code skill, but the pattern is portable — if you’re on Codex or another agent, you can wire up the same thing yourself.

Run it as a separate agent against the merged branch with a narrower brief:

Prompt

run on the current branch. tests must stay green throughout.

for each of: duplication, long files, primitive obsession, dead code, and inconsistent naming — find one offender, fix it, run tests, commit. repeat until you can’t find more.

do not change behaviour. do not add features.

Then add lint-as-backpressure rules so the next slice doesn’t undo the cleanup. My ESLint setup My Opinionated ESLint Setup for Vue Projects A battle-tested linting configuration that catches real bugs, enforces clean architecture, and runs fast using Oxlint and ESLint together. vuetypescripttooling +1 covers the rules I lean on; Michael Arnaldi’s “clone the repo” workflow Vibe Engineering Effect Apps: Just Clone the Repo Michael Arnaldi's workshop on building Effect apps with coding agents. The single trick: clone the library repo as a git subtree so the agent reads source code instead of fighting stale docs. ai-agentseffecttypescript +2 takes the same idea further. Every shortcut you catch the agent taking, ban it with a rule.

Phase 5: Agentic QA (AFK)#

Pixel art of a QA agent walking through a three-step booking wizard in a real browser, ticking off happy path, invalid email, and declined card test cases, then producing screenshots and a markdown report
A QA agent drives the real browser through happy and negative paths, dropping screenshots and a markdown report.

Tests prove the unit works. They don’t prove the user can complete the wizard. For that, drive a real browser.

agent-browser is a Rust CLI that exposes the browser’s accessibility tree as a snapshot with stable element refs (@e1, @e2, …). The agent reads the snapshot, picks a ref, and clicks. No CSS selectors, no Playwright API to learn. Shell commands the agent already knows how to run.

A QA agent runs the user journey end-to-end:

Prompt

you are a QA engineer. the booking wizard is at localhost:3000/book.

run the happy path: complete all three steps with valid data. then run two negative paths: invalid email at step 1, declined card at step 3.

use agent-browser. take a screenshot at every state. write a markdown report to qa/booking-wizard.md with one section per test: steps taken, expected, actual, pass/fail, and the screenshot.

You wake up to a markdown report with screenshots. If anything is red, the report tells you which step. If everything is green, the PR is ready for human eyes.

Phase 6: Human Review (HITL)#

The agent opens the PR. Now you do the job.

The 30 minutes on the spec and the 30 minutes on the review are 80% of the value. The four hours of coding in between are the part you outsourced. As Linear frames it in their agent SDK approach: “An agent cannot be held accountable.” Issues get delegated to agents but assigned to humans. Review is the part of the job that doesn’t have an AFK version.

Why This Pipeline Beats “Just Use a Bigger Model”#

You could wait for the next model with a 10M-token window and pretend the problem goes away. It won’t. Bigger windows degrade: attention thins, costs balloon, and the model still won’t refactor on its own.

The pipeline is mostly the work humans always did, automated:

What changed: the typing happens AFK. You only show up where judgment lives.

When to skip the pipeline

For 1–3 pointers, skip all of this. One Claude Code session, one prompt, one PR. The pipeline is overhead for small work; it pays back at 5 points and up.

The Job Changes#

A developer’s day in this model:

You’re a manager of a small fleet of agents. Each ticket is a project; each Ralph loop is a contractor. The skill is your specs, your reviews, and your taste, not your typing speed.

This isn’t theoretical. Boris Cherny, creator of Claude Code, runs 15+ parallel Claude sessions on a normal day: five in numbered terminal tabs and another five-to-ten on claude.ai/code, with iTerm notifications when one needs input. Anthropic’s headcount tripled and productivity per engineer climbed 70%. Roughly 80–90% of Claude Code is now written by Claude Code itself.

Lee Robinson, after building three production projects without writing code by hand, puts it bluntly:

“Writing code was never really the bottleneck, especially for larger projects.”

Boris reaches the same conclusion from inside Anthropic:

“We’re going to start to see the title of software engineer go away. It’s just going to be ‘builder’ or ‘product manager.’”

The pipeline isn’t novel. Stripe’s Minions, Steve Yegge’s Gas Town, Pocock’s Sandcastle all encode variants of it. The novel part: the building blocks (Claude Code in headless mode, agent-browser, Ralph, git worktrees) are open and free. You can run this pipeline tonight on your own machine.

Pixel art of an overwhelmed agent surrounded by monitors showing failing pipelines, with a list on the left going from passing to warnings to fires
Parallelism is limited by review capacity. Past that, every extra pipeline is another fire.

⚠️ Don't run twenty pipelines because you can

Simon Willison flagged the dark side of all this in AI Doesn’t Reduce Work — It Intensifies It: the low cost of “just one more prompt” turns into compulsive task-stacking and burnout. The point of AFK coding is to be away from the keyboard, not to triple-book your attention. Run as many pipelines as your review capacity can absorb. Not more.

Conclusion#

Further reading#

The notes I leaned on for this post, grouped by what they support:

Press Esc or click outside to close

Stay Updated!

Subscribe to my newsletter for more TypeScript, Vue, and web dev insights directly in your inbox.

  • Background information about the articles
  • Weekly Summary of all the interesting blog posts that I read
  • Small tips and trick
Subscribe Now