Why I watched this
I’d been hitting a wall with coding agents on library-heavy Effect work. The model has stale knowledge, MCP doc servers underperform, and most “agents.md best practices” posts stay on the surface. Michael Arnaldi’s workshop promised the unvarnished version, built live from an empty directory by someone who hasn’t written code by hand in eight months. That’s the post-mortem I wanted.
What I learned
Design for what LLMs are, not what they look like
People keep treating models like little humans. They are not. The differences shape everything downstream:
- No continuous learning. Pre-training cutoff, post-training phase, then frozen. Tell the model something today and tomorrow it has forgotten.
- Context is a fixed-size array of messages, not memory. Bigger windows confuse the model. A 1M-token context often hurts more than it helps.
- Coding agents are post-trained on reading and producing code in a project. They are less trained on reading prose docs and on calling MCP tools they’ve never seen. Most agents deprioritise
node_modules; Cursor by default doesn’t even index.gitignored files.
The design constraint: the agent is dumb, has six-month-old knowledge at best, can’t remember between sessions, and gets worse the more junk you stuff into its context. Build for that.
Just clone the damn repo
If the model has stale knowledge, you need a way to deliver new knowledge. People reach for clever options: MCP doc servers, README scraping, doc-CLIs. All of these underperform a dumber move: clone the library’s source into your own working tree as a git subtree under repos/<library>, and tell the agent it’s reference material.
Now the model treats it as “more of my codebase,” explores it with the same tools it uses for your own files, and copies the patterns it finds. You’re not fighting its training; you’re feeding it the thing it’s best at consuming.
repos/
effect/ # cloned as a git subtree, no history
src/
...your code...
agents.md # tells the agent: "patterns live in repos/effect"
This is also what effect.solutions is solving, but that approach chases its tail: it gives the model a CLI to read docs, and then the model has to know how to use that CLI. Read the project’s own README and you find the line “you should actually just clone the repository.” So we did.
Patterns folders, not raw library access
Don’t point the agent at the raw cloned repo every turn. Have it distil into project-local patterns/*.md:
“Explore the Effect repo for patterns on how to build an HTTP API with OpenAPI docs and a generated type-safe client. Save your research into
patterns/http-api.md.”
Two reasons this beats re-reading the library each turn:
- Self-selection. Effect is huge. If you let the model wander, you’ll end up using every feature. Project-local patterns are the Effect-subset you chose. Matters even more in brownfield codebases where you don’t want a refactor of the world.
- Context discipline. A 200-line
patterns/sql.mdis cheap; re-reading half the Effect monorepo every turn is not.
Crank every diagnostic to error
For an AI-driven project: zero soft warnings. Every TypeScript suggestion, every unused-symbol notice, all errors. The model won’t commit code with errors. It ignores warnings. Combined with format on save: true, this gives a tight feedback loop without babysitting style.
ESLint as back-pressure: a moat against shortcuts
The other half of the loop. Every shortcut Michael caught the model taking, he banned with a custom rule:
- No
ascasts. UseSchemato validate. (When he bannedas unknown as X, the model learned thatneveris a bottom type and started writingas never as X. So he bannedasoutright.) - No
any. Nounknownat API boundaries. - No
sql<MyType>typed query placeholders. Zero runtime safety. UseSqlSchemainstead. - No constructors inside HTTP handlers to massage strings into branded types. Sign your schema at the edge is too loose.
- Branded types for every identifier.
UserId,TodoId, etc. Otherwise the agent will pass auserId: stringwhere atodoId: stringis wanted.
Spec-first, then implement, in dumb loops
Plan mode cripples tool access. Use spec-driven development instead. The first task: “discuss with me how to build X and write the plan to plans/x.md.” That spec becomes the durable artefact. A second task implements against the spec, inside a Ralph-style loop:
while true; do
agent "$(cat NEXT_TASK.md)" || break
done
Why restart the agent so often? A long session accretes irrelevant context: failed attempts, abandoned ideas, dead branches. The earliest messages then bias the model on later, unrelated tasks. With AI, less is more. Complex context-management architectures lose to dumb loops.
Match prompts to the model
The “agents.md standard” pretends models are interchangeable. They aren’t.
- GPT gets scared by ALL CAPS and de-optimises into sycophancy. Keep prompts calm. More concise output. Asks permission more often.
- Opus does the opposite. Shouted instructions get extra attention. Eager: it does the thing where GPT keeps asking. Sometimes a feature, sometimes a bug (takes shortcuts, sleeps, removes failing tests rather than fixing them).
- Leave a single
anyin a codebase and Opus will copy it everywhere. - For UI work, reach for Opus over codex. For most other things they’re interchangeable.
- Open-weights models lag frontier models by 3–6 months but already past where Sonnet 4 was when Michael started.
Skills vs slash commands
Slash commands work when one person controls the agent (/new-pattern, /update-agents). Skills generalise across teammates with different agents. Skills are not a magic wand. Install many “Next.js skills” and you pollute the context, ending up worse. Use skills for specific, narrow tasks. They’re CLI tools, not knowledge dumps.
The workflow gap nobody fills
The part most people get wrong: long-running, multi-step business processes. Take registration:
- Insert user row.
- Send confirmation email.
There’s no transaction across (1) and (2). The server crashes between them. Hence every signup flow has “if the email doesn’t arrive in 30 minutes, please retry.” That copy hides a broken system behind a UX wrapper.
The fix: a workflow engine. Temporal, Inngest, or @effect/cluster + @effect/workflow. This used to be a “large company” problem because edge cases happened twice a day at scale. AI made it everyone’s problem. With LLMs in the request path, response times went from 10ms to a minute. In that minute, the server will fail. Even a 10-user app now needs durable workflows. Hence Temporal’s moment.
Evals, and why “is this code good?” is hard
Everything above is opinion-driven. Michael thinks branded IDs are better. He prefers classes. How does he know his patterns improve agent output? Honest answer: by running evals, badly. The hard part: “is this code good?” depends. Terse vs verbose, this structure vs that, both convey meaning, both type-check. 100 humans give you an 80/20 split.
Current approach: keep human-written reference solutions, generate model output, use an LLM judge to score similarity. Not pretty. Necessary if you want to fine-tune a model on Effect itself, because you can’t reinforcement-learn without an eval signal.
The takeaway
If I remember nothing else from this:
- Clone the library repo as a git subtree. Stop fighting the agent’s training.
- Distil into
patterns/*.mdinstead of pointing at the raw repo every turn. You get to choose your subset. - Crank every diagnostic to error, and let a custom-rule linter ban the shortcuts you keep catching.
- Spec first, implement second, and run implementation in dumb loops with fresh contexts.
- Match prompts to the model. GPT calm, Opus loud. Don’t pretend they’re the same.
- Stop trying to write code. Start setting up the repository so the agent can.
My Rating
The most useful coding-agent talk I’ve watched this year, because Michael prepared nothing and showed the live mistakes (placeholder npm package, models inventing as never as X to dodge a lint rule). The “just clone the repo” insight reframes a dozen other techniques as cope. The Ralph-loop-and-fresh-context dogma matches what I arrived at on smaller projects. The workflows section is the one I keep coming back to: durable orchestration is now a 10-user-app problem, the moment you put LLMs in the request path.
- ⭐⭐⭐⭐⭐ (5/5).