# alexop.dev - Full Content > A Personal Blog from a simple web developer This file contains the full content of all posts for deep context. For a sitemap overview, see: https://alexop.dev/llms.txt ================================================================================ # BLOG POSTS ================================================================================ --- title: Spec-Driven Development with Claude Code in Action description: A practical workflow for tackling large refactors with Claude Code using parallel research subagents, written specs, and the new task system for context-efficient implementation. tags: ['claude-code', 'ai', 'local-first', 'architecture'] url: https://alexop.dev/posts/spec-driven-development-claude-code-in-action --- # Spec-Driven Development with Claude Code in Action I'm building a [simplified sync engine from scratch](https://github.com/alexanderop/nuxt-sync-engine) using Nuxt 4. My approach: study how production-grade frameworks solve the hard problems, then implement a minimal version myself. [Jazz](https://jazz.tools) is my primary reference a local-first framework with elegant patterns for persistence, conflict resolution, and cross-tab sync. Rather than reading through their codebase manually, I use Claude Code to research, extract patterns, and help me implement them. This post documents the workflow I call **Spec-Driven Development with Claude Code** the exact prompts, tools, and patterns I used to migrate my storage layer from SQLite/WASM to IndexedDB in a single day. ## The Problem My sync engine was using `sql.js` (SQLite compiled to WASM) for client-side storage. It worked, but had issues: - Large WASM bundle (~1MB) - Complex COOP/COEP header requirements - No native cross-tab sync I wanted to migrate to IndexedDB, borrowing patterns from Jazz. But this was a significant refactor touching 15+ files. (For background on local-first web development and why it matters, see my earlier post.) ## The Workflow Instead of diving into code, I used Claude Code as an **AI development team**with myself as the product owner, Claude as the tech lead, and subagents as developers. Important: I also cloned the source code of Jazz into my Project so Claude could reference it during research and implementation.
## Phase 1: Research with Parallel Subagents ### The Prompt you have access to jazz source repo explain to me how they use indexdb in the client to persist state our project is using sqlite but we want to change to indexdb with jazz your goal is to write a report spin up multiple subagents for your research task ### What Happened Claude spawned **5 parallel research agents**, each investigating a specific aspect of Jazz:
Each agent explored the Jazz codebase independently and reported back: | Agent | Focus | Key Findings | |-------|-------|--------------| | CRDT | Data structures | CoMap, CoList use operation-based CRDTs with LWW | | WebSocket | Real-time sync | 4-message protocol: load, known, content, done | | Push/Pull | Sync strategy | Hybrid model with known-state tracking | | Storage | Persistence | IndexedDB with `coValues`, `sessions`, `transactions` stores | | Architecture | Overall design | Monorepo with platform adapters | ### Follow-up Prompt research longer and improve the plan This triggered deeper investigation into edge cases and implementation details. ## Phase 2: Spec Creation After research, Claude wrote a comprehensive technical specification to `docs/indexeddb-migration-spec.md`: [Full spec](https://gist.github.com/alexanderop/70ef80ac6dda5166c5085cc9bb269df1) ```markdown # IndexedDB Migration Specification ## Part 1: How Jazz Uses IndexedDB - Database schema (coValues, sessions, transactions stores) - Transaction queuing pattern - Entity caching layer - Session-based conflict resolution ## Part 2: Current SQLite Architecture Analysis - sql.js WASM setup - Existing sync protocol - Pain points and limitations ## Part 3: Migration Plan (4 Phases) - Phase 1: Core IndexedDB utilities - Phase 2: Composables layer - Phase 3: Cross-tab sync - Phase 4: Cleanup and testing ## Part 4: Implementation Checklist - [ ] idb-helpers.ts - [ ] useIndexedDB.ts - [ ] useSessionTracking.ts - ... (14 items total) ``` **Key insight**: The spec becomes the source of truth. It's a document Claude can reference during implementation, ensuring consistency across all tasks. It also becomes a Pin that we can use if something went wrong during implementation. ## Phase 3: Spec Refinement via Interview Before implementation, I wanted to ensure the spec was solid. I used Claude's `AskUserQuestion` tool: ### The Prompt use the ask_user_question tool do you have any questions regarding @docs/indexeddb-migration-spec.md before we implement it we want to improve the specs Claude asked clarifying questions: - Should we support migration from existing SQLite data? - What's the preferred conflict resolution strategy? - Should cross-tab sync use BroadcastChannel or SharedWorker? After answering, I requested Vue-specific improvements: we want to use provide and inject you have access to the source code of pinia spin up multiple subagents how they do it so we can use same patterns Claude researched Pinia's patterns and updated the spec with: - Symbol-based injection keys - Provider composables with fallback patterns - Proper cleanup on unmount ## Phase 4: Implementation with Task Delegation This is where the new **Claude Code Task System** shines. (If you're unfamiliar with subagents and how they work in Claude Code, my customization guide covers the fundamentals.) ### The Prompt implement @docs/indexeddb-migration-spec.md use the task tool and each task should only be done by a subagent so that context is clear after each task do a commit before you continue you are the main agent and your subagents are your devs ### Understanding Claude Code's Task System Claude Code's task systeminspired by [Beads](https://github.com/beads-ai/beads), Steve Yegge's distributed git-backed issue trackersolves two critical problems with AI coding agents: **Agent Amnesia**: Starting a new session mid-task loses all progress unless you manually document remaining work. **Context Pollution**: A full context window makes the agent drop discovered bugs instead of tracking them. The previous todo list lived in session memory and vanished on restart. The new task system persists tasks to disk, making them shareable across sessions and subagents. ### How Tasks Persist Tasks are stored in `.claude/tasks/{session-id}/` as JSON files: ```json { "id": "task-1", "subject": "Create idb-helpers.ts", "description": "Implement IndexedDB promise wrappers...", "status": "pending | in_progress | completed", "blocks": ["task-3", "task-4"], "blockedBy": ["task-0"] } ``` ### The Four Task Tools | Tool | Purpose | |------|---------| | `TaskCreate` | Create a new task with subject, description, and dependencies | | `TaskUpdate` | Update status (pending → in_progress → completed) or modify dependencies | | `TaskList` | View all tasks, their status, and what's blocked | | `TaskGet` | Get full details of a specific task including description | ### Task System Architecture
### Why Subagents + Tasks = Context Efficiency By delegating each task to a subagent, the main session stays leanit only handles orchestration (creating tasks, tracking progress, committing). Each subagent gets a fresh context window focused entirely on its specific task, reads what it needs, implements, and returns. This means the main agent won't run out of context even for larger refactors with dozens of tasks. For truly massive projects spanning days or weeks, a full autonomous agent like [Ralph](https://ghuntley.com/ralph/) would be more appropriate. Ralph is elegantly simplea bash loop that feeds a markdown file into Claude Code repeatedly:
The key difference: Ralph executes each iteration in a completely new Claude session, using the markdown file as the only persistent memory. This makes it truly stateless and capable of running for days. This spec-driven approach hits a middle ground: subagents get fresh context but the main orchestrator maintains state within a single session. Structured enough to maintain coherence, flexible enough to handle complexity, without the setup overhead of a full autonomous system. ### The Execution Flow
### Why This Pattern Works 1. **Context isolation**: Each subagent starts fresh, reading only what it needsno accumulated cruft 2. **Persistent progress**: Tasks survive session restarts; pick up where you left off 3. **Dependency-aware parallelism**: Claude identifies which tasks can run concurrently 4. **Atomic commits**: Every task = one commit, making rollbacks trivial 5. **Spec as contract**: Subagents reference the spec, ensuring consistency ### Backpressure: Let the System Catch Mistakes One crucial element that makes atomic commits powerful: [backpressure](https://banay.me/dont-waste-your-backpressure/). Instead of manually reviewing every change, set up pre-commit hooks that run tests, linting, and type checking automatically. ```bash # .husky/pre-commit pnpm typecheck && pnpm lint && pnpm test-run ``` When a subagent commits, the hook runs immediately. If tests fail, the commit is rejected and the agent sees the error outputgiving it a chance to self-correct before moving on. This creates automated feedback that catches issues at the source rather than accumulating bugs across multiple tasks. The result: you stop being the bottleneck for quality control. The system validates correctness while you focus on higher-level decisions. ### When Things Go Wrong The first execution wasn't perfectI started the project and hit some errors. But here's where the spec pays off: I opened a new chat, pinned the spec document, pasted the error, and Claude fixed it immediately. No context rebuilding, no re-explaining the architecture. The spec acts as a recovery point. When a session goes sideways or context gets polluted, you don't lose everythingyou have a document that captures the full intent and design decisions. ### The Results After ~45 minutes: ``` $ git log-oneline | head20 9dc1c96 refactor: clean up code structure 9fce16b feat(storage): migrate from SQLite to IndexedDB 835c494 feat: integrate IDB sync engine provider d2cd7b7 refactor: remove SQLite/sql.js dependencies 2fb7656 feat: add browser mode test stubs ... (14 commits total) ``` **14 tasks completed**, **14 commits**, **15+ files changed**, **one PR ready for review**. See the [full pull request](https://github.com/alexanderop/nuxt-sync-engine/pull/3) (includes additional manual changes). And despite orchestrating 14 subagents, the main session's context stayed manageable: This proves the delegation pattern worksthe main agent handled orchestration while subagents did the heavy lifting in isolated contexts. ## The Prompt Patterns Here are the key prompt patterns that make this workflow effective: ### 1. Parallel Research spin up multiple subagents for your research task Triggers Claude to spawn parallel agents, each investigating independently. Much faster than sequential research. ### 2. Spec-First Development your goal is to write a report/document Forces Claude to produce a written artifact before any code. This becomes the source of truth. ### 3. Interview Before Implementation use the ask_user_question tool... before we implement Surfaces ambiguities and design decisions before they become bugs. ### 4. Task Delegation with Commits use the task tool and each task should only be done by a subagent after each task do a commit before you continue Creates the orchestration pattern with atomic commits. ### 5. Role Assignment you are the main agent and your subagents are your devs Sets expectations for how Claude should behaveas a coordinator, not a solo implementer. ## Comparison: Traditional vs Spec-Driven | | Traditional AI Coding | Spec-Driven Development | |---|---|---| | **Flow** | Prompt → Code → Debug → Repeat | Research → Spec → Refine → Tasks → Done | | **Context** | Fills up with failed attempts | Each task gets fresh context | | **Memory** | No persistence across sessions | Spec is persistent source of truth | | **Bug tracking** | Discovered late, forgotten | Bugs become new tasks | | **Completion** | No clear stopping point | Clear completion criteria | ## Advanced: Multi-Session Workflows The task system supports coordination across multiple Claude Code sessions. Set a shared task list ID: ```bash CLAUDE_CODE_TASK_LIST_ID=myproject claude ``` Or add to `.claude/settings.json`: ```json { "env": { "CLAUDE_CODE_TASK_LIST_ID": "myproject" } } ``` One session acts as **orchestrator**; another becomes a **checker** that monitors completed tasks, verifies implementation quality, and adds follow-up tasks for anything missing. ## When to Use This Workflow This pattern excels for: - **Large refactors** touching many files - **Migrations** requiring research into external codebases - **Feature implementations** with unclear requirements - **Learning new libraries** by studying their source It's overkill for: - Small bug fixes - Single-file changes - Well-defined, simple features ## The Tools You Need 1. **Claude Code CLI** (latest version with task tools) 2. **A spec document** (markdown works great) 3. **Reference codebases** if learning from existing implementations 4. **Git** for atomic commits ## Further Reading - [Beads](https://github.com/beads-ai/beads) Steve Yegge's git-backed issue tracker that inspired the task system - [12 Factor Agents](https://12factor.net/agents) Design principles for AI coding agents - [Building Effective Agents](https://www.anthropic.com/research/building-effective-agents) Anthropic's research on agent architectures - For a broader overview of Claude Code's feature stack, see my comprehensive guide ## Conclusion Spec-Driven Development with Claude Code mirrors real engineering workflows: parallel work, handoffs, blockers, and dependencies. Instead of treating Claude as a solo coder, you treat it as a team. The key insight from Beads applies here: > "By having each task that you give a coding agent isolated into its own context window, you can now give it the ability to log any bugs for later." The SQLite to IndexedDB migration would have taken me 2-3 days manually. With this workflow, it took one afternoonand produced better code thanks to the research phase uncovering patterns from Jazz I wouldn't have found on my own. --- *Try it yourself: Start your next significant feature with "write a spec for X, spin up subagents for research" and see how it changes your workflow.* --- --- title: My Opinionated ESLint Setup for Vue Projects description: A battle-tested linting configuration that catches real bugs, enforces clean architecture, and runs fast using Oxlint and ESLint together. tags: ['vue', 'typescript', 'tooling', 'architecture'] url: https://alexop.dev/posts/opinionated-eslint-setup-vue-projects --- # My Opinionated ESLint Setup for Vue Projects Over the last 7+ years as a Vue developer, I've developed a highly opinionated style for writing Vue components. Some of these rules might not be useful for you, but I thought it was worth sharing so you can pick what fits your project. The goal is to enforce code structure that's readable for both developers and AI agents. These rules aren't arbitrary—they encode patterns I've written about extensively: - How to Write Clean Vue Components explains why I separate business logic into pure functions - How to Structure Vue Projects covers my feature-based architecture approach - Building a Modular Monolith with Nuxt Layers applies feature isolation to Nuxt projects - The Problem with `as` in TypeScript covers why I ban type assertions - Robust Error Handling in TypeScript introduces the Result pattern behind my `tryCatch` rule - Vue 3 Testing Pyramid explains my integration-first testing strategy - Frontend Testing Guide shares my test naming conventions ESLint rules are how I enforce these patterns automatically—so the codebase stays consistent even as the team grows. **Why linting matters more in the AI era:** As AI agents write more of our code, strict linting becomes essential. It's a form of [back pressure](https://banay.me/dont-waste-your-backpressure/?ref=ghuntley.com)—automated feedback mechanisms that tell an agent when it's made a mistake, allowing it to self-correct without your intervention. You have a limited budget of feedback (your time and attention). If you spend that budget telling the agent "you missed an import" or "that type is wrong," you can't spend it on architectural decisions or complex logic. Type checkers, linters, and test suites act as back pressure: they push back against bad code so you don't have to. Your ESLint config is now part of your prompt—it's the automated quality gate that lets agents iterate until they pass. ## Table of Contents ## Why Two Linters? Oxlint + ESLint I run two linters: **Oxlint** first, then **ESLint**. Why? Speed and coverage. ### Oxlint: The Speed Demon [Oxlint](https://oxc.rs/docs/guide/usage/linter.html) is written in Rust. It runs 50-100x faster than ESLint on large codebases. My pre-commit hook completes in milliseconds instead of seconds. ```bash # In package.json "lint:oxlint": "oxlint . --fix --ignore-path .gitignore", "lint:eslint": "eslint . --fix --cache", "lint": "run-s lint:*" # Runs oxlint first, then eslint ``` **The tradeoff:** Oxlint supports fewer rules. It handles: - **Correctness & suspicious patterns** - catches bugs early - **Core ESLint equivalents** - `no-console`, `no-explicit-any` - **TypeScript basics** - `array-type`, `consistent-type-definitions` But Oxlint lacks: - Vue-specific rules (`vue/*`) - Import boundary rules (`import-x/*`) - Vitest testing rules (`vitest/*`) - i18n rules (`@intlify/vue-i18n/*`) - Custom local rules ### The Setup Oxlint runs first for fast feedback. ESLint runs second for comprehensive checks. The `eslint-plugin-oxlint` package tells ESLint to skip rules that Oxlint already handles. ```typescript // eslint.config.ts export default defineConfigWithVueTs( // ... other configs ...pluginOxlint.buildFromOxlintConfigFile('./.oxlintrc.json'), ) ``` ```json // .oxlintrc.json { "$schema": "./node_modules/oxlint/configuration_schema.json", "categories": { "correctness": "error", "suspicious": "warn" }, "rules": { "typescript/no-explicit-any": "error", "eslint/no-console": ["error", { "allow": ["warn", "error"] }] } } ``` --- ## Must-Have Rules These rules catch real bugs and enforce maintainable code. Enable them on every Vue project. --- ### Cyclomatic Complexity Complex functions are hard to test and understand. This rule limits branching logic per function. ```typescript // eslint.config.ts { rules: { 'complexity': ['warn', { max: 10 }] } } ``` ```typescript function processOrder(order: Order) { if (order.status === 'pending') { if (order.items.length > 0) { if (order.payment) { if (order.payment.verified) { if (order.shipping) { // 5 levels deep, complexity keeps growing... } } } } } } ``` ```typescript function processOrder(order: Order) { if (!isValidOrder(order)) return processPayment(order.payment) scheduleShipping(order.shipping) } function isValidOrder(order: Order): boolean { return order.status === 'pending' && order.items.length > 0 && order.payment?.verified === true } ``` **Threshold guidance:** - ESLint default: `20` (lenient) - This project uses: `10` (stricter) - Consider `15` as a middle ground for legacy codebases > [ESLint: complexity](https://eslint.org/docs/latest/rules/complexity) --- ### No Nested Ternaries Nested ternaries are hard to read. Use early returns or separate variables instead. ```typescript // eslint.config.ts { rules: { 'no-nested-ternary': 'error' } } ``` ```typescript const label = isLoading ? 'Loading...' : hasError ? 'Failed' : 'Success' ``` ```typescript function getLabel() { if (isLoading) return 'Loading...' if (hasError) return 'Failed' return 'Success' } const label = getLabel() ``` > [ESLint: no-nested-ternary](https://eslint.org/docs/rules/no-nested-ternary) --- ### No Type Assertions Type assertions (`as Type`) bypass TypeScript's type checker. They hide bugs. Use type guards or proper typing instead. ```typescript // eslint.config.ts { rules: { '@typescript-eslint/consistent-type-assertions': ['error', { assertionStyle: 'never' }] } } ``` `as const` assertions are always allowed, even with `assertionStyle: 'never'`. Const assertions don't bypass type checking—they make types more specific. ```typescript const user = response.data as User // What if it's not a User? const element = document.querySelector('.btn') as HTMLButtonElement element.click() // Runtime error if element is null ``` ```typescript // Use type guards function isUser(data: unknown): data is User { return typeof data === 'object' && data !== null && 'id' in data && 'name' in data } if (isUser(response.data)) { const user = response.data // TypeScript knows it's User } // Handle nulls properly const element = document.querySelector('.btn') if (element instanceof HTMLButtonElement) { element.click() } ``` > [TypeScript ESLint: consistent-type-assertions](https://typescript-eslint.io/rules/consistent-type-assertions) --- ### No Enums TypeScript enums have quirks. They generate JavaScript code, have numeric reverse mappings, and behave differently from union types. Use literal unions or const objects instead. ```typescript // eslint.config.ts { rules: { 'no-restricted-syntax': ['error', { selector: 'TSEnumDeclaration', message: 'Use literal unions or `as const` objects instead of enums.' }] } } ``` ```typescript enum Status { Pending, Active, Done } const status: Status = Status.Pending ``` ```typescript // Literal union - simplest type Status = 'pending' | 'active' | 'done' // Or const object when you need values const Status = { Pending: 'pending', Active: 'active', Done: 'done' } as const type Status = typeof Status[keyof typeof Status] ``` > [ESLint: no-restricted-syntax](https://eslint.org/docs/rules/no-restricted-syntax) --- ### No else/else-if `else` and `else-if` blocks increase nesting. Early returns are easier to read and reduce cognitive load. ```typescript // eslint.config.ts { rules: { 'no-restricted-syntax': ['error', { selector: 'IfStatement > IfStatement.alternate', message: 'Avoid `else if`. Prefer early returns or ternary operators.' }, { selector: 'IfStatement > :not(IfStatement).alternate', message: 'Avoid `else`. Prefer early returns or ternary operators.' } ] } } ``` ```typescript function getDiscount(user: User) { if (user.isPremium) { return 0.2 } else if (user.isMember) { return 0.1 } else { return 0 } } ``` ```typescript function getDiscount(user: User) { if (user.isPremium) return 0.2 if (user.isMember) return 0.1 return 0 } ``` > [ESLint: no-restricted-syntax](https://eslint.org/docs/rules/no-restricted-syntax) --- ### No Native try/catch Native try/catch blocks are verbose and error-prone. Use a utility function that returns a result tuple instead. ```typescript // eslint.config.ts { rules: { 'no-restricted-syntax': ['error', { selector: 'TryStatement', message: 'Use tryCatch() from @/lib/tryCatch instead of try/catch. Returns Result tuple: [error, null] | [null, data].' }] } } ``` ```typescript async function fetchUser(id: string) { try { const response = await api.get(`/users/${id}`) return response.data } catch (error) { console.error(error) return null } } ``` ```typescript async function fetchUser(id: string) { const [error, response] = await tryCatch(api.get(`/users/${id}`)) if (error) { console.error(error) return null } return response.data } ``` The `tryCatch` utility returns `[error, null]` or `[null, data]`, similar to Go's error handling. > [ESLint: no-restricted-syntax](https://eslint.org/docs/rules/no-restricted-syntax) --- ### No Direct DOM Manipulation Vue manages the DOM. Calling `document.querySelector` bypasses Vue's reactivity and template refs. Use `useTemplateRef()` instead. If you're on Vue 3.5+, the built-in rule already enforces this. ```typescript // eslint.config.ts { files: ['src/**/*.vue'], rules: { 'vue/prefer-use-template-ref': 'error' } } ``` ```vue ``` ```vue ``` > [ESLint: no-restricted-syntax](https://eslint.org/docs/rules/no-restricted-syntax) --- ### Feature Boundary Enforcement Features should not import from other features. This keeps code modular and prevents circular dependencies. If you're using a feature-based architecture, this rule is essential—see How to Structure Vue Projects for more on this approach. ```typescript // eslint.config.ts { plugins: { 'import-x': pluginImportX }, rules: { 'import-x/no-restricted-paths': ['error', { zones: [ // === CROSS-FEATURE ISOLATION === // Features cannot import from other features { target: './src/features/workout', from: './src/features', except: ['./workout'] }, { target: './src/features/exercises', from: './src/features', except: ['./exercises'] }, { target: './src/features/settings', from: './src/features', except: ['./settings'] }, { target: './src/features/timers', from: './src/features', except: ['./timers'] }, { target: './src/features/templates', from: './src/features', except: ['./templates'] }, { target: './src/features/benchmarks', from: './src/features', except: ['./benchmarks'] }, // === UNIDIRECTIONAL FLOW === // Shared code cannot import from features or views { target: ['./src/components', './src/composables', './src/lib', './src/db', './src/types', './src/stores'], from: ['./src/features', './src/views'] }, // Features cannot import from views (views are top-level orchestrators) { target: './src/features', from: './src/views' } ] }] } } ``` **Unidirectional Flow:** The architecture enforces a strict dependency hierarchy. Views orchestrate features, features use shared code, but never the reverse. ``` views → features → shared (components, composables, lib, db, types, stores) ``` ```typescript // src/features/workout/composables/useWorkout.ts // Cross-feature import! ``` ```typescript // src/features/workout/composables/useWorkout.ts // Use shared database layer instead ``` > [eslint-plugin-import-x: no-restricted-paths](https://github.com/un-ts/eslint-plugin-import-x) --- ### Vue Component Naming Consistent naming makes components easy to find and identify. ```typescript // eslint.config.ts { files: ['src/**/*.vue'], rules: { 'vue/multi-word-component-names': ['error', { ignores: ['App', 'Layout'] }], 'vue/component-definition-name-casing': ['error', 'PascalCase'], 'vue/component-name-in-template-casing': ['error', 'PascalCase', { registeredComponentsOnly: false }], 'vue/match-component-file-name': ['error', { extensions: ['vue'], shouldMatchCase: true }], 'vue/prop-name-casing': ['error', 'camelCase'], 'vue/attribute-hyphenation': ['error', 'always'], 'vue/custom-event-name-casing': ['error', 'kebab-case'] } } ``` ```vue ``` ```vue ``` > [eslint-plugin-vue: component rules](https://eslint.vuejs.org/rules/) --- ### Dead Code Detection in Vue Find unused props, refs, and emits before they become tech debt. ```typescript // eslint.config.ts { files: ['src/**/*.vue'], rules: { 'vue/no-unused-properties': ['error', { groups: ['props', 'data', 'computed', 'methods'] }], 'vue/no-unused-refs': 'error', 'vue/no-unused-emit-declarations': 'error' } } ``` ```vue ``` ```vue ``` > [eslint-plugin-vue: no-unused-properties](https://eslint.vuejs.org/rules/no-unused-properties.html) --- ### No Hardcoded i18n Strings Hardcoded strings break internationalization. The `@intlify/vue-i18n` plugin catches them. ```typescript // eslint.config.ts { files: ['src/**/*.vue'], plugins: { '@intlify/vue-i18n': pluginVueI18n }, rules: { '@intlify/vue-i18n/no-raw-text': ['error', { ignorePattern: '^[-#:()&+×/°′″%]+', ignoreText: ['kg', 'lbs', 'cm', 'ft/in', '—', '•', '✓', '›', '→', '·', '.', 'Close'], attributes: { '/.+/': ['title', 'aria-label', 'aria-placeholder', 'placeholder', 'alt'] } }] } } ``` The `attributes` option catches hardcoded strings in accessibility attributes too. ```vue ``` ```vue ``` > [eslint-plugin-vue-i18n](https://eslint-plugin-vue-i18n.intlify.dev/) --- ### No Disabling i18n Rules Prevent developers from bypassing i18n checks with `eslint-disable` comments. ```typescript // eslint.config.ts { files: ['src/**/*.vue'], plugins: { '@eslint-community/eslint-comments': pluginEslintComments }, rules: { '@eslint-community/eslint-comments/no-restricted-disable': [ 'error', '@intlify/vue-i18n/*' ] } } ``` ```vue ``` ```vue ``` > [@eslint-community/eslint-plugin-eslint-comments](https://github.com/eslint-community/eslint-plugin-eslint-comments) --- ### No Hardcoded Route Strings Use named routes instead of hardcoded path strings for maintainability. ```typescript // eslint.config.ts { rules: { 'no-restricted-syntax': ['error', { selector: 'CallExpression[callee.property.name="push"][callee.object.name="router"] > Literal:first-child', message: 'Use named routes with RouteNames instead of hardcoded path strings.' }, { selector: 'CallExpression[callee.property.name="push"][callee.object.name="router"] > TemplateLiteral:first-child', message: 'Use named routes with RouteNames instead of template literals.' } ] } } ``` ```typescript router.push('/workout/123') router.push(`/workout/${id}`) ``` ```typescript router.push({ name: RouteNames.WorkoutDetail, params: { id } }) ``` > [ESLint: no-restricted-syntax](https://eslint.org/docs/latest/rules/no-restricted-syntax) --- ### Enforce Integration Test Helpers Ban direct `render()` or `mount()` calls in tests. Use a centralized test helper instead. For more on testing strategies in Vue, see Vue 3 Testing Pyramid: A Practical Guide with Vitest Browser Mode. ```typescript // eslint.config.ts { files: ['src/**/__tests__/**/*.{ts,spec.ts}'], ignores: ['src/__tests__/helpers/**'], rules: { 'no-restricted-imports': ['error', { paths: [ { name: 'vitest-browser-vue', importNames: ['render'], message: 'Use createTestApp() from @/__tests__/helpers/createTestApp instead.' }, { name: '@vue/test-utils', importNames: ['mount', 'shallowMount'], message: 'Use createTestApp() instead of mounting components directly.' } ] }] } } ``` ```typescript const { getByText } = render(MyComponent) const wrapper = mount(MyComponent) ``` ```typescript const { page } = await createTestApp({ route: '/workout' }) ``` This ensures all tests use consistent setup with routing, i18n, and database. > [ESLint: no-restricted-imports](https://eslint.org/docs/latest/rules/no-restricted-imports) --- ### Enforce pnpm Catalogs When using pnpm workspaces, enforce that dependencies use catalog references. ```typescript // eslint.config.ts export default defineConfigWithVueTs( // ... other configs ...pnpmConfigs.recommended, ) ``` This ensures dependencies are managed centrally in `pnpm-workspace.yaml`. > [eslint-plugin-pnpm](https://github.com/nickmccurdy/eslint-plugin-pnpm) --- ## Nice-to-Have Rules These rules improve code quality but are less critical. Enable them after the must-haves are in place. --- ### Vue 3.5+ API Enforcement Use the latest Vue 3.5 APIs for cleaner code. ```typescript // eslint.config.ts { files: ['src/**/*.vue'], rules: { 'vue/define-props-destructuring': 'error', 'vue/prefer-use-template-ref': 'error' } } ``` ```vue ``` ```vue ``` > [eslint-plugin-vue: define-props-destructuring](https://eslint.vuejs.org/rules/define-props-destructuring.html) --- ### Explicit Component APIs Require `defineExpose` and `defineSlots` to make component interfaces explicit. ```typescript // eslint.config.ts { files: ['src/**/*.vue'], rules: { 'vue/require-expose': 'warn', 'vue/require-explicit-slots': 'warn' } } ``` ```vue ``` ```vue ``` > [eslint-plugin-vue: require-expose](https://eslint.vuejs.org/rules/require-expose.html) --- ### Template Depth Limit Deep template nesting is hard to read. Extract nested sections into components. This one matters a lot—it helps you avoid ending up with components that are 2000 lines long. ```typescript // eslint.config.ts { files: ['src/**/*.vue'], rules: { 'vue/max-template-depth': ['error', { maxDepth: 8 }], 'vue/max-props': ['error', { maxProps: 6 }] } } ``` ```vue ``` ```vue ``` > [eslint-plugin-vue: max-template-depth](https://eslint.vuejs.org/rules/max-template-depth.html) --- ### Better Assertions in Tests Use specific matchers for clearer test failures. ```typescript // eslint.config.ts { files: ['src/**/__tests__/*'], rules: { 'vitest/prefer-to-be': 'error', 'vitest/prefer-to-have-length': 'error', 'vitest/prefer-to-contain': 'error', 'vitest/prefer-mock-promise-shorthand': 'error' } } ``` ```typescript expect(value === null).toBe(true) expect(arr.length).toBe(3) expect(arr.includes('foo')).toBe(true) ``` ```typescript expect(value).toBeNull() expect(arr).toHaveLength(3) expect(arr).toContain('foo') // Also prefer mock shorthands vi.fn().mockResolvedValue('data') // Instead of mockReturnValue(Promise.resolve('data')) ``` > [eslint-plugin-vitest](https://github.com/veritem/eslint-plugin-vitest) --- ### Test Structure Rules Keep tests organized and readable. ```typescript // eslint.config.ts { files: ['src/**/__tests__/*'], rules: { 'vitest/consistent-test-it': ['error', { fn: 'it' }], 'vitest/prefer-hooks-on-top': 'error', 'vitest/prefer-hooks-in-order': 'error', 'vitest/no-duplicate-hooks': 'error', 'vitest/require-top-level-describe': 'error', 'vitest/max-nested-describe': ['error', { max: 2 }], 'vitest/no-conditional-in-test': 'warn' } } ``` ```typescript test('works', () => {}) // Inconsistent: test vs it it('also works', () => {}) describe('feature', () => { it('test 1', () => {}) beforeEach(() => {}) // Hook after test! describe('nested', () => { describe('too deep', () => { describe('way too deep', () => {}) // 3 levels! }) }) }) ``` ```typescript describe('feature', () => { beforeEach(() => {}) // Hooks first, in order it('does something', () => {}) it('does another thing', () => {}) describe('edge cases', () => { it('handles null', () => {}) }) }) // no-conditional-in-test prevents flaky tests // Bad: if (data.length > 0) { expect(data[0]).toBeDefined() } // Good: expect(data).toHaveLength(3); expect(data[0]).toBeDefined() ``` > [eslint-plugin-vitest](https://github.com/veritem/eslint-plugin-vitest) --- ### Prefer Vitest Locators in Tests Use Vitest Browser locators instead of raw DOM queries. ```typescript // eslint.config.ts { files: ['src/**/__tests__/**/*.{ts,spec.ts}'], rules: { 'no-restricted-syntax': ['warn', { selector: 'CallExpression[callee.property.name=/^querySelector(All)?$/]', message: 'Prefer page.getByRole(), page.getByText(), or page.getByTestId() over querySelector. Vitest locators are more resilient to DOM changes.' }] } } ``` ```typescript const button = container.querySelector('.submit-btn') await button?.click() ``` ```typescript const button = page.getByRole('button', { name: 'Submit' }) await button.click() ``` > [Vitest Browser Mode](https://vitest.dev/guide/browser/) --- ### Unicorn Rules The `eslint-plugin-unicorn` package catches common mistakes and enforces modern JavaScript patterns. ```typescript // eslint.config.ts pluginUnicorn.configs.recommended, { name: 'app/unicorn-overrides', rules: { // === Enable non-recommended rules that add value === 'unicorn/better-regex': 'warn', // Simplify regexes: /[0-9]/ → /\d/ 'unicorn/custom-error-definition': 'error', // Correct Error subclassing 'unicorn/no-unused-properties': 'warn', // Dead code detection 'unicorn/consistent-destructuring': 'warn', // Use destructured vars consistently // === Disable rules that conflict with project conventions === 'unicorn/no-null': 'off', // We use null for database values 'unicorn/filename-case': 'off', // Vue uses PascalCase, tests use camelCase 'unicorn/prevent-abbreviations': 'off', // props, e, Db are fine 'unicorn/no-array-callback-reference': 'off', // arr.filter(isValid) is fine 'unicorn/no-await-expression-member': 'off', // (await fetch()).json() is fine 'unicorn/no-array-reduce': 'off', // reduce is useful for aggregations 'unicorn/no-useless-undefined': 'off' // mockResolvedValue(undefined) for TS } } ``` **Examples:** ```typescript // unicorn/better-regex // Bad: /[0-9]/ // Good: /\d/ // unicorn/consistent-destructuring // Bad: const { foo } = object console.log(object.bar) // Uses object.bar instead of destructuring // Good: const { foo, bar } = object console.log(bar) ``` > [eslint-plugin-unicorn](https://github.com/sindresorhus/eslint-plugin-unicorn) --- ## Custom Local Rules Sometimes you need rules that don't exist. Write them yourself. ### Composable Must Use Vue A file named `use*.ts` should import from Vue. If it doesn't, it's a utility, not a composable. For more on writing proper composables, see Vue Composables Style Guide: Lessons from VueUse's Codebase. ```typescript // eslint-local-rules/composable-must-use-vue.ts const VALID_VUE_SOURCES = new Set(['vue', '@vueuse/core', 'vue-router', 'vue-i18n']) const VALID_PATH_PATTERNS = [/^@\/stores\//] // Global state composables count too function isComposableFilename(filename: string): boolean { return /^use[A-Z]/.test(path.basename(filename, '.ts')) } const rule: Rule.RuleModule = { meta: { messages: { notAComposable: 'File "{{filename}}" does not import from Vue. Rename it or add Vue imports.' } }, create(context) { if (!isComposableFilename(context.filename)) return {} let hasVueImport = false return { ImportDeclaration(node) { if (VALID_VUE_SOURCES.has(node.source.value)) { hasVueImport = true } }, 'Program:exit'(node) { if (!hasVueImport) { context.report({ node, messageId: 'notAComposable' }) } } } } } ``` ```typescript // src/composables/useFormatter.ts export function useFormatter() { return { formatDate: (d: Date) => d.toISOString() // No Vue imports! } } ``` ```typescript // src/lib/formatter.ts (renamed) export function formatDate(d: Date) { return d.toISOString() } // OR add Vue reactivity: // src/composables/useFormatter.ts export function useFormatter() { const locale = ref('en-US') const formatter = computed(() => new Intl.DateTimeFormat(locale.value)) return { formatter, locale } } ``` --- ### No Hardcoded Tailwind Colors Hardcoded Tailwind colors (`bg-blue-500`) make theming impossible. Use semantic colors (`bg-primary`). ```typescript // eslint-local-rules/no-hardcoded-colors.ts // Status colors (red, amber, yellow, green, emerald) are ALLOWED for semantic states const HARDCODED_COLORS = ['slate', 'gray', 'zinc', 'blue', 'purple', 'pink', 'orange', 'indigo', 'violet'] const COLOR_UTILITIES = ['bg', 'text', 'border', 'ring', 'fill', 'stroke'] const rule: Rule.RuleModule = { meta: { messages: { noHardcodedColor: 'Avoid "{{color}}". Use semantic classes like bg-primary, text-foreground.' } }, create(context) { return { Literal(node) { if (typeof node.value !== 'string') return const matches = findHardcodedColors(node.value) for (const color of matches) { context.report({ node, messageId: 'noHardcodedColor', data: { color } }) } } } } } ``` ```vue ``` ```vue ``` Status colors (`red`, `amber`, `yellow`, `green`, `emerald`) are intentionally allowed for error/warning/success states. Only use these for semantic status indication, not general styling. --- ### No let in describe Blocks Mutable variables in test describe blocks create hidden state. Use setup functions instead. ```typescript // eslint-local-rules/no-let-in-describe.ts const rule: Rule.RuleModule = { meta: { messages: { noLetInDescribe: 'Avoid `let` in describe blocks. Use setup functions instead.' } }, create(context) { let describeDepth = 0 return { CallExpression(node) { if (isDescribeCall(node)) describeDepth++ }, 'CallExpression:exit'(node) { if (isDescribeCall(node)) describeDepth-- }, VariableDeclaration(node) { if (describeDepth > 0 && node.kind === 'let') { context.report({ node, messageId: 'noLetInDescribe' }) } } } } } ``` ```typescript describe('Login', () => { let user: User beforeEach(() => { user = createUser() // Hidden mutation! }) it('works', () => { expect(user.name).toBe('test') }) }) ``` ```typescript describe('Login', () => { function setup() { return { user: createUser() } } it('works', () => { const { user } = setup() expect(user.name).toBe('test') }) }) ``` --- ### Extract Complex Conditions Complex boolean expressions should have names. Extract them into variables. ```typescript // eslint-local-rules/extract-condition-variable.ts const OPERATOR_THRESHOLD = 2 // Conditions with 2+ logical operators need extraction const rule: Rule.RuleModule = { meta: { messages: { extractCondition: 'Complex condition should be extracted into a named const.' } }, create(context) { return { IfStatement(node) { // Skip patterns that TypeScript needs inline for narrowing if (isEarlyExitGuard(node.consequent)) return // if (!x) return if (hasOptionalChaining(node.test)) return // if (user?.name) if (hasTruthyNarrowingPattern(node.test)) return // if (arr && arr[0]) if (countOperators(node.test) >= OPERATOR_THRESHOLD) { context.report({ node: node.test, messageId: 'extractCondition' }) } } } } } ``` **Smart Exceptions:** The rule skips several patterns that TypeScript needs inline for type narrowing: - Early exit guards: `if (!user) return` - Optional chaining: `if (user?.name)` - Truthy narrowing: `if (arr && arr[0])` ```typescript if (user.isActive && user.role === 'admin' && !user.isBanned) { showAdminPanel() } ``` ```typescript const canAccessAdminPanel = user.isActive && user.role === 'admin' && !user.isBanned if (canAccessAdminPanel) { showAdminPanel() } ``` --- ### Repository tryCatch Wrapper Database calls can fail. Enforce wrapping them in `tryCatch()`. ```typescript // eslint-local-rules/repository-trycatch.ts // Matches pattern: get*Repository().method() const REPO_PATTERN = /^get\w+Repository$/ const rule: Rule.RuleModule = { meta: { messages: { missingTryCatch: 'Repository calls must be wrapped with tryCatch().' } }, create(context) { return { AwaitExpression(node) { if (!isRepositoryMethodCall(node.argument)) return if (isWrappedInTryCatch(context, node)) return context.report({ node, messageId: 'missingTryCatch' }) } } } } ``` ```typescript const workouts = await getWorkoutRepository().findAll() // Might throw! ``` ```typescript const [error, workouts] = await tryCatch(getWorkoutRepository().findAll()) if (error) { showError('Failed to load workouts') return } ``` This rule matches the `get*Repository()` pattern. Ensure your repository factory functions follow this naming convention. --- ## The Full Config ```typescript export default defineConfigWithVueTs( { ignores: ['**/dist/**', '**/coverage/**', '**/node_modules/**'] }, pluginVue.configs['flat/essential'], vueTsConfigs.recommended, pluginUnicorn.configs.recommended, // Vue component rules { files: ['src/**/*.vue'], rules: { 'vue/multi-word-component-names': ['error', { ignores: ['App', 'Layout'] }], 'vue/component-name-in-template-casing': ['error', 'PascalCase'], 'vue/prop-name-casing': ['error', 'camelCase'], 'vue/custom-event-name-casing': ['error', 'kebab-case'], 'vue/no-unused-properties': ['error', { groups: ['props', 'data', 'computed', 'methods'] }], 'vue/no-unused-refs': 'error', 'vue/define-props-destructuring': 'error', 'vue/prefer-use-template-ref': 'error', 'vue/max-template-depth': ['error', { maxDepth: 8 }], }, }, // TypeScript style guide { files: ['src/**/*.{ts,vue}'], rules: { 'complexity': ['warn', { max: 10 }], 'no-nested-ternary': 'error', '@typescript-eslint/consistent-type-assertions': ['error', { assertionStyle: 'never' }], 'no-restricted-syntax': ['error', { selector: 'TSEnumDeclaration', message: 'Use literal unions instead of enums.' }, { selector: 'IfStatement > :not(IfStatement).alternate', message: 'Avoid else. Use early returns.' }, { selector: 'TryStatement', message: 'Use tryCatch() instead of try/catch.' }, ], }, }, // Feature boundaries { files: ['src/**/*.{ts,vue}'], plugins: { 'import-x': pluginImportX }, rules: { 'import-x/no-restricted-paths': ['error', { zones: [ { target: './src/features/workout', from: './src/features', except: ['./workout'] }, // ... other features { target: './src/features', from: './src/views' }, // Unidirectional flow ] }], }, }, // i18n rules { files: ['src/**/*.vue'], plugins: { '@intlify/vue-i18n': pluginVueI18n }, rules: { '@intlify/vue-i18n/no-raw-text': ['error', { /* config */ }], }, }, // Prevent disabling i18n rules { files: ['src/**/*.vue'], plugins: { '@eslint-community/eslint-comments': pluginEslintComments }, rules: { '@eslint-community/eslint-comments/no-restricted-disable': ['error', '@intlify/vue-i18n/*'], }, }, // Vitest rules { files: ['src/**/__tests__/*'], ...pluginVitest.configs.recommended, rules: { 'vitest/consistent-test-it': ['error', { fn: 'it' }], 'vitest/prefer-hooks-on-top': 'error', 'vitest/prefer-hooks-in-order': 'error', 'vitest/no-duplicate-hooks': 'error', 'vitest/max-nested-describe': ['error', { max: 2 }], 'vitest/no-conditional-in-test': 'warn', }, }, // Enforce test helpers { files: ['src/**/__tests__/**/*.{ts,spec.ts}'], rules: { 'no-restricted-imports': ['error', { paths: [ { name: 'vitest-browser-vue', importNames: ['render'], message: 'Use createTestApp()' }, { name: '@vue/test-utils', importNames: ['mount'], message: 'Use createTestApp()' }, ] }], }, }, // Local rules { files: ['src/**/*.{ts,vue}'], plugins: { local: localRules }, rules: { 'local/no-hardcoded-colors': 'error', 'local/composable-must-use-vue': 'error', 'local/repository-trycatch': 'error', 'local/extract-condition-variable': 'error', 'local/no-let-in-describe': 'error', }, }, // Disable rules handled by Oxlint ...pluginOxlint.buildFromOxlintConfigFile('./.oxlintrc.json'), // pnpm catalog enforcement ...pnpmConfigs.recommended, skipFormatting, ) ``` --- ## Summary | Category | Rule | Purpose | |----------|------|---------| | **Must Have** | `complexity` | Limit function complexity | | **Must Have** | `no-nested-ternary` | Readable conditionals | | **Must Have** | `consistent-type-assertions` | No unsafe `as` casts | | **Must Have** | `no-restricted-syntax` (enums) | Use unions over enums | | **Must Have** | `no-restricted-syntax` (else) | Prefer early returns | | **Must Have** | `no-restricted-syntax` (routes) | Use named routes | | **Must Have** | `import-x/no-restricted-paths` | Feature isolation | | **Must Have** | `vue/no-unused-*` | Dead code detection | | **Must Have** | `@intlify/vue-i18n/no-raw-text` | i18n compliance | | **Must Have** | `no-restricted-disable` | No bypassing i18n | | **Must Have** | `no-restricted-imports` | Enforce test helpers | | **Nice to Have** | `vue/define-props-destructuring` | Vue 3.5 patterns | | **Nice to Have** | `vue/max-template-depth` | Template readability | | **Nice to Have** | `vitest/*` | Test consistency | | **Nice to Have** | `unicorn/*` | Modern JavaScript | | **Nice to Have** | `pnpm/recommended` | Catalog enforcement | | **Custom** | `composable-must-use-vue` | Composable validation | | **Custom** | `no-hardcoded-colors` | Theming support | | **Custom** | `no-let-in-describe` | Clean tests | | **Custom** | `extract-condition-variable` | Readable conditions | | **Custom** | `repository-trycatch` | Error handling | Start with the must-haves. Add nice-to-haves when you're ready. Write custom rules when nothing else fits. The combination of Oxlint for speed and ESLint for coverage gives you fast feedback during development and comprehensive checks in CI. --- --- title: Next Level GitHub Copilot: Agents, Instructions & Automation in VS Code description: Workshop covering the transformation from LLM to Agent, context engineering, AGENTS.md, subagents, and skills in VS Code Copilot. tags: ['vs-code', 'github-copilot', 'ai-agents', 'context-engineering', 'workshop'] url: https://alexop.dev/posts/vs-code-copilot-workshop --- # Next Level GitHub Copilot: Agents, Instructions & Automation in VS Code # Next Level GitHub Copilot Agents.md Subagents & Skills by Alexander Opalic --- --- ## Workshop Outline 1. What is an Agent? (LLM → Agent transformation) 2. Context Engineering (the real skill) 3. Back Pressure (core validation concept) 4. AGENTS.md (open standard) 5. Subagents (specialized invocation) 6. Skills (portable workflows) 7. Live Demo --- ## 🙋 Who has used GitHub Copilot in VS Code? --- ## About me

Alex Opalic

* 🚀 7 years expierence as a full stack developer * 💼 Developer at Otto Payments * 🏡 Based in Geretsried (south of Munich, Bavaria) * ✍️ Blogger at alexop.dev * 🎤 Sharing & speaking about Vue, testing & GraphQL & AI --- # What is an Agent? --- ## The Transformation: LLM → Agent - At the beginning, an LLM is just a text generator - One problem: the LLM didn't have access to current news - Solution: all providers gave the LLM access to tools - With tools, the LLM can now interact with the world - This is why an agent is an LLM + Tools + Agentic Loop --- --- ## The Agentic Loop (nanocode) ```shell nanocode | claude-opus-4-5 | /Users/alexanderopalic/Projects/typescript/nanocode ──────────────────────────────────────────────────────────────────────────────── ❯ create a simple typescript file as a sum function ──────────────────────────────────────────────────────────────────────────────── [agentLoop] Starting with 1 messages [agentLoop] Got response, stop_reason: tool_use ⏺ Write(src/sum.ts) ⎿ ok [agentLoop] Starting with 3 messages [agentLoop] Got response, stop_reason: end_turn ⏺ Created `src/sum.ts` with a simple sum function that takes two numbers and returns their sum. ``` **~350 lines of TypeScript** to understand how Claude Code works. --- ## The Agentic Loop (Code) ```typescript async function agentLoop(messages: Message[], systemPrompt: string): Promise { const response = await callApi(messages, systemPrompt) printResponse(response) const toolResults = await processToolCalls(response.content) const newMessages = [...messages, { role: 'assistant', content: response.content }] if (toolResults.length === 0) { return newMessages // No tools called, we're done } return agentLoop( // Loop again with tool results [...newMessages, { role: 'user', content: toolResults }], systemPrompt ) } ``` The entire request → response → execute → loop cycle in ~15 lines. --- ## Tool Registration ```typescript const TOOLS = new Map([ ['read', { description: 'Read file with line numbers', schema: { path: 'string', offset: 'number?', limit: 'number?' }, execute: read }], ['write', { description: 'Write content to file', schema: { path: 'string', content: 'string' }, execute: write }], ['bash', { description: 'Run shell command', schema: { cmd: 'string' }, execute: bash }] ]) ``` --- ## A Complete Tool Implementation ```typescript async function read(args: Record): Promise { const path = args.path as string const text = await Bun.file(path).text() const lines = text.split('\n') const offset = (args.offset as number) ?? 0 const limit = (args.limit as number) ?? lines.length return lines .slice(offset, offset + limit) .map((line, i) => `${(offset + i + 1).toString().padStart(4)}| ${line}`) .join('\n') } ``` --- --- ## VS Code Copilot Built-in Tools - ⟨⟩ **agent** — Delegate tasks to other agents - ⓘ **askQuestions** — Ask questions to clarify requirements - ✎ **edit** — Edit files in your workspace - ▷ **execute** — Execute code and applications - ⧉ **read** — Read files in your workspace - 🔍 **search** — Search files in your workspace - ≡ **todo** — Manage and track todo items - ✕ **vscode** — Use VS Code features - 🌐 **web** — Fetch information from the web --- # Context Engineering --- --- --- > "Context engineering is the art and science of filling the context window with just the right information at each step of an agent's trajectory." > > — LangChain/Manus webinar --- ## Context Window Utilization --- --- ## Three Long-Horizon Techniques From [Anthropic's guide](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents): 1. **Compaction** — Summarize history, reset periodically 2. **Structured note-taking** — External memory systems 3. **Sub-agent architectures** — Distribute work across focused contexts --- # Back Pressure --- ## Why Back Pressure Matters **Back pressure** = automated feedback that validates agent work - Without back pressure, **you** become the validation layer - Agents cannot self-correct if nothing tells them something is wrong - With good back pressure, agents detect mistakes and iterate until correct > "If you're directly responsible for checking each line is valid, that's time taken away from higher-level goals." --- ## Back Pressure Sources | Source | What It Validates | |--------|-------------------| | **Type system** | Types, interfaces, contracts | | **Build tools** | Syntax, imports, compilation | | **Tests** | Logic, behavior, regressions | | **Linters** | Style, patterns, best practices | **Key insight:** Expressive type systems + good error messages = agents can self-correct. --- # AGENTS.md --- ## What is AGENTS.md? **What:** An open standard for agent-specific documentation **Where:** Repository root (works in monorepos too) **Who:** Works with Copilot, Claude, Cursor, Devin, 20+ agents > "While README.md targets humans, AGENTS.md contains the extra context coding agents need." --- --- ## AGENTS.md Structure ```markdown # AGENTS.md ## Dev Environment - How to set up and navigate ## Build & Test Commands - `pnpm install && pnpm dev` - `pnpm test:unit` ## Code Style - TypeScript strict mode - Prefer composition over inheritance ## PR Instructions - Keep PRs small and focused ``` **Key:** No required fields—use what helps your project. --- --- ## Before vs After: Progressive Disclosure

❌ Bloated (847 lines)

```markdown # AGENTS.md ## API Endpoints [200 lines of docs...] ## Testing Strategy [150 lines of docs...] ## Architecture [300 lines of docs...] ## Code Style [100 lines of rules...] ## Deployment [97 lines of docs...] ```

40% context consumed before work starts

✅ Lean (58 lines)

```markdown # AGENTS.md ## Quick Start pnpm install && pnpm dev ## Docs Reference | Doc | When to read | |-----|--------------| | docs/api.md | API work | | docs/testing.md | Tests | | docs/arch.md | Design | ```

Docs loaded on-demand when needed

--- --- ## The /learn Skill ```markdown # Learn from Conversation ## Phase 1: Deep Analysis - What patterns or approaches were discovered? - What gotchas or pitfalls were encountered? - What architecture decisions were made? ## Phase 2: Categorize & Locate Read existing docs to find the best home. ## Phase 3: Draft the Learning Format to match existing doc style. ## Phase 4: User Approval (BLOCKING) Present changes, wait for explicit approval. ## Phase 5: Save After approval, save the learning. ``` --- # Subagents --- ## Subagents in VS Code **How to invoke:** 1. Enable tools in Copilot Chat (hammer icon) 2. Call explicitly with `#runSubagent` 3. Or accept when Copilot suggests one --- ## Use Cases - Specialized searches (explore codebase, web, docs) - Long-running tasks (data analysis, refactoring) - TDD workflows (test generation, validation) - Multi-step processes (research, summarize, act) --- ## Explore Subagent Flow Click **Start** to see how the main agent delegates file search to a specialized Explore subagent. --- ## Parallel Subagent Execution Click **Start** to see the fan-out/fan-in pattern where multiple subagents search in parallel. --- ## Subagent in nanocode: Fetch Tool ```shell nanocode | claude-opus-4-5 | /Users/alexanderopalic/Projects/typescript/nanocode ──────────────────────────────────────────────────────────────────────────────── ❯ how does fc bayern played yesterday ──────────────────────────────────────────────────────────────────────────────── [agentLoop] Starting with 1 messages [agentLoop] Got response, stop_reason: tool_use ⏺ Fetch(https://www.google.com/search?q=fc+bayern+munich+m) ⎿ I apologize, but I cannot access real-time sports results or ... +7 lines [agentLoop] Starting with 3 messages [agentLoop] Got response, stop_reason: tool_use ⏺ Fetch(https://www.espn.com/soccer/team/_/id/132/bayern-m) ⎿ Based on the information in the image, Bayern Munich's most ... +7 lines [agentLoop] Starting with 5 messages [agentLoop] Got response, stop_reason: end_turn ⏺ FC Bayern Munich lost 2-1 to FC Augsburg in the Bundesliga yesterday. ``` The **Fetch tool** uses a subagent to summarize HTML responses before returning. --- # Skills --- --- ## Real Skill: Plausible SEO Consultant ```shell .claude/skills/plausible-insights/ ├── skill.md # Skill definition + quick start ├── scripts/ # Automation scripts │ └── fetch-data.ts # Fetch Plausible data CLI └── references/ # On-demand docs (progressive disclosure) ├── quick-ref.md # Common query patterns ├── api/ │ ├── filters.md # Filter syntax │ └── errors.md # Error solutions └── seo/ └── thresholds.md # Interpretation guidelines ``` The agent reads `skill.md` first. Reference docs load only when needed. --- ## Skill in Action **User:** "Why is my bounce rate so high on the Vue posts?" 1. Description matches → skill.md loads (~500 tokens) 2. Agent runs: `bun cli top-pages --range 7d --pattern "/vue/"` 3. Agent reads `references/seo/thresholds.md` for interpretation 4. Agent fetches actual pages with WebFetch 5. Returns specific fixes based on real content **Key:** Data shows symptoms. Content shows causes. --- # The Full Picture --- --- # Live Demo --- ## Prerequisites The demo uses `npx` (bundled with Node.js) and Python. Install for your platform: **Mac (Homebrew):** ```bash brew install node python ``` **Windows (winget):** ```bash winget install OpenJS.NodeJS Python.Python.3.12 ``` **Or download from:** [nodejs.org](https://nodejs.org) | [python.org](https://python.org) **Verify:** ```bash node --version && npx --version && python --version ``` --- ## Demo: Building a Skill 1. **Enable Skills** in VS Code settings 2. **Install skill-creator** via CLI 3. **Prompt** to generate a new skill --- ## Step 1: Enable Skills **VS Code Setting:** ```json { "chat.useAgentSkills": true } ``` Or via UI: `Settings → Search "agent skills" → Enable` > Note: Still in preview — enable in VS Code Insiders for latest features. --- --- ## Step 3: Create a new Skill ```md --- name: hello description: 'use it everytime the user writes alex' --- # Hello SKill if the user writes "alex", respond with "Hello, Alexander Opalic! How can I assist you today?" ``` --- ## Step 3: Install skill-creator ```bash npx skills add https://github.com/anthropics/skills --skill skill-creator ``` This adds the **skill-creator** skill to your project — a skill that helps you create new skills. **Project structure after install:** ``` my-project/ └── .github/ └── skills/ └── skill-creator/ └── SKILL.md ``` --- ```shell ◇ Source: https://github.com/anthropics/skills.git │ ◇ Repository cloned │ ◇ Found 17 skills (via Well-known Agent Skill Discovery) │ ● Selected 1 skill: skill-creator │ ◇ Detected 3 agents │ ◇ Install to │ All agents (Recommended) │ ◇ Installation scope │ Project │ ◇ Installation method │ Symlink (Recommended) │ ◇ Installation Summary ──────────────────────────────╮ │ │ │ ~/Projects/workshop/.agents/skills/skill-creator │ │ symlink → Claude Code, GitHub Copilot, OpenCode │ │ │ ├─────────────────────────────────────────────────────╯ │ ◆ Proceed with installation? │ ● Yes / ○ No └ ``` --- ## Step 3: Generate a New Skill Important Skill name and folder name must match! **Prompt:** ``` Create a skill that will use https://alexop.dev/llms.txt and will answer any question regarding Vue or AI. The skill should fetch the content and use the #runSubagent command. The subagent should do the heavy work and then report back to the main agent. name of the skill is vue-ai-assistant ``` → **skill-creator generates the SKILL.md for us** --- ## What Gets Generated ```markdown --- name: vue-ai-assistant description: Answer questions about Vue.js, Nuxt, and AI topics using Alexander Opalic's knowledge base. Use this skill when the user asks about Vue, Vue 3, Nuxt, Nuxt 3, Composition API, Vue Router, Pinia, Vite, AI, machine learning, LLMs, or related frontend/AI topics. Triggers on questions like "how do I use Vue", "explain Nuxt", "what's new in Vue 3", "AI agent patterns", or any Vue/AI related query. --- # Vue & AI Assistant Answer questions about Vue.js ecosystem and AI topics by fetching knowledge from https://alexop.dev/llms.txt and delegating research to a subagent. ## MANDATORY Workflow **IMPORTANT: You MUST follow ALL steps below. Do NOT skip the subagent step. Do NOT answer directly after fetching - you MUST delegate to a subagent.** 1. **Fetch the knowledge base**: Use `fetch_webpage` to retrieve content from `https://alexop.dev/llms.txt` 2. **REQUIRED - Delegate to subagent**: Use `runSubagent` with the fetched content and user's question. **This step is NOT optional.** 3. **Return the answer**: Present the subagent's findings to the user ## Implementation **You MUST execute ALL steps below. Skipping the subagent is a violation of this skill's requirements.** ### Step 1: Fetch Knowledge Base Use the fetch_webpage tool: - URL: `https://alexop.dev/llms.txt` - Query: The user's question about Vue or AI ### Step 2: Run Subagent with Context (MANDATORY) **You MUST call `runSubagent` - do NOT answer the question yourself. The subagent handles the analysis and response.** Use `runSubagent` with a detailed prompt containing: 1. The fetched content from llms.txt as the knowledge base 2. The user's original question 3. Instructions to: - Analyze the knowledge base content thoroughly - Find relevant information to answer the question - Provide a clear, concise, and accurate answer - Include code examples when relevant - Cite specific sections from the knowledge base if applicable - If the knowledge base doesn't contain the answer, use general knowledge but note this Example subagent prompt: You are a Vue.js and AI expert. Answer the following question using the provided knowledge base content. KNOWLEDGE BASE CONTENT: fetched_content USER QUESTION: user_question Analyze thoroughly, provide code examples when relevant, and cite sources from the knowledge base. ### Step 3: Present Answer Return the subagent's response to the user, formatted appropriately with code blocks and explanations. ## Example **User asks**: "How do I use composables in Vue 3?" **Execution**: 1. Fetch https://alexop.dev/llms.txt 2. **MUST** call runSubagent with the content and question (do NOT skip this) 3. Return the subagent's comprehensive answer about Vue 3 composables ``` --- --- ## Bonus: The askQuestions Tool VS Code Copilot can **ask clarifying questions** mid-task. ```md help me to create a workout tracking app use the #askQuestions tool to find out how the tech specs should be ``` --- ```shell ┌─────────────────────────────────────────────────────────────┐ │ Platform (1/4) │ ├─────────────────────────────────────────────────────────────┤ │ What platform should the workout tracking app target? │ ├─────────────────────────────────────────────────────────────┤ │ ★ Web App Browser-based PWA, accessible anywhere [✓] │ ├─────────────────────────────────────────────────────────────┤ │ iOS Native Swift/SwiftUI for iPhone │ ├─────────────────────────────────────────────────────────────┤ │ Android Native Kotlin for Android devices │ ├─────────────────────────────────────────────────────────────┤ │ Cross-Platform React Native or Flutter for iOS & Android │ ├─────────────────────────────────────────────────────────────┤ │ Desktop Electron app for Mac/Windows │ ├─────────────────────────────────────────────────────────────┤ │ ✎ Other... Enter custom answer │ └─────────────────────────────────────────────────────────────┘ ``` --- ## Subagent Fan-Out Pattern **Prompt for VS Code Insiders:** ``` #runSubagent run 3 subagents that search the web and tell me something interesting about Geretsried ``` This demonstrates the **fan-out/fan-in pattern** where multiple agents work in parallel. --- ## Live Action: Excalidraw Skill **Install the skill:** ```bash npx skills add https://github.com/softaworks/agent-toolkit --skill excalidraw ``` Install the Excalidraw Extension in VS Code for best experience. **Prompt to customize with brand colors:** ``` Update the excalidraw skill to use these brand colors: - Fill: rgb(33, 39, 55) - Text: rgb(234, 237, 243) - Accent: rgb(255, 107, 237) - Card: rgb(52, 63, 96) - Card Muted: rgb(138, 51, 123) - Border: rgb(171, 75, 153) ``` → Agent modifies the skill's SKILL.md to include color instructions --- --- ## More Community Skills ```bash npx skills add https://github.com/anthropics/skills --skill frontend-design npx skills add https://github.com/simonwong/agent-skills --skill code-simplifier ``` - **frontend-design** — creates polished, production-grade UI components - **code-simplifier** — simplifies and refines code for clarity Browse and discover skills at [agentskills.io](https://agentskills.io/) --- # Key Takeaways --- ## Key Takeaways 1. **Agents = LLM + Tools + Loop** (nanocode shows this simply) 2. **Context is finite** — treat tokens as budget 3. **AGENTS.md** — standardized project context 4. **Subagents** — specialized agents for complex tasks 5. **Skills** — portable workflows that load on demand --- # Thank You! Questions? --- # Resources --- ## Resources - [VS Code: Using Agents](https://code.visualstudio.com/docs/copilot/agents/overview) - Agent types and session management - [Anthropic: Effective Context Engineering](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) - Context engineering guide - [VS Code: Introducing Agent Skills](https://www.youtube.com/watch?v=JepVi1tBNEE) - Agent Skills deep dive - [VS Code: Context Engineering Guide](https://code.visualstudio.com/docs/copilot/guides/context-engineering-guide) - Microsoft's context engineering workflow - [AGENTS.md](https://agents.md/) - Open standard for agent documentation - [Agent Skills Spec](https://agentskills.io/) - Open standard for portable agent skills - [nanocode](https://github.com/alexanderop/nanocode) - Minimal agent implementation in TypeScript - [Writing a Good CLAUDE.md](https://www.humanlayer.dev/blog/writing-a-good-claude-md) - Best practices for agent documentation - [Plausible SEO Skill](https://github.com/alexanderop/claude-plausible-analytics) - Skills deep dive with Plausible example - [Don't Waste Your Back Pressure](https://banay.me/dont-waste-your-backpressure/) - Why automated feedback loops make agents more effective - [Workshop Solution](https://github.com/alexanderop/workshop) - Complete code examples from this workshop - [Learn Prompt](https://alexop.dev/prompts/claude/claude-learn-command/) - Skill that helps agents learn from conversations --- --- --- title: What's New in VS Code Copilot: January 2026 Update description: Major updates to VS Code Copilot including parallel subagent execution, a new skills system, deeper Claude integration with extended thinking, terminal improvements with kitty keyboard protocol, and instruction files that now work everywhere. tags: ['ai', 'tooling', 'vscode'] url: https://alexop.dev/posts/whats-new-vscode-copilot-january-2026 --- # What's New in VS Code Copilot: January 2026 Update The past week has brought a wave of updates to VS Code's Copilot experience, with major improvements to how agents work together, a new skills system, deeper Claude integration, and significant terminal enhancements. Here's what you need to know—with concrete examples you can try today. For those who want to dive deeper into the implementation details, I've included links to the relevant GitHub pull requests and issues throughout this post. --- ## Subagents Get Smarter (and Faster) Two significant changes make subagents far more practical for complex workflows. **Related:** [Issue #274630 - Parallel subagent execution](https://github.com/microsoft/vscode/issues/274630) ### Parallel Execution Previously, if you kicked off multiple `runSubagent` calls, they'd run one after another. Now they can run simultaneously when tasks are independent, dramatically reducing wait times for research and code review operations. **Example prompt:** ``` Research the best approaches for: 1. Rate limiting in our REST API 2. Caching strategies for our database queries 3. Error handling patterns for our microservices Use a subagent for each topic and compile the findings. ``` With parallel execution, all three research subagents run concurrently instead of sequentially—cutting total wait time significantly.
### Fine-Grained Tool Access You can now constrain which tools a subagent can access. This is critical for safety-conscious workflows where you want AI help without the risk of unintended changes. **Creating a custom agent with restricted tools:** Create a file at `.github/agents/github-researcher.md`: ```markdown --- name: github-researcher description: Research agent with access to GitHub. Use for searching issues, reading documentation, and gathering information. Cannot edit files. tools: ['read', 'search', 'web', 'github/*'] argument-hint: The research task to complete --- You are a research assistant with read-only access to the codebase and GitHub. Your capabilities: - Search and read files in the repository - Search GitHub issues and pull requests - Fetch web documentation You cannot: - Edit or create files - Run terminal commands - Make commits When researching, provide citations and links to sources. ``` Now you can ask: *"Use a subagent to find all issues assigned to me about authentication and summarize them"* — and the subagent will be limited to read-only operations. If you've used Claude Code's subagent system, you'll recognize this pattern—it's similar to how Claude Code handles skills and subagents with tool restrictions. ### Control Subagent Availability Use the `infer` attribute to control whether an agent can be used as a subagent: ```markdown --- name: dangerous-deployer description: Handles production deployments tools: ['execute', 'edit', 'read'] infer: false # This agent cannot be auto-invoked as a subagent --- ``` --- ## Skills Are Now a First-Class Feature Skills are now **enabled by default** for all users. They're folders containing instructions and resources that Copilot loads on-demand when relevant to your task. **Related PRs:** - [Issue #286237 - Custom agent improvements](https://github.com/microsoft/vscode/issues/286237) - [Issue #286238 - Skill lookup enhancements](https://github.com/microsoft/vscode/issues/286238) - [PR #3082 - Implement agent using CustomAgentProvider API](https://github.com/microsoft/vscode-copilot-chat/pull/3082) ### Creating Your First Skill Create a directory structure: **`SKILL.md`:** ```markdown --- name: webapp-testing description: Guide for testing web applications using Playwright. Use this when asked to create or run browser-based tests. --- # Web Application Testing with Playwright When creating tests for this project, follow these patterns: ## Test Structure - Use `describe` blocks for feature groupings - Use `test` for individual test cases - Always include setup and teardown ## Assertions - Prefer `toBeVisible()` over `toHaveCount(1)` - Use `waitFor` for async operations - Include accessibility checks ## Example Template Reference the [test template](./test-template.js) for the standard structure. ## Naming Convention - Test files: `*.spec.ts` - Test descriptions: "should [expected behavior] when [condition]" ``` Now when you ask *"Write Playwright tests for the login form"*, Copilot automatically loads this skill and follows your project's testing conventions. ### Loading Skills from Custom Locations For teams sharing skills across repos, use the new setting: ```json { "chat.agentSkillsLocations": [ ".github/skills", "~/shared-skills", "/team/copilot-skills" ] } ``` ### Extension-Contributed Skills Extensions can now contribute skills via their `package.json`: ```json { "contributes": { "copilotSkills": [ { "name": "docker-compose", "description": "Helps create and debug Docker Compose configurations", "path": "./skills/docker-compose" } ] } } ``` Or dynamically via the new API: ```typescript vscode.chat.registerSkill({ name: 'dynamic-skill', description: 'A skill registered at runtime', async getInstructions(context) { // Return context-aware instructions return generateInstructionsFor(context.workspace); } }); ``` --- ## Instruction Files Work Everywhere Instruction files now apply to **non-coding tasks** like code exploration, architecture explanation, and documentation. [#287152](https://github.com/microsoft/vscode/issues/287152) **Before:** Your `.github/copilot-instructions.md` was ignored when you asked *"Explain how authentication works in this codebase"* **After:** Those instructions are now read for all codebase-related work. This aligns with the progressive disclosure approach where context is loaded on-demand rather than crammed into a single file. **Example `copilot-instructions.md`:** ```markdown # Project Context This is a microservices architecture with: - API Gateway (Node.js/Express) - Auth Service (Go) - User Service (Python/FastAPI) - Shared message queue (RabbitMQ) When explaining code: - Always mention which service a file belongs to - Reference the architecture diagram at docs/architecture.md - Note any cross-service dependencies ``` Now *"How does user registration work?"* will include this context automatically. --- ## Claude Code Gets Extended Thinking The Claude Code integration now supports **extended thinking**, showing Claude's chain-of-thought reasoning in a collapsible section. [#287658](https://github.com/microsoft/vscode/issues/287658) **Related:** [Issue #266962 - Claude agent support](https://github.com/microsoft/vscode/issues/266962), [#287933 - Model picker support](https://github.com/microsoft/vscode/issues/287933) ### What It Looks Like When you ask Claude to solve a complex problem, you'll see: ``` ▼ Thinking... Let me analyze the codebase structure first. I see there are three main modules: auth, api, and database. The user is asking about the authentication flow, so I should trace the request from the API gateway through to the auth service... The JWT validation happens in middleware/auth.ts, but the token generation is in services/auth/token.go. I need to explain how these connect via the shared Redis cache... Here's how authentication works in your codebase: [Final response] ``` ### Configuration Enable/disable thinking display in settings: ```json { "github.copilot.chat.claude.showThinking": true } ``` ### Model Picker You can now select which Claude model to use: 1. Open the Chat view 2. Click the model selector dropdown 3. Choose from available Claude models (Sonnet, Opus, etc.) Different models offer different speed/capability tradeoffs—use faster models for simple tasks, more capable models for complex reasoning. --- ## Terminal Gets Major Upgrades The integrated terminal received significant keyboard handling improvements this release, with two new protocol implementations. **Related PRs:** - [PR #286897 - xterm.js 6.1.0 with kitty keyboard and win32-input-mode](https://github.com/microsoft/vscode/pull/286897) - [Issue #286809 - Kitty keyboard protocol support](https://github.com/microsoft/vscode/issues/286809) - [Issue #286896 - Win32 input mode support](https://github.com/microsoft/vscode/issues/286896) - [xterm.js PR #5600 - Implement kitty keyboard protocol](https://github.com/xtermjs/xterm.js/pull/5600) (upstream) ### Kitty Keyboard Protocol (CSI u) VS Code's terminal now supports the [kitty keyboard protocol](https://sw.kovidgoyal.net/kitty/keyboard-protocol/), enabling more sophisticated keyboard input handling. This unlocks previously unavailable key combinations and provides better support for terminal applications that use this modern standard. **Important:** This feature is **disabled by default** as it's experimental. Enable it in settings: ```json { "terminal.integrated.enableKittyKeyboardProtocol": true } ``` The protocol improves handling of modifiers, key events, repeat detection, and escape sequences—particularly useful if you use tools like fish shell, neovim, or other terminal applications that support CSI u. ### Win32 Input Mode For Windows users, the terminal now supports win32-input-mode, improving keyboard handling compatibility with Windows console applications. VT sequences alone can't send everything that Windows console programs expect (encoded as win32 INPUT_RECORDs), so this mode bridges that gap. **Also disabled by default.** Enable with: ```json { "terminal.integrated.enableWin32InputMode": true } ``` ### Terminal Command Output Streams Inline When using Copilot in agent mode, terminal command output now streams inline inside the Chat view instead of requiring you to switch to the terminal panel. [#257468](https://github.com/microsoft/vscode/issues/257468) The output auto-expands on command execution and collapses on success [#287664](https://github.com/microsoft/vscode/issues/287664)—keeping you focused on the conversation flow. ### Terminal Timeout Parameter The terminal tool now supports a timeout parameter to control how long commands run before timing out. [#286598](https://github.com/microsoft/vscode/issues/286598) This prevents unnecessary polling and gives you more control over long-running operations. ### Terminal Command Sandboxing Terminal command sandboxing is now available for **macOS and Linux** [#277286](https://github.com/microsoft/vscode/issues/277286), adding an extra layer of security when running commands through the terminal tool. ### Syntax Highlighting in Confirmation Dialogs The terminal tool now presents Python, Node.js, and Ruby commands with syntax highlighting in the confirmation dialog [#287772](https://github.com/microsoft/vscode/issues/287772), [#287773](https://github.com/microsoft/vscode/issues/287773), [#288360](https://github.com/microsoft/vscode/issues/288360)—making it easier to review commands before execution. ### Expanded Auto-Approved Commands More commands are now automatically approved for execution: - `dir` in PowerShell [#288431](https://github.com/microsoft/vscode/issues/288431) - `sed -i` when editing files within the workspace [#288318](https://github.com/microsoft/vscode/issues/288318) - `od`, `xxd`, and safe `docker` commands [#287652](https://github.com/microsoft/vscode/issues/287652) ### SGR 221/222 Escape Sequences The terminal now supports SGR 221 and 222 escape sequences [#286810](https://github.com/microsoft/vscode/issues/286810), allowing independent control of bold and faint text attributes for more granular formatting. --- ## MCP Gets More Powerful Model Context Protocol continues to evolve with significant new capabilities. ### Dynamic Context Updates MCP apps now support model context update methods, enabling servers to update the context model dynamically. [#289473](https://github.com/microsoft/vscode/issues/289473) This means MCP servers can push new context to your chat sessions without requiring a refresh. ### Custom Package Registries Added support for `registryBaseUrl` in MCP packages [#287549](https://github.com/microsoft/vscode/issues/287549), allowing teams to use private package registries for their MCP servers. ### Built-in MCP Apps Support Built-in support for MCP Apps enables servers to provide custom UI for tool invocation. [#260218](https://github.com/microsoft/vscode/issues/260218) This opens the door for richer, more interactive MCP experiences beyond simple text-based tools. --- ## Quality of Life Improvements ### Codex Agent in Dropdown The OpenAI Codex agent now appears directly in the agents dropdown [#289040](https://github.com/microsoft/vscode/issues/289040) for quick access: ``` Agents ▼ ├── Local Agent ├── Background Agent ├── Cloud Agent └── Codex Agent ← New! ``` ### New MCP Server Command A new `workbench.mcp.startServer` command [#283959](https://github.com/microsoft/vscode/issues/283959) lets you programmatically start specific or all MCP servers to discover their tools. This is useful for automation scenarios where you need to ensure servers are running before invoking their tools. ### The `/clear` Command Archives Sessions The `/clear` command now archives the current session and starts a new one automatically [#285854](https://github.com/microsoft/vscode/issues/285854)—no more losing your chat history when you want a fresh start. ### New Local Chat Command A new "New Local Chat" command [#288467](https://github.com/microsoft/vscode/issues/288467) lets you start a local chat session quickly. ### Chat Session Imports You can now **import** a chat session directly into the Chat view [#283954](https://github.com/microsoft/vscode/issues/283954), instead of only being able to open it in a new editor tab. This makes it easier to continue conversations from exported sessions. ### Chat Session Exports with MCP Info Exported sessions now include MCP server configuration [#283945](https://github.com/microsoft/vscode/issues/283945): ```json { "session": { "messages": [...], "mcpServers": [ { "name": "github", "url": "https://mcp.github.com", "tools": ["search_issues", "get_pr", "list_repos"] } ] } } ``` This makes sessions reproducible—share them with teammates and they can recreate your exact setup. ### Multi-Select in Sessions View Select multiple chat sessions with `Cmd/Ctrl+Click` [#288448](https://github.com/microsoft/vscode/issues/288448): - Archive all selected - Mark all as read - Batch delete Additional session management improvements include "Mark All Read", "Archive All", and "Unarchive All" actions in context menus [#288147](https://github.com/microsoft/vscode/issues/288147), and increased locally persisted chat sessions [#283123](https://github.com/microsoft/vscode/issues/283123). ### Resizable Sessions Sidebar You can now resize the sessions sidebar in the Chat view by dragging the separator [#281258](https://github.com/microsoft/vscode/issues/281258), similar to how terminal tabs work. ### Extension Context Tooltips Hover over extension-contributed context items to see additional information about what they provide. [#280658](https://github.com/microsoft/vscode/issues/280658) ### Accessible View Streams Thinking Content The Accessible View now dynamically streams thinking content [#289223](https://github.com/microsoft/vscode/issues/289223), making Claude's chain-of-thought reasoning accessible to screen reader users in real-time. ### Multi-Model Selection in Language Models Editor Select multiple models in the Language Models editor and toggle their visibility at once [#287511](https://github.com/microsoft/vscode/issues/287511). Enterprise and Business users also get access to the Manage Models action [#287814](https://github.com/microsoft/vscode/issues/287814). --- ## Editor & Language Improvements ### Improved Shebang Detection VS Code now recognizes Deno, Bun, and other modern JavaScript runtimes [#287819](https://github.com/microsoft/vscode/issues/287819) for better language detection when opening scripts. ### Better Ghost Text Visibility Improved visibility of ghost text in next edit suggestions [#284517](https://github.com/microsoft/vscode/issues/284517), making it easier to distinguish AI suggestions from regular text. ### Double-Click Selects Block Content Double-clicking immediately after a curly brace or bracket now selects the content inside it [#9123](https://github.com/microsoft/vscode/issues/9123)—a small but impactful change for manipulating code blocks. ### Match File Path Case Toggle A new "Match File Path Case" toggle in the Search view's "files to include" input [#10633](https://github.com/microsoft/vscode/issues/10633) lets you control whether file paths and glob patterns match case-sensitively. ### Bracket Match Foreground Color New `editorBracketMatch.foreground` theme color [#85775](https://github.com/microsoft/vscode/issues/85775) enables customization of matched bracket text color. ### Parallel Build Tasks Dependent build tasks can now run in parallel [#288439](https://github.com/microsoft/vscode/issues/288439), improving build performance for projects with multiple independent compilation steps. ### Git Delete File Command A new "Git: Delete File" command [#111767](https://github.com/microsoft/vscode/issues/111767) performs `git rm` on the current file directly from the command palette. --- ## Try It Today Here's a quick workflow to test the new features: 1. **Create a custom agent** at `.github/agents/researcher.md` with restricted tools 2. **Create a skill** at `.github/skills/my-skill/SKILL.md` 3. **Ask Copilot:** *"What skills and subagents do you have available?"* 4. **Test parallel execution:** *"Use subagents to research three different topics simultaneously"* 5. **Enable Claude thinking** and ask a complex architecture question --- ## Looking Ahead These updates signal a clear direction: Copilot is evolving from a single-agent assistant into a **coordinated multi-agent system**. The combination of parallel subagents, constrained tool access, and shareable skills creates a foundation for sophisticated automated workflows. If you're interested in building your own agent systems, check out Building Your Own Coding Agent from Scratch for a hands-on guide to the underlying patterns. Key settings to know: ```json { "chat.useAgentSkills": true, "chat.agentSkillsLocations": [".github/skills"], "chat.customAgentInSubagent.enabled": true, "github.copilot.chat.claude.showThinking": true, "terminal.integrated.enableKittyKeyboardProtocol": true, "terminal.integrated.enableWin32InputMode": true } ``` The ecosystem is about to get a lot more interesting. --- ## Related Pull Requests & Issues For those who want to dig into the implementation details: ### Agent & Skills - [#274630 - Parallel subagent execution](https://github.com/microsoft/vscode/issues/274630) - [#280704 - Agents define allowed subagents](https://github.com/microsoft/vscode/issues/280704) - [#288480 - Skills enabled by default](https://github.com/microsoft/vscode/issues/288480) - [#288483 - Extension-contributed skills via manifest](https://github.com/microsoft/vscode/issues/288483) - [#288486 - Dynamic skills API](https://github.com/microsoft/vscode/issues/288486) - [#282738 - Skills from custom locations](https://github.com/microsoft/vscode/issues/282738) ### Claude Integration - [#287658 - Extended thinking support](https://github.com/microsoft/vscode/issues/287658) - [#287933 - Model picker for Claude](https://github.com/microsoft/vscode/issues/287933) - [#266962 - Claude agent support](https://github.com/microsoft/vscode/issues/266962) ### Terminal - [#286809 - Kitty keyboard protocol](https://github.com/microsoft/vscode/issues/286809) - [#286896 - Win32 input mode](https://github.com/microsoft/vscode/issues/286896) - [#286810 - SGR 221/222 escape sequences](https://github.com/microsoft/vscode/issues/286810) - [#257468 - Terminal output streams inline](https://github.com/microsoft/vscode/issues/257468) - [#287664 - Auto-expand/collapse terminal output](https://github.com/microsoft/vscode/issues/287664) - [#277286 - Terminal sandboxing for macOS/Linux](https://github.com/microsoft/vscode/issues/277286) - [#286598 - Terminal timeout parameter](https://github.com/microsoft/vscode/issues/286598) - [#287772 - Python syntax highlighting in confirmations](https://github.com/microsoft/vscode/issues/287772) - [xterm.js #5600 - Kitty keyboard protocol](https://github.com/xtermjs/xterm.js/pull/5600) ### MCP - [#289473 - Dynamic context updates](https://github.com/microsoft/vscode/issues/289473) - [#287549 - Custom package registries](https://github.com/microsoft/vscode/issues/287549) - [#260218 - Built-in MCP Apps](https://github.com/microsoft/vscode/issues/260218) - [#283959 - startServer command](https://github.com/microsoft/vscode/issues/283959) - [#283945 - MCP info in session exports](https://github.com/microsoft/vscode/issues/283945) ### Chat & Sessions - [#285854 - /clear archives sessions](https://github.com/microsoft/vscode/issues/285854) - [#288467 - New Local Chat command](https://github.com/microsoft/vscode/issues/288467) - [#283954 - Import chat sessions](https://github.com/microsoft/vscode/issues/283954) - [#288448 - Multi-select in sessions](https://github.com/microsoft/vscode/issues/288448) - [#281258 - Resizable sessions sidebar](https://github.com/microsoft/vscode/issues/281258) - [#283123 - Increased persisted sessions](https://github.com/microsoft/vscode/issues/283123) - [#289223 - Accessible View streams thinking](https://github.com/microsoft/vscode/issues/289223) ### Editor & Other - [#287819 - Improved shebang detection](https://github.com/microsoft/vscode/issues/287819) - [#284517 - Ghost text visibility](https://github.com/microsoft/vscode/issues/284517) - [#9123 - Double-click selects block content](https://github.com/microsoft/vscode/issues/9123) - [#10633 - Match file path case toggle](https://github.com/microsoft/vscode/issues/10633) - [#288439 - Parallel build tasks](https://github.com/microsoft/vscode/issues/288439) - [#111767 - Git Delete File command](https://github.com/microsoft/vscode/issues/111767) ### Iteration Plan - [#286040 - January 2026 Iteration Plan](https://github.com/microsoft/vscode/issues/286040) --- *These features are rolling out in VS Code Insiders (1.109) now, with stable release expected in early February. Note that some features like kitty keyboard protocol and win32-input-mode are disabled by default and require manual opt-in.* --- --- title: Presentation Mode: Turn Your Blog Posts into Slides description: A complete demo of presentation mode with v-click animations and drawing annotations. Press P to see keyboard navigation, incremental reveals, and press D to draw on slides! tags: ['demo', 'presentation', 'feature'] url: https://alexop.dev/posts/presentation-mode-demo --- # Presentation Mode: Turn Your Blog Posts into Slides Press **P** on your keyboard or click the floating button in the bottom-right corner to enter presentation mode. Use arrow keys to navigate between slides, and press **D** to draw annotations directly on slides! # Presentation Mode Turn your blog posts into beautiful slides with incremental reveals --- ## Why Presentation Mode? **The Problem:** You write a great blog post, then need to present it at a meetup. **Old Solution:** Recreate everything in PowerPoint or Google Slides. **New Solution:** Just add `presentation: true` to your frontmatter! ```yaml --- title: "My Awesome Post" presentation: true --- ``` --- ## Keyboard Shortcuts | Key | Action | |-----|--------| | **P** | Toggle presentation mode | | **→** or **Space** | Next click step, then next slide | | **←** | Previous click step, then previous slide | | **1-9** | Jump to slide N (resets clicks) | | **Home** | First slide | | **End** | Last slide | | **D** | Toggle drawing mode | | **G** | Toggle grid overview | | **Escape** | Exit drawing → grid → presentation | --- # Drawing Annotations Draw directly on slides with Excalidraw! --- ## Try Drawing Mode Press **D** to toggle drawing mode. A toolbar will appear with these tools: - **↖ Selection** - Select and move drawings - **✏️ Freedraw** - Freehand drawing (press P) - **→ Arrow** - Draw arrows (press A) - **□ Rectangle** - Draw rectangles (press R) - **○ Ellipse** - Draw circles (press O) - **T Text** - Add text annotations (press T) - **🧹 Eraser** - Erase drawings (press E) --- ## Drawing Features **Colors:** 6 preset colors - Red, Blue, Green, Yellow, White, Black **Stroke Widths:** 3 sizes - Thin (1px), Medium (2px), Thick (4px) **Persistence:** Drawings stay when you navigate between slides! **Shortcuts:** - `C` - Clear current slide - `Shift+C` - Clear all slides - `Escape` - Exit drawing mode --- ## Use Cases Why draw on slides during presentations? - Circle important code sections - Draw arrows connecting concepts - Add quick annotations for Q&A - Highlight key points in diagrams - Sketch ideas during discussions --- # V-Click Animations Reveal content step-by-step with Slidev-style click animations --- ## Sequential Reveals Press **→** to reveal each point: **Step 1:** First, we define the problem clearly. **Step 2:** Then, we explore possible solutions. **Step 3:** Finally, we implement and test! --- ## Building a List Benefits of incremental reveals: - Keeps audience focused on the current point - Creates natural pacing for your talk - Prevents information overload - Makes complex topics digestible --- ## V-Click Syntax In MDX files, use components to control reveals: ```mdx Content appears on click - Each list item - Gets its own click Explicit order (appears third) This disappears on click ``` --- # Custom Components Interactive React components work in slides! --- ## Context Window Visualizer This is a custom React component rendered inside a slide: Try typing messages to see the context fill up! --- ## Code Blocks Work Perfectly Here's a Vue composable example with full syntax highlighting: ```typescript export function useCounter(initial = 0) { const count = ref(initial) const double = computed(() => count.value * 2) function increment() { count.value++ } return { count, double, increment } } ``` --- ## Magic Move: Code Evolution Watch code transform with smooth animations (press **→** to advance): Each arrow press animates the code to its next state! --- ## Mermaid Diagrams Flowcharts render beautifully in slides: ```mermaid graph LR A[Blog Post] --> B{presentation: true?} B -->|Yes| C[Show Toggle Button] B -->|No| D[Normal Post] C --> E[Press P] E --> F[Fullscreen Slides!] ``` --- ## Animated Diagrams Interactive diagrams with self-contained animations: Click **Start** to see the Explore subagent flow animation! --- # Slide Layouts --- ## Available Layouts This feature supports 9 different layout types: | Layout | Description | |--------|-------------| | `default` | Standard centered prose | | `cover` | Large title, full-bleed | | `center` | Fully centered content | | `two-cols` | Two-column split | | `image-left` | Image 40%, content 60% | | `image-right` | Content 60%, image 40% | | `image` | Full-bleed background | | `quote` | Prominent blockquote | | `section` | Section divider | | `iframe` | Embedded website/demo | --- ## Iframe Layout Embed live demos, CodePen, StackBlitz, or videos directly in slides: This is a live StackBlitz embed - fully interactive! --- ## Layout Syntax In **.md files**, use HTML comments. In **.mdx files**, use components: ```mdx {/* MDX format */} # My Title --- {/* .md format */} # My Title ``` --- ## Layout Properties All layouts accept these properties: | Property | Description | |----------|-------------| | `layout` | Layout name (required) | | `image` | Path to image in /public | | `backgroundSize` | CSS value (default: cover) | | `class` | Custom CSS class | | `src` | URL for iframe layout | | `title` | Accessibility title for iframe | --- ## Layout Animations Each layout type has its own animation: | Layout | Animation | |--------|-----------| | `default` | Slide | | `center` | Slide | | `two-cols` | Slide | | `cover` | **Fade** | | `image` | **Zoom** | | `quote` | **Fade** | | `section` | **Fade** | | `iframe` | **Fade** | --- # Technical Details --- ## Architecture Overview ```mermaid flowchart TB subgraph Frontend A[PresentationToggle] --> B[PresentationMode] B --> C[PresentationSlide] B --> D[PresentationControls] B --> E[PresentationProgress] end subgraph Layouts C --> F[SlideLayoutDefault] C --> G[SlideLayoutCover] C --> H[SlideLayoutTwoCols] C --> I[... more layouts] end subgraph Content J[Blog Post] --> K[Compiled HTML] K --> L[Parse slide comments] L --> M[SlideData Array] end M --> C ``` --- ## Tips for Great Slides ### Content Length - Short slides work best - If content overflows, it scrolls within the slide - But try to keep each slide digestible - Use v-clicks to break up longer content --- ## Accessibility Features - **Focus trap** - Tab stays within the modal - **ARIA live region** - Announces "Slide X of Y, Step N of M" - **Escape to exit** - Standard modal behavior - **Keyboard navigation** - No mouse required - **Theme aware** - Respects light/dark mode - **Reduced motion** - V-click respects `prefers-reduced-motion` --- ## What's NOT Included (Yet) Future enhancements could include: - Presenter notes (separate view) - Export to PDF - Speaker timer - Touch/swipe gestures - Dual-screen presenter view - Laser pointer mode - Drawing export to PNG/SVG Want these features? Open an issue! --- # Thank You! Press **Escape** to exit or continue with arrow keys --- ## Quick Reference **File format trade-offs:** | Format | Custom Components | Layouts | V-Click | Syntax | |--------|-------------------|---------|---------|--------| | `.mdx` | Yes | Yes | Yes | Components | | `.md` | No | Yes | Yes | HTML comments | Press **Escape** to exit presentation mode! --- --- title: Stop Bloating Your CLAUDE.md: Progressive Disclosure for AI Coding Tools description: AI coding tools are stateless—every session starts fresh. The solution isn't cramming everything into CLAUDE.md, but building a layered context system where learnings accumulate in docs and specialized agents load on-demand. tags: ['claude-code', 'ai-tools', 'developer-experience', 'productivity'] url: https://alexop.dev/posts/stop-bloating-your-claude-md-progressive-disclosure-ai-coding-tools --- # Stop Bloating Your CLAUDE.md: Progressive Disclosure for AI Coding Tools Yesterday I spent an hour debugging a Nuxt Content gotcha with Claude. We figured it out together—you need to use `stem` instead of `slug` in page collection queries. Today? Claude made the same mistake. Yesterday's session was gone. The examples in this post come from my [Second Brain](https://second-brain-nuxt.vercel.app/)—a personal wiki built with Nuxt and Nuxt Content that uses Zettelkasten-style wiki-links for knowledge management. You can see the actual [CLAUDE.md file](https://github.com/alexanderop/second-brain-nuxt/blob/main/CLAUDE.md) on GitHub. That's the constraint. **Your context is just an array of tokens**—a sliding window that forgets everything the moment the conversation ends.[^1] The percentages shown in these visualizations are illustrative examples—not real measurements. Actual system prompt overhead varies by tool version and configuration. The key insight is the relative proportions, not the exact numbers. There's no hidden memory. No database of past conversations. Just this array, rebuilt fresh every session. Dex Horthy calls this "context engineering"—since LLMs are stateless, the only way to improve output is optimizing input.[^6] The array is all you have. Everything outside it doesn't exist to the model. But that array has a size limit. Fill it with noise, and you're working in what Dex calls the "dumb zone"—where performance degrades because irrelevant context competes for attention. Most developers respond to this by putting every lesson learned into their `CLAUDE.md` file. I've seen files balloon to 2000 lines. Style guides, architectural decisions, war stories from that one bug that took three days to fix. This makes things worse. ## Bloated CLAUDE.md Makes Things Worse When Claude makes a mistake, the instinct is to add a rule: "Never use `slug` in page collection queries—use `stem` instead." Then another mistake, another rule. Then another. Before long, your CLAUDE.md looks like this: ```markdown # CLAUDE.md ## Project Overview ...50 lines... ## Code Style ...200 lines of formatting rules... ## Architecture Decisions ...150 lines of historical context... ## Gotchas ...300 lines of edge cases... ## Testing Conventions ...100 lines... ``` **Half your context budget is gone before any work begins.** HumanLayer keeps their CLAUDE.md under 60 lines.[^2] Frontier LLMs reliably follow 150-200 instructions—and Claude Code's system prompt already uses about 50 of those.[^2] The math doesn't work. You can't stuff everything in one file. ## Stop Writing Prose About Lint Rules Why write two hundred lines about code style when one line handles it? I stopped putting anything a tool can enforce in CLAUDE.md. ❌ **Don't write prose about style rules:** ```markdown ## Code Style - Use 2-space indentation - Prefer single quotes - Always add trailing commas - Maximum line length: 100 characters ``` ✅ **Let ESLint handle it:** ```json { "extends": ["@nuxt/eslint-config"] } ``` The rules are already there—you just don't repeat them in prose: ```js // What @nuxt/eslint-config contains: { rules: { 'indent': ['error', 2], 'quotes': ['error', 'single'], 'comma-dangle': ['error', 'always-multiline'], 'max-len': ['error', { code: 100 }] } } ``` The AI can run `pnpm lint:fix && pnpm typecheck` and know immediately if it violated a rule. No interpretation needed. No ambiguity. **If a tool can enforce it, don't write prose about it.** ESLint for style. TypeScript for types. Prettier for formatting. These rules are verifiable, not interpretable. Moss calls this *backpressure*—automated feedback mechanisms that let agents self-correct.[^7] Without a linter, you waste your time typing messages like "you forgot to add the import" or "that should be a const, not let." With backpressure, the agent runs the build, reads the error, and fixes itself. You remove yourself from trivial corrections and focus on higher-level decisions. My CLAUDE.md now just says: ```markdown Run `pnpm lint:fix && pnpm typecheck` after code changes. ``` One line instead of two hundred. Or skip it entirely—use husky to run checks automatically on commit. This is especially useful for techniques like Ralph, where AI works autonomously through a queue of tasks.[^8] ## The Gotchas ESLint Won't Catch ESLint won't catch this: > "Nuxt Content v3 caches aggressively in `.data/`. When you modify transformation logic in hooks, you must clear the cache to test changes." Or this: {/* > "Don't mock `@nuxt/content/server` internals in tests—it breaks when Nuxt Content updates. Extract pure logic to `server/utils/` instead." */} Or this: > "Wiki-links to data collections require path prefixes. Use `[[authors/john-doe]]`, not `[[john-doe]]`." These are *gotchas*—non-obvious behaviors that bite you once. The kind of thing you'd tell a new team member on their first day. They need documentation, but they don't belong in CLAUDE.md. **The insight: CLAUDE.md is for universal context. Gotchas are situational.** You don't need the wiki-link prefix rule in every conversation—only when you're writing content with author links. Loading it every time wastes tokens. So where do these gotchas go? And how do you capture them without breaking your flow? ## My /learn Skill My system: when I notice Claude struggling with something we've solved before, I run `/learn`. This is a Claude Code skill I built ([see full prompt](/prompts/claude/claude-learn-command)). It: 1. Analyzes the conversation for reusable, non-obvious insights 2. Finds the right place in `/docs` to save it (or proposes a new file) 3. Asks for my approval before saving I end up with a growing knowledge base in my docs folder: ``` docs/ ├── nuxt-content-gotchas.md # 15 hard-won lessons ├── nuxt-component-gotchas.md # Vue-specific pitfalls ├── testing-strategy.md # When to use which test type └── SYSTEM_KNOWLEDGE_MAP.md # Architecture overview ``` **CLAUDE.md stays stable.** It just tells Claude where to look: ```markdown ## Further Reading **IMPORTANT:** Before starting any task, identify which docs below are relevant and read them first. Load the full context before making changes. - `docs/nuxt-content-gotchas.md` - Nuxt Content v3 pitfalls - `docs/testing-strategy.md` - Test layers and when to use each ``` The **IMPORTANT** instruction is critical—without it, Claude won't automatically read these docs. With it, Claude identifies relevant docs before starting work: content queries trigger the gotchas doc, testing tasks trigger the testing strategy. Progressive disclosure—the right context at the right time.[^2] Another approach: build skills that load domain-specific gotchas automatically. A `nuxt-content` skill that injects the gotchas doc whenever you're working with content queries. In theory, this is cleaner—context loads without you thinking about it. In practice, I've found skills don't always activate when expected. The trigger conditions can be fuzzy, and sometimes Claude just doesn't invoke them. Vercel's agent evals confirmed this: skills were never invoked in 56% of their test cases, producing zero improvement over baseline.[^9] The docs-based setup is more predictable: I know Claude will read what I point it to. ## One Agent Per Domain I take this further with custom agents. Each agent has its own documentation file that loads only when needed. If you're new to how these customization layers work together, I wrote a detailed comparison of CLAUDE.md, skills, and subagents. ``` .claude/agents/ ├── nuxt-content-specialist.md # Content queries, MDC, search ├── nuxt-ui-specialist.md # Component styling, theming ├── vue-specialist.md # Reactivity, composables └── nuxt-specialist.md # Routing, config, deployment ``` When I'm debugging a content query, Claude loads the nuxt-content-specialist. When I'm styling a component, it loads nuxt-ui-specialist. The specialist agents know to fetch the latest documentation from official sources—they don't rely on stale training data. This is why I don't use MCPs like context7 for documentation. Agents can fetch llms.txt directly from official docs sites and find what they need. No tool definition bloat, no intermediate tokens—just a focused research task in its own context window. I wrote more about why I use custom research agents instead of MCPs. Skills work similarly—with `context:fork`, they run in isolated contexts without polluting your main conversation. The agent has both the ability and motivation to read real documentation. No context7, no MCP overhead. ## It Compounds This system creates a feedback loop: Over time, my `/docs` folder becomes a curated knowledge base of *exactly the things AI coding tools get wrong* in my codebase. It's like fine-tuning, but under my control. I got this idea from a pattern for self-improving skills where agents automatically analyze sessions and update themselves.[^5] I adapted it to use markdown documentation and a `/learn` command instead—giving me explicit control over what gets captured and where it goes. An actual entry from my `nuxt-content-gotchas.md`: ```markdown ## Page Collection Queries: Use `stem` Not `slug` The `slug` field doesn't exist in page-type collections. Use `stem` (file path without extension) instead: // ❌ Fails: "no such column: slug" queryCollection('content').select('slug', 'title').all() // ✅ Works queryCollection('content').select('stem', 'title').all() ``` Claude will never make this mistake again in my project. Not because I added it to CLAUDE.md—but because when it's working with content queries, it reads the gotchas doc first. ## My 50-Line CLAUDE.md The structure: ```markdown # CLAUDE.md Second Brain is a personal knowledge base using Zettelkasten-style wiki-links. ## Commands pnpm dev # Start dev server pnpm lint:fix # Auto-fix linting issues pnpm typecheck # Verify type safety Run `pnpm lint:fix && pnpm typecheck` after code changes. ## Stack - Nuxt 4, @nuxt/content v3, @nuxt/ui v3 ## Structure - `app/` - Vue application - `content/` - Markdown files - `content.config.ts` - Collection schemas ## Further Reading **IMPORTANT:** Read relevant docs below before starting any task. - `docs/nuxt-content-gotchas.md` - `docs/testing-strategy.md` - `docs/SYSTEM_KNOWLEDGE_MAP.md` ``` That's it. Universal context only. Everything else lives in docs, agents, or tooling. ## Cross-Tool Compatibility If you use multiple AI coding tools, you don't need separate config files. VS Code Copilot and Cursor both support `agents.md` for project-level instructions. You can symlink it to share the same configuration: ```bash # Create a symlink so all tools read the same file ln -s CLAUDE.md agents.md ``` Now your minimal, focused instructions work across Claude Code, Copilot, and Cursor. One source of truth, no drift between tools. ## How This Played Out Last Week Last week I was implementing semantic search. When Claude started working on content queries, it read `nuxt-content-gotchas.md` first—as my CLAUDE.md instructs. The stem/slug gotcha was already there. No mistake. No correction needed. But during the session, we discovered something new: `queryCollectionSearchSections` returns IDs with a leading slash. Don't add another slash when constructing URLs. I ran `/learn`. Claude proposed: ```markdown ## Search Section IDs Returns IDs with leading slash (`/slug#section`). Don't add another slash when constructing URLs. ``` Added. Next time I work on search, Claude will know. --- AI tools being stateless isn't a bug to fight. It's a design constraint—like limited screen real estate or slow network connections. Accept it, and you can build systems that work with it. **Keep CLAUDE.md minimal. Let tooling enforce what it can. Capture learnings as you go. Load context on demand.** One caveat: you can never be 100% sure agents will read your docs when they face issues. For tricky domains like Nuxt Content—where training data is sparse or outdated—I've learned to be explicit in my prompts. When I know I'm working on something with poor training coverage, I'll add to the plan: "If you encounter issues with Nuxt Content APIs, read `docs/nuxt-content-gotchas.md` first." This nudge makes the difference between the agent guessing based on outdated patterns and actually consulting current knowledge. The AI forgets. Your documentation doesn't. --- [^1]: LLMs have no memory between sessions—context is just tokens in a sliding window. See Factory's analysis in [The Context Window Problem](https://factory.ai/news/context-window-problem). [^2]: HumanLayer's guide on [Writing a Good CLAUDE.md](https://www.humanlayer.dev/blog/writing-a-good-claude-md) recommends keeping files under 60 lines and using progressive disclosure for detailed instructions. [^5]: Developers Digest, [Self-Improving Skills in Claude Code](https://www.youtube.com/watch?v=-4nUCaMNBR8). A pattern for capturing learnings automatically: skills analyze sessions, extract corrections, and update themselves. [^6]: Dex Horthy, [No Vibes Allowed: Solving Hard Problems in Complex Codebases](https://www.youtube.com/watch?v=rmvDxxNubIg). Dex is the founder of HumanLayer and creator of the Ralph technique for autonomous AI coding. His [12 Factor Agents](https://www.humanlayer.dev/blog/12-factor-agents) manifesto includes "Make Your Agent a Stateless Reducer" as Factor 12. [^7]: Moss, [Don't Waste Your Back Pressure](https://banay.me/dont-waste-your-backpressure). Backpressure—automated feedback from type systems, linters, and build tools—is what enables agents to work on longer-horizon tasks without constant human intervention. [^8]: Geoffrey Huntley, [Ralph](https://ghuntley.com/ralph/). Ralph is a technique for autonomous AI coding where tasks are queued and executed without human intervention, making automated checks on commit essential. [^9]: Jude Gao, [AGENTS.md outperforms skills in our agent evals](https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals). Vercel's evals found that a compressed docs index embedded directly in AGENTS.md achieved 100% pass rate, while skills maxed out at 79% even with explicit instructions—and performed no better than baseline when left to trigger naturally. --- --- title: How I Built a Skill That Lets Me Talk to Claude's Conversation Memory description: How I built a skill that lets Claude search its own conversation history, turning it into a persistent coding partner that remembers past solutions. tags: ['claude-code', 'ai', 'tooling', 'python'] url: https://alexop.dev/posts/building-conversation-search-skill-claude-code --- # How I Built a Skill That Lets Me Talk to Claude's Conversation Memory When I work with Claude Code on complex projects, I often remember discussing a problem or solution but can't find it. "We fixed that EMFILE error last week, what was the solution?" or "What did we work on yesterday?" Claude Code stores every session locally. But Claude itself can't search those files by default. So I built a skill that lets Claude search its own conversation history. This turns Claude into a persistent coding partner that actually remembers past solutions. ## How Claude Code Stores Conversations Every Claude Code session gets saved as a JSONL file in `~/.claude/projects/`. The directory structure looks like this: The path encoding is simple: replace `/` with `-` and prefix absolute paths with `-`. So `/Users/alex/Projects/myapp` becomes `-Users-alex-Projects-myapp`. Each JSONL file contains one JSON object per line: ```json {"type": "user", "timestamp": "2026-01-16T10:30:00Z", "gitBranch": "main", "message": {"content": "Fix the EMFILE error"}} {"type": "assistant", "timestamp": "2026-01-16T10:30:15Z", "message": {"content": [{"type": "text", "text": "Let me investigate..."}, {"type": "tool_use", "name": "Bash", "input": {"command": "ulimit -n"}}]}} {"type": "summary", "summary": "Fixed EMFILE error by increasing file descriptor limit"} ``` Each entry includes the role, timestamp, git branch, message content, and tool uses. The `summary` type appears when Claude generates a conversation summary. ## The Skill Structure The skill lives in `~/.claude/skills/conversation-search/` with two files: The `SKILL.md` file tells Claude when to activate this skill: ```yaml --- name: conversation-search description: Search past Claude Code conversation history. Use when asked to recall, find, or search for anything from previous conversations. Triggers include "what did we do today", "how did we fix X", "search history", "recall when we"... --- ``` When I ask "what did we do yesterday?", Claude recognizes the trigger and knows to use this skill. ## How the Python Script Works The script has two modes: **digest** for daily summaries and **search** for finding specific solutions. ### Data Structures The script parses JSONL files into clean dataclasses: ```python @dataclass class Message: uuid: str parent_uuid: Optional[str] role: str # 'user', 'assistant' content: str timestamp: str tool_uses: list tool_results: list @dataclass class Conversation: session_id: str file_path: str summary: Optional[str] messages: list project_path: str git_branch: Optional[str] timestamp: str @dataclass class SearchResult: conversation: Conversation score: float matched_messages: list problem_excerpt: str solution_excerpt: str commands_run: list ``` ### Relevance Scoring The search algorithm tokenizes the query and content, then calculates relevance scores with weighted boosts: ```python def calculate_relevance_score(query: str, conversation: Conversation) -> tuple: query_tokens = tokenize(query) total_score = 0.0 matched_messages = [] # Summary gets highest weight (3x) if conversation.summary: summary_tokens = tokenize(conversation.summary) summary_overlap = len(query_tokens & summary_tokens) / len(query_tokens) total_score += summary_overlap * 3.0 # Check each message for msg in conversation.messages: msg_tokens = tokenize(msg.content) overlap = len(query_tokens & msg_tokens) if overlap > 0: msg_score = overlap / len(query_tokens) # User messages get 1.5x boost (problem descriptions) if msg.role == 'user': msg_score *= 1.5 # Messages with tool uses get 1.3x boost (solutions) if msg.tool_uses: msg_score *= 1.3 total_score += msg_score matched_messages.append(msg) return total_score, matched_messages ``` The weighting makes sense: summaries are the most relevant since they capture the essence. User messages describe problems. Tool uses indicate actual solutions. ### Date Filtering The script supports filtering by date range: ```bash # Today's sessions only python3 search_history.py --today "newsletter" # Yesterday python3 search_history.py --yesterday "bug fix" # Last 7 days python3 search_history.py --days 7 "refactor" # Since a specific date python3 search_history.py --since 2026-01-01 "feature" ``` ### Extracting Useful Information The script extracts practical information from each conversation: ```python def extract_bash_commands(conversation: Conversation) -> list: """Extract Bash commands run during the conversation.""" commands = [] for msg in conversation.messages: for tool in msg.tool_uses: if tool.get('name') == 'Bash': cmd = tool.get('input', {}).get('command', '') if cmd: commands.append(cmd) return commands def extract_files_touched(conversation: Conversation) -> list: """Extract files that were read, written, or edited.""" files = set() for msg in conversation.messages: for tool in msg.tool_uses: name = tool.get('name', '') inp = tool.get('input', {}) if name in ('Read', 'Write', 'Edit'): path = inp.get('file_path', '') if path: files.add(Path(path).name) return sorted(files)[:10] ``` This is useful for recreating solutions. If you found how you fixed something before, you can see exactly which commands you ran and which files you changed. ## Using the Skill ### Daily Digest Ask "what did we do yesterday?" and Claude runs the digest mode: ```bash python3 search_history.py --digest yesterday ``` Output: ``` ## January 16, 2026 - 32 sessions ### 1. Set Context Menu Feature Spec Session: `1498ff91` Branch: `fitnessFunctions` Files: set-context-menu.md, SetContextMenu.vue, SetContextMenuPO.ts Commands: 12 executed ### 2. Fix Pipeline: Missing i18n, Unused Exports Session: `23351e77` Branch: `fitnessFunctions` Files: de.json, en.json, claude-qa.yml Commands: 6 executed ### 3. Adding AI Coding Articles to Second Brain Session: `5c909423` Branch: `main` Files: article.md, dex-horthy.md, diagrams-guide.md Commands: 1 executed ``` This is great for standup notes or just remembering what you worked on. ### Keyword Search Ask "how did we fix the EMFILE error?" and Claude searches for relevant sessions: ```bash python3 search_history.py "EMFILE error" --days 14 ``` Output: ``` ============================================================ Result #1 (Score: 4.25) ============================================================ Project: /Users/alex/Projects/fitness-app Session: a1b2c3d4... Branch: main Date: 2026-01-10 PROBLEM: Getting EMFILE error when running tests, too many open files SOLUTION: The issue was too many file watchers. Fixed by increasing the limit with `ulimit -n 10240` and adding it to shell profile... COMMANDS RUN (3 total): $ ulimit -n $ ulimit -n 10240 $ echo "ulimit -n 10240" >> ~/.zshrc ``` Now I can recreate the exact solution without remembering the details. ### Project Filtering You can narrow searches to a specific project: ```bash python3 search_history.py "vitest config" --project ~/Projects/fitness-app ``` ## Why This Matters Before this skill, I'd waste time re-solving problems I'd already solved. "I know we discussed this, but I can't find it." Now I just ask Claude. The benefits: 1. **No more re-solving problems** - Claude finds past solutions instantly 2. **Daily digests for standups** - "What did we work on yesterday?" gives a ready summary 3. **Commands are preserved** - You can recreate exact solutions with the same commands 4. **Cross-project search** - Find solutions from any project you've worked on The skill turns Claude from a stateless assistant into something closer to a persistent coding partner. It remembers what you've done together. If you want to extend Claude Code with custom skills, check out my post on building a Claude Code plugin for packaging and sharing skills across projects. --- --- title: In Five Years, Developers Won't Write Code By Hand description: Software development as translation work is dying. Software engineering—the strategic, architectural discipline—is more valuable than ever. The shift is already here. tags: ['ai', 'software-engineering'] url: https://alexop.dev/posts/developers-wont-write-code-by-hand --- # In Five Years, Developers Won't Write Code By Hand I haven't written code by hand in months. This year alone, I built four complete projects using only Claude Code: a markdown editor, a Nuxt blog starter, a workout tracking app, and the Second Brain you might be reading this on. At work, I regularly one-shot entire issues without touching my keyboard for anything except prompts. Last week, I resolved a production incident using VS Code Copilot while barely glancing at the actual code. I'm not special. I'm just paying attention. And I'm not alone. Simon Willison[^1]—one of the most respected voices in the developer community—put it bluntly on the Oxide and Friends podcast: > I think the job of being paid money to type code into a computer will go the same way as punching punch cards [...] I do not think anyone will be paid to just do the thing where you type the code. I think software engineering will still be an enormous career. I just think the software engineers won't be spending multiple hours of their day in a text editor typing out syntax. But here's the part that matters: > The more time I spend on AI-assisted programming the less afraid I am for my job, because it turns out building software—especially at the rate it's now possible to build—still requires enormous skill, experience and depth of understanding. The skills are changing though! Being able to read a detailed specification and transform it into lines of code is the thing that's being automated away. What's left is everything else, and the more time I spend working with coding agents the larger that "everything else" becomes. That "everything else" is the whole point. ## The Shift Is Already Here In five years, developers won't write code by hand. This isn't a prediction about some distant future—it's a description of what's already happening to anyone using the right tools. The reason most people don't see it? Two things: skill gaps and companies failing to provide developers with modern tooling. Most developers are still typing every character. Most companies are still debating whether AI tools are "worth the license cost." Meanwhile, the developers who figured this out are shipping at 10x the pace. The creator of Claude Code uses Claude Code to work on multiple features simultaneously. Techniques like Ralph[^2]—an automation framework that breaks work into discrete chunks—can literally rip through your entire backlog. This isn't theoretical. It's happening now, in production, at companies that stopped waiting for permission. (For a deep dive into how these tools actually work, see my guide to Claude Code's architecture.) ## The Great Distinction Nobody Talks About Here's what changes everything: understanding the difference between software *engineering* and software *development*. **Software engineering** is designing systems. Architecture decisions. Test strategies. The guardrails that keep a codebase healthy over time. Knowing what to build and—more importantly—what not to build. **Software development** is writing the actual code. Translating specifications into syntax. Converting tickets into pull requests. One of these is a creative, strategic discipline. The other is translation work. Software development is dying. Software engineering is more valuable than ever. Kent Beck[^3] put it perfectly: 90% of traditional skills have lost their economic value because AI can replicate them efficiently. But the remaining 10%? They gain 1000x leverage through AI augmentation. The question you need to answer: which skills are your 10%? ## Scrum Was a Workaround for Human Limitations Think about why we created Scrum in the first place. We needed big teams with specialized roles because implementing features took forever. Frontend developers, backend developers, QA engineers, DevOps specialists—all coordinating through ceremonies and tickets because the bottleneck was literally typing characters into an editor. In the worst cases—an anti-pattern far too common—managers saw developers as "code monkeys" who converted tickets into code. The developer's job was translation, not thinking. This made sense when coding was slow. When a feature took days or weeks to implement. Those days are over. When implementation takes days, you need ceremonies to coordinate. When implementation takes minutes, the coordination overhead becomes the bottleneck. Scrum isn't dying because it was bad—it's dying because the constraint it solved no longer exists. ## The New Economics The math has fundamentally changed. Prototypes are cheap now. What took a sprint takes an afternoon. Burke Holland[^4], a Microsoft developer advocate, built four substantial projects with AI—including Swift applications in a language he doesn't know. His advice? "Make things. Stop waiting to have all the answers... you can make things faster than you ever thought possible." I've watched product owners use Claude Code to generate their own prototypes, fix simple bugs, and submit pull requests. They're not becoming developers—they're just not waiting for developers anymore. Ralph-style automation lets you feed your backlog to an AI and get working code out the other end. Not perfect code. Not production-ready code on the first try. But functional code that's 80% there, leaving humans to handle the remaining 20% that actually requires judgment. The developers who thrive aren't writing more code. They're orchestrating AI to write code for them, then applying their expertise to the parts that matter. ## The Hard Part Was Never Coding Here's the uncomfortable truth: coding was never the hard part. We just convinced ourselves it was because it took so much *time*. Lee Robinson[^5]—who went from Vercel to Cursor—built a Rust image compressor, a SvelteKit web app, and a hardware game without writing code by hand. His reflection? "Writing code was never really the bottleneck, especially for larger projects." And: "It wasn't about the code... It's about building something great and something that I'm proud of." The actually hard problems haven't changed: - Understanding what customers need (not what they say they need) - Writing specifications clear enough that anyone—human or AI—can implement them - Knowing what to build and what to skip - Making architectural decisions that won't haunt you in two years - Marketing, positioning, product sense These are human problems. Creative problems. Strategic problems. Martin Fowler[^6] argues this is the biggest shift since assembly to high-level languages. But here's what he gets right: AI lacks architectural judgment. It cannot distinguish good patterns from poor ones. It can write code all day, but it can't tell you whether that code should exist. The value shifts from writing code to knowing what code to write and why. ## "But AI Code Is Buggy" Yes. And? The first high-level language compilers produced worse machine code than hand-written assembly. Early web frameworks were slower than hand-crafted HTML. Every abstraction layer introduces inefficiencies. We adopt them anyway because developer productivity matters more than perfect output. The question isn't "is AI code flawless?"—it's "is AI code good enough, fast enough, to change the economics?" The answer is yes. Today. Not in five years—today. I'm not arguing AI produces better code than expert developers. I'm arguing it produces acceptable code fast enough that the calculus changes. When you can generate ten implementations in the time it takes to write one, you can afford to throw away the bad ones. The skeptics are optimizing for the wrong variable. They're measuring code quality when they should be measuring iteration speed. ## How to Prepare Geoffrey Huntley[^7] put it bluntly: "Software engineers who haven't adopted or started exploring software assistants, are frankly not gonna make it." But here's the nuance he adds: "I suspect there's not going to be mass-layoffs for software developers due to AI. Instead, what we will see is a natural attrition between those who invest in themselves right now and those who do not." This isn't about being replaced by AI. It's about being outperformed by developers who use AI. The gap is already opening. (I wrote about this dynamic in The Age of the Generalist—high-agency builders thrive while passive specialists struggle.) If you want to thrive in this new world, here's where to focus: **Learn how LLMs actually work.** Not to build models—to orchestrate them. Understand context windows, token limits, prompt engineering. Know why your AI assistant suddenly "forgot" what you told it three messages ago. This isn't optional knowledge anymore; it's table stakes. (For practical starting points, see how I use LLMs in my daily work.) **Study the architecture of AI agents.** The developers who can build custom agents for their specific workflows have superpowers the rest of the industry doesn't understand yet. Resources like the 12 Factor Agents[^8] manifesto lay out the principles: small focused agents, deterministic control flow with strategic LLM decision points, and proper context window management. Learn what context engineering means. Understand why RAG exists. Build something that automates your own repetitive work. **Double down on software engineering.** System design, architecture patterns, testing strategies—these skills become more valuable, not less. When anyone can generate code, the people who know what code to generate become invaluable. **Stop optimizing for code output.** Start optimizing for clarity of thought, quality of specifications, and speed of iteration. Your value isn't in the characters you type; it's in the decisions you make. ## The Paradox Here's what I find fascinating: I don't think we'll ever get AI that matches human agency and creativity. The models might plateau. They might not get dramatically "smarter" than they are today. It doesn't matter. Even with current capabilities, as tooling improves, we're witnessing the biggest transformation in software development history. The change isn't coming from AI replacing human thinking—it's coming from AI eliminating the translation layer between human thinking and working software. You don't need AGI to automate code. You just need models that are good enough at translation, combined with humans who are good enough at specification. We have both. Right now. --- The question isn't whether this shift will happen. It's whether you'll be ready when your company finally notices. [^1]: Simon Willison, [LLM Predictions for 2026](https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026) [^2]: [Ralph](https://github.com/snarktank/ralph) - Automation framework for AI-driven development [^3]: Kent Beck, [90% of My Skills Are Now Worth $0](https://tidyfirst.substack.com/p/90-of-my-skills-are-now-worth-0) [^4]: Burke Holland, [Opus 4.5 is Going to Change Everything](https://burkeholland.github.io/posts/opus-4-5-change-everything/) [^5]: Lee Robinson, [AI codes better than me. Now what?](https://www.youtube.com/watch?v=UrNLVip0hSA) [^6]: Martin Fowler, [How AI Will Change Software Engineering](https://www.youtube.com/watch?v=CQmI4XKTa0U) [^7]: Geoffrey Huntley, [What do I mean by some software devs are "ngmi"?](https://ghuntley.com/ngmi/) [^8]: [12 Factor Agents](https://www.humanlayer.dev/blog/12-factor-agents) - Principles for production-grade AI agents --- --- title: Mutation Testing with AI Agents When Stryker Doesn't Work description: When Stryker doesn't support your test stack, AI agents can execute mutation testing manually. A practical approach for Vitest browser mode and Playwright. tags: ['testing', 'ai', 'claude-code', 'vitest'] url: https://alexop.dev/posts/mutation-testing-ai-agents-vitest-browser-mode --- # Mutation Testing with AI Agents When Stryker Doesn't Work ## The Coverage Lie Code coverage lies. A test that exercises a line doesn't mean it verifies that line does the right thing: ```typescript function add(a: number, b: number): number { return a + b } // 100% coverage - would still pass if add() returned 999 it('adds numbers', () => { add(2, 2) }) ``` Mutation testing flips the question. Instead of asking "did tests run this code?", it asks **"if I break this code, do tests fail?"** Using our `add` example, a mutation tester would: ```typescript // Original function add(a: number, b: number): number { return a + b } // Mutated: swap + for - function add(a: number, b: number): number { return a - b // <-- bug introduced } ``` Now run the test. `add(2, 2)` returns `0` instead of `4`. Does the test fail? No—it never checked the result. **The mutant survives.** Your test has a gap. The process: 1. **Mutate**: Introduce a small bug (change `>` to `>=`, swap `&&` for `||`, delete a line) 2. **Run tests**: Execute your test suite against the mutated code 3. **Evaluate**: If tests pass with the bug, your tests are weak. If tests fail, they caught it. A mutation that tests fail to catch is a "surviving mutant"—proof of a test gap. --- ## When Stryker Works: The Gold Standard When your test stack supports it, automated mutation testing with Stryker is the way to go. It's fast, deterministic, generates HTML reports, and runs in CI pipelines. This is especially valuable when you have pure functions with high test coverage but want to verify test quality. Here's what it looks like in practice: ```bash pnpm test:mutation # or: stryker run ``` ``` INFO ProjectReader Found 7 of 2947 file(s) to be mutated. INFO Instrumenter Instrumented 7 source file(s) with 394 mutant(s) INFO DryRunExecutor Initial test run succeeded. Ran 184 tests in 0 seconds. Mutation testing [====================] 100% | 394/394 Mutants tested (35 survived, 0 timed out) --------------|---------|----------|----------|----------| File | % score | # killed | # survived | # no cov | --------------|---------|----------|----------|----------| All files | 90.86 | 358 | 35 | 1 | backlinks.ts | 96.30 | 26 | 1 | 0 | callouts.ts | 93.94 | 62 | 4 | 0 | graph.ts | 91.55 | 65 | 6 | 0 | mentions.ts | 91.30 | 63 | 5 | 1 | minimark.ts | 82.61 | 76 | 16 | 0 | text.ts | 100.00 | 34 | 0 | 0 | wikilinks.ts | 91.43 | 32 | 3 | 0 | --------------|---------|----------|----------|----------| INFO MutationTestExecutor Done in 36 seconds. ``` 394 mutants tested across 7 files in 36 seconds. The report shows exactly which files have weak spots—`minimark.ts` at 82.61% needs attention, while `text.ts` is solid at 100%. Stryker also generates an interactive HTML report where you can drill into each surviving mutant and see exactly what code change your tests failed to catch. If your stack supports Stryker (standard Vitest in Node mode, Jest, Mocha), use it. Deterministic tooling in your CI pipeline beats manual approaches every time. The AI agent technique in this post is for when Stryker isn't an option. --- ## The Vitest Browser Mode Problem But what if Stryker doesn't support your stack? Stryker doesn't work with Vitest's browser mode. Their instrumentation assumes Node.js execution, but browser mode runs tests in actual Chromium via Playwright. My setup: - **Framework**: Vitest 4 with `browser.enabled: true` - **Provider**: Playwright (Chromium) - **Test style**: Integration tests with real DOM My testing strategy relies heavily on Vitest browser mode for realistic user flow testing. Stryker's mutation coverage reports? Not an option. And switching to Node-based testing would mean losing the browser-specific behavior I'm actually testing. --- ## AI Agents as Manual Mutation Testers The mutation testing algorithm is simple enough that an AI coding agent can execute it manually. Claude Code can: 1. Read your source code 2. Apply mutations systematically 3. Run `pnpm test --run` 4. Record whether tests passed or failed 5. Restore the original code 6. Report surviving mutants with suggested fixes I adapted a Claude Code skill originally created by [Paul Hammond](https://www.linkedin.com/posts/paul-hammond-bb5b78251_mutation-testing-is-typically-expensive-but-activity-7414719212071473152-_xTm) that codifies this workflow. ```mermaid flowchart TD subgraph Agent["AI Agent Workflow"] A[Read source file] --> B[Identify mutation targets] B --> C[Apply single mutation] C --> D[Run test suite] D --> E{Tests fail?} E -->|Yes| F[Mutant KILLED] E -->|No| G[Mutant SURVIVED] F --> H[Restore original code] G --> H H --> I{More mutations?} I -->|Yes| C I -->|No| J[Generate report] end subgraph Results["Report Output"] J --> K[Killed mutants: Tests caught the bug] J --> L[Survived mutants: Test gaps found] L --> M[Suggested fixes for each gap] end style G fill:#f96,stroke:#333 style F fill:#6f9,stroke:#333 style L fill:#f96,stroke:#333 style K fill:#6f9,stroke:#333 ``` ### The Mutation Testing Skill The skill defines mutation operators in priority order: **Priority 1 - Boundaries** (most likely to survive): | Original | Mutate To | |----------|-----------| | `<` | `<=` | | `>` | `>=` | | `<=` | `<` | | `>=` | `>` | **Priority 2 - Boolean Logic**: | Original | Mutate To | |----------|-----------| | `&&` | `\|\|` | | `\|\|` | `&&` | | `!condition` | `condition` | **Priority 3 - Return Values**: | Original | Mutate To | |----------|-----------| | `return x` | `return null` | | `return true` | `return false` | | Early return | Remove it | **Priority 4 - Statement Removal**: | Original | Mutate To | |----------|-----------| | `array.push(x)` | Remove | | `await save(x)` | Remove | | `emit('event')` | Remove | The agent applies each mutation one at a time, runs tests, records results, and restores the original code immediately. --- ## Real Example: Settings Feature I ran this against my settings feature. The integration tests looked comprehensive—theme toggling, language switching, unit preferences. Code coverage would show high percentages. **Results: 38% mutation score** (5 killed, 8 survived out of 13 mutations) Here's what the AI agent found: ### Surviving Mutant #1: Volume Boundary Not Tested ```typescript // Original (stores/settings.ts:65) Math.min(Math.max(volume, 0.5), 1) // Mutation: Change 0.5 to 0.4 Math.min(Math.max(volume, 0.4), 1) // Result: Tests PASSED -> Mutant SURVIVED ``` My tests never verified the minimum volume constraint. A bug changing the minimum from 50% to 40% would ship undetected. ### Surviving Mutant #2: Theme DOM Class Not Verified ```typescript // Original (composables/useTheme.ts:26) newMode === 'dark' // Mutation: Negate the condition newMode !== 'dark' // Result: Tests PASSED -> Mutant SURVIVED ``` My test checked that clicking the toggle changed the stored preference. It never verified that `document.documentElement.classList` actually received the `dark` class. The UI could break while tests pass. ### Surviving Mutant #3: Error Handling Path Untested ```typescript // Original (stores/settings.ts:28) if (error) return // Mutation: Negate the condition if (!error) return // Result: Tests PASSED -> Mutant SURVIVED ``` No test exercised the error handling branch. A bug that inverted error handling would go unnoticed. ### The Fixes The agent suggested specific tests for each surviving mutant: ```typescript // Fix for Mutant #1: Boundary test it('volume slider has minimum value constraint of 50%', async () => { const volumeSlider = page.getByTestId('timer-sound-volume-slider') await expect.poll(async () => { const el = await volumeSlider.element() return el.getAttribute('min') }).toBe('0.5') }) // Fix for Mutant #2: DOM verification it('adds dark class to html element when dark mode enabled', async () => { const themeToggle = page.getByTestId('theme-toggle') await userEvent.click(themeToggle) await expect.poll(() => document.documentElement.classList.contains('dark') ).toBe(true) }) ``` --- ## How to Set This Up ### Step 1: Create the Skill Save this as `.claude/skills/mutation-testing/SKILL.md`:
Full Mutation Testing Skill (click to expand) ```markdown --- name: mutation-testing description: | Mutation testing patterns for verifying test effectiveness. Use when analyzing branch code to find weak or missing tests. Triggers: "mutation testing", "test effectiveness", "would tests catch this bug", "weak tests", "are my tests good enough", "surviving mutants". --- # Mutation Testing Mutation testing answers: **"Would my tests catch this bug?"** by actually introducing bugs and running tests. --- ## Execution Workflow **CRITICAL**: This skill actually mutates code and runs tests. Follow this exact process: ### Step 1: Identify Target Code git diff main...HEAD --name-only | grep -E '\.(ts|js|tsx|jsx|vue)' | grep -v '\.test\.' | grep -v '\.spec\.' ### Step 2: For Each Function to Test Execute this loop for each mutation: 1. READ the original file and note exact content 2. APPLY one mutation (edit the code) 3. RUN tests: pnpm test --run (or specific test file) 4. RECORD result: KILLED (test failed) or SURVIVED (test passed) 5. RESTORE original code immediately 6. Repeat for next mutation ### Step 3: Report Results After all mutations, provide a summary table: | Mutation | Location | Result | Action Needed | |----------|----------|--------|---------------| | `>` → `>=` | file.ts:42 | SURVIVED | Add boundary test | | `&&` → `||` | file.ts:58 | KILLED | None | --- ## Mutation Operators to Apply ### Priority 1: Boundary Mutations (Most Likely to Survive) | Original | Mutate To | Why It Matters | |----------|-----------|----------------| | `<` | `<=` | Boundary not tested | | `>` | `>=` | Boundary not tested | | `<=` | `<` | Equality case missed | | `>=` | `>` | Equality case missed | ### Priority 2: Boolean Logic Mutations | Original | Mutate To | Why It Matters | |----------|-----------|----------------| | `&&` | `\|\|` | Only tested when both true | | `\|\|` | `&&` | Only tested when both false | | `!condition` | `condition` | Negation not verified | ### Priority 3: Arithmetic Mutations | Original | Mutate To | Why It Matters | |----------|-----------|----------------| | `+` | `-` | Tested with 0 only | | `-` | `+` | Tested with 0 only | | `*` | `/` | Tested with 1 only | ### Priority 4: Return/Early Exit Mutations | Original | Mutate To | Why It Matters | |----------|-----------|----------------| | `return x` | `return null` | Return value not asserted | | `return true` | `return false` | Boolean return not checked | | `if (cond) return` | `// removed` | Early exit not tested | ### Priority 5: Statement Removal | Original | Mutate To | Why It Matters | |----------|-----------|----------------| | `array.push(x)` | `// removed` | Side effect not verified | | `await save(x)` | `// removed` | Async operation not verified | | `emit('event')` | `// removed` | Event emission not tested | --- ## Practical Execution Example ### Example: Testing a Validation Function **Original code** (`src/utils/validation.ts:15`): export function isValidAge(age: number): boolean { return age >= 18 && age <= 120; } **Mutation 1**: Change `>=` to `>` export function isValidAge(age: number): boolean { return age > 18 && age <= 120; // MUTATED } **Run tests**: `pnpm test --run src/__tests__/validation.test.ts` **Result**: Tests PASS → **SURVIVED** (Bad! Need test for `isValidAge(18)`) **Restore original code immediately** **Mutation 2**: Change `&&` to `||` export function isValidAge(age: number): boolean { return age >= 18 || age <= 120; // MUTATED } **Run tests**: `pnpm test --run src/__tests__/validation.test.ts` **Result**: Tests FAIL → **KILLED** (Good! Tests catch this bug) **Restore original code immediately** --- ## Results Interpretation ### Mutant States | State | Meaning | Action | |-------|---------|--------| | **KILLED** | Test failed with mutant | Tests are effective | | **SURVIVED** | Tests passed with mutant | **Add or strengthen test** | | **TIMEOUT** | Tests hung (infinite loop) | Counts as detected | ### Mutation Score Score = (Killed + Timeout) / Total Mutations * 100 | Score | Quality | |-------|---------| | < 60% | Weak - significant test gaps | | 60-80% | Moderate - improvements needed | | 80-90% | Good - minor gaps | | > 90% | Strong test suite | --- ## Fixing Surviving Mutants When a mutant survives, add a test that would catch it: ### Surviving: Boundary mutation (`>=` → `>`) // Add boundary test it('accepts exactly 18 years old', () => { expect(isValidAge(18)).toBe(true); // Would fail if >= became > }); ### Surviving: Logic mutation (`&&` → `||`) // Add test with mixed conditions it('rejects when only one condition met', () => { expect(isValidAge(15)).toBe(false); // Would pass if && became || }); ### Surviving: Statement removal // Add side effect verification it('saves to database', async () => { await processOrder(order); expect(db.save).toHaveBeenCalledWith(order); // Would fail if save removed }); --- ## Quick Checklist During Mutation For each mutation, ask: 1. **Before mutating**: Does a test exist for this code path? 2. **After running tests**: Did any test actually fail? 3. **If survived**: What specific test would catch this? 4. **After fixing**: Re-run mutation to confirm killed --- ## Common Surviving Mutation Patterns ### Tests Only Check Happy Path // WEAK: Only tests success case it('validates', () => { expect(validate(goodInput)).toBe(true); }); // STRONG: Tests both cases it('validates good input', () => { expect(validate(goodInput)).toBe(true); }); it('rejects bad input', () => { expect(validate(badInput)).toBe(false); }); ### Tests Use Identity Values // WEAK: Mutation survives expect(multiply(5, 1)).toBe(5); // 5*1 = 5/1 = 5 // STRONG: Mutation detected expect(multiply(5, 3)).toBe(15); // 5*3 ≠ 5/3 ### Tests Don't Assert Return Values // WEAK: No return value check it('processes', () => { process(data); // No assertion! }); // STRONG: Asserts outcome it('processes', () => { const result = process(data); expect(result).toEqual(expected); }); --- ## Important Rules 1. **ALWAYS restore original code** after each mutation 2. **Run tests immediately** after applying mutation 3. **One mutation at a time** - don't combine mutations 4. **Focus on changed code** - prioritize branch diff 5. **Track all results** - report full mutation summary --- ## Summary Report Template After completing mutation testing, provide: ## Mutation Testing Results **Target**: `src/features/workout/utils.ts` (functions: X, Y, Z) **Total Mutations**: 12 **Killed**: 9 **Survived**: 3 **Score**: 75% ### Surviving Mutants (Action Required) | # | Location | Original | Mutated | Suggested Test | |---|----------|----------|---------|----------------| | 1 | line 42 | `>=` | `>` | Test boundary value | | 2 | line 58 | `&&` | `\|\|` | Test mixed conditions | | 3 | line 71 | `emit()` | removed | Verify event emission | ### Killed Mutants (Tests Effective) - Line 35: `+` → `-` killed by `calculation.test.ts` - Line 48: `true` → `false` killed by `validate.test.ts` - ... ```
### Step 2: Invoke It ```bash claude "Run mutation testing on the settings feature" ``` The agent will: - Find changed files on your branch - Identify testable functions - Apply mutations systematically - Report surviving mutants with suggested test fixes ### Step 3: Review and Fix The agent produces a markdown report. Review each surviving mutant and decide: - Add the suggested test - Accept the risk (document why) - Refactor the code to be more testable --- ## When to Use This Approach | Good Fit | Not Ideal | |----------|-----------| | Vitest browser mode (no Stryker support) | Large codebases needing full mutation coverage | | Playwright component testing | CI/CD automation (manual agent invocation) | | Small-to-medium codebases | Strict mutation score thresholds | | Pre-merge review of specific features | | | Learning what makes tests effective | | This approach works best alongside your existing testing strategy. Use it to spot-check critical features before merge, not as a replacement for automated mutation testing where available. This skill shines on **feature branches** where you want to validate test quality before merging. Running AI agents in CI/CD pipelines is possible—you could build an automated QA agent with the Claude Agent SDK—but it adds complexity and cost. For pipeline automation, deterministic tools like Stryker remain the better choice when your stack supports them. Think of this as a developer tool for improving tests during development, not a CI gate. --- ## Key Takeaways 1. **Coverage doesn't equal confidence.** High code coverage can coexist with ineffective tests. 2. **Mutation testing reveals test gaps.** By breaking code and checking if tests notice, you find what's actually being verified. 3. **AI agents can execute manual mutation testing.** When tooling doesn't support your stack, an agent can apply the algorithm systematically. 4. **Focus on surviving mutants.** Each one is a potential bug your tests wouldn't catch. 5. **This complements, not replaces.** Use this alongside coverage reports, not instead of automated mutation testing where available. --- ## Resources - [Paul Hammond's Mutation Testing Skill](https://github.com/citypaul/.dotfiles/blob/main/claude/.claude/skills/mutation-testing/SKILL.md) - The original skill this post is based on - [Mutation Testing on Wikipedia](https://en.wikipedia.org/wiki/Mutation_testing) - [Stryker Mutator](https://stryker-mutator.io/) - When your stack supports it - My TDD workflow with Claude Code - Related approach for test-first development --- --- title: Why You Don't Need the Nuxt MCP When You Use Claude Code description: Why I use custom research agents instead of MCP servers for AI-assisted development. Learn how llms.txt enables context-efficient documentation fetching with a practical Nuxt Content agent example. tags: ['ai', 'claude-code', 'nuxt', 'tooling'] url: https://alexop.dev/posts/why-you-dont-need-nuxt-mcp-claude-code --- # Why You Don't Need the Nuxt MCP When You Use Claude Code I think we all love Nuxt. One problem with using Nuxt for AI is that training data is not up to date. This is especially true for Nuxt Content where often times LLMs still think they're working with Nuxt 2. This is why the Nuxt team created their MCP server. I think the MCP is good and perfectly fine. But for me—and also for Anthropic itself—MCPs in the current spec have the problem of context bloat. Anthropic has [written down this problem perfectly](https://www.anthropic.com/engineering/code-execution-with-mcp) in their engineering blog. Anthropic identifies two main issues: **tool definition overload** (loading all tools upfront creates hundreds of thousands of tokens before the model even reads your request) and **intermediate result redundancy** (every result must pass through the model, sometimes processing 50,000+ tokens per operation). If you want to dive deeper into what MCP is and how it works, check out my post on What Is the Model Context Protocol (MCP)?. ## Why I Use Custom Research Agents Instead This is why for all my projects I don't use MCP but I use custom research agents. All websites nowadays use `llms.txt`. Now if you let an LLM fetch `llms.txt` first, it can perfectly find every information needed from the docs itself. I've written about how I added llms.txt to my own blog—it's becoming a standard way for sites to expose their content to AI. This approach has several advantages: 1. **Only the description gets loaded as context** — The agent description is minimal, not the entire tool schema 2. **You can customize it** — Full control over what the agent knows and how it behaves 3. **It runs in its own context** — Your main agent could use the research agent only to gather information and then continue with its work without polluting its context window This is essentially the same pattern I described in my post about Claude Code subagents — agents keep your main context clean by delegating specialized tasks. Claude Code itself uses this exact approach. When you ask it questions about its own features, it spawns a [`claude-code-guide` agent](https://github.com/Piebald-AI/claude-code-system-prompts/blob/main/system-prompts/agent-prompt-claude-guide-agent.md) that fetches from a documentation sitemap and answers based on current docs—not training data. We're just applying the same pattern to other libraries. ## Example: Nuxt Content Specialist Agent Here's how my Nuxt Content agent looks. Just put it under `.claude/agents`: ````markdown --- name: nuxt-content-specialist description: Use this agent when the task involves @nuxt/content v3 in any way - implementing, modifying, querying, reviewing, or improving content management code. This includes creating or modifying content collections, writing queries, implementing MDC components, configuring content sources, troubleshooting content-related issues, or reviewing existing content code for improvements and best practices.\n\nExamples:\n\n\nContext: User asks about improving their Nuxt Content implementation.\nuser: "What can I improve on this codebase when it comes to Nuxt Content?"\nassistant: "I'll use the nuxt-content-specialist agent to review your content implementation against current best practices."\n\nSince the user is asking about Nuxt Content improvements, use the nuxt-content-specialist agent to fetch the latest documentation and review the existing code for optimization opportunities, missing features, and best practice violations.\n\n\n\n\nContext: User needs to add a new content collection.\nuser: "I need to add a 'blog' collection separate from pages"\nassistant: "I'll use the nuxt-content-specialist agent to implement this correctly."\n\nSince the user needs to modify the content collection schema, use the nuxt-content-specialist agent to first fetch the latest Nuxt Content documentation and then implement the collection following best practices.\n\n\n\n\nContext: User is asking about content query patterns.\nuser: "How do I query content by multiple tags in Nuxt Content?"\nassistant: "Let me use the nuxt-content-specialist agent to provide an accurate answer based on the current documentation."\n\nSince the user is asking about Nuxt Content query capabilities, use the nuxt-content-specialist agent to fetch documentation and provide an accurate, up-to-date response about queryCollection filtering.\n\n\n\n\nContext: User wants to embed Vue components in Markdown.\nuser: "How do I use a custom component inside my markdown files?"\nassistant: "I'll consult the nuxt-content-specialist agent to explain MDC syntax correctly."\n\nSince this involves MDC (Markdown Components) syntax, use the nuxt-content-specialist agent to fetch relevant documentation about component usage in Markdown files.\n\n\n\n\nContext: User needs to implement content search.\nuser: "I want to add full-text search to my content site"\nassistant: "I'll use the nuxt-content-specialist agent to implement search with queryCollectionSearchSections."\n\nSince search requires specific Nuxt Content APIs, use the nuxt-content-specialist agent to fetch the latest documentation on search implementation patterns.\n\n model: opus color: green --- # Nuxt Content Specialist Agent This document defines the Nuxt Content specialist agent's role and responsibilities for helping users with @nuxt/content v3 implementations. ## Primary Domain **@nuxt/content v3**: Content management system for Nuxt applications providing file-based content with Markdown support, MDC syntax for embedding Vue components, SQLite-based querying, and full-text search capabilities. ### Core Expertise Areas 1. **Collections**: Defining collections in `content.config.ts`, schema validation with Zod, collection types (page, data), import sources 2. **Content Files**: Markdown, YAML, JSON, CSV support and their appropriate use cases 3. **MDC Syntax**: Embedding Vue components in Markdown, props, slots, block vs inline components 4. **Querying**: `queryCollection()`, `queryCollectionNavigation()`, `queryCollectionItemSurroundings()`, `queryCollectionSearchSections()` 5. **Rendering**: ``, ``, prose components, custom renderers 6. **Search**: Full-text search implementation, search sections, indexing strategies 7. **Sources**: Custom data sources, remote content, transformers 8. **Deployment**: Static generation, server rendering, edge deployment considerations ## Documentation Sources The agent leverages one primary documentation resource: - **Nuxt Content docs** (`https://content.nuxt.com/llms.txt`): Covers collection definitions, querying APIs, MDC syntax, content rendering, search implementation, custom sources, and deployment patterns ### Key Documentation Sections | Section | URL Path | Purpose | |---------|----------|---------| | Collections | `/docs/collections` | Collection definitions and configuration | | Querying | `/docs/querying` | Query composables and filtering | | ContentRenderer | `/docs/components/content-renderer` | Rendering content | | Markdown/MDC | `/docs/files/markdown` | Markdown and MDC syntax | | Search | `/docs/recipes/search` | Search implementation | | Sources | `/docs/advanced/sources` | Custom content sources | ## Operational Approach The agent follows a structured methodology: 1. **Fetch documentation index** from `https://content.nuxt.com/llms.txt` to understand available documentation structure 2. **Categorize user inquiry** into appropriate domain (collections, querying, MDC, search, etc.) 3. **Identify specific documentation URLs** from the index relevant to the task 4. **Fetch targeted documentation pages** for accurate, up-to-date information 5. **Review project context** by reading relevant local files (`content.config.ts`, existing content files) 6. **Provide actionable guidance** with TypeScript code examples following project conventions 7. **Reference documentation sources** to support recommendations ## Core Guidelines - Prioritize official documentation over training knowledge (v3 has significant v2 differences) - Maintain concise, actionable responses - Include TypeScript code examples following project conventions - Reference specific documentation URLs consulted - Avoid emojis - Always verify API specifics against fetched documentation before providing guidance - Note v2 to v3 migration considerations when relevant - Consider static vs server rendering implications - Handle content not found scenarios gracefully in implementations ## Project Context This agent operates within a Nuxt 4 application using: - **@nuxt/content v3** with SQLite-based querying - **@nuxt/ui v3** for UI components - **TypeScript** for type safety - **File-based routing** with catch-all content routes in `app/` directory ### Established Patterns ```typescript // content.config.ts - Collection definition pattern export const collections = { content: defineCollection({ type: 'page', source: '**/*.md' }) } ``` ```vue ``` ## Quality Assurance - Always verify suggestions against fetched documentation - If documentation is unclear or unavailable, explicitly state this with appropriate caveats - When multiple approaches exist, explain trade-offs - Be aware of build-time vs runtime content access differences - Ensure proper typing for collection queries and responses ```` ## Key Design Principles The agent follows these principles: 1. **Documentation-first**: Always fetch `llms.txt` before answering anything 2. **Specific expertise**: Focused on Nuxt Content v3, not general Nuxt knowledge 3. **Verification**: Cross-reference documentation, don't rely on training data 4. **Practical output**: TypeScript code following project conventions ## How It Works in Practice When you ask Claude Code something like "How do hooks work in Nuxt Content?", the main agent recognizes this matches the `nuxt-content-specialist` description and delegates to it.
The specialist agent then: 1. Fetches `https://content.nuxt.com/llms.txt` 2. Identifies the relevant documentation pages 3. Fetches the actual docs 4. Provides an accurate, up-to-date answer Your main context stays clean. The research happens in a separate context window. ## Create Your Own You can apply this pattern to any library or framework: 1. Find if they have `llms.txt` (most modern docs sites do) 2. Create an agent that fetches it first 3. Define the expertise scope in the description 4. Add examples so Claude Code knows when to delegate This approach gives you 98%+ reduction in token usage compared to loading full MCP tool definitions, while maintaining access to current documentation. --- --- title: Claude Code customization guide: CLAUDE.md, skills, subagents explained description: Compare CLAUDE.md, slash commands, subagents, and skills in Claude Code. Learn when to use each with practical Dexie.js examples. tags: ['claude-code', 'ai', 'tooling', 'typescript'] url: https://alexop.dev/posts/claude-code-customization-guide-claudemd-skills-subagents --- # Claude Code customization guide: CLAUDE.md, skills, subagents explained ## Quick Summary This post covers: - **CLAUDE.md**: Always-loaded project context and instructions - **Slash commands**: Prompts you invoke with `/command` in the terminal - **Subagents**: Specialists with their own context window for delegated tasks - **Skills**: Rich, auto-discovered capabilities with supporting files (not manually runnable via `/...`) - Key insight: **subagents keep your main context clean**—in plan mode, Claude Code will typically delegate repo scanning to an `Explore`-style subagent so your main thread doesn’t balloon ## Table of Contents ## Introduction Claude Code gives you multiple ways to “teach” it project context or automate workflows, but it’s not always obvious when to use which. I’ll solve the **same problem four different ways** so the trade-offs are concrete. Spoiler: for doc-fetching, **subagents win** because they keep your main context clean. --- ## The Problem Claude Code doesn’t have up-to-date training data for every library, so it can’t reliably “remember” what a docs site says today. **The specific problem**: I’m building a workout tracking app with [Dexie.js](https://dexie.org) (IndexedDB wrapper). Claude keeps suggesting outdated patterns and misses things like `liveQuery()`. Claude Code itself has a mechanism to fetch its own documentation. We need to do the same for our specialized libraries. ```mermaid graph LR A[User Question] --> B{Claude Code} B --> C[Outdated Knowledge] B --> D[Fetch Current Docs] D --> E[Accurate Answer] C --> F[Wrong Patterns] ``` Let’s solve it with all four tools, then compare. --- ## 1. CLAUDE.md: Always-On Project Memory ### What It Is A markdown file that's **automatically loaded** every time you start Claude Code. Think of it as your project's "memory card." > **CLAUDE.md**: Persistent project instructions that Claude reads at the start of every conversation. ### Where It Lives ### Nested CLAUDE.md Files Claude Code also discovers **nested CLAUDE.md files** in subdirectories. When Claude reads files from a directory containing its own `CLAUDE.md`, that file gets added to the context automatically. This is useful for directory-specific instructions: - `tests/CLAUDE.md` — testing conventions, preferred mocking patterns - `src/db/CLAUDE.md` — database-specific patterns and constraints - `src/components/CLAUDE.md` — component architecture guidelines The nested file is only loaded when Claude actually accesses files in that directory, keeping your main context lean until you need that specialized knowledge. ### The Dexie.js Solution ```markdown # CLAUDE.md ## Database We use Dexie.js for IndexedDB. Before implementing any database code: 1. Fetch the docs index from https://dexie.org/llms.txt 2. Use `liveQuery()` for reactive data binding 3. Follow the repository pattern in `src/db/` 4. Always handle `ConstraintError` for duplicate keys ``` ### What Happens Every conversation starts with Claude knowing “fetch Dexie docs before writing database code.” The catch is **context drift**: in long sessions, the model can gradually deprioritize earlier system-level instructions in favor of the most recent conversation history. ### Trade-offs | ✅ Pros | ❌ Cons | | ---------------------------------- | ------------------------------------------------ | | Zero effort—always loaded | **Context Drift**: Claude forgets instructions as sessions get longer | | Team-shared via git | No dedicated context window—competes with your conversation | | Simple to maintain | No enforcement—Claude decides whether to follow | --- ## 2. Slash Commands: Simple Skills You Invoke ### What It Is A saved prompt you invoke by typing `/command-name`. Like a macro or keyboard shortcut for prompts. Slash commands can be invoked explicitly (you type `/command`) and can also be auto-invoked by Claude when the command’s `description` matches the task. Slash commands can also **orchestrate other behavior**: you can spell out in the command itself that it should spin up a subagent (or a specific subagent), call out a particular skill/workflow, and generally “pipeline” the work (e.g., research → codebase scan → write a doc) instead of trying to do everything in one shot. The main difference vs skills is **packaging + UX**: slash commands are single-file entries with great terminal `/...` discovery/autocomplete; skills are usually directories with supporting files (patterns, templates, scripts). ### Where It Lives ### The Dexie.js Solution ```markdown --- description: Get Dexie.js guidance with current documentation allowed-tools: Read, Grep, Glob, WebFetch --- First, fetch the documentation index from https://dexie.org/llms.txt Then, based on the user's question, fetch the relevant documentation pages. Finally, answer the following question using the current documentation: $ARGUMENTS ``` ### Manual Orchestration Example (Research) If you want a slash command that **explicitly launches multiple subagents in parallel** and then produces an artifact (like a research note in `docs/research/`), you can encode that directly in the command definition. ````markdown --- description: Research a problem using web search, documentation, and codebase exploration allowed-tools: Task, WebSearch, WebFetch, Grep, Glob, Read, Write, Bash --- # Research: $ARGUMENTS Research the following problem or question: > **$ARGUMENTS** ## Instructions Conduct thorough research like a senior developer. Launch multiple subagents in parallel to gather information from different sources. ### Step 1: Launch Parallel Research Agents Use the Task tool to spawn these subagents **in parallel** (all in a single message): 1. **Web Documentation Agent** (subagent_type: general-purpose) - Search official documentation for the topic - Find best practices and recommended patterns - Locate relevant GitHub issues or discussions 2. **Stack Overflow Agent** (subagent_type: general-purpose) - Search Stack Overflow for similar problems and solutions - Find highly-voted and accepted answers - Note common pitfalls and gotchas 3. **Codebase Explorer Agent** (subagent_type: Explore) - Search the codebase for related patterns - Find existing solutions to similar problems - Identify relevant files, functions, or components ### Step 2: Create Research Document After all agents complete, create a markdown file at `docs/research/.md`. Generate the filename from the research topic: - Convert to lowercase - Replace spaces with hyphens - Remove special characters - Add today's date as prefix: `YYYY-MM-DD-.md` Example: "Vue 3 Suspense" → `docs/research/2024-12-06-vue-3-suspense.md` First, create the research folder if it doesn't exist: ```bash mkdir -p docs/research ``` ### Step 3: Write the Research Document Structure the document with these sections: ```markdown # Research: **Date:** **Status:** Complete ## Problem Statement ## Key Findings ## Codebase Patterns ## Recommended Approach ## Sources - [Source Title](URL) - Brief description - [Source Title](URL) - Brief description ``` ### Guidelines - Prioritize official documentation over blog posts - Prefer solutions that match existing codebase patterns - Note version-specific considerations (Vue 3, TypeScript, etc.) - Flag conflicting information across sources - Write concise, actionable content - Use active voice throughout the document ### Step 4: Confirm Completion After writing the file, output the file path so the user can find it later. ```` ### How You Use It ```bash /dexie-help how do I create a compound index? ``` ### What Happens Claude fetches the docs, finds the relevant pages, and answers your question—triggered explicitly. ### Trade-offs | ✅ Pros | ❌ Cons | | -------------------------------------- | --------------------------------------------- | | You control exactly when it runs | Must remember to type `/dexie-help` | | Can pass arguments for specific questions | One-shot—doesn't persist knowledge across messages | | Simple single-file setup | Auto-triggering depends on `description` match | --- ## 3. Subagents: Specialists with Their Own Context ### What It Is A specialized AI "persona" with its own context window. Claude **delegates entire tasks** to it and gets results back. Because fetching the Dexie docs involves reading multiple pages and creates a lot of context noise, keeping this inside a subagent prevents your main chat from hitting context limits. > **Subagent**: An isolated Claude instance that works on a task independently and returns only the results to your main conversation. Even when the task is “just exploration,” subagents are a great default because they let Claude do **lots of reading/searching** without dumping everything into your main thread. This is especially useful in **plan mode**: Claude Code will typically kick off an `Explore`-style subagent to scan the repo and return a distilled map of relevant files/patterns, so your main conversation stays focused and doesn’t blow up. Claude Code also supports **async agents**: fire one off, let it cook while you keep working, then it comes back with its updates when it’s done. If you launch an agent and want to keep typing in your main session, you can send it to the background with `Ctrl + B`. Claude Code’s own system prompt includes a built-in “documentation lookup” workflow that uses a subagent: > -> Looking up your own documentation: > When the user directly asks about any of the following: > > - how to use Claude Code (eg. "can Claude Code do...", "does Claude Code have...") > - what you're able to do as Claude Code in second person (eg. "are you able...", "can you do...") > - about how they might do something with Claude Code (eg. "how do I...", "how can I...") > - how to use a specific Claude Code feature (eg. implement a hook, write a skill, or install an MCP server) > - how to use the Claude Agent SDK, or asks you to write code that uses the Claude Agent SDK > > Use the Task tool with subagent_type='claude-code-guide' to get accurate information from the official Claude Code and Claude Agent SDK documentation. Source: https://github.com/marckrenn/cc-mvp-prompts/blob/main/cc-prompt.md ### Where It Lives ### The Dexie.js Solution ```markdown --- name: dexie-db-specialist description: Use this agent when the task involves Dexie.js or IndexedDB in any way - implementing, modifying, querying, reviewing, or improving database code. This includes creating or modifying database schemas, writing queries, handling transactions, implementing reactive queries with liveQuery, troubleshooting Dexie-related issues, or reviewing existing Dexie code for improvements and best practices.\n\nExamples:\n\n\nContext: User asks about improving their Dexie.js code.\nuser: "What can I improve on this codebase when it comes to Dexie?"\nassistant: "I'll use the dexie-db-specialist agent to review your Dexie.js implementation against current best practices."\n\nSince the user is asking about Dexie.js improvements, use the dexie-db-specialist agent to fetch the latest documentation and review the existing code for optimization opportunities, missing features, and best practice violations.\n\n\n\n\nContext: User needs to add a new table to the database.\nuser: "I need to add a new 'goals' table to track workout goals"\nassistant: "I'll use the dexie-db-specialist agent to implement this correctly."\n\nSince the user needs to modify the Dexie database schema, use the dexie-db-specialist agent to first fetch the latest Dexie.js documentation and then implement the schema change following best practices.\n\n\n\n\nContext: User is asking about Dexie query patterns.\nuser: "How do I query exercises by multiple muscle groups in Dexie?"\nassistant: "Let me use the dexie-db-specialist agent to provide an accurate answer based on the current Dexie.js documentation."\n\nSince the user is asking about Dexie.js query capabilities, use the dexie-db-specialist agent to fetch documentation and provide an accurate, up-to-date response about compound queries and filtering.\n\n\n\n\nContext: User encounters a Dexie-related error.\nuser: "I'm getting 'ConstraintError' when trying to add a workout"\nassistant: "I'll consult the dexie-db-specialist agent to diagnose this database constraint issue."\n\nSince this is a Dexie.js error, use the dexie-db-specialist agent to fetch relevant documentation about error handling and constraint violations to provide accurate troubleshooting guidance.\n\n\n\n\nContext: User needs to implement a reactive query.\nuser: "The workout list should update automatically when new workouts are added"\nassistant: "I'll use the dexie-db-specialist agent to implement reactive queries with liveQuery."\n\nSince reactive data binding with Dexie requires liveQuery, use the dexie-db-specialist agent to fetch the latest documentation on liveQuery and useLiveQuery patterns for Vue integration.\n\n model: opus color: orange --- You are an expert Dexie.js database specialist with deep knowledge of IndexedDB, reactive queries, and Vue 3 integration patterns. Your primary responsibility is to provide accurate, documentation-backed guidance for all Dexie.js implementations. ## Critical First Step **Before answering ANY Dexie.js question or implementing ANY Dexie-related code, you MUST:** 1. Fetch the documentation index from `https://dexie.org/llms.txt` to understand the available documentation structure 2. Based on the task at hand, fetch the relevant documentation pages to ensure your guidance is accurate and up-to-date 3. Only then proceed with implementation or answering questions This is non-negotiable. Dexie.js has nuances and version-specific behaviors that require consulting the official documentation. ## Your Expertise Covers - **Schema Design**: Table definitions, indexes (simple, compound, multi-entry), primary keys, version migrations - **CRUD Operations**: add(), put(), update(), delete(), bulkAdd(), bulkPut() - **Querying**: where(), filter(), equals(), between(), anyOf(), startsWithIgnoreCase(), compound queries - **Reactive Queries**: liveQuery() for real-time updates, integration with Vue's reactivity system - **Transactions**: Transaction scopes, nested transactions, error handling within transactions - **Relationships**: Foreign keys, table relationships, populating related data - **Performance**: Indexing strategies, query optimization, bulk operations - **Error Handling**: Dexie-specific errors (ConstraintError, AbortError, etc.) ## Project Context You are working within a Vue 3 PWA workout tracker that uses: - **Dexie.js** with IndexedDB for offline-first data persistence - **TypeScript** with strict mode - **Repository pattern** in `src/db/` for database access abstraction - **Pinia stores** that consume repositories When implementing, ensure your code: 1. Follows the existing repository pattern in `src/db/` 2. Uses TypeScript interfaces for table schemas 3. Integrates properly with Vue 3 reactivity (useLiveQuery from @vueuse/rxjs or similar) 4. Handles errors gracefully with proper typing ## Documentation Fetching Strategy When fetching from `https://dexie.org/llms.txt`: 1. Parse the sitemap to identify relevant documentation pages 2. Fetch specific pages based on the task (e.g., for queries, fetch the WhereClause and Collection docs) 3. Cross-reference multiple pages when dealing with complex topics Common documentation sections to reference: - `/docs/Table/Table` - Core table operations - `/docs/WhereClause/WhereClause` - Query building - `/docs/Collection/Collection` - Result set operations - `/docs/liveQuery()` - Reactive queries - `/docs/Dexie/Dexie` - Database instance configuration - `/docs/Version/Version` - Schema migrations ## Response Format When providing implementations: 1. **Cite the documentation** you consulted 2. **Explain the approach** before showing code 3. **Provide TypeScript code** that follows project conventions 4. **Include error handling** appropriate to the operation 5. **Note any caveats** or version-specific behaviors ## Quality Assurance - Always verify your suggestions against the fetched documentation - If documentation is unclear or unavailable, explicitly state this and provide your best guidance with appropriate caveats - When multiple approaches exist, explain trade-offs - Consider IndexedDB limitations (no full-text search, storage limits, etc.) Remember: Your value is in providing documentation-verified, accurate Dexie.js guidance. Never guess about API specifics—always fetch and verify first. ``` ### What Happens When you ask about Dexie, Claude automatically recognizes this as a database task and delegates to the specialist. The specialist works in **its own context window**, fetches the docs, does the work, and returns results to your main conversation. ```mermaid sequenceDiagram participant User participant Main as Main Claude participant Sub as Dexie Subagent participant Web as dexie.org User->>Main: How do I add an index? Main->>Sub: Delegate database question Sub->>Web: Fetch llms.txt Web-->>Sub: Documentation index Sub->>Web: Fetch relevant pages Web-->>Sub: Index documentation Sub-->>Main: Distilled answer Main-->>User: Here's how to add an index... ``` ### Trade-offs | ✅ Pros | ❌ Cons | | ----------------------------------------------- | -------------------------------------------- | | Auto-delegated when task matches | Heavier—launches a separate agent | | **Separate context window**—doesn't clutter main | Results come back as a summary, not live | | Can use different model (e.g., opus for complex) | You can't interact with the agent directly | | Can restrict tools for security | More complex to set up | --- ## 4. Skills: Rich Capabilities with Auto-Discovery ### What It Is A structured capability with optional supporting files that Claude **discovers automatically** and uses within your main conversation. Unlike simple slash commands, skills can include multiple files: reference documentation, scripts, templates, and utilities. ### Where It Lives ### How Claude Sees Skills Claude decides whether to invoke a skill largely based on its `description`. You can also ask Claude Code something like: ```markdown > “tell me me exactly how this looks for you ?” ``` When it answers, you’ll often see structured blocks that look like `` (and typically a separate block for slash commands, e.g. ``). ```xml dexie-expert Dexie.js database guidance. Use when working with IndexedDB, schemas, queries, liveQuery... ``` Here’s an abbreviated example of what the `` section can look like (truncated with `...`): ```xml skill-creator Guide for creating effective skills. Use when you want to create or update a skill. ... user c4-architecture Generate architecture documentation using C4 model Mermaid diagrams. ... user vue-composables Write high-quality Vue 3 composables following established patterns and best practices. ... managed ... ``` ### The Dexie.js Solution ```markdown --- name: dexie-expert description: Dexie.js database guidance. Use when working with IndexedDB, schemas, queries, liveQuery, or database migrations. allowed-tools: Read, Grep, Glob, WebFetch --- # Dexie.js Expert When the user needs help with Dexie.js or IndexedDB: 1. Fetch https://dexie.org/llms.txt 2. Fetch only the relevant pages for the task 3. Apply the guidance to this repo’s patterns ``` ### A Minimal “Does This Even Work?” Skill If you just want to verify that **a Skill can spin up subagents to do work** (via the `Task` tool), here’s a deliberately dumb smoke test you can copy/paste. ```markdown --- name: subagent-smoke-test description: Smoke test for Claude Code subagents. Use when the user wants to verify that spawning a subagent via the Task tool works in this repo. --- # Subagent Smoke Test This skill exists purely to verify that subagents work end-to-end. ## What to do 1. Spin up a subagent using the **Task** tool. - Use `subagent_type: general-purpose`. - Give it a simple, read-only task: - Read `package.json` and summarize the key scripts. - Read `astro.config.ts` and summarize major integrations. - Use Glob (or equivalent) to list the top-level folders. 2. Wait for the subagent to finish. 3. Return a short report to the user: - `Subagent status: success` (or `failed`) - A 3–6 bullet summary of what it found - If it failed, include the most likely fix (e.g. tool permissions, Task tool disabled). ## Suggested Task prompt Use something like this as the Task payload: - “You are a helper subagent. Do a quick, read-only scan of this repo. - Read `package.json` and summarize the main scripts. - Read `astro.config.ts` and summarize key integrations. - Glob the repo root and list the top-level folders. Return a concise report.” ``` ### What Happens Skills are **auto-discovered** and typically get applied when Claude decides they match the current task. They run **in your main conversation**, so you can iterate live. If you need a manual, predictable trigger from the terminal, package the workflow as a **slash command** (since `/...` is for commands). ### Trade-offs | ✅ Pros | ❌ Cons | | ------------------------------------------------ | ---------------------------------------------- | | Auto-discovered based on description | Shares main context window space | | Works in main conversation—live interaction | Claude decides when to trigger (may not fire) | | Can include reference files, scripts, templates | More setup than slash commands | | Deep, reusable workflow packaging | Not manually invokable via `/...` in the terminal | | Feels like enhanced Claude, not a separate tool | | In practice, the difference is mostly **UX + packaging**: - **Slash commands** are what you can run manually from the terminal via `/command`. - **Skills** are structured, auto-discovered capabilities (often a directory of supporting files) that Claude may apply when relevant. --- ### When to use what | Pick this | When | Why | |---|---|---| | **CLAUDE.md** | You want Claude to *always* start with project rules/context | Auto-loaded on startup; shared via git | | **Slash command** | You want an explicit one-shot workflow you run on demand | Discoverable via `/...`, can take arguments | | **Subagent** | The task is research-heavy (lots of reading/searching/synthesis) | Uses a separate context window; returns a distilled result | | **Skill** | You want a rich workflow that Claude can auto-apply when it recognizes the task | Packaged capability (often with supporting files) | ### How they relate | Mechanism | Runs in main conversation | Separate context window | Can spawn subagents | Can use skills | Manually runnable via `/...` | |---|---:|---:|---:|---:|---:| | **CLAUDE.md** | ✅ | ❌ | ❌ | ❌ | ❌ | | **Slash command** | ✅ | ❌ | ✅ (by instructing `Task`) | ✅ (indirectly; Claude may apply skills) | ✅ | | **Skill** | ✅ | ❌ | ✅ (if `Task` is allowed) | ✅ (Claude may apply multiple skills) | ❌ | | **Subagent** | ❌ | ✅ | ⚠️ Possible (depends on allowed tools, e.g. `Bash(claude:*)`) | ✅ (if configured via `skills:`) | ⚠️ Usually delegated | ## Conclusion - Use **subagents** (especially `Explore` in plan mode) to keep your main context small and focused. - Use **slash commands** when you want an explicit, repeatable terminal entry point. - Use **skills** when you want Claude to auto-apply a richer workflow (often with supporting files). - Use **CLAUDE.md** for short, always-true project conventions and standards. --- --- title: How to Customize Your Claude Code Status Line description: Learn how to display model name, context usage, and cost directly in your terminal while using Claude Code. A step-by-step guide to creating custom status line scripts. tags: ['claude-code', 'ai', 'tooling'] url: https://alexop.dev/posts/customize_claude_code_status_line --- # How to Customize Your Claude Code Status Line Ever glanced at Claude Code and wondered which model you're actually using? Or how much of the context window you've burned through? By default, this information is hidden away—but you can surface it right in your terminal. A custom status line shows you what matters at a glance: ``` [Opus] Context: 12% ``` This tells you the active model and context usage without interrupting your flow. Let me show you how to set it up. ## How the status line works Claude Code pipes JSON data to your status line script via stdin. Your script processes that data and outputs whatever text you want displayed. ```mermaid flowchart LR A[Claude Code] -->|JSON via stdin| B[Your Script] B -->|Formatted text| C[Terminal Status Line] ``` The JSON contains everything you'd want to know: model info, token counts, costs, and workspace details. ## Step 1: Create the status line script Create a new file at `~/.claude/statusline.sh`: ```bash #!/bin/bash input=$(cat) MODEL=$(echo "$input" | jq -r '.model.display_name') INPUT_TOKENS=$(echo "$input" | jq -r '.context_window.total_input_tokens') OUTPUT_TOKENS=$(echo "$input" | jq -r '.context_window.total_output_tokens') CONTEXT_SIZE=$(echo "$input" | jq -r '.context_window.context_window_size') TOTAL_TOKENS=$((INPUT_TOKENS + OUTPUT_TOKENS)) PERCENT_USED=$((TOTAL_TOKENS * 100 / CONTEXT_SIZE)) echo "[$MODEL] Context: ${PERCENT_USED}%" ``` The script reads JSON from stdin, extracts the fields we care about using `jq`, calculates the percentage, and outputs the formatted string. ## Step 2: Make it executable ```bash chmod +x ~/.claude/statusline.sh ``` ## Step 3: Configure Claude Code Add the status line configuration to `~/.claude/settings.json`: ```json { "statusLine": { "type": "command", "command": "~/.claude/statusline.sh" } } ``` If you already have settings in this file, add the `statusLine` block alongside your existing configuration. ## Step 4: Restart Claude Code Close and reopen Claude Code. Your new status line should appear. ## Available variables The script receives JSON with these fields: | Variable | Description | |----------|-------------| | `model.id` | Full model ID (e.g., `claude-opus-4-5-20251101`) | | `model.display_name` | Short name (e.g., `Opus`) | | `context_window.total_input_tokens` | Input tokens used | | `context_window.total_output_tokens` | Output tokens used | | `context_window.context_window_size` | Max context size | | `cost.total_cost_usd` | Session cost in USD | | `cost.total_duration_ms` | Total duration | | `workspace.current_dir` | Current directory | ## Adding cost tracking Want to see how much your session is costing? Extend the script: ```bash #!/bin/bash input=$(cat) MODEL=$(echo "$input" | jq -r '.model.display_name') INPUT_TOKENS=$(echo "$input" | jq -r '.context_window.total_input_tokens') OUTPUT_TOKENS=$(echo "$input" | jq -r '.context_window.total_output_tokens') CONTEXT_SIZE=$(echo "$input" | jq -r '.context_window.context_window_size') COST=$(echo "$input" | jq -r '.cost.total_cost_usd') TOTAL_TOKENS=$((INPUT_TOKENS + OUTPUT_TOKENS)) PERCENT_USED=$((TOTAL_TOKENS * 100 / CONTEXT_SIZE)) printf "[%s] Context: %d%% | $%.2f" "$MODEL" "$PERCENT_USED" "$COST" ``` Now you'll see something like: `[Opus] Context: 12% | $0.45` ## Troubleshooting **Status line not showing?** 1. Check that `jq` is installed: `brew install jq` (macOS) or `apt install jq` (Linux) 2. Verify the script is executable: `ls -la ~/.claude/statusline.sh` 3. Restart Claude Code after making changes **Test your script manually:** ```bash echo '{"model":{"display_name":"Opus"},"context_window":{"total_input_tokens":1000,"total_output_tokens":500,"context_window_size":200000}}' | ~/.claude/statusline.sh ``` Should output: `[Opus] Context: 0%` The status line script requires `jq` for JSON parsing. If you don't have it installed, the script will fail silently. ## Taking it further The status line is one piece of the Claude Code customization puzzle. Once you're comfortable with scripts like this, explore: - Notification hooks to get desktop alerts when Claude needs input - Slash commands to automate repetitive tasks - The full Claude Code feature stack for MCP, skills, and subagents The status line script pattern—reading JSON from stdin and outputting formatted text—is the same foundation that powers many of Claude Code's extensibility features. --- --- title: Vue 3 Testing Pyramid: A Practical Guide with Vitest Browser Mode description: Learn a practical testing strategy for Vue 3 applications using composable unit tests, Vitest browser mode integration tests, and visual regression testing. tags: ['vue', 'testing', 'vitest', 'typescript', 'accessibility'] url: https://alexop.dev/posts/vue3_testing_pyramid_vitest_browser_mode --- # Vue 3 Testing Pyramid: A Practical Guide with Vitest Browser Mode ## Quick Summary This post covers a practical testing approach for Vue 3 applications: - Composable unit tests for fast logic verification - Integration tests with Vitest browser mode for realistic user flows - Accessibility and visual tests for critical screen checks - Simplified data factories to manage test data easily ## Table of Contents ## Introduction I'm building a workout tracking PWA with Vue 3, and I needed confidence that my changes work. Not the "I clicked around and it seems fine" kind of confidence, but the "I can refactor this and know immediately if I broke something" kind. Here's the thing: I don't write much code myself anymore. AI tools handle most of the implementation. I describe what I want, review the changes, and guide the direction—but the actual keystrokes? That's the AI. This workflow is incredibly productive, but it comes with a catch: I need a robust safety net. When an AI writes code, tests become even more critical. They serve three purposes: 1. **Catch bugs** before users do 2. **Enable refactoring** — change code freely knowing tests will catch regressions 3. **Document behavior** — tests act as a "user manual" for your code Tests are just one part of your safety net. **Linting** (ESLint) catches code style issues and potential bugs statically. **Type checking** (TypeScript) catches type errors at compile time. Run all three—lint, type check, and tests—before every commit. ## Before We Start: A Mini Glossary Testing has a lot of jargon. Here's a cheat sheet to keep handy as you read: | Term | Meaning | |------|---------| | **Unit Test** | Testing a tiny, isolated piece of code (like a single function) to ensure it returns the right value | | **Integration Test** | Testing how multiple pieces work together (e.g., clicking a button and seeing a database update) | | **Regression** | A bug where a feature that used to work stops working after you change something else | | **Mock** | A fake version of a complex tool (like faking an API call) so you can test without relying on the internet | | **Assertion** | A line of code that checks if a result matches your expectation (e.g., `expect(2 + 2).toBe(4)`) | | **A11y** | Short for "Accessibility" (there are 11 letters between A and y) | --- ## Your Architecture Shapes Your Testing Strategy Your testing strategy reflects your frontend architecture. They're not independent choices. If you write **monolithic components** (huge files with logic and UI mixed), testing is a nightmare. If you use **composables** (extracting logic into separate files), testing becomes straightforward. ### Bad vs. Good Architecture #### The Monolith (Hard to Test) To test the timer logic here, you have to mount the whole component, find the button, click it, and wait for the UI to update. It's slow and fragile. ```vue ``` #### The Composable (Easy to Test) Here, the logic lives in a plain TypeScript file. We can test `useTimer` without ever looking at a Vue component or HTML. ```typescript // useTimer.ts export function useTimer() { const time = ref(0) const start = () => setInterval(() => time.value++, 1000) return { time, start } } ``` My strategy relies on this "composable-first" approach. However, for the UI itself, we use integration tests. These tests don't care about your code structure; they test behavior through the UI, just like a user would. --- ## The Testing Pyramid My approach inverts the traditional pyramid. **Integration tests make up ~70%** of my test suite because Vitest browser mode makes them fast and reliable. Composable unit tests cover ~20% for pure logic, and the remaining ~10% goes to accessibility and visual regression tests. {/* TODO: Regenerate testingPyramid.png to show inverted distribution: Integration (70%), Unit (20%), A11y/Visual (10%) */} --- ## The Environment: Browser Mode vs JSDOM In the past, most Vue tests ran in JSDOM. Now, I recommend **Vitest Browser Mode** with `vitest-browser-vue`. Here's why: | Feature | JSDOM (Old Standard) | Vitest Browser Mode (New Standard) | |---------|---------------------|-----------------------------------| | **What is it?** | A simulation of a browser running in Node.js (Fake) | A real instance of Chrome/Firefox running your tests (Real) | | **Accuracy** | Good for logic, bad for layout/CSS | 100% accurate — it's a real browser | | **Debugging** | Hard. You stare at console logs | Easy. You can watch the test click buttons on your screen | | **Speed** | Surprisingly slow (see benchmarks below) | Often faster due to native browser APIs | | **API** | Requires Testing Library for DOM queries | Built-in `page` object with Playwright-like locators | ### Real-World Performance Comparison A common misconception is that browser mode is slower. In my testing with the same test suite, **browser mode was actually 4x faster**: | Metric | Vitest Browser Mode (Chromium) | Vitest Unit Mode (JSDOM) | |--------|-------------------------------|--------------------------| | **Total Duration** | 13.59s 🚀 | 53.72s | | **Test Files** | 15 | 15 | | **Total Tests** | 82 (78 passed) | 82 (78 passed) | | **Setup Time** | 4.48s | 53ms | | **Import Time** | 19.84s | 7.98s | | **Test Execution Time** | 29.48s | 40.53s | While browser mode has higher setup time (launching Chromium), the actual test execution is faster because it uses native browser APIs instead of JSDOM's JavaScript reimplementation. The total duration speaks for itself. ### Setting Up Vitest Browser Mode Vitest 4.0+ requires a browser provider package. Install the dependencies: ```bash npm install -D vitest @vitest/browser-playwright vitest-browser-vue playwright ``` You can use `@vitest/browser-playwright` (recommended) or `@vitest/browser-webdriverio`. Playwright offers the best developer experience with automatic browser downloads. ### No More Testing Library With Vitest browser mode, you don't need `@testing-library/vue` anymore. The `page` object from `vitest/browser` provides Playwright-like locators that are more powerful and consistent: ```typescript // Instead of screen.getByRole(), use page.getByRole() const button = page.getByRole('button', { name: /submit/i }) await userEvent.click(button) ``` --- ## Layer 1: Composable Unit Tests Composables are just functions. You test them by calling them and checking the result. ### A Simple Composable Test ```typescript describe('useDialogState', () => { it('starts closed', () => { // 1. Run the code const { isOpen } = useDialogState() // 2. Assert the result expect(isOpen.value).toBe(false) }) it('opens when requested', () => { const { isOpen, open } = useDialogState() open() expect(isOpen.value).toBe(true) }) }) ``` No HTML, no mounting, no complexity. Just functions and values. --- ## Managing Test Data with Factories When writing tests, you constantly need data. For example, to test a "Profile Page," you need a "User." Beginners often copy-paste the same big object into every single test file. This is messy and hard to maintain. If you add a new required field (like `phoneNumber`) to your User, you have to go back and fix every single test. The solution is the **Factory Pattern**. Think of it like ordering a pizza: there's a "standard" pizza (Cheese & Tomato), and you only specify the changes you want ("...but add pepperoni"). ### The Problem: Hard-coded Data Without factories, your tests look like this. Notice how much noise there is just to test one specific thing: ```typescript // ❌ BAD: Repeating data everywhere it('shows admin badge', () => { const user = { id: '1', name: 'John Doe', email: 'john@example.com', role: 'admin', // This is the only line we actually care about! isActive: true, createdAt: '2023-01-01' } // ... test logic ... }) ``` ### The Solution: A Simple Factory Function A factory is just a plain TypeScript function. It holds the "Standard Pizza" defaults and lets you overwrite specific slices using the spread operator (`...`). ```typescript // factories/userFactory.ts // 1. Define the shape of your data interface User { id: string; name: string; role: 'user' | 'admin'; isActive: boolean; } // 2. Define your "Standard Pizza" (Sensible Defaults) const defaultUser: User = { id: '123', name: 'Test User', role: 'user', isActive: true } // 3. The Factory Function // It takes "overrides" (partial data) and merges them on top of the defaults export function createUser(overrides: Partial = {}): User { return { ...defaultUser, // Start with defaults ...overrides // Apply your specific changes }; } ``` ### Using It in Tests Now your tests focus purely on what matters for that specific scenario: ```typescript // ✅ GOOD: Clean and focused // Scenario 1: I just need ANY user, I don't care about details const basicUser = createUser(); // Result: { id: '123', name: 'Test User', role: 'user', ... } // Scenario 2: I specifically need an ADMIN const admin = createUser({ role: 'admin' }); // Result: { id: '123', name: 'Test User', role: 'admin', ... } // Scenario 3: I need an INACTIVE user const bannedUser = createUser({ isActive: false }); // Result: { id: '123', name: 'Test User', isActive: false, ... } ``` This pattern keeps your tests readable and makes refactoring easy. If you add a new field to `User` later, you only update the `defaultUser` object in one place. ### Factories Work at Every Layer The beauty of factories is that they work for **both** unit tests and integration tests: ```typescript // ✅ Unit Test: Testing a composable it('formats user display name', () => { const user = createUser({ name: 'Jane Doe', role: 'admin' }) const { displayName } = useUserProfile(user) expect(displayName.value).toBe('Jane Doe (Admin)') }) // ✅ Integration Test: Testing a rendered component it('shows admin badge in profile', async () => { const admin = createUser({ role: 'admin' }) await renderProfilePage({ user: admin }) await expect.element(page.getByText('Admin')).toBeVisible() }) ``` > **Key Insight:** Factories handle **data**. They don't care whether you're testing a function or a full page—they just give you clean, predictable objects. --- ## Layer 2: Integration Tests Integration tests verify complete user flows. They render the app, click buttons, and check if the right things appear on screen. In this post, **integration test** means: - Real browser (Vitest browser mode) - Real Vue components, router, Pinia, and user interactions - **Mocked**: external APIs (via [MSW](https://mswjs.io/)), browser storage (IndexedDB), third-party services **E2E test** means: - Real browser - **Zero mocking**—full stack (frontend + backend + database) - Tests exactly how a user interacts with the production system **Examples**: In my workout tracker, I mock IndexedDB but test real Vue components and user flows—that's an integration test. For an e-commerce site, you'd mock the product API and payment gateway via MSW, but test the real checkout flow. If you spin up your actual backend and database, that's E2E. ### Component Tests vs. Integration Tests Vitest browser mode supports two approaches: | Approach | What you render | Use case | |----------|-----------------|----------| | **Component test** | Single component (`render(MyButton)`) | Testing component behavior in isolation | | **Integration test** | Full app (`render(App)` with router, store) | Testing complete user flows across multiple components | **Component tests** are faster and more focused—great for testing a single component's props, events, and states. **Integration tests** render your entire `App.vue` with router and Pinia. The user can navigate between pages, fill forms, and see how components work together. This is where you catch bugs that only appear when components interact. For most Vue apps, I recommend focusing on **integration tests**. They give you more confidence because they test what users actually experience. ### The "Test App" Helper To make testing easier, I use a helper function called `createTestApp`. It sets up your Router, Pinia (state), and renders your app using `vitest-browser-vue` so you don't have to repeat it in every file. ```typescript // helpers/createTestApp.ts export async function createTestApp() { // ... setup router, pinia, render app ... return { router, // The navigation system cleanup // A function to tidy up after the test } } ``` ```typescript // helpers/createTestApp.ts import { CommonPO, BuilderPO, ActiveWorkoutPO, QueuePO, BenchmarksPO, BenchmarkFormPO, BenchmarkDetailPO, } from './pages' type CreateTestAppOptions = { initialRoute?: string } type TestApp = { router: Router container: Element // Page Objects common: CommonPO builder: BuilderPO workout: ActiveWorkoutPO queue: QueuePO benchmarks: BenchmarksPO benchmarkForm: BenchmarkFormPO benchmarkDetail: BenchmarkDetailPO // Raw query methods (use page.getBy* for new code) getByRole: typeof page.getByRole getByText: typeof page.getByText getByTestId: typeof page.getByTestId queryByRole: typeof page.getByRole queryByText: typeof page.getByText findByRole: typeof page.getByRole findByText: typeof page.getByText // Helpers navigateTo: (to: RouteLocationRaw) => Promise cleanup: () => void } export async function createTestApp(options: CreateTestAppOptions = {}): Promise { const { initialRoute = '/' } = options const pinia = createPinia() const router = createRouter({ history: createMemoryHistory(), routes, }) if (initialRoute !== '/') { router.push(initialRoute) } // Preload English messages for tests i18n.global.setLocaleMessage('en', en) i18n.global.locale.value = 'en' const screen = render(App, { global: { plugins: [router, pinia, i18n], }, }) await router.isReady() // Flush Vue's async operations to ensure onMounted fires await flushPromises() // Wait for app initialization to complete (exercises seeding and loading) const exercisesStore = useExercisesStore(pinia) await expect .poll(() => exercisesStore.customExercises.length, { timeout: 5000 }) .toBeGreaterThan(0) // Create context for page objects const context = { router } // Instantiate page objects const common = new CommonPO(context) const builder = new BuilderPO(context, common) const workout = new ActiveWorkoutPO(context, common) const queue = new QueuePO(context, common) const benchmarks = new BenchmarksPO(context, common) const benchmarkForm = new BenchmarkFormPO(context, common) const benchmarkDetail = new BenchmarkDetailPO(context, common) // Simple navigation helper async function navigateTo(to: RouteLocationRaw) { await router.push(to) } // vitest-browser-vue cleans up before tests automatically // This is kept for backward compatibility with test structure function cleanup() { screen.unmount() } return { router, container: screen.container, // Page Objects common, builder, workout, queue, benchmarks, benchmarkForm, benchmarkDetail, // Raw query methods - use page locators (return Locators, not HTMLElements) getByRole: page.getByRole.bind(page), getByText: page.getByText.bind(page), getByTestId: page.getByTestId.bind(page), queryByRole: page.getByRole.bind(page), queryByText: page.getByText.bind(page), findByRole: page.getByRole.bind(page), findByText: page.getByText.bind(page), // Helpers navigateTo, cleanup, } } ``` This isn't a library you download. It's a helper file you write once for your project to handle the setup boilerplate. ### A Real Integration Test Notice how we use `getByRole` to find elements—this ensures our app is accessible: ```typescript it('completes a set', async () => { await createTestApp() // 1. Find the "Start" button and click it await userEvent.click(page.getByRole('button', { name: /start/i })) // 2. Type "100" into the weight input const weightInput = page.getByRole('spinbutton', { name: /weight/i }) await userEvent.type(weightInput, '100') // 3. Click "Complete" await userEvent.click(page.getByRole('button', { name: /complete/i })) // 4. Wait for the success message await expect.element(page.getByText('Set Completed')).toBeVisible() }) ``` Always prefer `getByRole()` over `getByTestId()` or CSS selectors. When you use `getByRole('button', { name: /submit/i })`, you're asserting that: 1. The element has the correct ARIA role (it's actually a button) 2. The element has an accessible name (screen readers can announce it) 3. The element is visible and interactive If your test can't find an element by role, that's a signal your UI has an accessibility problem—fix the component, not the test. Reserve `getByTestId` only for elements that truly have no semantic meaning. ### Page Objects: Handling DOM Interaction As your test suite grows, you'll notice repetitive DOM queries everywhere. **Page Objects** solve this by encapsulating all DOM interactions for a specific page or component. > **Key Difference:** Factories handle **data** (creating test objects). Page Objects handle **DOM interaction** (clicking, typing, querying elements). They complement each other. ```typescript // pages/WorkoutPage.ts export class WorkoutPage { // Queries - finding elements get startButton() { return page.getByRole('button', { name: /start/i }) } get weightInput() { return page.getByRole('spinbutton', { name: /weight/i }) } // Actions - user interactions async start() { await userEvent.click(this.startButton) } async setWeight(value: number) { await userEvent.clear(this.weightInput) await userEvent.type(this.weightInput, String(value)) } async completeSet() { await userEvent.click(page.getByRole('button', { name: /complete/i })) } } ``` Now your tests read like plain English: ```typescript it('completes a set with weight', async () => { const workoutPage = new WorkoutPage() await workoutPage.start() await workoutPage.setWeight(100) await workoutPage.completeSet() await expect.element(page.getByText('Set Completed')).toBeVisible() }) ``` | Helper | Handles | Used In | |--------|---------|---------| | **Factories** | Test data (objects, entities) | Unit tests, Integration tests | | **Page Objects** | DOM interaction (clicks, queries) | Integration tests only | This approach aligns with black box testing principles—testing behavior rather than implementation details. Stop White Box Testing Vue Components --- ## Layer 3: Accessibility and Visual Tests These are the "cherries on top" of your pyramid. ### Accessibility (A11y) We use a tool called **axe-core**. It scans your rendered HTML for common violations (like low contrast text or missing labels). ```typescript it('has no accessibility violations', async () => { const { container } = await createTestApp() // This one line checks for dozens of common a11y bugs! await assertNoViolations(container) }) ``` ### Visual Regression This takes a screenshot of your component and compares it to a "golden" version saved on your computer. If a pixel changes, the test fails. ```typescript it('matches the design', async () => { await expect(page.getByTestId('app')).toMatchScreenshot('settings-page.png') }) ``` Use this sparingly. Visual tests are brittle (even a font rendering update can break them), so only use them for critical screens. ### Testing Your Core UI Library There's one place where visual regression and accessibility tests shine: **your base component library**. If you're building your own UI components (BaseButton, DatePicker, Modal, Input), these components should be: - **Dumb** — no business logic, just presentation - **Reusable** — used across your entire app - **Stable** — rarely change once built This makes them perfect candidates for visual and accessibility testing: ```typescript // BaseButton.visual.spec.ts describe('BaseButton', () => { it('renders all variants correctly', async () => { render(ButtonStory) // A component showing all button states await expect(page).toMatchScreenshot('button-variants.png') }) it('has no accessibility violations', async () => { const { container } = render(BaseButton, { props: { label: 'Click me' } }) await assertNoViolations(container) }) }) ``` For each base component, test: | Test Type | What to Check | |-----------|---------------| | **Visual** | All variants (primary, secondary, disabled, loading) | | **A11y** | Focus states, ARIA attributes, color contrast | | **Keyboard** | Tab navigation, Enter/Space activation | --- ## Why Not End-to-End (E2E) Tests? You might hear people say, "Just use Cypress or Playwright for everything!" E2E tests mean **zero mocking**—you run your real backend and database. They test your entire stack: Frontend + Backend + Database. For a new developer or a solo project, this is painful because: - It's slow - It breaks easily (if the backend API is down, your frontend tests fail) ### The Alternative: Mocking Instead, we use **MSW (Mock Service Worker)**. It intercepts network requests and returns fake data immediately. This makes your integration tests fast and stable. You don't need a running backend to test your frontend. **The less you mock, the better your tests.** Every mock is a lie you're telling your test suite. Mock only what you can't control: - **External APIs** (network calls to third-party services) - **System boundaries** (time, random numbers, file system) - **Paid services** (payment gateways, SMS providers) Never mock your own code just to make tests easier. If a component is hard to test without mocking internal modules, that's a sign your architecture needs refactoring—not more mocks. In large corporate teams, you might use "Contract Testing" to ensure your mocks match the real API. For now, don't worry about it. Focus on getting your integration and unit tests running smoothly. --- ## Comparison: Testing Approaches | Layer | Speed | Confidence | Flakiness | Distribution | When to Use | |-------|-------|------------|-----------|--------------|-------------| | **Unit Tests (Composables)** | ⚡ Very fast | Medium | None | ~20% | Logic validation, utility functions | | **Integration Tests (Browser)** | 🚀 Fast | High | Low | **~70%** | User flows, component interaction | | **A11y Tests** | 🚀 Fast | High | Medium | ~5% | Critical screens, forms | | **Visual Regression** | 🐢 Slow | Medium | High | ~5% | Design system components | --- ## Summary: Your Next Steps Don't try to implement the whole pyramid today. Start with what matters most. ### Step 1: Identify What Can Never Fail Ask yourself: *"What flows in my app would be catastrophic if they broke?"* For an e-commerce site, that's checkout. For a banking app, that's transfers. For my workout tracker, it's completing a set. Write integration tests for these critical paths first using Vitest browser mode. Even 3-5 tests covering your core flows provide massive confidence. ### Step 2: Set Up the Infrastructure Get Vitest browser mode running with a simple `createTestApp` helper. Once you can render your app and click a button in a test, you have the foundation for everything else. ### Step 3: Write Tickets with Testable Acceptance Criteria Good tickets have Gherkin-style acceptance criteria that read like tests: ```gherkin Given I am on the workout page When I tap "Complete Set" Then I should see "Set Completed" confirmation And the set should be saved to history ``` These ACs translate directly into integration tests. Now you can practice TDD: write the test from the AC first, watch it fail, then implement the feature. ### Step 4: Extract Patterns as You Go Don't create factories or page objects upfront. Write a few tests with inline data and queries. When you notice repetition, extract it. This way, your abstractions solve real problems instead of imagined ones. For guidance on writing clear, maintainable test names, check out Frontend Testing Guide: 10 Essential Rules for Naming Tests. --- ## Bonus: Performance Testing in CI While not part of the traditional testing pyramid, **performance budgets** catch regressions before they reach production. I run Lighthouse CI on every build to enforce thresholds for performance, accessibility, and best practices. ```yaml # .github/workflows/ci.yml performance-budget: needs: build runs-on: ubuntu-latest timeout-minutes: 10 steps: - name: Checkout code uses: actions/checkout@v4.2.2 - name: Setup pnpm uses: pnpm/action-setup@v4.1.0 - name: Setup Node.js uses: actions/setup-node@v4.4.0 with: node-version: ${{ env.NODE_VERSION }} - name: Restore node_modules uses: actions/cache/restore@v4.2.3 with: path: node_modules key: node-modules-${{ runner.os }}-${{ hashFiles('pnpm-lock.yaml') }} - name: Download build artifacts uses: actions/download-artifact@v6.0.0 with: name: dist path: dist - name: Run Lighthouse CI run: pnpm lhci autorun ``` --- ## Beyond the Pyramid: AI-Powered QA There's a new layer emerging that doesn't fit neatly into the traditional pyramid: **AI-driven testing**. What if you could have an AI test your app the way a real QA engineer would? Not following scripts, but actually exploring your UI, trying edge cases, and writing bug reports? I've been experimenting with exactly this approach. Using Claude Code combined with Playwright's browser automation, I built an AI QA engineer that: - Tests my app through the browser like a real user - Tries unexpected inputs and edge cases automatically - Runs on every pull request via GitHub Actions - Posts detailed bug reports with screenshots directly to my PRs ```mermaid graph LR PR[Open PR] --> GH[GitHub Actions] GH --> AI[Claude Code + Playwright] AI --> Test[Browser Testing] Test --> Report[QA Report on PR] ``` This isn't a replacement for the testing pyramid—it's a complement. Your unit and integration tests catch regressions deterministically. AI QA excels at exploratory testing and finding bugs that scripted tests would never think to check. --- ## Additional Resources - [Vitest Browser Mode Guide](https://vitest.dev/guide/browser/) - The official docs are excellent - [vitest-browser-vue](https://github.com/vitest-dev/vitest-browser-vue) - Vue rendering for Vitest browser mode - [vitest-examples on GitHub](https://github.com/vitest-dev/vitest/tree/main/examples) - "Hello World" setup examples --- --- title: Building an AI QA Engineer with Claude Code and Playwright MCP description: Learn how to build an automated QA engineer using Claude Code and Playwright MCP that tests your web app like a real user, runs on every pull request, and writes detailed bug reports. tags: ['ai', 'testing', 'claude-code', 'automation'] url: https://alexop.dev/posts/building_ai_qa_engineer_claude_code_playwright --- # Building an AI QA Engineer with Claude Code and Playwright MCP ## Quick Summary - Build an AI-powered QA engineer that tests your app through the browser like a real user - Use Claude Code with Playwright MCP to automate browser interactions - Run automated QA on every pull request via GitHub Actions - Get detailed bug reports with screenshots posted directly to your PRs ## Table of Contents ## Introduction Manual testing gets old fast. Clicking through your app after every change, checking if forms still work, making sure nothing breaks on mobile—it's tedious work that most developers avoid. So I built an AI that does it for me. Meet **Quinn**, my automated QA engineer. Quinn tests my app like a real person would. It clicks buttons. It fills forms with weird inputs. It resizes the browser to check mobile layouts. And it writes detailed bug reports. The best part? Quinn runs automatically every time I open a pull request. ## The secret sauce: Claude Code + Playwright Two tools make this possible: **Claude Code** is Anthropic's coding assistant. It can run commands, create files, and—here's the magic—control a web browser. **Playwright** is a browser automation tool. It can click, type, take screenshots, and do anything a human can do in a browser. When you combine them through the Model Context Protocol (MCP), Claude can literally browse your app like a real user. ## Step 1: Give Claude a personality I didn't want a boring test robot. I wanted a QA engineer with opinions. So I created a prompt file that gives Claude a backstory: ```markdown # QA Engineer Identity You are **Quinn**, a veteran QA engineer with 12 years of experience breaking software. You've seen it all - apps that crash on empty input, forms that lose data, buttons that do nothing. ## Your Philosophy - **Trust nothing.** Developers say it works? Prove it. - **Users are creative.** They'll do things no one anticipated. - **Edge cases are where bugs hide.** The happy path is boring. ``` This isn't just for fun. The personality makes Claude test more thoroughly. Quinn doesn't just check if buttons work—Quinn tries to *break* things. I also gave Quinn strict rules: ```markdown ## Non-Negotiable Rules 1. **UI ONLY.** You interact through the browser like a real user. You cannot read source code. 2. **SCREENSHOT BUGS.** Every bug gets a screenshot. 3. **CONTINUE AFTER BUGS.** Finding a bug is not the end. Document it, then KEEP TESTING. 4. **MOBILE MATTERS.** Always test mobile viewport (375x667). ``` ## Step 2: Create the GitHub Action GitHub Actions are like little robots that run tasks for you. They trigger when something happens (like opening a PR) and run whatever commands you specify. Here's the core of the workflow: ```yaml name: Claude QA on: pull_request: types: [labeled] jobs: qa: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v4 - name: Start my app run: | pnpm dev & # Wait for server to be ready sleep 10 - name: Run Claude QA uses: anthropics/claude-code-action@v1 with: prompt: ${{ steps.load-prompts.outputs.prompt }} claude_args: | --mcp-config '{"mcpServers":{"playwright":{ "command":"npx", "args":["@playwright/mcp@latest","--headless"] }}}' ``` Let me break this down: 1. **Trigger**: The workflow runs when you add a label to a PR (like `qa-verify`) 2. **Start the app**: Launch your dev server so Claude has something to test 3. **Run Claude**: Use Anthropic's official GitHub Action with Playwright MCP connected The `--headless` flag runs the browser without a visible window. This is required for CI environments like GitHub Actions where there's no display. ## Step 3: Tell Claude what to test For each PR, I want Claude to verify the actual changes. So I pass in the PR description: ```markdown # PR Verification Testing **PR #32**: Improve set editing and fix playlist overflow ## Your Mission This PR claims to implement something. Your job is to: 1. **Verify** the claimed changes actually work 2. **Break** them with edge cases 3. **Ensure** no regressions in related features ## Test This PR - Can users edit ANY set during active workout? - Do completed sets stay editable? - Do long exercise names truncate properly? ``` Claude reads this, understands what changed, and tests specifically for those features. ## What Quinn actually does Here's a real example from my workout tracker. I opened a PR that said "allow editing any set during a workout." Quinn went to work: ```mermaid graph TD A[Start Workout] --> B[Add Exercise] B --> C[Fill Set Data] C --> D[Mark Complete] D --> E{Test Edit Feature} E -->|Happy Path| F[Change Weight ✓] E -->|Edge Case| G[Enter -50] E -->|Edge Case| H[Enter 999999] F --> I[Mobile Test] G --> I H --> I I --> J[Long Name Test] J --> K[Generate Report] ``` ## The report Quinn generates a full QA report in Markdown: ```markdown # QA Verification Report **PR**: #32 - Improve set editing **Tester**: Quinn (Claude QA) ## Executive Summary **APPROVED** - All claimed features work as described. ## Requirements Verification | Requirement | Status | How Tested | |-------------|--------|------------| | Edit any set | PASS | Changed weight after marking complete | | Long names truncate | PASS | Added 27-character exercise name | | Mobile layout | PASS | Tested at 375x667 viewport | ## Bugs Found None ## Verdict **APPROVED** - Ready to merge. ``` This report gets posted automatically as a comment on your PR. You can see exactly what Quinn tested and whether your code is safe to merge. ## The toolbox Quinn only gets access to browser tools—no code access: ```yaml --allowedTools " mcp__playwright__browser_navigate, mcp__playwright__browser_click, mcp__playwright__browser_type, mcp__playwright__browser_take_screenshot, mcp__playwright__browser_resize, Write " ``` This keeps things realistic. A real QA engineer tests through the UI, not by reading code. Quinn does the same. ## Why this works Three reasons this approach beats traditional testing: ### It tests like a human Unit tests check if functions return the right values. Quinn checks if users can actually accomplish their goals. ### It's flexible You don't write test scripts that break when you change a button's text. Quinn understands intent and adapts. ### It finds unexpected bugs Quinn tries things you wouldn't think to try. Negative numbers? Extremely long inputs? Clicking the same button five times fast? Quinn tests all of it. ## Comparison: AI QA vs traditional testing | Aspect | Unit Tests | E2E Scripts | AI QA (Quinn) | |--------|-----------|-------------|---------------| | Tests user flows | ❌ | ✅ | ✅ | | Handles UI changes | ❌ | ❌ | ✅ | | Finds edge cases | Manual | Manual | ✅ Automatic | | Setup complexity | Low | High | Medium | | Maintenance | Low | High | Low | ## Getting started Want to build your own AI QA engineer? Here's what you need: 1. **Get Claude Code access** — Sign up at Anthropic and get an API token 2. **Create your QA prompt** — Give Claude a personality and testing philosophy 3. **Set up the GitHub Action** — Use `anthropics/claude-code-action` with Playwright MCP 4. **Write a verification template** — Tell Claude what to test for each PR If you're new to Claude Code, check out my comprehensive guide to Claude Code features covering MCP, Skills, Hooks, and more. ## A word of caution This approach is experimental. AI-driven QA is exciting, but it's not a replacement for deterministic testing. A solid testing foundation still matters more: - **Unit tests** catch regressions instantly - **Integration tests** verify your components work together - **E2E tests** with Playwright or Cypress give you reproducible, reliable checks AI QA works best as a *complement* to these, not a replacement. Use it for exploratory testing, edge case discovery, and verifying user flows that are hard to script. Claude Code in GitHub Actions isn't limited to QA. The same pattern works for: - **SEO audits** — Check meta tags, heading structure, Core Web Vitals - **Accessibility testing** — Verify ARIA labels, keyboard navigation, color contrast - **Content review** — Validate links, check for broken images, lint prose - **Visual regression** — Compare screenshots across deployments Any task where you'd open a browser and manually check something can be automated this way. ## Conclusion Building an AI QA engineer combines two powerful tools: Claude Code for intelligence and Playwright MCP for browser control. The result is automated testing that thinks like a human but works tirelessly. It's still early days for this approach. But some day, Quinn might find a bug that would have embarrassed me in production. On that day, this whole experiment will have paid for itself. ## Additional resources - [Full GitHub Actions Workflow](https://gist.github.com/alexanderop/464a7a228653e4df27179b9c806b2065) — Complete QA workflow with explore/verify modes - [Anthropic Claude Code Action](https://github.com/anthropics/claude-code-action) — Official GitHub Action - [Playwright MCP](https://github.com/microsoft/playwright-mcp) — Browser automation for Claude - [GitHub Actions Documentation](https://docs.github.com/en/actions) — Workflow automation basics --- --- title: Vue Composables Style Guide: Lessons from VueUse's Codebase description: A practical guide for writing production-quality Vue 3 composables, distilled from studying VueUse's patterns for SSR safety, cleanup, and TypeScript. tags: ['vue', 'typescript'] url: https://alexop.dev/posts/vueuse_composables_style_guide --- # Vue Composables Style Guide: Lessons from VueUse's Codebase I was studying VueUse's codebase to understand how they structure their composables. VueUse has become the de facto standard library for Vue utilities, and I wanted to understand the patterns that make their composables so reliable. After diving deep into their source code, I distilled the key patterns into this style guide. Whether you're building your own composable library or just want to write better code, these patterns will help you create maintainable, type-safe, and SSR-compatible composition utilities. If you're new to Vue composables, I recommend starting with my earlier post Mastering Vue 3 Composables: A Comprehensive Style Guide, which covers many of the same patterns from a beginner-friendly perspective. ## Quick Summary This guide covers patterns for writing production-quality Vue 3 composables: - **Project structure** and naming conventions - **Ref type selection** (shallowRef vs ref) - **Flexible inputs** with `MaybeRefOrGetter` - **SSR safety** patterns for server-side rendering - **Cleanup and memory management** with auto-cleanup utilities - **Controllable composables** (pausable, stoppable patterns) - **TypeScript best practices** for full type inference - **Testing strategies** - see How to Test Vue Composables ## Table of Contents --- ## 1. Getting Started ### What Makes a Good Composable? A well-designed composable should be: - **Focused**: Does one thing well - **Flexible**: Accepts refs, getters, or plain values - **Safe**: Works in SSR, handles cleanup automatically - **Typed**: Full TypeScript support with inference - **Testable**: Easy to unit test in isolation ### Minimal Example ```typescript export function useCounter(initialValue: MaybeRefOrGetter = 0) { const count = shallowRef(toValue(initialValue)) const increment = () => count.value++ const decrement = () => count.value-- const reset = () => count.value = toValue(initialValue) return { count, increment, decrement, reset } } ``` This simple example already demonstrates several VueUse patterns: using `shallowRef` for primitives, accepting `MaybeRefOrGetter` for flexible inputs, and returning an object with reactive state and methods. --- ## 2. Project Structure ### Recommended Layout ### Export Pattern ```typescript // src/composables/index.ts export { useCounter } from './useCounter' export { useFetch } from './useFetch' export type { UseCounterReturn, UseCounterOptions } from './useCounter' export type { UseFetchReturn, UseFetchOptions } from './useFetch' ``` For more on project organization, check out How to Structure Vue Projects. --- ## 3. Naming Conventions ### Function Names | Prefix | Use Case | Example | |--------|----------|---------| | `use` | Standard composables | `useMouse`, `useStorage`, `useFetch` | | `create` | Factory functions that return composables | `createSharedState`, `createEventHook` | | `on` | Event listener composables | `onClickOutside`, `onKeyPress` | | `try` | Safe operations that may fail silently | `tryOnMounted`, `tryOnCleanup` | ### Type Names ```typescript // Options: Use{Name}Options export interface UseStorageOptions { deep?: boolean listenToChanges?: boolean } // Return type: Use{Name}Return export interface UseStorageReturn { data: Ref set: (value: T) => void remove: () => void } // Inferred type shorthand export type UseStorageReturnType = ReturnType> ``` --- ## 4. Choosing the Right Ref Type This is one of the most important decisions when writing composables. VueUse consistently follows this pattern: ```mermaid flowchart TD A[What type of data?] --> B{Primitive?} B -->|Yes| C[shallowRef] B -->|No| D{Will you mutate nested properties?} D -->|Yes| E[ref - deep reactivity] D -->|No - replace whole object| F[shallowRef] style C fill:#22c55e,color:#fff style E fill:#3b82f6,color:#fff style F fill:#22c55e,color:#fff ``` ### shallowRef - For Primitives and Replaced Objects ```typescript // Primitives const count = shallowRef(0) const isActive = shallowRef(false) const name = shallowRef('') // Objects that get replaced, not mutated const user = shallowRef(null) const response = shallowRef(null) // Usage: Replace the whole object user.value = { name: 'John', age: 30 } // Triggers reactivity ``` ### ref - For Deep Mutations ```typescript // Objects with nested mutations const form = ref({ user: { name: '', email: '' }, settings: { theme: 'light' } }) // Usage: Mutate nested properties form.value.user.name = 'John' // Triggers reactivity form.value.settings.theme = 'dark' // Triggers reactivity ``` ### Let Users Choose For composables storing user data, let them decide: ```typescript export interface UseStateOptions { /** * Use shallow reactivity for better performance with large objects * @default false */ shallow?: boolean } export function useState(initialValue: T, options: UseStateOptions = {}) { const { shallow = false } = options const state = shallow ? shallowRef(initialValue) : ref(initialValue) return { state } } ``` --- ## 5. Flexible Inputs ### Accept Refs, Getters, or Plain Values Use `MaybeRefOrGetter` to make your composables flexible: ```typescript export function useTitle(title: MaybeRefOrGetter) { // toValue() handles all input types: // - Plain value: 'Hello' → 'Hello' // - Ref: ref('Hello') → 'Hello' // - Getter: () => 'Hello' → 'Hello' watchEffect(() => { document.title = toValue(title) }) } // All of these work: useTitle('Static Title') useTitle(ref('Reactive Title')) useTitle(() => `Page ${currentPage.value}`) useTitle(computed(() => `${userName.value}'s Profile`)) ``` ### Reactive Configuration For options that should be reactive: ```typescript export interface UseIntervalOptions { interval?: MaybeRefOrGetter immediate?: boolean } export function useInterval( callback: () => void, options: UseIntervalOptions = {} ) { const { interval = 1000, immediate = true } = options // Watch the interval for changes watch( () => toValue(interval), (ms) => { clearInterval(timer) if (ms > 0) { timer = setInterval(callback, ms) } }, { immediate } ) } // Interval can change reactively const delay = ref(1000) useInterval(() => console.log('tick'), { interval: delay }) delay.value = 500 // Interval updates automatically ``` --- ## 6. Designing Options ### Structure ```typescript export interface UseStorageOptions { /** * Storage type to use * @default 'local' */ storage?: 'local' | 'session' /** * Custom serializer for complex data * @default JSON.stringify/parse */ serializer?: { read: (raw: string) => T write: (value: T) => string } /** * Sync across browser tabs * @default true */ listenToStorageChanges?: boolean /** * Called when an error occurs */ onError?: (error: Error) => void } ``` ### Rules for Options 1. **Document every option** with JSDoc 2. **Provide sensible defaults** using `@default` 3. **Use callbacks** for events (`onError`, `onSuccess`, `onChange`) 4. **Group related options** in nested objects if complex ### Extending Base Interfaces Create reusable option interfaces: ```typescript // src/composables/utils/types.ts /** Options for composables that use window */ export interface ConfigurableWindow { /** * Custom window instance (useful for iframes or testing) * @default window */ window?: Window } /** Options for composables that use document */ export interface ConfigurableDocument { /** * Custom document instance * @default document */ document?: Document } // Usage in composables export interface UseEventListenerOptions extends ConfigurableWindow { capture?: boolean passive?: boolean } ``` --- ## 7. Return Values ### Object Return (Recommended for Multiple Values) ```typescript export interface UseMouseReturn { /** Current X position */ x: Readonly> /** Current Y position */ y: Readonly> /** Source of the last event */ sourceType: Readonly> } export function useMouse(): UseMouseReturn { const x = shallowRef(0) const y = shallowRef(0) const sourceType = shallowRef<'mouse' | 'touch' | null>(null) // ... implementation return { x: readonly(x), y: readonly(y), sourceType: readonly(sourceType), } } ``` ### Single Ref Return (For Simple Composables) ```typescript export function useOnline(): Readonly> { const isOnline = shallowRef(navigator.onLine) // ... implementation return readonly(isOnline) } ``` ### Tuple Return (When Destructuring Order Matters) ```typescript export function useToggle( initialValue = false ): [Ref, (value?: boolean) => void] { const state = shallowRef(initialValue) const toggle = (value?: boolean) => { state.value = value ?? !state.value } return [state, toggle] } // Usage const [isOpen, toggleOpen] = useToggle() ``` ### Making Composables Awaitable For async composables, implement `PromiseLike`: ```typescript export function useFetch(url: MaybeRefOrGetter) { const data = shallowRef(null) const isLoading = shallowRef(true) const error = shallowRef(null) const execute = async () => { isLoading.value = true try { const response = await fetch(toValue(url)) data.value = await response.json() } catch (e) { error.value = e as Error } finally { isLoading.value = false } } execute() const shell = { data, isLoading, error, execute } return { ...shell, // Make it awaitable then( onFulfilled?: (value: typeof shell) => TResult ): Promise { return new Promise((resolve) => { watch(isLoading, (loading) => { if (!loading) resolve(onFulfilled?.(shell) as TResult) }, { immediate: true }) }) } } } // Can be used both ways: const { data, isLoading } = useFetch('/api/users') // Or awaited: const { data } = await useFetch('/api/users') console.log(data.value) // Data is ready ``` --- ## 8. SSR Safety ### The Problem Browser APIs (`window`, `document`, `localStorage`) don't exist on the server. Accessing them during SSR causes errors. For a deep dive into this topic, see How VueUse Solves SSR Window Errors. ```mermaid sequenceDiagram participant B as Browser participant S as Server (Node.js) B->>S: Request page S-->>S: Run Vue code (no window!) Note over S: window, document undefined S-->>B: Send HTML B-->>B: Hydrate app (has window) Note over B: Browser APIs available ``` ### Solution: Create SSR Utilities ```typescript // src/composables/utils/ssr.ts /** * Check if code is running in browser */ export const isClient = typeof window !== 'undefined' /** * Check if code is running on server */ export const isServer = !isClient /** * Safe window reference (undefined on server) */ export const defaultWindow = isClient ? window : undefined /** * Safe document reference (undefined on server) */ export const defaultDocument = isClient ? document : undefined /** * Safe localStorage reference (undefined on server) */ export const defaultStorage = isClient ? localStorage : undefined ``` ### Using SSR Utilities ```typescript export interface UseWindowSizeOptions extends ConfigurableWindow { initialWidth?: number initialHeight?: number } export function useWindowSize(options: UseWindowSizeOptions = {}) { const { window = defaultWindow, initialWidth = Infinity, initialHeight = Infinity, } = options const width = shallowRef(initialWidth) const height = shallowRef(initialHeight) const update = () => { // Guard: Only run if window exists if (window) { width.value = window.innerWidth height.value = window.innerHeight } } // Only set up listeners on client if (window) { update() window.addEventListener('resize', update) onUnmounted(() => { window.removeEventListener('resize', update) }) } return { width, height } } ``` ### Feature Detection Create a utility to safely check for browser features: ```typescript export function useSupported(check: () => boolean): Readonly> { const isSupported = shallowRef(false) onMounted(() => { isSupported.value = check() }) return readonly(isSupported) } // Usage export function useClipboard() { const isSupported = useSupported( () => navigator && 'clipboard' in navigator ) const copy = async (text: string) => { if (!isSupported.value) { console.warn('Clipboard API not supported') return false } // ... implementation } return { isSupported, copy } } ``` --- ## 9. Cleanup and Memory Management ### The Problem Event listeners, timers, and observers must be cleaned up to prevent memory leaks. ```mermaid flowchart LR A[Composable Created] --> B[Register Resources] B --> C[Listeners, Timers, etc.] C --> D{Component Unmounted?} D -->|Yes| E[Cleanup Required] D -->|No| C E --> F[Remove Listeners] E --> G[Clear Timers] E --> H[Disconnect Observers] style E fill:#ef4444,color:#fff style F fill:#22c55e,color:#fff style G fill:#22c55e,color:#fff style H fill:#22c55e,color:#fff ``` ### Solution: Auto-Cleanup Utility ```typescript // src/composables/utils/cleanup.ts /** * Register a cleanup function that runs when the scope is disposed. * Safe to call outside of component context. * * @returns true if cleanup was registered, false otherwise */ export function tryOnCleanup(fn: () => void): boolean { if (getCurrentScope()) { onScopeDispose(fn) return true } return false } /** * Safe onMounted that doesn't error outside component context */ export function tryOnMounted(fn: () => void): void { if (getCurrentScope()) { onMounted(fn) } } ``` ### Using Auto-Cleanup ```typescript export function useInterval(callback: () => void, ms: number) { let timer: ReturnType | null = null const start = () => { stop() timer = setInterval(callback, ms) } const stop = () => { if (timer) { clearInterval(timer) timer = null } } start() // Automatically stops when component unmounts tryOnCleanup(stop) return { start, stop } } ``` ### Event Listener Composable with Auto-Cleanup ```typescript export function useEventListener( event: K, handler: (event: WindowEventMap[K]) => void, options?: AddEventListenerOptions & ConfigurableWindow ): () => void { const { window = defaultWindow, ...listenerOptions } = options ?? {} let cleanup = () => {} if (window) { window.addEventListener(event, handler, listenerOptions) cleanup = () => window.removeEventListener(event, handler, listenerOptions) } tryOnCleanup(cleanup) return cleanup } ``` --- ## 10. Controllable Composables ### Pausable Pattern ```mermaid stateDiagram-v2 [*] --> Active: immediate=true [*] --> Paused: immediate=false Active --> Paused: pause() Paused --> Active: resume() Active --> [*]: cleanup Paused --> [*]: cleanup ``` For composables that can be paused and resumed: ```typescript export interface Pausable { /** Whether the composable is currently active */ isActive: Readonly> /** Pause the composable */ pause: () => void /** Resume the composable */ resume: () => void } export interface UseIntervalOptions { /** Start immediately */ immediate?: boolean /** Call callback immediately when starting */ immediateCallback?: boolean } export function useIntervalFn( callback: () => void, interval: MaybeRefOrGetter = 1000, options: UseIntervalOptions = {} ): Pausable { const { immediate = true, immediateCallback = false } = options const isActive = shallowRef(false) let timer: ReturnType | null = null function clean() { if (timer) { clearInterval(timer) timer = null } } function pause() { isActive.value = false clean() } function resume() { const ms = toValue(interval) if (ms <= 0) return isActive.value = true if (immediateCallback) callback() clean() timer = setInterval(callback, ms) } if (immediate) resume() tryOnCleanup(pause) return { isActive: readonly(isActive), pause, resume, } } // Usage const { isActive, pause, resume } = useIntervalFn(() => { console.log('tick') }, 1000) pause() // Stop ticking resume() // Start again ``` ### Stoppable Pattern For one-way stopping (e.g., timeouts, one-time operations): ```typescript export interface Stoppable { /** Whether the operation is pending */ isPending: Readonly> /** Stop the operation */ stop: () => void } export function useTimeoutFn( callback: () => void, interval: MaybeRefOrGetter, options: { immediate?: boolean } = {} ): Stoppable & { start: () => void } { const { immediate = true } = options const isPending = shallowRef(false) let timer: ReturnType | null = null function stop() { isPending.value = false if (timer) { clearTimeout(timer) timer = null } } function start() { stop() isPending.value = true timer = setTimeout(() => { isPending.value = false timer = null callback() }, toValue(interval)) } if (immediate) start() tryOnCleanup(stop) return { isPending: readonly(isPending), stop, start, } } ``` --- ## 11. Error Handling ### Graceful Degradation ```typescript export function useGeolocation() { const isSupported = useSupported( () => navigator && 'geolocation' in navigator ) const coords = shallowRef(null) const error = shallowRef(null) function update() { if (!isSupported.value) return navigator.geolocation.getCurrentPosition( (position) => { coords.value = position.coords error.value = null }, (err) => { error.value = err } ) } if (isSupported.value) { update() } return { isSupported, coords, error, update, } } ``` ### Error Callbacks ```typescript export interface UseAsyncStateOptions { /** Called on success */ onSuccess?: (data: T) => void /** Called on error */ onError?: (error: unknown) => void /** * Whether to throw errors * @default false */ throwError?: boolean } export function useAsyncState( promise: () => Promise, initialState: T, options: UseAsyncStateOptions = {} ) { const { onSuccess, onError, throwError = false } = options const state = shallowRef(initialState) const error = shallowRef(null) const isLoading = shallowRef(false) async function execute() { isLoading.value = true error.value = null try { const data = await promise() state.value = data onSuccess?.(data) } catch (e) { error.value = e onError?.(e) if (throwError) throw e } finally { isLoading.value = false } } execute() return { state, error, isLoading, execute } } ``` --- ## 13. TypeScript Best Practices ### Generic Type Inference Let TypeScript infer types when possible: ```typescript // Type T is inferred from defaultValue export function useStorage(key: string, defaultValue: T): Ref { // ... } // Usage - types are inferred const name = useStorage('name', 'John') // Ref const count = useStorage('count', 0) // Ref const user = useStorage('user', { id: 1 }) // Ref<{ id: number }> ``` ### Function Overloads Use overloads for different call signatures: ```typescript // Overload 1: Window events export function useEventListener( event: K, handler: (e: WindowEventMap[K]) => void ): () => void // Overload 2: Element events export function useEventListener( target: MaybeRefOrGetter, event: K, handler: (e: HTMLElementEventMap[K]) => void ): () => void // Implementation export function useEventListener(...args: any[]): () => void { // ... implementation handles all cases } ``` ### Conditional Return Types ```typescript // If passed a ref, return just the toggle function export function useToggle( value: Ref ): (value?: boolean) => boolean // If passed a plain value, return tuple export function useToggle( initialValue?: boolean ): [Ref, (value?: boolean) => boolean] // Implementation export function useToggle( initialValue: MaybeRef = false ) { const valueIsRef = isRef(initialValue) const state = shallowRef(toValue(initialValue)) const toggle = (value?: boolean) => { state.value = value ?? !state.value return state.value } if (valueIsRef) { return toggle } return [state, toggle] as const } ``` --- ## 14. Testing For comprehensive testing strategies including basic test structure, testing with timers, and testing async composables, see How to Test Vue Composables. --- ## 15. Documentation ### JSDoc Comments ```typescript /** * Reactive mouse position * * @param options - Configuration options * @returns Reactive mouse coordinates and source type * * @example * ```ts * const { x, y } = useMouse() * * watchEffect(() => { * console.log(`Mouse at ${x.value}, ${y.value}`) * }) * ``` * * @see https://your-docs.com/composables/use-mouse */ export function useMouse(options?: UseMouseOptions): UseMouseReturn { // ... } ``` ### Document Every Option Every option should have a JSDoc comment with a `@default` tag: ```typescript export interface UseStorageOptions { /** * Storage type to use * @default 'local' */ storage?: 'local' | 'session' /** * Whether to sync across browser tabs * @default true */ listenToStorageChanges?: boolean } ``` --- ## 16. Templates ### Basic Composable Template ```typescript export interface UseXxxOptions { /** * Option description * @default 'defaultValue' */ someOption?: string } export interface UseXxxReturn { /** Description of value */ value: Readonly> /** Description of action */ doSomething: () => void } /** * Short description of what this composable does * * @param param - Description of parameter * @param options - Configuration options */ export function useXxx( param: MaybeRefOrGetter, options: UseXxxOptions = {} ): UseXxxReturn { const { someOption = 'defaultValue' } = options const value = shallowRef(0) function doSomething() { const paramValue = toValue(param) value.value++ } // Cleanup if needed tryOnCleanup(() => { // cleanup logic }) return { value: readonly(value), doSomething, } } ``` ### Async Composable Template ```typescript export interface UseAsyncXxxOptions { /** * Execute immediately * @default true */ immediate?: boolean /** * Called on success */ onSuccess?: (data: T) => void /** * Called on error */ onError?: (error: Error) => void } export interface UseAsyncXxxReturn { data: Readonly> error: Readonly> isLoading: Readonly> execute: () => Promise onSuccess: EventHook['on'] onError: EventHook['on'] } export function useAsyncXxx( fetcher: () => Promise, options: UseAsyncXxxOptions = {} ): UseAsyncXxxReturn { const { immediate = true, onSuccess, onError } = options const data = shallowRef(null) const error = shallowRef(null) const isLoading = shallowRef(false) const successHook = createEventHook() const errorHook = createEventHook() async function execute() { isLoading.value = true error.value = null try { const result = await fetcher() data.value = result onSuccess?.(result) successHook.trigger(result) } catch (e) { const err = e as Error error.value = err onError?.(err) errorHook.trigger(err) } finally { isLoading.value = false } } if (immediate) { execute() } return { data: readonly(data), error: readonly(error), isLoading: readonly(isLoading), execute, onSuccess: successHook.on, onError: errorHook.on, } } ``` ### Pausable Composable Template ```typescript export interface Pausable { isActive: Readonly> pause: () => void resume: () => void } export interface UsePausableXxxOptions { /** * Start immediately * @default true */ immediate?: boolean } export function usePausableXxx( callback: () => void, interval: MaybeRefOrGetter = 1000, options: UsePausableXxxOptions = {} ): Pausable { const { immediate = true } = options const isActive = shallowRef(false) let timer: ReturnType | null = null function clean() { if (timer) { clearInterval(timer) timer = null } } function pause() { isActive.value = false clean() } function resume() { const ms = toValue(interval) if (ms <= 0) return isActive.value = true clean() timer = setInterval(callback, ms) } if (immediate) { resume() } tryOnCleanup(pause) return { isActive: readonly(isActive), pause, resume, } } ``` --- ## Quick Reference Checklist Use this checklist when creating new composables: **Structure** - [ ] Named export (no default) - [ ] Explicit return type interface - [ ] JSDoc with `@param`, `@returns`, `@example` **Reactivity** - [ ] `shallowRef` for primitives - [ ] `ref` only when deep mutations needed - [ ] `MaybeRefOrGetter` for flexible inputs - [ ] `toValue()` to unwrap inputs - [ ] `readonly()` for exposed refs **Safety** - [ ] Guard browser APIs (`if (window)`) - [ ] Auto-cleanup with `tryOnCleanup` - [ ] Feature detection for optional APIs **TypeScript** - [ ] Generic type inference where possible - [ ] Overloads for multiple signatures - [ ] Strict types, no `any` **Testing** - [ ] Unit tests for all functionality - [ ] Edge cases (null, undefined, empty) - [ ] Cleanup verification --- ## Additional Resources - [Vue Composition API Docs](https://vuejs.org/guide/extras/composition-api-faq.html) - [Vue Reactivity in Depth](https://vuejs.org/guide/extras/reactivity-in-depth.html) - [VueUse](https://vueuse.org) - Collection of Vue composables (the source of these patterns) - [Vitest](https://vitest.dev) - Testing framework for Vue --- These patterns represent the accumulated wisdom from VueUse's codebase. Apply them consistently to build maintainable, type-safe, and production-ready Vue composables. --- --- title: Forcing Claude Code to TDD: An Agentic Red-Green-Refactor Loop description: Build a custom TDD workflow with Claude Code using skills and subagents that enforce Red-Green-Refactor discipline for your Vue projects. tags: ['ai', 'vue', 'testing', 'claude-code'] url: https://alexop.dev/posts/custom-tdd-workflow-claude-code-vue --- # Forcing Claude Code to TDD: An Agentic Red-Green-Refactor Loop I rely on Claude Code, but it has a structural limitation: it defaults to implementation-first. It writes the "Happy Path," ignoring edge cases. When I try to force TDD in a single context window, the implementation "bleeds" into the test logic (context pollution). This article documents a multi-agent system using Claude's "Skills" and "Hooks" that enforces a strict Red-Green-Refactor cycle. While this article uses Vue as an example, the TDD principles and Claude Code workflow apply to any technology. Whether you're working with React, Angular, Svelte, or even backend languages like Python, Go, or Rust—the Red-Green-Refactor cycle and subagent orchestration work the same way. ## The Problem with AI-Assisted TDD When I ask Claude to "implement feature X," it writes the implementation first. Every time. TDD flips this—you write the test first, watch it fail, then write minimal code to make it pass. I needed a way to: * **Force test-first** — No implementation before a failing test exists * **Keep phases focused** — The test writer shouldn't think about implementation details * **Ensure refactoring happens** — Easy to skip when the feature already works ## Skills + Subagents Claude Code supports two features I hadn't explored until recently: * **Skills** (`.claude/skills/`): High-level workflows that orchestrate complex tasks * **Agents** (`.claude/agents/`): Specialized workers that handle specific jobs You might wonder: why use subagents at all? Skills alone could handle the TDD workflow. But there's a catch—context pollution. ### The Context Pollution Problem When everything runs in one context window, **the LLM cannot truly follow TDD**. The test writer's detailed analysis bleeds into the implementer's thinking. The implementer's code exploration pollutes the refactorer's evaluation. Each phase drags along baggage from the others. This isn't just messy—it fundamentally breaks TDD. The whole point of writing the test first is that **you don't know the implementation yet**. But if the same context sees both phases, the LLM subconsciously designs tests around the implementation it's already planning. It "cheats" without meaning to. **Subagents solve this architectural limitation.** Each phase runs in complete isolation: - The **test writer** focuses purely on test design—it has no idea how the feature will be implemented - The **implementer** sees only the failing test—it can't be influenced by test-writing decisions - The **refactorer** evaluates clean implementation code—it starts fresh without implementation baggage Each agent starts with exactly the context it needs and nothing more. This isn't just organization—it's the only way to get genuine test-first development from an LLM. Combining skills with subagents gave me exactly what I needed: ## The TDD Skill The orchestrating skill lives at `.claude/skills/tdd-integration/skill.md`: ```markdown --- name: tdd-integration description: Enforce Test-Driven Development with strict Red-Green-Refactor cycle using integration tests. Auto-triggers when implementing new features or functionality. Trigger phrases include "implement", "add feature", "build", "create functionality", or any request to add new behavior. Does NOT trigger for bug fixes, documentation, or configuration changes. --- # TDD Integration Testing Enforce strict Test-Driven Development using the Red-Green-Refactor cycle with dedicated subagents. ## Mandatory Workflow Every new feature MUST follow this strict 3-phase cycle. Do NOT skip phases. ### Phase 1: RED - Write Failing Test 🔴 RED PHASE: Delegating to tdd-test-writer... Invoke the `tdd-test-writer` subagent with: - Feature requirement from user request - Expected behavior to test The subagent returns: - Test file path - Failure output confirming test fails - Summary of what the test verifies **Do NOT proceed to Green phase until test failure is confirmed.** ### Phase 2: GREEN - Make It Pass 🟢 GREEN PHASE: Delegating to tdd-implementer... Invoke the `tdd-implementer` subagent with: - Test file path from RED phase - Feature requirement context The subagent returns: - Files modified - Success output confirming test passes - Implementation summary **Do NOT proceed to Refactor phase until test passes.** ### Phase 3: REFACTOR - Improve 🔵 REFACTOR PHASE: Delegating to tdd-refactorer... Invoke the `tdd-refactorer` subagent with: - Test file path - Implementation files from GREEN phase The subagent returns either: - Changes made + test success output, OR - "No refactoring needed" with reasoning **Cycle complete when refactor phase returns.** ## Multiple Features Complete the full cycle for EACH feature before starting the next: Feature 1: 🔴 → 🟢 → 🔵 ✓ Feature 2: 🔴 → 🟢 → 🔵 ✓ Feature 3: 🔴 → 🟢 → 🔵 ✓ ## Phase Violations Never: - Write implementation before the test - Proceed to Green without seeing Red fail - Skip Refactor evaluation - Start a new feature before completing the current cycle ``` The `description` field contains trigger phrases so Claude activates this skill automatically when I ask to implement something. Each phase has explicit "Do NOT proceed until..." gates—Claude needs clear boundaries. The 🔴🟢🔵 emojis make tracking progress easy in the output. ## The Test Writer Agent (RED Phase) At `.claude/agents/tdd-test-writer.md`: ```markdown --- name: tdd-test-writer description: Write failing integration tests for TDD RED phase. Use when implementing new features with TDD. Returns only after verifying test FAILS. tools: Read, Glob, Grep, Write, Edit, Bash skills: vue-integration-testing --- # TDD Test Writer (RED Phase) Write a failing integration test that verifies the requested feature behavior. ## Process 1. Understand the feature requirement from the prompt 2. Write an integration test in `src/__tests__/integration/` 3. Run `pnpm test:unit ` to verify it fails 4. Return the test file path and failure output ## Test Structure typescript describe('Feature Name', () => { afterEach(async () => { resetWorkout() await resetDatabase() document.body.innerHTML = '' }) it('describes the user journey', async () => { const app = await createTestApp() // Act: User interactions await app.user.click(app.getByRole('button', { name: /action/i })) // Assert: Verify outcomes expect(app.router.currentRoute.value.path).toBe('/expected') app.cleanup() }) }) ## Requirements - Test must describe user behavior, not implementation details - Use `createTestApp()` for full app integration - Use Testing Library queries (`getByRole`, `getByText`) - Test MUST fail when run - verify before returning ## Return Format Return: - Test file path - Failure output showing the test fails - Brief summary of what the test verifies ``` I limited the tools to only what's needed for writing and running tests. The `skills` field pulls in my `vue-integration-testing` skill for project-specific context. And the explicit return format ensures clean handoffs between phases. This skill defines how I want tests written: using jsdom with Vue Test Utils, writing BDD-style tests that describe user behavior, and avoiding mocks wherever possible. I don't see much value in unit tests that mock everything—they often just verify implementation details rather than actual functionality. Integration tests that exercise real code paths catch more bugs. ## The Implementer Agent (GREEN Phase) At `.claude/agents/tdd-implementer.md`: ```markdown --- name: tdd-implementer description: Implement minimal code to pass failing tests for TDD GREEN phase. Write only what the test requires. Returns only after verifying test PASSES. tools: Read, Glob, Grep, Write, Edit, Bash --- # TDD Implementer (GREEN Phase) Implement the minimal code needed to make the failing test pass. ## Process 1. Read the failing test to understand what behavior it expects 2. Identify the files that need changes 3. Write the minimal implementation to pass the test 4. Run `pnpm test:unit ` to verify it passes 5. Return implementation summary and success output ## Principles - **Minimal**: Write only what the test requires - **No extras**: No additional features, no "nice to haves" - **Test-driven**: If the test passes, the implementation is complete - **Fix implementation, not tests**: If the test fails, fix your code ## Return Format Return: - Files modified with brief description of changes - Test success output - Summary of the implementation ``` ## The Refactorer Agent (REFACTOR Phase) At `.claude/agents/tdd-refactorer.md`: ```markdown --- name: tdd-refactorer description: Evaluate and refactor code after TDD GREEN phase. Improve code quality while keeping tests passing. Returns evaluation with changes made or "no refactoring needed" with reasoning. tools: Read, Glob, Grep, Write, Edit, Bash skills: vue-composables --- # TDD Refactorer (REFACTOR Phase) Evaluate the implementation for refactoring opportunities and apply improvements while keeping tests green. ## Process 1. Read the implementation and test files 2. Evaluate against refactoring checklist 3. Apply improvements if beneficial 4. Run `pnpm test:unit ` to verify tests still pass 5. Return summary of changes or "no refactoring needed" ## Refactoring Checklist Evaluate these opportunities: - **Extract composable**: Reusable logic that could benefit other components - **Simplify conditionals**: Complex if/else chains that could be clearer - **Improve naming**: Variables or functions with unclear names - **Remove duplication**: Repeated code patterns - **Thin components**: Business logic that should move to composables ## Decision Criteria Refactor when: - Code has clear duplication - Logic is reusable elsewhere - Naming obscures intent - Component contains business logic Skip refactoring when: - Code is already clean and simple - Changes would be over-engineering - Implementation is minimal and focused ## Return Format If changes made: - Files modified with brief description - Test success output confirming tests pass - Summary of improvements If no changes: - "No refactoring needed" - Brief reasoning (e.g., "Implementation is minimal and focused") ``` This agent has a **decision framework** for whether to refactor. Sometimes "no refactoring needed" is the right answer. The `skills` field references my `vue-composables` skill so it knows my project's patterns for extracting reusable logic. ## Real Example: Adding Workout Detail View Here's what this looks like in practice. My request: > "When a user is on the Workouts page, they should be able to click on a past workout and see a detail view of what exercises and sets they completed." The workflow executes like this: ```mermaid sequenceDiagram participant U as User participant S as TDD Skill participant TW as Test Writer participant I as Implementer participant R as Refactorer U->>S: "Add workout detail view" S->>TW: Feature requirement TW->>TW: Write test TW->>TW: Run test TW-->>S: ❌ Test fails S->>I: Test file path I->>I: Write minimal code I->>I: Run test I-->>S: ✅ Test passes S->>R: Implementation files R->>R: Evaluate code R->>R: Extract composable R->>R: Run test R-->>S: ✅ Improvements applied S-->>U: 🔴→🟢→🔵 Complete ``` ### 🔴 RED Phase The `tdd-test-writer` produced: ```typescript // src/__tests__/integration/workout-detail.spec.ts describe('Workout History Detail View', () => { afterEach(async () => { // cleanup }) it('navigates to detail view when clicking a completed workout', async () => { // Arrange: Create a completed workout const completedWorkout = { id: generateId(), name: 'Push Day', exercises: [{ name: 'Bench Press', sets: [{ kg: '100', reps: '10' }] }], completedAt: Date.now(), } await db.workouts.add(completedWorkout) // Act: Navigate and click const app = await createTestApp() await app.user.click(app.getByRole('button', { name: /workouts/i })) await app.user.click(await app.findByText('Push Day')) // Assert expect(app.router.currentRoute.value.path) .toBe(`/workouts/${completedWorkout.id}`) expect(app.getByText('Bench Press')).toBeDefined() expect(app.getByText('100')).toBeDefined() app.cleanup() }) }) ``` The test failed: ``` AssertionError: expected '/workouts' to be '/workouts/d747077d-...' ``` Exactly what I wanted—a clear failure pointing to missing functionality. ### 🟢 GREEN Phase The `tdd-implementer` created: * `WorkoutDetailView.vue` — New detail view component * Modified `TheWorkoutsView.vue` — Added click handlers * Modified `router/index.ts` — Added `/workouts/:id` route Test passed. Minimal implementation, just enough to satisfy the assertions. ### 🔵 REFACTOR Phase The `tdd-refactorer` evaluated the code and made improvements: * **Extracted `useWorkoutDetail` composable** — Reusable data fetching with discriminated union states * **Created shared formatters** — Pulled `formatDuration` and `formatDate` into `lib/formatters.ts` * **Added accessibility** — Keyboard navigation for clickable cards All tests still passed. The cycle completed. ## The Test Helper A crucial piece making all this work is my `createTestApp()` helper: ```typescript // src/__tests__/helpers/createTestApp.ts export async function createTestApp(): Promise { const pinia = createPinia() const router = createRouter({ history: createMemoryHistory(), routes, }) render(App, { global: { plugins: [router, pinia] }, }) await router.isReady() return { router, user: userEvent.setup(), getByRole: screen.getByRole, getByText: screen.getByText, findByText: screen.findByText, waitForRoute: (pattern) => waitFor(() => { if (!pattern.test(router.currentRoute.value.path)) { throw new Error('Route mismatch') } }), cleanup: () => { document.body.innerHTML = '' }, } } ``` This gives agents a consistent API for rendering the full app and simulating user interactions. They don't need to figure out how to set up Vue, Pinia, and Vue Router each time—they just call `createTestApp()` and start writing assertions. ## Hooks for Consistent Skill Activation Even with well-written skills, Claude sometimes skipped evaluation and jumped straight to implementation. I tracked this informally—skill activation happened maybe 20% of the time. I found a great solution in [Scott Spence's post on making skills activate reliably](https://scottspence.com/posts/how-to-make-claude-code-skills-activate-reliably). He tested 200+ prompts across different hook configurations and found that a "forced eval" approach—making Claude explicitly evaluate each skill before proceeding—jumped activation from ~20% to ~84%. The fix: **hooks**. Claude Code runs hooks at specific lifecycle points, and I used `UserPromptSubmit` to inject a reminder before every response. In `.claude/settings.json`: ```json { "hooks": { "UserPromptSubmit": [ { "matcher": "", "hooks": [ { "type": "command", "command": "npx tsx \"$CLAUDE_PROJECT_DIR/.claude/hooks/user-prompt-skill-eval.ts\"", "timeout": 5 } ] } ] } } ``` The hook script at `.claude/hooks/user-prompt-skill-eval.ts`: ```typescript #!/usr/bin/env npx tsx function main(): void { readFileSync(0, 'utf-8') // consume stdin const instruction = ` INSTRUCTION: MANDATORY SKILL ACTIVATION SEQUENCE Step 1 - EVALUATE: For each skill in , state: [skill-name] - YES/NO - [reason] Step 2 - ACTIVATE: IF any skills are YES → Use Skill(skill-name) tool for EACH relevant skill NOW IF no skills are YES → State "No skills needed" and proceed Step 3 - IMPLEMENT: Only after Step 2 is complete, proceed with implementation. CRITICAL: You MUST call Skill() tool in Step 2. Do NOT skip to implementation. ` stdout.write(instruction.trim()) } main() ``` With this hook, skill activation jumped from ~20% to ~84%. Now when I say "implement the workout detail view," the TDD skill triggers automatically. ## Conclusion Claude Code's default behavior produces implementation-first code with minimal test coverage. Without constraints, it optimizes for "working code" rather than "tested code." The system described here addresses this through architectural separation: * **Hooks** inject evaluation logic before every prompt, increasing skill activation from ~20% to ~84% * **Skills** define explicit phase gates that block progression until each TDD step completes * **Subagents** enforce context isolation—the test writer cannot see implementation plans, so tests reflect actual requirements rather than anticipated code structure The setup cost is ~2 hours of configuration. After that, each feature request automatically follows the Red-Green-Refactor cycle without manual enforcement. --- --- title: Get Notified When Claude Code Finishes With Hooks description: Set up desktop notifications in Claude Code to know when Claude needs your input or permission. Learn how to use hooks for instant alerts instead of constantly checking. tags: ['claude-code'] url: https://alexop.dev/posts/claude-code-notification-hooks --- # Get Notified When Claude Code Finishes With Hooks You're deep in your work. Claude Code is running, doing its thing. You check back five minutes later. Still waiting. Ten minutes later? Still waiting. Wouldn't it be nice to know *when* Claude actually needs you? This is where hooks come in. Claude Code runs hooks at specific points in its workflow. You can tap into those hooks to send yourself desktop notifications—so you never miss an important moment. But here's the thing—if you've never used hooks before, they might sound abstract. Let me break it down. ## What Are Hooks? Hooks are commands that run at specific points in Claude Code's lifecycle. They let you respond to events without constantly watching the CLI. Instead of polling, you get notified. Claude Code provides two notification hooks: - **`permission_prompt`** - Claude needs your permission to do something - **`idle_prompt`** - Claude is waiting for your input Think of them like webhooks, but for your local machine. Claude Code fires an event, you can respond. ## Setting Up Desktop Notifications Now let's get this working. It's straightforward—just two pieces: a configuration file and a notification script. Start by creating a `.claude/hooks` directory in your project. Then add the hook configuration to `.claude/settings.json`: ```json { "hooks": { "Notification": [ { "matcher": "permission_prompt|idle_prompt", "hooks": [ { "type": "command", "command": "npx tsx \"$CLAUDE_PROJECT_DIR/.claude/hooks/notification-desktop.ts\"", "timeout": 5 } ] } ] } } ``` This tells Claude Code: "When you hit a `permission_prompt` or `idle_prompt`, run this command." The `timeout: 5` means the hook has 5 seconds to complete before Claude moves on. You can place this in two locations: - `.claude/settings.json` - Project-specific (checked into git, shared with team) - `~/.claude/settings.json` - Global user settings (personal machine only) Use project-specific settings for team hooks, global settings for personal notifications. The `$CLAUDE_PROJECT_DIR` is an environment variable Claude Code provides—it expands to your project root automatically. Here's what your project structure should look like: ## The Notification Script Create `.claude/hooks/notification-desktop.ts`. This script handles sending the actual notifications: ```typescript #!/usr/bin/env npx tsx /* eslint-disable node/prefer-global/process */ /** * Claude Code Notification Hook - Desktop Alerts * * Sends system notifications when Claude needs attention: * - Permission prompts * - Idle prompts (waiting for input) */ function readStdin(): string { return readFileSync(0, 'utf-8') } function sendMacNotification(title: string, message: string): void { // Escape special characters for AppleScript const escapedTitle = title.replace(/"/g, '\\"') const escapedMessage = message.replace(/"/g, '\\"') const script = `display notification "${escapedMessage}" with title "${escapedTitle}" sound name "Ping"` try { execSync(`osascript -e '${script}'`, { stdio: 'ignore' }) } catch { // Notification failed, ignore silently } } function main(): void { const rawInput = readStdin() let parsedInput: unknown try { parsedInput = JSON.parse(rawInput) } catch { process.exit(0) } const input = parsedInput as NotificationHookInput const notificationType = (input as { notification_type?: string }).notification_type const message = input.message switch (notificationType) { case 'permission_prompt': sendMacNotification('Claude Code - Permission Required', message || 'Claude needs your permission to continue') break case 'idle_prompt': sendMacNotification('Claude Code - Waiting', message || 'Claude is waiting for your input') break default: // Don't notify for other types break } process.exit(0) } main() ``` ## When This Really Shines You've set up the basics. Now here's where it becomes powerful. This notification system is especially useful when you're doing deep focus work and Claude Code runs a long operation. You don't have to check your terminal every few seconds. Permission prompts that need immediate action? They hit you with a different sound. Idle waits while you've stepped away? A gentle reminder pulls you back. The key is this: the notification comes exactly when you need to be engaged. No sooner, no later. Want to go deeper? You can even build custom plugins that use hooks across different projects for more powerful automation. ## Conclusion Hooks transform Claude Code from a tool you watch into a tool that watches for you. The setup takes maybe five minutes. Copy the configuration, create the script, adjust the sounds to your preference. After that, you're done. No more context switching. No more glancing at the terminal every few seconds wondering if Claude needs you. The notification arrives exactly when it matters. That's the real power here. It's not about automating notifications. It's about reclaiming your focus—letting Claude Code work while you work, and pulling your attention back only when it's needed. Set it up once, and you've unlocked a better way to collaborate with AI. --- --- title: How to Speed Up Your Claude Code Experience with Slash Commands description: Learn how to transform Claude Code from a chatbot into a deterministic engine using Slash Commands. This guide covers the technical setup and a complete 'Full Circle' workflow that automates your entire feature lifecycle. tags: ['ai', 'claude-code'] url: https://alexop.dev/posts/claude-code-slash-commands-guide --- # How to Speed Up Your Claude Code Experience with Slash Commands I was wasting time. Every commit message, every branch name, every PR description. I typed the same things over and over. Then I discovered Slash Commands in Claude Code. Now I type `/commit` and it writes the message for me. `/branch "add dark mode"` and it creates `feat/add-dark-mode`. `/pr` and it generates a full PR description from my commits. This post shows you how to build the same workflow. I'll cover how Slash Commands work, then we'll build a complete system that automates your entire git lifecycle. You need Git and the GitHub CLI (`gh`). Install `gh` with `brew install gh` on macOS or check [cli.github.com](https://cli.github.com). Run `gh auth login` to authenticate. Without `gh`, commands like `/pr` and `/fix-pipeline` will not work. ## Two things you need to know Before we build the workflow, you need to understand two features. ### Bash command execution Write `!git status` inside a command file. Claude runs the command first, captures the output, and injects it into the prompt. The AI sees the result before it starts thinking. This is how `/commit` knows what you changed. It runs `!git diff` automatically. See the [official documentation](https://docs.anthropic.com/en/docs/claude-code/slash-commands#bash-command-execution) for more details. ### Model selection You don't need a powerful model to fix a missing semicolon. Claude Code lets you pick the model in the frontmatter: - `sonnet` — for complex reasoning (default) - `haiku` — fast and cheap Add `model: haiku` and commands run almost instantly. ## Command structure Slash commands are Markdown files stored in `.claude/commands/` (project-level) or `~/.claude/commands/` (personal). The filename becomes the command name: `commit.md` becomes `/commit`. Here is a complete example: ```markdown --- description: Create a git commit with a conventional message allowed-tools: Bash(git add:*), Bash(git commit:*) argument-hint: [message] model: haiku --- # Commit Changes !`git diff --cached` Create a commit message following Conventional Commits. If $ARGUMENTS is provided, use it as the commit message. ``` ### Frontmatter options | Option | Purpose | Default | |--------|---------|---------| | `description` | Brief description shown in `/help` | First line of prompt | | `allowed-tools` | Tools the command can use | Inherits from conversation | | `model` | Model to use (`sonnet`, `haiku`, or full model ID) | Inherits from conversation | | `argument-hint` | Shows expected arguments in autocomplete | None | ### Arguments Use `$ARGUMENTS` to capture everything passed to the command: ```markdown Create a branch named: $ARGUMENTS ``` For multiple arguments, use positional parameters `$1`, `$2`, etc: ```markdown --- argument-hint: [pr-number] [priority] --- Review PR #$1 with priority $2. ``` ### File references Include file contents with the `@` prefix: ```markdown Review the implementation in @src/utils/helpers.js ``` ## The workflow I replaced my manual git rituals with custom commands. They live in `.claude/commands/`. Here is how I drive a feature from start to merge. ```mermaid --- title: Development Workflow --- flowchart LR %% Initial Setup Start((Start)) --> Branch["/branch"] Branch --> Code[Write Code] %% Local Iteration Loop Code --> Lint["/lint
(Haiku)"] Lint -- "Auto-fix" --> Lint Lint --> Test["/vitest
(Haiku)"] Test -- "Fix Failure" --> Test %% Deployment Flow Test --> Push["/push"] Push --> PR["/pr"] PR --> CI{CI Pass?} %% CI Debugging Loop CI -- "No" --> Fix["/fix-pipeline
(Sonnet)"] Fix -- "Fix & Push" --> CI %% Final Review & Merge CI -- "Yes" --> Review["/review-coderabbit"] Review --> Merge["/merge-to-main"] Merge --> Done((Done)) ``` ### /branch — start a task I type `/branch "implement dark mode toggle"` and Claude checks out main, pulls latest, and creates `feat/dark-mode-toggle`. No more thinking about naming conventions. ### /lint — fix before commit I type `/lint`. It runs the linter with auto-fix, and if errors remain, Claude fixes them. Uses Haiku for speed—runs in about 20 seconds. ### /vitest — run unit tests I type `/vitest`. It runs the test suite and fixes any failures. The prompt tells Claude to fix the code, not the test—implementation should match expected behavior. ### /commit — save your work I type `/commit`. Claude analyzes the diff, generates a Conventional Commit message, and commits. It looks at recent commits to match your project's style. ### /push — commit and push in one step I type `/push`. It stages everything, generates a commit message, commits, and pushes. My most-used command—one word and the code is on GitHub. ### /fix-pipeline — fix failing CI tests I type `/fix-pipeline`. It fetches the failed logs via `gh`, analyzes the error, and fixes it. Uses Sonnet because debugging requires reasoning. The prompt includes guardrails—Claude must read the actual error before proposing fixes. ### /pr — create a pull request I type `/pr`. It analyzes all commits on the branch, generates a PR title and description, and opens it via `gh pr create`. Checks if a PR already exists first. ### /review-coderabbit — address review comments I type `/review-coderabbit`. It fetches CodeRabbit's comments via GraphQL, verifies each suggestion against the codebase, implements valid fixes or pushes back with reasoning, and resolves every thread. AI reviewers aren't always right—the prompt ensures Claude verifies before acting. ### /merge-to-main — finish the task I type `/merge-to-main`. It squash merges the PR, deletes the branch, and pulls main. Done. ## Summary By moving your process into `.claude/commands/`, you are building a system. - Bash command execution injects real-time context - Model selection balances speed vs reasoning - The workflow automates branching, linting, committing, CI debugging, PRs, and merging Define the process once. Claude executes it every time. Want to extend Claude Code even further? Connect external tools via MCP (Model Context Protocol) or package your commands into a shareable plugin. I don't think about naming conventions, commit messages, or PR descriptions anymore. The commands handle it. You can skip the interactive prompt entirely with `claude -p`. Add aliases to your `.zshrc` or `.bashrc`: ```bash alias clint="claude -p '/lint'" alias cpush="claude -p '/push'" alias ccommit="claude -p '/commit'" alias cbranch="claude -p '/branch'" ``` Now `clint` runs the lint command without opening the interactive session. The `-p` flag passes the prompt directly—Claude executes and exits. Two steps become one keystroke. --- --- title: Understanding Claude Code's Full Stack: MCP, Skills, Subagents, and Hooks Explained description: A practical guide to Claude Code's features — explained in the order they were introduced: MCP (2024), Claude Code core (Feb 2025), Plugins (2025), and Agent Skills (Oct 2025). What each does, how they fit together, and when to use what. tags: ['claude-code', 'ai', 'mcp', 'productivity', 'tooling'] url: https://alexop.dev/posts/understanding-claude-code-full-stack --- # Understanding Claude Code's Full Stack: MCP, Skills, Subagents, and Hooks Explained I've been using Claude Code for months. Mostly for quick edits and generating boilerplate. The vibe coding tool everyone talks about. Then I actually explored what it could do. MCP servers. Slash commands. Plugins. Skills. Hooks. Subagents. CLAUDE.md files. I was blown away. Claude Code isn't just a coding assistant. It's a framework for orchestrating AI agents. It speeds up development in ways I've never seen before. Most people use one or two features. They miss how these features stack together. This guide explains each concept **in the order they build on each other** — from external connections to automatic behaviors. (New to using LLMs for development? Start with my overview of how I use LLMs for context.) > Claude Code is, with hindsight, poorly named. It's not purely a coding tool: it's a tool for general computer automation. Anything you can achieve by typing commands into a computer is something that can now be automated by Claude Code. It's best described as a general agent. Skills make this a whole lot more obvious and explicit. > > — Simon Willison, [Claude Skills are awesome, maybe a bigger deal than MCP](https://simonwillison.net/2025/Oct/16/claude-skills/) ## The feature stack 1. **Model Context Protocol (MCP)** — the foundation for connecting external tools and data sources 2. **Claude Code core features** — project memory, slash commands, subagents, and hooks 3. **Plugins** — shareable packages that bundle commands, hooks, and skills 4. **Agent Skills** — automatic, model-invoked capabilities that activate based on task context
--- ## 1) Model Context Protocol (MCP) — connecting external systems ```mermaid sequenceDiagram actor User participant Claude participant MCPServer User->>Claude: /mcp connect github Claude->>MCPServer: Authenticate + request capabilities MCPServer-->>Claude: Return tools/resources/prompts Claude-->>User: Display /mcp__github__* commands ``` **What it is.** The Model Context Protocol connects Claude Code to external tools and data sources. Think universal adapter for GitHub, databases, APIs, and other systems. **How it works.** Connect an MCP server, get access to its tools, resources, and prompts as slash commands: ```bash # Install a server claude mcp add playwright npx @playwright/mcp@latest # Use it /mcp__playwright__create-test [args] ``` Each MCP server consumes context. Monitor with `/context` and remove unused servers. **The gotcha.** MCP servers expose their own tools — they don't inherit Claude's Read, Write, or Bash unless explicitly provided. **Real-world example.** Want to see MCP in action? Check out how to build an AI QA engineer with Playwright MCP that tests your app like a real user. --- ## 2) Claude Code core features ### 2.1) Project memory with `CLAUDE.md` **What it is.** Markdown files Claude loads at startup. They give Claude memory about your project's conventions, architecture, and patterns. **How it works.** Files merge hierarchically from enterprise → user (`~/.claude/CLAUDE.md`) → project (`./CLAUDE.md`). When you reference `@components/Button.vue`, Claude also reads CLAUDE.md from that directory and its parents. **Example structure for a Vue app:** When you work on `src/components/Button.vue`, Claude loads context from: 1. Enterprise CLAUDE.md (if configured) 2. User `~/.claude/CLAUDE.md` (personal preferences) 3. Project root `CLAUDE.md` (project-wide info) 4. `src/components/CLAUDE.md` (component-specific patterns) **What goes in.** Common commands, coding standards, architectural patterns. Keep it concise — reference guide, not documentation. Need help creating your own? Check out this [CLAUDE.md creation guide](/prompts/claude/claude-create-md). Here's my blog's CLAUDE.md: ````markdown # CLAUDE.md ## Project Overview Alexander Opalic's personal blog built on AstroPaper - Astro-based blog theme with TypeScript, React, TailwindCSS. **Tech Stack**: Astro 5, TypeScript, React, TailwindCSS, Shiki, FuseJS, Playwright ## Development Commands ```bash npm run dev # Build + Pagefind + dev server (localhost:4321) npm run build # Production build npm run lint # ESLint for .astro, .ts, .tsx --- ``` ```` ### 2.2) Slash Commands — explicit, reusable prompts ```mermaid graph LR User[/ /my-command args /] PreBash[Pre-execution Bash Steps] Prompt[Markdown prompt] Claude[Claude processes] Output[Result] User --> PreBash --> Prompt --> Claude --> Output ``` **What they are.** Markdown files in `.claude/commands/` you trigger manually by typing `/name [args]`. User-controlled workflows. **Key features:** - `$ARGUMENTS` or `$1`, `$2` for argument passing - `@file` syntax to inline code - `allowed-tools: Bash(...)` for pre-execution scripts - XML-tagged prompts for reliable outputs **When to use.** Repeatable workflows you trigger on demand — code reviews, commit messages, scaffolding. For a complete example of a git workflow built entirely with slash commands, see my Slash Commands Guide. Want to create your own? Use this [slash command creation guide](/prompts/claude/claude-create-command). **Example structure:** ```markdown --- description: Create new slash commands argument-hint: [name] [purpose] allowed-tools: Bash(mkdir:*), Bash(tee:*) --- # /create-command Generate slash command files with proper structure. **Inputs:** `$1` = name, `$2` = purpose **Outputs:** `STATUS=WROTE PATH=.claude/commands/{name}.md` [... instructions ...] ``` Commands can create commands. Meta, but powerful. --- ### 2.3) Subagents — specialized AI personalities for delegation ```mermaid sequenceDiagram participant Main participant SubA participant SubB Main->>SubA: task: security analysis Main->>SubB: task: test generation par Parallel execution SubA-->>Main: results SubB-->>Main: results end ``` **What they are.** Pre-configured AI personalities with specific expertise areas. Each subagent has its own system prompt, allowed tools, and separate context window. When Claude encounters a task matching a subagent's expertise, it delegates automatically. **Why use them.** Keep your main conversation clean while offloading specialized work. Each subagent works independently in its own context window, preventing token bloat. Run multiple subagents in parallel for concurrent analysis. Subagents prevent "context poisoning" — when detailed implementation work clutters your main conversation. Use subagents for deep dives (security audits, test generation, refactoring) that would otherwise fill your primary context with noise. **Example structure:** ```markdown --- name: security-auditor description: Analyzes code for security vulnerabilities tools: Read, Grep, Bash # Controls what this personality can access model: sonnet # Optional: sonnet, opus, haiku, inherit --- You are a security-focused code auditor. Identify vulnerabilities (XSS, SQL injection, CSRF, etc.) Check dependencies and packages Verify auth/authorization Review data validation Provide severity levels: Critical, High, Medium, Low. Focus on OWASP Top 10. ``` The system prompt shapes the subagent's behavior. The `description` helps Claude know when to delegate. The `tools` restrict what the personality can access. **Best practices:** One expertise area per subagent. Grant minimal tool access. Use `haiku` for simple tasks, `sonnet` for complex analysis. Run independent work in parallel. Need a template? Check out this [subagent creation guide](/prompts/claude/claude-create-agent). --- ### 2.4) Hooks — automatic event-driven actions ```mermaid graph TD Event[Lifecycle Event] HookA[Hook 1] HookB[Hook 2] HookC[Hook 3] Event --> HookA Event --> HookB Event --> HookC ``` **What they are.** JSON-configured handlers in `.claude/settings.json` that trigger automatically on lifecycle events. No manual invocation. **Available events:** `PreToolUse`, `PostToolUse`, `UserPromptSubmit`, `Notification`, `Stop`, `SubagentStop`, `SessionStart` **Two modes:** - **Command:** Run shell commands (fast, predictable) - **Prompt:** Let Claude decide with the LLM (flexible, context-aware) **Example:** Auto-lint after file edits. ```json { "hooks": { "PostToolUse": [ { "matcher": "Edit|Write", "hooks": [ { "type": "command", "command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/run-oxlint.sh" } ] } ] } } ``` ```bash #!/usr/bin/env bash file_path="$(jq -r '.tool_input.file_path // ""')" if [[ "$file_path" =~ \.(js|jsx|ts|tsx|vue)$ ]]; then pnpm lint:fast fi ``` **Common uses:** Auto-format after edits, require approval for bash commands, validate writes, initialize sessions. For a practical example, see how to set up desktop notifications when Claude needs your attention. Want to create your own hooks? Use this [hook creation guide](/prompts/claude/claude-create-hook). --- ## 3) Plugins — shareable, packaged configurations ```mermaid classDiagram class Plugin { name version author } Plugin --> Commands Plugin --> Hooks Plugin --> Skills ``` **What they are.** Distributable bundles of commands, hooks, skills, and metadata. Share your setup with teammates or install pre-built configurations. **Basic structure:** **When to use.** Share team configurations, package domain workflows, distribute opinionated patterns, install community tooling. **How it works.** Install a plugin, get instant access. Components merge seamlessly — hooks combine, commands appear in autocomplete, skills activate automatically. Ready to build your own? Check out this [plugin creation guide](/prompts/claude/claude-create-plugin). --- ## 4) Agent Skills — automatic, task-driven capabilities ```mermaid flowchart TD Ctx["Task context"] --> Match["Match SKILL.md
description?"] Skills["Available skills
(personal / project / plugin)"] --> Match Match -->|yes| Tools["Check allowed-tools"] Tools -->|ok| Exec["Run skill"] Tools -->|blocked| Pass["Skip"] Match -->|no| Pass Exec --> Out["Return result"] Pass --> Out ``` **What they are.** Folders with `SKILL.md` descriptors plus optional scripts. Unlike slash commands, skills activate **automatically** when their description matches the task context. **How Claude discovers them.** When you give Claude a task, it reviews available skill descriptions to find relevant ones. If a skill's `description` field matches the task context, Claude loads the full skill instructions and applies them. This happens transparently — you never explicitly invoke skills. Check out the [official Anthropic skills repository](https://github.com/anthropics/skills) for ready-to-use examples. > Claude Skills are awesome, maybe a bigger deal than MCP > > — Simon Willison, [Claude Skills are awesome, maybe a bigger deal than MCP](https://simonwillison.net/2025/Oct/16/claude-skills/) Want rigorous, spec-driven development? Check out [obra's superpowers](https://github.com/obra/superpowers) — a comprehensive skills library that enforces systematic workflows. **What it provides:** TDD workflows (RED-GREEN-REFACTOR), systematic debugging, code review processes, git worktree management, and brainstorming frameworks. Each skill pushes you toward verification-based development instead of "trust me, it works." **The philosophy:** Test before implementation. Verify with evidence. Debug systematically through four phases. Plan before coding. No shortcuts. These skills work together to prevent common mistakes. The brainstorming skill activates before implementation. The TDD skill enforces writing tests first. The verification skill blocks completion claims without proof. **Use when:** You want Claude to be more disciplined about development practices, especially for production code. **Where to put them:** - `~/.claude/skills/` — personal, all projects - `.claude/skills/` — project-specific - Inside plugins — distributable **What you need:** - `SKILL.md` with frontmatter (`name`, `description`) - Optional `allowed-tools` declaration - Optional helper scripts Want to create your own skill? Use this [skill creation guide](/prompts/claude/claude-create-skill). **Why they're powerful.** Skills package expertise Claude applies automatically. Style enforcement, doc updates, test hygiene, framework patterns — all without manual triggering. **Skills vs CLAUDE.md.** Think of skills as modular chunks of a CLAUDE.md file. Instead of Claude reviewing a massive document every time, skills let Claude access specific expertise only when needed. This improves context efficiency while maintaining automatic behavior. **Key difference.** Skills are "always on." Claude activates them based on context. Commands require manual invocation. Some workflows could be either a skill or a command. Example: git worktree management. **Make it a skill if:** You want Claude to automatically consider git worktrees whenever relevant to the conversation. **Make it a command if:** You want explicit control over when worktree logic runs (e.g., `/create-worktree feature-branch`). The overlap is real — choose based on whether you prefer automatic activation or manual control. **Subagents and Skills activate automatically** when Claude determines they're relevant to the task. You don't need to invoke them manually — Claude uses them proactively when it thinks they're useful. **Slash commands require manual triggering** — you type `/command-name` to run them. This is the fundamental difference: automation vs explicit control. --- ## Putting it all together Here's how these features work together in practice: 1. **Memory (`CLAUDE.md`)** — Establish project context and conventions that Claude always knows 2. **Slash commands** — Create explicit shortcuts for workflows you want to trigger on demand 3. **Subagents** — Offload parallel or isolated work to specialized agents 4. **Hooks** — Enforce rules and automate repetitive actions at key lifecycle events 5. **Plugins** — Package and distribute your entire setup to others 6. **MCP** — Connect external systems and make their capabilities available as commands 7. **Skills** — Define automatic behaviors that activate based on task context ### Example: A Task-Based Development Workflow Here's a real-world workflow that combines multiple features: **Setup phase:** - `CLAUDE.md` contains implementation standards ("don't commit until I approve", "write tests first") - `/load-context` slash command initializes new chats with project state - `update-documentation` skill activates automatically after implementations - Hook triggers linting after file edits **Planning phase (Chat 1):** - Main agent plans bug fix or new feature - Outputs detailed task file with approach **Implementation phase (Chat 2):** - Start fresh context with `/load-context` - Feed in the plan from Chat 1 - Implementation subagent executes the plan - `update-documentation` skill updates docs automatically - `/resolve-task` command marks task complete **Why this works:** Main context stays focused on planning. Heavy implementation work happens in isolated context. Skills handle documentation. Hooks enforce quality standards. No context pollution. ## Decision guide: choosing the right tool 🎉 **Huge thanks to [@thewiredbear](https://github.com/thewiredbear)** for creating the [Claude Code Driver](https://github.com/thewiredbear/Claude_Code_Driver/) repository! This community-driven collection includes examples, templates, and resources based on this guide. Perfect for getting started quickly or finding inspiration for your own Claude Code setup. Check it out and contribute your own patterns! For a comprehensive visual guide to all Claude Code features, check out the [Awesome Claude Code Cheat Sheet](https://awesomeclaude.ai/code-cheatsheet). Want model name, context usage, and cost displayed in your terminal? See how to customize your Claude Code status line. ### Feature comparison This comparison table is adapted from [IndyDevDan's video "I finally CRACKED Claude Agent Skills"](https://www.youtube.com/watch?v=kFpLzCVLA20&t=1027s). | Category | Skill | MCP | Subagent | Slash Command | | ------------------- | ----- | ------- | -------- | ------------- | | Triggered By | Agent | Both | Both | Engineer | | Context Efficiency | High | Low | High | High | | Context Persistence | ✅ | ✅ | ✅ | ✅ | | Parallelizable | ❌ | ❌ | ❌ | ❌ | | Specializable | ✅ | ✅ | ✅ | ✅ | | Sharable | ✅ | ✅ | ✅ | ✅ | | Modularity | High | High | Mid | Mid | | Tool Permissions | ✅ | ❌ | ✅ | ✅ | | Can Use Prompts | ✅ | ✅ | ✅ | ✅ | | Can Use Skills | ✅ | Kind of | ✅ | ✅ | | Can Use MCP Servers | ✅ | ✅ | ✅ | ✅ | | Can Use Subagents | ✅ | ✅ | ✅ | ❌ | ### Real-world examples | Use Case | Best Tool | Why | | ------------------------------------------------------ | ------------- | ---------------------------------------------------- | | "Always use Pinia for state management in Vue apps" | `CLAUDE.md` | Persistent context that applies to all conversations | | Generate standardized commit messages | Slash Command | Explicit action you trigger when ready to commit | | Check Jira tickets and analyze security simultaneously | Subagents | Parallel execution with isolated contexts | | Run linter after every file edit | Hook | Automatic reaction to lifecycle event | | Share your team's Vue testing patterns | Plugin | Distributable package with commands + skills | | Query PostgreSQL database for reports | MCP | External system integration | | Run automated SEO audits with browser testing | MCP | External system integration | | Detect style guide violations during any edit | Skill | Automatic behavior based on task context | | Create React components from templates | Slash Command | Manual workflow with repeatable structure | | "Never use `any` type in TypeScript" | Hook | Automatic enforcement after code changes | | Auto-format code on save | Hook | Event-driven automation | | Connect to GitHub for issue management | MCP | External API integration | | Run comprehensive test suite in parallel | Subagent | Isolated, resource-intensive work | | Deploy to staging environment | Slash Command | Manual trigger with safeguards | | Enforce TDD workflow automatically | Skill | Context-aware automatic behavior | | Initialize new projects with team standards | Plugin | Shareable, complete configuration | --- --- title: Building My First Claude Code Plugin description: How I built a Claude Code plugin to generate skills, agents, commands, and more—and stopped copy-pasting boilerplate. tags: ['claude-code', 'ai', 'tooling', 'productivity'] url: https://alexop.dev/posts/building-my-first-claude-code-plugin --- # Building My First Claude Code Plugin If you're unfamiliar with Claude Code or want to understand the full ecosystem (MCP, Skills, Subagents, Hooks, and Plugins), check out my{" "} comprehensive guide to Claude Code's full stack {" "}first. This post assumes you know the basics. ## The Problem I've been using Claude Code for a while now. It's been my daily driver for development work, alongside other AI tools in my workflow. But here's the thing—over the last few months, I stopped paying attention to what Anthropic was shipping. Skills? Didn't look into them. Plugins? No idea they existed. Today I caught up. And I discovered something I'd been missing: plugins. The idea clicked immediately. Everything I'd been building locally—custom commands, agents, configurations—was stuck in `.claude/` folders per project. Plugins change that. You can package it up and share it across projects. Give Claude Code new abilities anywhere. That's when I decided to build one. A plugin that generates slash commands, skills, agents, and everything else I kept creating manually. ## The Manual Workflow Was Painful Before the plugin, creating a new command looked like this: 1. Search the docs for the right frontmatter format 2. Create `.claude/commands/my-command.md` 3. Copy-paste a template 4. Fill in the blanks 5. Hope you got the structure right Repeat for agents. Repeat for skills. Repeat for hooks. 10 minutes on boilerplate. 5 minutes on actual logic. Same problem every time: too much manual work for something that should be instant. ## The Solution: Claude Code Builder I fixed this by building a plugin that generates everything for me. Here's what it includes: | Command | Description | | ---------------------- | -------------------------------------------- | | `/create-skill` | Generate model-invoked skills | | `/create-agent` | Create specialized sub-agents | | `/create-command` | Add custom slash commands | | `/create-hook` | Configure event-driven hooks | | `/create-md` | Generate CLAUDE.md files for project context | | `/create-output-style` | Create custom output styles | | `/create-plugin` | Package your setup as a plugin | Each command handles the structure, frontmatter, and best practices. I just provide the name and description. ## The Plugin Structure Here's the structure: ## Command Files: Where the Magic Happens Each command is a markdown file with frontmatter. Here's the `/create-skill` command as an example: ```markdown --- description: Generate a new Claude Skill with proper structure and YAML frontmatter argument-hint: [skill-name] [description] --- # /create-skill ## Purpose Generate a new Claude Skill with proper structure and YAML frontmatter using official documentation as reference ## Contract **Inputs:** - `$1` — SKILL_NAME (lowercase, kebab-case, max 64 characters) - `$2` — DESCRIPTION (what the skill does and when to use it, max 1024 characters) - `--personal` — create in ~/.claude/skills/ (default) - `--project` — create in .claude/skills/ **Outputs:** `STATUS= PATH=` ## Instructions 1. **Validate inputs:** - Skill name: lowercase letters, numbers, hyphens only - Description: non-empty, max 1024 characters 2. **Determine target directory:** - Personal (default): `~/.claude/skills/{{SKILL_NAME}}/` - Project: `.claude/skills/{{SKILL_NAME}}/` 3. **Generate SKILL.md using this template:** [template content here...] ``` Commands are just instructions for Claude. Write them like you're teaching a junior developer the exact steps to follow. Good{" "} prompt engineering principles {" "} apply here too. Here's what the plugin generates when you run a command:
## Publishing to GitHub Once I had it working locally, publishing was straightforward: 1. Push to GitHub 2. Users add the marketplace: `/plugin marketplace add alexanderop/claude-code-builder` 3. Users install: `/plugin install claude-code-builder@claude-code-builder` No npm, no build step. Just GitHub. ## Try It Yourself Ready to stop copy-pasting Claude Code boilerplate? **Step 1: Install the plugin** ```bash /plugin install claude-code-builder@claude-code-builder ``` **Step 2: Verify installation** Check that the plugin is loaded: ```bash /plugins ``` You should see `claude-code-builder` in the list.
**Step 3: Use the new commands** You now have access to seven new commands. Try creating your first skill: ```bash /create-skill commit-helper "Generate clear commit messages; use when committing" ```
That's it. You're now equipped to generate skills, agents, commands, and more—without touching the docs. ## What's Next? I'm using this daily. Every time I think "I wish Claude could...", I run `/create-skill` instead of Googling docs. Right now, I'm focused on workflow optimization: building Vue applications faster with Claude Code. The question I'm exploring: How do I teach Claude Code to write good Vue applications? I'm working on: - Skills that encode Vue best practices - Commands for common Vue patterns (composables, stores, components) - Custom agents that understand Vue architecture decisions - MCP server integrations for external tools It's not just about speed. It's about teaching Claude Code the way I think about development. Building tools that build tools. That's where it gets fun. --- --- title: Building a Modular Monolith with Nuxt Layers: A Practical Guide description: Learn how to build scalable applications using Nuxt Layers to enforce clean architecture boundaries without the complexity of microservices. tags: ['nuxt', 'vue', 'architecture', 'typescript'] url: https://alexop.dev/posts/nuxt-layers-modular-monolith --- # Building a Modular Monolith with Nuxt Layers: A Practical Guide I once worked on a project that wanted to build an e-commerce website with Nuxt that could be used by multiple countries. The architecture was a nightmare: they had a base repository, and then they would merge the base repo into country-specific code. This was before Nuxt Layers existed, back in the Nuxt 2 days, and managing this was incredibly painful. Every merge brought conflicts, and maintaining consistency across countries was a constant struggle. Now with Nuxt Layers, we finally have a much better solution for this exact use case. But in this blog post, we're going to explore something even more powerful: using Nuxt Layers to build a **modular monolith architecture**. I recently built a simple example e-commerce application to explore this pattern in depth, and I want to share what I learned. By the end of this post, you'll understand how to structure your Nuxt applications with clean boundaries and enforced separation of concerns, without the complexity of microservices or the pain of repository merging strategies. **Full project repository**: https://github.com/alexanderop/nuxt-layer-example This is Part 4 of my [How to Structure Vue Projects](/posts/how-to-structure-vue-projects/). If you're choosing between architecture patterns, start there first. ## The Problem: When Flat Architecture Stops Scaling Most projects start the same way. You create a new Nuxt project, organize files into `components/`, `composables/`, and `stores/` folders, and everything feels clean and organized. This works beautifully at first. Then your application grows. You add a product catalog, then a shopping cart, then user profiles, then an admin panel. Suddenly your `components/` folder has 50+ files. Your stores reference each other in complex ways you didn't plan for. A seemingly innocent change to the cart accidentally breaks the product listing page. I've been there, and I'm sure you have too. The core problem is simple: **flat architectures have no boundaries**. Nothing prevents your cart component from directly importing from your products store. Nothing stops circular dependencies. You can import anything from anywhere, and this freedom becomes a liability as your codebase grows. When I first encountered this problem, I considered micro frontends. While [How to build Microfrontends with Module Federation and Vue](/posts/how-to-build-microfrontends-with-module-federation-and-vue/) solve similar problems, Nuxt Layers offers better developer experience for monorepos. I wanted clean boundaries without the operational complexity of deploying and maintaining separate services. That's when I discovered Nuxt Layers. ## What Are Nuxt Layers? Before diving into the implementation, let me explain what Nuxt Layers actually are and why they solve our problem. Nuxt Layers let you split your application into independent, reusable modules. Think of each layer as a mini Nuxt application with its own components, composables, pages, and stores. Each layer lives in its own folder with its own `nuxt.config.ts` file. You compose these layers together using the `extends` keyword in your main configuration: ```typescript // nuxt.config.ts export default defineNuxtConfig({ extends: [ "./layers/shared", // Local folder "./layers/products", "./layers/cart", ], }); ``` When you extend layers, Nuxt merges their configurations and makes their code available to your application. All extended layers become accessible through auto-generated TypeScript paths (like `#layers/products/...`), and their components, composables, and utilities are automatically imported. Here's the important part: **by default, there's no compile-time enforcement preventing cross-layer imports**. If your app extends both the products and cart layers, the cart layer can technically import from products at runtime—even if cart doesn't extend products directly. This is where ESLint enforcement becomes crucial, which I'll cover later. ```mermaid graph TD A[App Root
extends all layers] --> B[Products Layer
extends shared] A --> C[Cart Layer
extends shared] B --> D[Shared Layer
extends nothing] C --> D ``` ## Building an E-commerce Application with Layers Let me show you how I structured a real e-commerce application using this pattern. I created three layers, each with a specific purpose: **Shared Layer**: The foundation. This layer provides UI components (like badges and buttons), utility functions (currency formatting, storage helpers), and nothing else. No business logic lives here. **Products Layer**: Everything related to browsing and viewing products. Product schemas, the product store, catalog pages, and filter components all live here. Crucially, this layer knows nothing about shopping carts. **Cart Layer**: Everything related to managing a shopping cart. The cart store, localStorage persistence, and cart UI components. This layer knows nothing about product catalogs. **Your Project Root**: The orchestrator. This is not a separate layer—it's your main application that extends all the layers. This is where you create pages that combine features from multiple layers (like a product listing page with "add to cart" functionality). Here's the folder structure: Notice how the products and cart layers never import from each other. They are completely independent features. This is the core principle that makes this pattern work. ## The Difference: Before and After Let me show you the contrast between a traditional approach and the layered approach. ### Without Layers: Tight Coupling In a traditional flat structure, your product component might directly import the cart store: ```vue ``` This creates hidden dependencies. The products feature now depends on the cart feature. You cannot use products without including cart. You cannot understand one without reading the other. Testing becomes harder because everything is coupled. ### With Layers: Clear Boundaries With layers, the product component has no idea that carts exist: ```vue ``` The product component simply emits an event. The parent page (living in your project root) connects products to cart: ```vue ``` Your project root acts as the orchestrator. It knows about both products and cart, but the features themselves stay completely independent. ## How Features Communicate When a page needs functionality from multiple layers, your project root orchestrates the interaction. I like to think of this pattern as similar to micro frontends with an app shell. **Feature layers** are independent workers. Each does one job well. They expose simple interfaces (stores, components, composables) but have no knowledge of each other. **Your project root** is the manager. It knows all the workers. When a task needs multiple workers, your project root coordinates them. Here's a sequence diagram showing how this works: ```mermaid sequenceDiagram participant User participant Page (Project Root) participant Products Layer participant Cart Layer User->>Page (Project Root): Click product Page (Project Root)->>Products Layer: Get product data Products Layer-->>Page (Project Root): Return product Page (Project Root)->>Cart Layer: Add to cart Cart Layer-->>Page (Project Root): Cart updated Page (Project Root)->>User: Show confirmation ``` Let me show you a real example from the cart page. It needs to display cart items (from the cart layer) with product details (from the products layer): ```vue ``` Your project root queries both stores and combines the data. Neither feature layer knows about the other. This keeps your features loosely coupled and incredibly easy to test in isolation. ```mermaid graph TB subgraph "Project Root (Orchestrator)" Page[Cart Page] end subgraph "Independent Features" Cart[Cart Layer
Cart items & logic] Products[Products Layer
Product data] end Page -->|reads| Cart Page -->|reads| Products Page -->|combines| Combined[Combined View] Cart -.->|never imports| Products Products -.->|never imports| Cart ``` ## Enforcing Boundaries with ESLint Now here's something important I discovered while working with this pattern. Nuxt provides basic boundary enforcement through TypeScript: if you try to import from a layer not in your `extends` array, your build fails. This is good, but it's not enough. The problem is this: if your main config extends both products and cart, nothing prevents the cart layer from importing from products. Technically both layers are available at runtime. This creates the exact coupling we're trying to avoid. I needed stricter enforcement. So I built a custom ESLint plugin called `eslint-plugin-nuxt-layers`. This plugin enforces two critical rules: 1. **No cross-feature imports**: Cart cannot import from products (or vice versa) 2. **No upward imports**: Feature layers cannot import from the app layer The plugin detects which layer a file belongs to based on its path, then validates all imports against the allowed dependencies. ```javascript // ❌ This fails linting // In layers/cart/stores/cart/useCartStore.ts // Error: cart layer cannot import from products layer // ✅ This passes linting (in layers/cart/) // OK: cart layer can import from shared layer // ✅ This also passes linting (in your project root) // OK: project root can import from any layer ``` Here's how the validation logic works: ```mermaid graph LR A[File being linted] --> B{Which layer?} B -->|shared| C[Can import: nothing] B -->|products| D[Can import: shared only] B -->|cart| E[Can import: shared only] B -->|project root| F[Can import: all layers] D -.->|❌| G[products → cart] E -.->|❌| H[cart → products] ``` The ESLint plugin gives you enforcement of your architecture. Your IDE will warn you immediately if you violate boundaries, and your CI/CD pipeline will fail if violations slip through. ## Important Gotchas to Avoid Working with Nuxt Layers comes with some quirks you should know about. I learned these the hard way, so let me save you the trouble: **Route paths need full names**: Layer names don't auto-prefix routes. If you have `layers/blog/pages/index.vue`, it creates the `/` route, not `/blog`. You need `layers/blog/pages/blog/index.vue` to get `/blog`. **Component auto-import prefixing**: By default, nested components get prefixed. A component at `components/form/Input.vue` becomes ``. You can disable this with `pathPrefix: false` in the components config if you prefer explicit names. ## When Should You Use This Pattern? I want to be honest with you: Nuxt Layers add complexity. They're powerful, but they're not always the right choice. Here's when I recommend using them: **Your app has distinct features**: If you're building an application with clear feature boundaries (products, cart, blog, admin panel), layers shine. Each feature gets its own layer with its own components, pages, and logic. **You have multiple developers**: Layers prevent teams from stepping on each other's toes. The cart team works in their layer, the products team works in theirs. No more merge conflicts in a giant shared components folder. **You want to reuse code**: Building multiple apps that share functionality? Extract common features into layers and publish them as npm packages. Your marketing site and main app can share the same blog layer without code duplication. **You're thinking long-term**: A small project with 5 components doesn't need layers. But a project that will grow to 50+ features over two years? Layers will save your sanity. ## The Benefits You'll Get After working with this pattern for several months, here are the concrete benefits I've experienced: | Benefit | Description | | -------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Clear boundaries enforced by tools** | Import rules aren't just documentation that developers ignore. Your build fails if someone violates the architecture. This is incredibly powerful for maintaining standards as your team grows. | | **Independent development** | Team members can work on different features without conflicts. The cart team never touches product code. Changes are isolated and safe. | | **Easy testing** | Each layer has minimal dependencies. You can test features in complete isolation without complex mocking setups. | | **Gradual extraction** | If you need to extract a feature later (maybe to share across projects or even split into a micro frontend), you already have clean boundaries. You could publish a layer as its own npm package with minimal refactoring. | | **Better code review** | When someone adds an import in a pull request, you immediately see if it crosses layer boundaries. Architecture violations become obvious during review. | | **Scales with complexity** | As your app grows, you simply add new layers. Existing layers stay independent and unaffected. | | **Better AI assistant context** | You can add layer-specific documentation files (like `claude.md`) to each layer with context tailored to that feature. When working with AI coding assistants like Claude or GitHub Copilot, changes to the cart layer will only pull in cart-specific context, making the AI's suggestions more accurate and focused. | | **Targeted testing** | Running tests becomes more efficient. Instead of running your entire test suite, you can run only the tests related to the feature you're working on. | ## Getting Started with Your Own Project If you want to try this pattern, here's how to get started: ### 1. Clone and Explore the Example Start by exploring the complete example project: ```bash # 📥 Clone the repository git clone https://github.com/alexanderop/nuxt-layer-example cd nuxt-layer-example # 📦 Install dependencies pnpm install # 🚀 Start development server pnpm dev ``` Browse through the layers to see how everything connects. Try making changes to understand how the boundaries work. ### 2. Create Your Own Layered Project To start your own project from scratch: ```bash # 📁 Create layer folders mkdir -p layers/shared layers/products layers/cart # 🔧 Add a nuxt.config.ts to each layer echo "export default defineNuxtConfig({ \$meta: { name: 'shared', description: 'Shared UI and utilities' } })" > layers/shared/nuxt.config.ts ``` ### 3. Add ESLint Enforcement Install the ESLint plugin: ```bash # 📦 Install ESLint plugin pnpm add -D eslint-plugin-nuxt-layers ``` Configure it in your `eslint.config.mjs` with the `layer-boundaries` rule: ```javascript export default [ { plugins: { "nuxt-layers": nuxtLayers, }, rules: { "nuxt-layers/layer-boundaries": [ "error", { root: "layers", // 📁 Your layers directory name aliases: ["#layers", "@layers"], // 🔗 Path aliases that point to layers layers: { shared: [], // 🏗️ Cannot import from any layer products: ["shared"], // 🛍️ Can only import from shared cart: ["shared"], // 🛒 Can only import from shared // 🏠 Your project root files can import from all layers (use '*') }, }, ], }, }, ]; ``` The plugin will now enforce your architecture boundaries automatically. It detects violations in ES6 imports, dynamic imports, CommonJS requires, and export statements—giving you immediate feedback in your IDE and failing your CI/CD pipeline if boundaries are violated. ## Conclusion I've been working with modular monoliths for a while now, and I believe this pattern gives you the best of both worlds. You get the clear boundaries and independent development of micro frontends without the operational complexity of deployment, networking, and data consistency. Nuxt Layers makes this pattern accessible and practical. You get compile-time enforcement of boundaries through TypeScript. You get clear dependency graphs that are easy to visualize and understand. You get a structure that scales from small teams to large organizations without a rewrite. You can start with layers from day one, or you can refactor gradually as your application grows. Either way, your future self will thank you when your codebase is still maintainable after two years and 50+ features. I hope this blog post has been insightful and useful. The complete code is available for you to explore, learn from, and build upon. Clone it, break it, experiment with it. **Full project repository**: https://github.com/alexanderop/nuxt-layer-example If you have questions or want to share your own experiences with Nuxt Layers, I'd love to hear from you. This pattern has fundamentally changed how I approach application architecture, and I'm excited to see how you use it in your own projects. --- --- title: How to Handle API Calls in Pinia with The Elm Pattern description: Learn how to handle API calls in Pinia using the Elm pattern for predictable, testable side effects. Includes complete examples with the Pokemon API. tags: ['vue'] url: https://alexop.dev/posts/handling-api-calls-pinia-elm-pattern --- # How to Handle API Calls in Pinia with The Elm Pattern ## Understanding Pure Functions and Side Effects Before diving into the pattern, it's important to understand the foundational concepts of functional programming that make this approach powerful. ### What Is a Pure Function? A pure function is a function that satisfies two key properties: 1. **Deterministic**: Given the same inputs, it always returns the same output. 2. **No side effects**: It does not interact with anything outside its scope. Here's a simple example: ```ts // Pure function - always predictable function add(a: number, b: number): number { return a + b; } add(2, 3); // Always returns 5 add(2, 3); // Always returns 5 ``` This function is pure because: - It only depends on its inputs (`a` and `b`) - It always produces the same result for the same inputs - It doesn't modify any external state - It doesn't perform any I/O operations ### What Is a Side Effect? A side effect is any operation that interacts with the outside world or modifies state beyond the function's return value. Common side effects include: ```ts // Side effect: Network request function fetchUser(id: number) { return fetch(`/api/users/${id}`); // Network I/O } // Side effect: Modifying external state let count = 0; function increment() { count++; // Mutates external variable } // Side effect: Writing to storage function saveUser(user: User) { localStorage.setItem("user", JSON.stringify(user)); // I/O operation } // Side effect: Logging function calculate(x: number) { console.log("Calculating..."); // I/O operation return x * 2; } ``` None of these are pure functions because they interact with something beyond their inputs and outputs. ### Why Does This Matter? Pure functions are easier to: - **Test**: No need to mock APIs, databases, or global state - **Reason about**: The function's behavior is completely determined by its inputs - **Debug**: No hidden dependencies or unexpected state changes - **Reuse**: Work anywhere without environmental setup However, real applications need side effects. You can't build useful software without API calls, database writes, or user interactions. The key insight from functional programming is not to eliminate side effects, but to **separate** them from your business logic. ## Why Side Effects Are a Problem A pure function only depends on its inputs and always returns the same output. If you include an API call or any asynchronous operation inside it, the function becomes unpredictable and hard to test. Example: ```ts export function update(model, msg) { if (msg.type === "FETCH_POKEMON") { fetch("https://pokeapi.co/api/v2/pokemon/pikachu"); return { ...model, isLoading: true }; } } ``` This mixes logic with side effects. The function now depends on the network and the API structure, making it complex to test and reason about. ## The Solution: Separate Logic and Effects The Elm Architecture provides a simple way to handle side effects correctly. 1. Keep the update function pure. 2. Move side effects into separate functions that receive a dispatch function. 3. Use the store as the bridge between both layers. This separation keeps your business logic independent of the framework and easier to verify. ### File Organization Before diving into the code, here's how we organize the files for a Pinia store using the Elm pattern: ``` src/ └── stores/ └── pokemon/ ├── pokemonModel.ts # Types and initial state ├── pokemonUpdate.ts # Pure update function ├── pokemonEffects.ts # Side effects (API calls) └── pokemon.ts # Pinia store (connects everything) ``` Each file has a clear, single responsibility: - **`pokemonModel.ts`**: Defines the state shape and message types - **`pokemonUpdate.ts`**: Contains pure logic for state transitions - **`pokemonEffects.ts`**: Handles side effects like API calls - **`pokemon.ts`**: The Pinia store that wires everything together This structure makes it easy to: - Find and modify specific logic - Test each piece independently - Reuse the update logic in different contexts - Add new effects without touching business logic ## Example: Fetching Data from the Pokémon API This example demonstrates how to handle an API call using this pattern. ### `pokemonModel.ts` The model defines the structure of the state and the possible messages that can change it. ```ts export type PokemonModel = { isLoading: boolean; pokemon: string | null; error: string | null; }; export const initialModel: PokemonModel = { isLoading: false, pokemon: null, error: null, }; export type PokemonMsg = | { type: "FETCH_REQUEST"; name: string } | { type: "FETCH_SUCCESS"; pokemon: string } | { type: "FETCH_FAILURE"; error: string }; ``` ### `pokemonUpdate.ts` The update function handles all state transitions in a pure way. ```ts export function update(model: PokemonModel, msg: PokemonMsg): PokemonModel { switch (msg.type) { case "FETCH_REQUEST": return { ...model, isLoading: true, error: null }; case "FETCH_SUCCESS": return { ...model, isLoading: false, pokemon: msg.pokemon }; case "FETCH_FAILURE": return { ...model, isLoading: false, error: msg.error }; default: return model; } } ``` This function has no side effects. It only describes how the state changes in response to a message. ### `pokemonEffects.ts` This file performs the network request and communicates back through the dispatch function. ```ts export async function fetchPokemon( name: string, dispatch: (m: PokemonMsg) => void ) { dispatch({ type: "FETCH_REQUEST", name }); try { const res = await fetch(`https://pokeapi.co/api/v2/pokemon/${name}`); if (!res.ok) throw new Error("Not found"); const data = await res.json(); dispatch({ type: "FETCH_SUCCESS", pokemon: data.name }); } catch (e: any) { dispatch({ type: "FETCH_FAILURE", error: e.message }); } } ``` This function does not depend on Pinia or Vue. It simply performs the side effect and dispatches messages based on the result. ### `pokemon.ts` The Pinia store connects the pure logic and the side effect layer. ```ts import { initialModel, type PokemonModel, type PokemonMsg, } from "./pokemonModel"; export const usePokemonStore = defineStore("pokemon", () => { const model = ref(initialModel); function dispatch(msg: PokemonMsg) { model.value = update(model.value, msg); } async function load(name: string) { await fetchPokemon(name, dispatch); } return { state: readonly(model), load, }; }); ``` The store contains no direct logic for handling API responses. It only coordinates updates and side effects. ### Usage in a Component ```vue ``` The component only interacts with the public API of the store. It does not mutate the state directly. ## Why This Approach Works Separating logic and effects provides several benefits. - The update function is pure and easy to test. - The side effect functions are independent and reusable. - The store focuses only on coordination. - The overall data flow remains predictable and maintainable. This method is especially effective in projects where you want full control over how and when side effects are executed. ## Other Side Effects You Can Handle with This Pattern This pattern is not limited to API requests. You can manage any kind of asynchronous or external operation the same way. Examples include: - Writing to or reading from `localStorage` or `IndexedDB` - Sending analytics or telemetry events - Performing authentication or token refresh logic - Communicating with WebSockets or event streams - Scheduling background tasks with `setTimeout` or `requestAnimationFrame` - Reading files or using browser APIs such as the Clipboard or File System By using the same structure, you can keep these effects organized and testable. Each effect becomes an independent unit that transforms external data into messages for your update function. ## Summary If you only need caching or background synchronization, use a specialized library such as [pinia-colada](https://pinia-colada.esm.dev/), [TanStack Vue Query](https://tanstack.com/query/vue), or [RStore](https://rstore.dev/). If you need to stay within Pinia and still maintain a functional structure, this approach is effective. 1. Define your model and messages. 2. Keep the update function pure. 3. Implement effects as separate functions that take a dispatch function. 4. Connect them inside the store. This structure keeps your Pinia stores predictable, testable, and easy to extend to any type of side effect. --- --- title: How to Write Better Pinia Stores with the Elm Pattern description: Learn how to combine The Elm Architecture (TEA) principles with Pinia's private store pattern for testable, framework-agnostic state management in Vue applications. tags: ['vue'] url: https://alexop.dev/posts/tea-architecture-pinia-private-store-pattern --- # How to Write Better Pinia Stores with the Elm Pattern ## The Problem: Pinia Gives You Freedom, Not Rules Pinia is a fantastic state management library for Vue, but it doesn't enforce any architectural patterns. It gives you complete freedom to structure your stores however you want. This flexibility is powerful, but it comes with a hidden cost: without discipline, your stores can become unpredictable and hard to test. The core issue? Pinia stores are inherently mutable and framework-coupled. While this makes them convenient for rapid development, it creates three problems: ```ts // Traditional Pinia approach - tightly coupled to Vue export const useTodosStore = defineStore("todos", () => { const todos = ref([]); function addTodo(text: string) { todos.value.push({ id: Date.now(), text, done: false }); } return { todos, addTodo }; }); ``` The problem? Components can bypass your API and directly manipulate state: ```vue ``` This leads to unpredictable state changes, makes testing difficult (requires mocking Pinia's entire runtime), and couples your business logic tightly to Vue's reactivity system. ```mermaid graph TB C1[Component A] -->|"store.addTodo() ✓"| API[Intended API] C2[Component B] -->|"store.todos.push() ✗"| State[Direct State Access] C3[Component C] -->|"store.todos[0].done = true ✗"| State API --> Store[Store State] State --> Store Store -->|unpredictable changes| Debug[Difficult to Debug] ``` ## The Solution: TEA + Private Store Pattern What if we could keep Pinia's excellent developer experience while adding the predictability and testability of functional patterns? Enter The Elm Architecture (TEA) combined with the "private store" technique from [Mastering Pinia](https://masteringpinia.com/blog/how-to-create-private-state-in-stores) by Eduardo San Martin Morote (creator of Pinia). This hybrid approach gives you: - **Pure, testable business logic** that's framework-agnostic - **Controlled state mutations** through a single dispatch function - **Seamless Vue integration** with Pinia's reactivity - **Full devtools support** for debugging You'll use a private internal store for mutable state, expose only selectors and a dispatch function publicly, and keep your update logic pure and framework-agnostic. ## Understanding The Elm Architecture Before we dive into the implementation, let's understand the core concepts of TEA: 1. **Model**: The state of your application 2. **Update**: Pure functions that transform state based on messages/actions 3. **View**: Rendering UI based on the current model ```mermaid graph LR M[Model
Current State] -->|renders| V[View
UI Display] V -->|user interaction
produces| Msg[Message/Action] Msg -->|dispatched to| U[Update Function
Pure Logic] U -->|returns new| M ``` The key insight is that update functions are pure—given the same state and action, they always return the same new state. This makes them trivial to test without any framework dependencies. ## How It Works: Combining TEA with Private State The pattern uses three key pieces: a private internal store for mutable state, pure update functions for business logic, and a public store that exposes only selectors and dispatch. ### The Private Internal Store First, create a private store that holds the mutable model. This stays in the same file as your public store but is not exported: ```ts // Inside stores/todos.ts - NOT exported! const useTodosPrivate = defineStore("todos-private", () => { const model = ref({ todos: [], }); return { model }; }); ``` The key here: no `export` keyword means components can't access this directly. ### Pure Update Function Next, define your business logic as pure functions: ```ts // stores/todos-update.ts export function update(model: TodosModel, message: TodosMessage): TodosModel { switch (message.type) { case "ADD_TODO": return { ...model, todos: [ ...model.todos, { id: Date.now(), text: message.text, done: false }, ], }; case "TOGGLE_TODO": return { ...model, todos: model.todos.map(todo => todo.id === message.id ? { ...todo, done: !todo.done } : todo ), }; default: return model; } } ``` This update function is completely framework-agnostic. You can test it with simple assertions: ```ts describe("update", () => { it("adds a todo", () => { const initial = { todos: [] }; const result = update(initial, { type: "ADD_TODO", text: "Test" }); expect(result.todos).toHaveLength(1); expect(result.todos[0].text).toBe("Test"); }); }); ``` ### Public Store with Selectors + Dispatch Finally, combine everything in a single file. The private store is defined but not exported: ```ts // stores/todos.ts (this is what components import) // Private store - not exported! const useTodosPrivate = defineStore("todos-private", () => { const model = ref({ todos: [], }); return { model }; }); // Public store - this is what gets exported export const useTodosStore = defineStore("todos", () => { const privateStore = useTodosPrivate(); // Selectors const todos = computed(() => privateStore.model.todos); // Dispatch function dispatch(message: TodosMessage) { privateStore.model = update(privateStore.model, message); } return { todos, dispatch }; }); ``` ```mermaid graph LR Component[Component] Component -->|dispatch message| Public[Public Store] Public -->|call| Update[Update Function
Pure Logic] Update -->|new state| Private[Private Store] Private -->|selectors| Public Public -->|reactive data| Component ``` ### Usage in Components Components interact with the public store: ```vue ``` ## Simpler Alternative: Using Vue's readonly If you want to prevent direct state mutations without creating a private store, Vue's `readonly` utility provides a simpler approach: ```ts // stores/todos.ts export const useTodosStore = defineStore("todos", () => { const model = ref({ todos: [], }); // Dispatch function dispatch(message: TodosMessage) { model.value = update(model.value, message); } // Only expose readonly state return { todos: readonly(model), dispatch, }; }); ``` With `readonly`, any attempt to mutate the state from a component will fail: ```vue ``` ## Benefits of This Approach 1. **Pure business logic**: The `update` function has zero dependencies on Vue or Pinia 2. **Easy testing**: Test your update function with simple unit tests 3. **Framework flexibility**: Could swap Vue for React without changing update logic 4. **Type safety**: TypeScript ensures message types are correct 5. **Devtools support**: Still works with Pinia devtools since we're using real stores 6. **Encapsulation**: Private store is an implementation detail ```mermaid graph TB subgraph T["Traditional Pinia"] TC[Component] TC -->|direct| TS[State] TC -->|actions| TA[Actions] TA --> TS end subgraph P["TEA + Private Store"] PC[Component] -->|dispatch| PD[Dispatch] PD --> PU[Update] PU --> PM[Model] PM -->|selectors| PC end ``` ## Conclusion By combining The Elm Architecture with Pinia's private store pattern, we achieve: - Pure, testable business logic - Clear separation of concerns - Framework-agnostic state management - Full Pinia devtools integration - Type-safe message dispatching This pattern scales from simple forms to complex domain logic while keeping your code maintainable and your tests simple. --- _Credit: This post synthesizes ideas from [The Elm Architecture](https://guide.elm-lang.org/architecture/) and Eduardo San Martin Morote's ["private store" pattern](https://masteringpinia.com/blog/how-to-create-private-state-in-stores) from Mastering Pinia._ --- --- title: How to build Microfrontends with Module Federation and Vue description: Build a Vue 3 microfrontend app with Module Federation. Clear decisions, working code, and a small reference project. tags: ['vue', 'microfrontends', 'module-federation', 'architecture'] url: https://alexop.dev/posts/how-to-build-microfrontends-with-module-federation-and-vue --- # How to build Microfrontends with Module Federation and Vue Monorepo with `pnpm`. Vue 3 SPA. Client side composition with Module Federation. Host owns routing. Events for navigation. Cart sync through localStorage plus events. Shared UI library for consistency. Fallbacks for remote failures. Code: https://github.com/alexanderop/tractorStoreVueModuleFederation You know Vue and basic bundling. You want to split a SPA into independent parts without tight coupling. If you do not have multiple teams or deployment bottlenecks, you likely do not need microfrontends. I wanted to write about microfrontends three years ago. My first touchpoint was the book _Micro Frontends in Action_ by Michael Geers. It taught me a lot, and I was still confused. This post shows a practical setup that works today with Vue 3 and Module Federation. ## Scope We build microfrontends for a Vue 3 SPA with client side composition. Server and universal rendering exist, but they are out of scope here. > Microfrontends are the technical representation of a business subdomain. They allow independent implementations with minimal shared code and single team ownership. > (Luca Mezzalira) ## The Tractor Store in one minute The Tractor Store is a reference shop that lets us compare microfrontend approaches on the same feature set (explore, decide, checkout). It is clear enough to show boundaries and realistic enough to surface routing, shared state, and styling issues.
## Architecture decisions | Question | Decision | Notes | | ---------------- | --------------------------------------------------------------- | -------------------------------------------------- | | Repo layout | Monorepo with `pnpm` workspaces | Shared configs, atomic refactors, simple local dev | | Composition | Client side with Module Federation | Fast iteration, simple hosting | | Routing | Host owns routing | One place for guards, links, and errors | | Team boundaries | Explore, Decide, Checkout, plus Host | Map to clear user flows | | Communication | Custom events for navigation. Cart via localStorage plus events | Low coupling and no shared global store | | UI consistency | Shared UI library in `packages/shared` | Buttons, inputs, tokens | | Failure handling | Loading and error fallbacks in host. Retry once | Keep the shell usable | | Styles | Team prefixes or Vue scoped styles. Tokens in shared | Prevent leakage and keep a common look | Repository layout: ``` ├── apps/ │ ├── host/ (Shell application) │ ├── explore/ (Product discovery) │ ├── decide/ (Product detail) │ └── checkout/ (Cart and checkout) ├── packages/ │ └── shared/ (UI, tokens, utils) └── pnpm-workspace.yaml ``` Workspace definition: ```yaml packages: - "apps/*" - "packages/*" ``` High level view: ```mermaid graph TD subgraph "Monorepo (@tractor)" subgraph "apps/" Host("host") Explore("explore") Decide("decide") Checkout("checkout") end subgraph "packages/" Shared("shared") end end Host -- Consumes --> Explore Host -- Consumes --> Decide Host -- Consumes --> Checkout Explore -- Uses --> Shared Decide -- Uses --> Shared Checkout -- Uses --> Shared Host -- Uses --> Shared ``` ## Implementation We compose at the client in the host. The host loads remote components at runtime and routes between them. ### Host router ```ts // apps/host/src/router.ts export const router = createRouter({ history: createWebHistory(), routes: [ { path: "/", component: remote("explore/HomePage") }, { path: "/products/:category?", component: remote("explore/CategoryPage"), props: true, }, { path: "/product/:id", component: remote("decide/ProductPage"), props: true, }, { path: "/checkout/cart", component: remote("checkout/CartPage") }, ], }); ``` ### `remote()` utility Vue `defineAsyncComponent` wraps any loader in a friendly component. It gives lazy loading, built in states, and retries. This is why it fits microfrontends. ```js const AsyncComp = defineAsyncComponent({ loader: () => Promise.resolve(/* component */), delay: 200, timeout: 3000, onError(error, retry, fail, attempts) { if (attempts <= 1) retry(); else fail(); }, }); ``` ```ts // apps/host/src/utils/remote.ts export function remote(id: string, delay = 150) { return defineAsyncComponent({ loader: async () => { const loader = (window as any).getComponent?.(id); if (!loader) throw new Error(`Missing loader for ${id}`); return await loader(); }, delay, loadingComponent: { render: () => h("div", { class: "mf-loading" }, "Loading..."), }, errorComponent: { render: () => h("div", { class: "mf-error" }, "Failed to load."), }, onError(error, retry, fail, attempts) { if (attempts <= 1) setTimeout(retry, 200); else fail(); }, }); } ``` Module Federation enables dynamic loading of JavaScript modules from different applications at runtime. Think of it as splitting your app into independently deployable pieces that can share code and communicate. You can use **Webpack 5**, **Rspack**, or **Vite** as bundlers. **Rspack offers the most comprehensive Module Federation support** with excellent performance, but for this solution I used **Rspack** (for the host) and **Vite** (for one remote) to showcase interoperability between different bundlers. ### Module Federation runtime The host bootstraps Module Federation and exposes a single loader. ```ts // apps/host/src/mf.ts import { createInstance, loadRemote, } from "@module-federation/enhanced/runtime"; declare global { interface Window { getComponent: (id: string) => () => Promise; } } createInstance({ name: "host", remotes: [ { name: "decide", entry: "http://localhost:5175/mf-manifest.json", alias: "decide", }, { name: "explore", entry: "http://localhost:3004/mf-manifest.json", alias: "explore", }, { name: "checkout", entry: "http://localhost:3003/mf-manifest.json", alias: "checkout", }, ], plugins: [ { name: "fallback-plugin", errorLoadRemote(args: any) { console.warn(`Failed to load remote: ${args.id}`, args.error); return { default: { template: `

Remote unavailable

Remote: ${args.id}

Try again or check the service.

`, }, }; }, }, ], }); window.getComponent = (id: string) => { return async () => { const mod = (await loadRemote(id)) as any; return mod.default || mod; }; }; ``` **Why this setup** - The host is the single source of truth for remote URLs and versions. - URLs can change without rebuilding remotes. - The fallback plugin gives a consistent error experience. ## Communication We avoid a shared global store. We use two small patterns that keep coupling low. A global store looks handy. In microfrontends it creates tight runtime coupling. That kills independent deploys. What goes wrong: - Lockstep releases (one store change breaks other teams) - Hidden contracts (store shape is an API that drifts) - Boot order traps (who creates the store and plugins) - Bigger blast radius (a store error can break the whole app) - Harder tests (cross team mocks and brittle fixtures) Do this instead: - Each microfrontend owns its state - Communicate with explicit custom events - Use URL and localStorage for simple shared reads - Share code not state (tokens, UI, pure utils) - If shared state grows, revisit boundaries rather than centralize it You could use VueUse's `useEventBus` for more Vue-like event communication instead of vanilla JavaScript events. It provides a cleaner API with TypeScript support and automatic cleanup in component lifecycle. However, adding VueUse means another dependency in your microfrontends. The tradeoff is developer experience vs. bundle size and dependency management: vanilla JavaScript events keep things lightweight and framework-agnostic. ### Navigation through custom events Remotes dispatch `mf:navigate` with `{ to }`. The host listens and calls `router.push`. ```ts // apps/host/src/main.ts window.addEventListener("mf:navigate", (e: Event) => { const to = (e as CustomEvent).detail?.to; if (to) router.push(to); }); ``` ### Cart sync through localStorage plus events The checkout microfrontend owns cart logic. It listens for `add-to-cart` and `remove-from-cart`. After each change, it writes to localStorage and dispatches `updated-cart`. Any component can listen and re read. ```ts // apps/checkout/src/stores/cartStore.ts window.addEventListener("add-to-cart", (ev: Event) => { const { sku } = (ev as CustomEvent).detail; // update cart array here localStorage.setItem("cart", JSON.stringify(cart)); window.dispatchEvent(new CustomEvent("updated-cart")); }); ``` ## Styling - Design tokens in `packages/shared` (CSS variables). - Shared UI in `packages/shared/ui` (Button, Input, Card). - Local isolation with Vue scoped styles or BEM with team prefixes (`e_`, `d_`, `c_`, `h_`). - Host provides `.mf-loading` and `.mf-error` classes for fallback UI. Example with BEM plus team prefix: ```css /* decide */ .d_ProductPage__title { font-size: 2rem; color: var(--color-primary); } .d_ProductPage__title--featured { color: var(--color-accent); } ``` Or Vue scoped styles: ```vue ``` ## Operations Plan for failure and keep the shell alive. - Each remote shows a clear loading state and a clear error state. - Navigation works even if a remote fails. - Logs are visible in the host during development. Global styles in the host help here: ```css /* apps/host/src/styles.css */ .mf-loading { display: flex; align-items: center; justify-content: center; padding: 2rem; color: var(--color-text-muted); } .mf-error { padding: 1rem; background: var(--color-error-background); border: 1px solid var(--color-error-border); color: var(--color-error-text); border-radius: 4px; } ``` ## More Module Federation features (short hints) Our solution is simple on purpose. Module Federation can do more. Use these when your app needs them. **Prefetch** (requires registering lazyLoadComponentPlugin) Pre load JS, CSS, and data for a remote before the user clicks (for example on hover). This cuts wait time. ```js // after registering lazyLoadComponentPlugin on the runtime instance const mf = getInstance(); function onHover() { mf.prefetch({ id: "shop/Button", dataFetchParams: { productId: "12345" }, preloadComponentResource: true, // also fetch JS and CSS }); } ``` **Component level data fetching** Expose a `.data` loader next to a component. The consumer can ask the runtime to fetch data before render (works in CSR and SSR). Use a `.data.client.ts` file for client loaders when SSR falls back. **Caching for loaders** Wrap expensive loaders in a cache helper (with maxAge, revalidate, tag, and custom keys). You get stale while revalidate behavior and tag based invalidation. ```js // inside your DataLoader file const fetchDashboard = cache(async () => getStats(), { maxAge: 120000, revalidate: 60000, }); ``` **Type hinting for remotes** `@module-federation/enhanced` can generate types for exposes. Add this to tsconfig.json so editors resolve remote types and hot reload them. ```json { "compilerOptions": { "paths": { "*": ["./@mf-types/*"] } }, "include": ["./@mf-types/*"] } ``` **Vue bridge for app level modules** Mount a whole remote Vue app into a route when you need an application level integration (not just a component). ```js const RemoteApp = createRemoteAppComponent({ loader: () => loadRemote("remote1/export-app"), }); // router: { path: '/remote1/:pathMatch(.*)*', component: RemoteApp } ``` You can read more about these interesting techniques at [module-federation.io](https://module-federation.io/). ### Summary This was my first attempt to understand **microfrontends** better while solving the _Tractor Store_ exercise. In the future, I may also try it with **SSR** and **universal rendering**, which I find interesting. Another option could be to use **Nuxt Layers** and take a _“microfrontend-ish”_ approach at build time. --- --- title: Why You Need Something Hard in Your Life description: The biggest paradox in life: the hardest things are usually the ones that help you grow. Exploring why challenge and difficulty are essential for meaning and personal development. tags: ['personal-development', 'productivity', 'motivation'] url: https://alexop.dev/posts/why-you-need-something-hard-in-your-life --- # Why You Need Something Hard in Your Life The biggest paradox in life is simple: the hardest things are usually the ones that help you grow. They force you out of your comfort zone. They make you stronger. One reason so many people in my generation are depressed is that they do not have a hard thing that is worth aiming for. They live on autopilot. Work 9 to 5. Watch Netflix. Go to a party on the weekend. And then repeat. Sometimes life gives you that hard thing automatically. Having a kid, for example, is brutally hard but also meaningful. But what if you do not have that? What do you do now? That is why so many millennials run marathons. A marathon is something hard. It gives you structure, meaning, and a clear goal. It demands you change your habits. It tells you who you are when things get tough. For me, I am happiest when I have a goal in front of me. Something hard. Not impossible, but not easy either. If it is too easy, I get bored. If it is too hard, I give up. But the sweet spot, where I have to fight for it, that is where life feels good. I am reading Flow right now by the Hungarian psychologist Mihály Csíkszentmihályi, and he explains exactly this. Real happiness does not come from comfort. It comes from challenge. From stretching yourself just enough that you lose track of time and become fully absorbed in the thing you are doing. That is where meaning lives. One of my favorite sports anime, Blue Lock, explains the concept of flow perfectly in [this video](https://www.youtube.com/watch?v=KTHqbv2M0aA). It shows how athletes enter a state where everything else disappears and they become completely absorbed in the challenge at hand. This is flow in action. The times when I was unhappy were always the times when I had no goal. I was just living. Eating badly. Drinking too much. Slowly sinking into a life I did not want. So here is my takeaway: Find something hard. Stick with it. Let it shape you. Because without it, life gets empty fast. --- --- title: What Is the Model Context Protocol (MCP)? How It Works description: Learn how MCP (Model Context Protocol) standardizes AI tool integration, enabling LLMs to interact with external services, databases, and APIs through a universal protocol similar to USB-C for AI applications. tags: ['mcp', 'typescript', 'ai'] url: https://alexop.dev/posts/what-is-model-context-protocol-mcp --- # What Is the Model Context Protocol (MCP)? How It Works I did not see how powerful MCP was until I used Claude Code with the Playwright MCP. **Playwright MCP lets an AI use a real browser.** It can open a page, click buttons, fill forms, and take screenshots. I asked Claude to audit my site for SEO. It ran checks in a real browser, gave me the results, and sent screenshots. You can read more about how I use Claude Code for doing SEO audits. **That was when I saw it.** This was not just text prediction. This was an AI that can see and work with the web like a human tester. ## What is MCP MCP means Model Context Protocol. Before we define it, let us see how we got here. ## How it started ```mermaid flowchart TD U[User] --> A[LLM] A --> U ``` In 2022 ChatGPT made AI open to everyone. You typed a question. It predicted the next tokens and sent back text. You could ask for your favorite author or create code. ## The problem with plain LLMs A plain LLM is a text generator. - It has no live data - It cannot read your files - It struggles with math - It cannot tell you who won yesterday in football You can read more in my post about LLM limits. ## The first fix: tools ```mermaid {scale: '0.5'} flowchart TD U[User] --> A[LLM] A --> D{Needs Python?} D -->|Yes| P[Run code in Python sandbox] P --> O[Execution result] O --> A D -->|No| A A --> U ``` When OpenAI added a Python sandbox, LLMs could run code and give exact results. ## More tools mean more power ```mermaid {scale: '0.5'} flowchart TD U[User] --> A[LLM] A --> D{Needs external tool?} D -->|Python| P[Run code in Python sandbox] P --> O[Execution result] O --> A D -->|Web search| W[Search the web for information] W --> R[Search result] R --> A D -->|No| A A --> U ``` Web search gave live knowledge. Now the model could answer fresh questions. ## Even more tools ```mermaid {scale: '0.5'} flowchart TD U[User] --> A[LLM] A --> D{Needs external tool?} D -->|Python| P[Run code in Python sandbox] P --> O[Execution result] O --> A D -->|Web search| W[Search the web for information] W --> R[Search result] R --> A D -->|Google Calendar| G[Check / update calendar events] G --> E[Calendar data] E --> A D -->|No| A A --> U ``` Anthropic added more tools to Claude like Google Calendar and email. You can ask it what meetings you have next week and it tells you. ## The solution: a protocol We need a standard. One tool for Google Calendar that any agent can use. In November the Model Context Protocol was released. ## Definition **MCP** is an open protocol that lets apps give context to LLMs in a standard way. Think of **USB-C**. You plug in power, a display, or storage and it just works. MCP does the same for AI with data sources and tools. With MCP you can build agents and workflows without custom glue code. --- ## How MCP Works (mental model) At its core, MCP has **three roles**: - **Host** → LLM applications that initiate connections - **Client** → Connectors within the host application - **Server** → Services that provide context and capabilities MCP takes some inspiration from the Language Server Protocol, which standardizes how to add support for programming languages across a whole ecosystem of development tools. In a similar way, MCP standardizes how to integrate additional context and tools into the ecosystem of AI applications. The host embeds clients, and those clients connect to one or more servers. Your VS Code could have a Playwright MCP server for browser automation and another MCP server for your docs — all running at the same time. ```mermaid flowchart LR U((User)) U --> H[Host UI
Claude Desktop, VS Code/Claude Code] H --> C1[MCP Client 1] H --> C2[MCP Client 2] H --> C3[MCP Client 3] C1 --> S1[MCP Server A] C2 --> S2[MCP Server B] C3 --> S3[MCP Server C] ``` --- ## How MCP Connects: Transports MCP uses **JSON-RPC 2.0** for all messages and supports two main transport mechanisms: **Key points:** - Messages are UTF-8 encoded JSON-RPC - stdio uses newline-delimited JSON (one message per line) - HTTP supports session management via `Mcp-Session-Id` headers - Both transports handle requests, responses, and notifications equally well The transport choice depends on your use case: stdio for local tools with minimal latency, HTTP for remote services that multiple clients can connect to. ## What servers can expose An MCP server can offer any combination of three capabilities: ### Tools: Functions the AI can call - Give AI ability to execute actions (check weather, query databases, solve math) - Each tool describes what it does and what info it needs - AI sends parameters → server runs function → returns results ```typescript // Simple calculator tool example server.registerTool( "calculate", { title: "Calculator", description: "Perform mathematical calculations", inputSchema: { operation: z.enum(["add", "subtract", "multiply", "divide"]), a: z.number(), b: z.number(), }, }, async ({ operation, a, b }) => { let result; switch (operation) { case "add": result = a + b; break; case "subtract": result = a - b; break; case "multiply": result = a * b; break; case "divide": result = b !== 0 ? a / b : "Error: Division by zero"; break; } return { content: [ { type: "text", text: `${a} ${operation} ${b} = ${result}`, }, ], }; } ); ``` ### Resources: Context and data - AI can read files, docs, database schemas - Provides context before answering questions or using tools - Supports change notifications when files update ```typescript server.registerResource( "app-config", "config://application", { title: "Application Configuration", description: "Current app settings and environment", mimeType: "application/json", }, async uri => ({ contents: [ { uri: uri.href, text: JSON.stringify( { environment: process.env.NODE_ENV, version: "1.0.0", features: { darkMode: true, analytics: false, beta: process.env.BETA === "true", }, }, null, 2 ), }, ], }) ); ``` ### Prompts: Templates for interaction - Pre-made templates for common tasks (code review, data analysis) - Exposed as slash commands or UI elements - Makes repetitive workflows quick and consistent ````typescript server.registerPrompt( "code-review", { title: "Code Review", description: "Review code for quality and best practices", argsSchema: { language: z.enum(["javascript", "typescript", "python", "go"]), code: z.string(), focus: z .enum(["security", "performance", "readability", "all"]) .default("all"), }, }, ({ language, code, focus }) => ({ messages: [ { role: "user", content: { type: "text", text: [ `Please review this ${language} code focusing on ${focus}:`, "", "```" + language, code, "```", "", "Provide feedback on:", focus === "all" ? "- Security issues\n- Performance optimizations\n- Code readability\n- Best practices" : focus === "security" ? "- Potential security vulnerabilities\n- Input validation\n- Authentication/authorization issues" : focus === "performance" ? "- Time complexity\n- Memory usage\n- Potential optimizations" : "- Variable naming\n- Code structure\n- Comments and documentation", ].join("\n"), }, }, ], }) ); ```` ## What a Client can expose An MCP client can provide capabilities that let servers interact with the world beyond their sandbox: ### Roots: Filesystem boundaries - Client tells server which directories it can access - Creates secure sandbox (e.g., only your project folder) - Prevents access to system files or other projects ### Sampling: Nested LLM calls - Servers can request AI completions through the client - No API keys needed on server side - Enables autonomous, agentic behaviors ### Elicitation: Asking users for input - Servers request missing info from users via client UI - Client handles forms and validation - Users can accept, decline, or cancel requests ## Example: How we can use MCPS in Vscode Your `mcp.json` could look like this: ```json { "servers": { "playwright": { "gallery": true, "command": "npx", "args": ["@playwright/mcp@latest"], "type": "stdio" }, "deepwiki": { "type": "http", "url": "https://mcp.deepwiki.com/sse", "gallery": true } } } ``` - **playwright** → Runs `npx @playwright/mcp@latest` locally over stdio for low-latency browser automation - **deepwiki** → Connects over HTTP/SSE to `https://mcp.deepwiki.com/sse` for live docs and codebase search - **gallery: true** → Makes them visible in tool pickers ## What MCP is not - **Not a hosted service** — It is a protocol - **Not a replacement** for your app logic - **Not a magic fix** for every hallucination — It gives access to real tools and data - You still need good prompts and good UX --- ## Simple example of your first MCP Server ```ts #!/usr/bin/env node const server = new McpServer({ name: "echo-onefile", version: "1.0.0", }); server.tool( "echo", "Echo back the provided text", { text: z .string() .min(1, "Text cannot be empty") .describe("Text to echo back"), }, async ({ text }) => ({ content: [{ type: "text", text }], }) ); const transport = new StdioServerTransport(); server .connect(transport) .then(() => console.error("Echo MCP server listening on stdio")) .catch(err => { console.error(err); process.exit(1); }); ``` This example uses the official [MCP SDK for TypeScript](https://modelcontextprotocol.io/docs/sdk), which provides type-safe abstractions for building MCP servers. The server exposes a single tool called "echo" that takes text input and returns it back. We're using [Zod](https://zod.dev/) for runtime schema validation, ensuring the input matches our expected structure with proper type safety and clear error messages. ## Simple MCP Client Example Here's how to connect to an MCP server and use its capabilities: ```typescript // Create a client that connects to your MCP server async function connectToServer() { // Create transport - this runs your server as a subprocess const transport = new StdioClientTransport({ command: "node", args: ["./echo-server.js"], }); // Create and connect the client const client = new Client({ name: "my-mcp-client", version: "1.0.0", }); await client.connect(transport); return client; } // Use the server's capabilities async function useServer() { const client = await connectToServer(); // List available tools const tools = await client.listTools(); console.log("Available tools:", tools); // Call a tool const result = await client.callTool({ name: "echo", arguments: { text: "Hello from MCP client!", }, }); console.log("Tool result:", result.content); // List and read resources const resources = await client.listResources(); for (const resource of resources) { const content = await client.readResource({ uri: resource.uri, }); console.log(`Resource ${resource.name}:`, content); } // Get and execute a prompt const prompts = await client.listPrompts(); if (prompts.length > 0) { const prompt = await client.getPrompt({ name: prompts[0].name, arguments: { code: "console.log('test')", language: "javascript", }, }); console.log("Prompt messages:", prompt.messages); } // Clean up await client.close(); } // Run the client useServer().catch(console.error); ``` This client example shows how to: - Connect to an MCP server using stdio transport - List and call tools with arguments - Read resources from the server - Get and use prompt templates - Properly close the connection when done ## Use it with Vscode ```json { "servers": { "echo": { "gallery": true, "type": "stdio", "command": "node", "args": ["--import", "tsx", "/absolute/path/echo-server.ts"] } } } ``` ## Summary This was just my starter post for MCP to give an overview. I will write more blog posts that will go in depth about the different topics. If you need a TypeScript starter template for your next MCP server, you can use my [mcp-server-starter-ts](https://github.com/alexanderop/mcp-server-starter-ts) repository to get up and running quickly. --- --- title: How VueUse Solves SSR Window Errors in Vue Applications description: Discover how VueUse solves SSR issues with browser APIs and keeps your Vue composables safe from 'window is not defined' errors. tags: ['vue'] url: https://alexop.dev/posts/how-vueuse-solves-ssr-window-errors-vue-applications --- # How VueUse Solves SSR Window Errors in Vue Applications I am a big fan of [VueUse](https://vueuse.org). Every time I browse the docs I discover a new utility that saves me hours of work. Yet VueUse does more than offer nice functions. It also keeps your code safe when you mix client-side JavaScript with server-side rendering (SSR). In this post I show the typical **"`window` is not defined"** problem, explain why it happens, and walk through the simple tricks VueUse uses to avoid it. ## The Usual Pain: `window` Fails on the Server When you run a Vue app with SSR, Vue executes in **two** places: 1. **Server (Node.js)** – It renders HTML so the user sees a fast first screen. 2. **Browser (JavaScript runtime in the user's tab)** – It takes over and adds interactivity. The server uses **Node.js**, which has **no browser objects** like `window`, `document`, or `navigator`. The browser has them. If code that needs `window` runs on the server, Node.js throws an error and the page render breaks. ### Diagram: How SSR Works ```mermaid sequenceDiagram participant B as Browser participant S as Server (Node.js) B->>S: Request page S-->>S: Run Vue code
(no window) S-->>B: Send HTML B-->>B: Hydrate app
(has window) Note over S,B: If Vue code touches
"window" on the server, SSR crashes ``` ## Node.js vs Browser: Two Different Worlds | Feature | Node.js on the server | Browser in the tab | | ----------- | --------------------- | ------------------ | | `window` | ❌ not defined | ✅ defined | | `document` | ❌ | ✅ | | `navigator` | ❌ | ✅ | | DOM access | ❌ | ✅ | | Goal | Render HTML fast | Add interactivity | A Vue _composable_ that reads the mouse position or listens to scroll events needs those browser objects. It must **not** run while the server renders. ## How VueUse Solves the Problem VueUse uses three small patterns: a **client check**, **safe defaults**, and an **SSR guard** inside each composable. ### 1. One-Line Client Check VueUse first asks, "Are we in the browser?" It does that in [`is.ts`](https://github.com/vueuse/vueuse/blob/main/packages/shared/utils/is.ts): ```ts export const isClient = typeof window !== "undefined" && typeof document !== "undefined"; ``` #### Diagram ```mermaid flowchart TD Start --> Test{window exists?} Test -- yes --> Client[isClient = true] Test -- no --> Server[isClient = false] ``` ### 2. Safe Defaults for Browser Objects Instead of making _you_ write `if (isClient)` checks, VueUse exports harmless fallbacks from [`_configurable.ts`](https://github.com/vueuse/vueuse/blob/main/packages/core/_configurable.ts): ```ts export const defaultWindow = isClient ? window : undefined; export const defaultDocument = isClient ? window.document : undefined; export const defaultNavigator = isClient ? window.navigator : undefined; ``` On the server these constants are `undefined`. That value is safe to read, so nothing crashes. #### Diagram ```mermaid flowchart TD Check[isClient?] -->|true| Real[Return real window] Check -->|false| Undef[Return undefined] Real --> Compose[Composable receives safe value] Undef --> Compose ``` ### 3. The SSR Guard Inside Every Composable Each composable that might touch the DOM adds a simple guard. Example: [`onElementRemoval`](https://github.com/vueuse/vueuse/blob/main/packages/core/onElementRemoval/index.ts): ```ts export function onElementRemoval(options: any = {}) { const { window = defaultWindow } = options; if (!window) // server path return () => {}; // no-op // browser logic goes here } ``` If `window` is `undefined`, the function returns a no-op and exits. The server render keeps going without errors. #### Diagram ```mermaid flowchart TD Run[Composable starts] --> IsWin{defaultWindow ?} IsWin -- no --> Noop[Return empty function] IsWin -- yes --> Logic[Run browser code] ``` ### 4. Extra Safety with `useSupported` Sometimes you **are** in the browser, but the user's browser lacks a feature. VueUse offers [`useSupported`](https://github.com/vueuse/vueuse/blob/main/packages/core/useSupported/index.ts) to check that: ```ts export function useSupported(test: () => unknown) { const isMounted = useMounted(); return computed(() => { isMounted.value; // make it reactive return Boolean(test()); }); } ``` #### Example: `useEyeDropper` `useEyeDropper` checks both SSR and feature support (see the full file [here](https://github.com/vueuse/vueuse/blob/main/packages/core/useEyeDropper/index.ts)): ```ts export function useEyeDropper() { const isSupported = useSupported( () => typeof window !== "undefined" && "EyeDropper" in window ); async function open() { if (!isSupported.value) return; // safe exit const eyeDropper = new (window as any).EyeDropper(); await eyeDropper.open(); } return { isSupported, open }; } ``` ## Wrap-Up - **Node.js** renders HTML but lacks browser globals. - **VueUse** avoids crashes with three steps: 1. A single **`isClient`** flag tells where the code runs. 2. **Safe defaults** turn `window`, `document`, and `navigator` into `undefined` on the server. 3. Every composable adds a quick **SSR guard** that eliminates environment concerns. Because of this design you can import any VueUse composable, even ones that touch the DOM, and trust it to work in SSR without extra code. ### Learn More - VueUse guidelines that inspired these patterns: [https://vueuse.org/guidelines](https://vueuse.org/guidelines) - Full VueUse repository: [https://github.com/vueuse/vueuse](https://github.com/vueuse/vueuse) --- --- title: Mastering GraphQL Fragments in Vue 3: Component-Driven Data Fetching description: Part 3 of the Vue 3 + GraphQL series: Learn how to use GraphQL fragments with fragment masking to create truly component-driven data fetching in Vue 3. tags: ['graphql', 'vue', 'typescript'] url: https://alexop.dev/posts/mastering-graphql-fragments-vue3-component-driven-data-fetching --- # Mastering GraphQL Fragments in Vue 3: Component-Driven Data Fetching ```mermaid graph TD A["❌ Traditional Approach"] --> A1["Monolithic Queries"] A1 --> A2["Over-fetching"] A1 --> A3["Tight Coupling"] A1 --> A4["Implicit Dependencies"] B["✅ Fragment Masking"] --> B1["Component-Owned Data"] B1 --> B2["Type Safety"] B1 --> B3["Data Encapsulation"] B1 --> B4["Safe Refactoring"] ``` ## Why Fragments Are a Game-Changer In Part 2, we achieved type safety with GraphQL Code Generator. But our queries are still monolithic—each component doesn't declare its own data needs. This creates several problems: - **Over-fetching**: Parent components request data their children might not need - **Under-fetching**: Adding a field means hunting down every query using that type - **Tight coupling**: Components depend on their parents to provide the right data - **Implicit dependencies**: Parent components can accidentally rely on data from child fragments - **Brittle refactoring**: Changing a component's data needs can break unrelated components Enter GraphQL fragments with **fragment masking**—the pattern that Relay popularized and that Apollo Client 3.12 has made even more powerful. This transforms how we think about data fetching by providing **true data encapsulation** at the component level. ## What Are GraphQL Fragments? GraphQL fragments are **reusable units of fields** that components can declare for themselves. But they're more than just field groupings—when combined with fragment masking, they provide **data access control**. ```graphql fragment CountryBasicInfo on Country { code name emoji capital } ``` **Fragment masking** is the key innovation that makes fragments truly powerful. It ensures that: 1. **Data is encapsulated**: Only the component that defines a fragment can access its fields 2. **Dependencies are explicit**: Components can't accidentally rely on data from other fragments 3. **Refactoring is safe**: Changing a fragment won't break unrelated components 4. **Type safety is enforced**: TypeScript prevents accessing fields you didn't request ## Understanding Fragments Through the Spread Operator If you're familiar with JavaScript's spread operator, fragments work exactly the same way: ```javascript // JavaScript objects const basicInfo = { code: "US", name: "United States" }; const fullCountry = { ...basicInfo, capital: "Washington D.C." }; ``` ```graphql # GraphQL fragments fragment CountryBasicInfo on Country { code name } query GetCountryDetails { country(code: "US") { ...CountryBasicInfo # Spread fragment fields capital # Add extra fields } } ``` **Fragment masking** takes this further by ensuring components can only access the data they explicitly request—pioneered by **Relay** and now enhanced in **Apollo Client 3.12**. ## Step 1: Enable Fragment Masking Ensure your `codegen.ts` uses the client preset (from Part 2): ```typescript const config: CodegenConfig = { overwrite: true, schema: "https://countries.trevorblades.com/graphql", documents: ["src/**/*.vue", "src/**/*.graphql"], generates: { "src/gql/": { preset: "client", plugins: [], config: { useTypeImports: true }, }, }, }; ``` This generates: - `FragmentType`: Masked fragment types for props - `useFragment()`: Function to unmask fragment data - Type safety to prevent accessing non-fragment fields ## Step 2: Your First Fragment with Masking Let's create a `CountryCard` component that declares its own data requirements: ```vue ``` ## Understanding Fragment Masking: The Key to Data Isolation **Fragment masking** is the core concept that makes this pattern so powerful. It's not just about code organization—it's about **data access control and encapsulation**. ### What Fragment Masking Actually Does Think of fragment masking like **access control in programming languages**. Just as a module can have private and public methods, fragment masking controls which components can access which pieces of data. ```typescript // Without fragment masking (traditional approach) const result = useQuery(GET_COUNTRIES); const countries = result.value?.countries || []; // ❌ Parent can access ANY field from the query console.log(countries[0].name); // Works console.log(countries[0].capital); // Works console.log(countries[0].currency); // Works ``` With fragment masking enabled: ```typescript // ✅ Parent component CANNOT access fragment fields const name = result.value?.countries[0].name; // TypeScript error! // ✅ Only CountryCard can access its fragment data const country = useFragment(CountryCard_CountryFragment, props.country); console.log(country.name); // Works! ``` ### The Power of Data Encapsulation Fragment masking provides **true data encapsulation**: 1. **Prevents Implicit Dependencies**: Parent components can't accidentally rely on data their children need 2. **Catches Breaking Changes Early**: If a child component removes a field, the parent can't access it anymore 3. **Enforces Component Boundaries**: Each component owns its data requirements 4. **Enables Safe Refactoring**: Change a fragment without breaking unrelated components ### Why This Matters Without fragment masking, parent components can accidentally depend on child fragment data. When the child removes a field, the parent breaks at runtime. With fragment masking, TypeScript catches this at compile time. ```typescript // Parent can only access explicitly requested fields countries[0].id; // ✅ Works (parent requested this) countries[0].name; // ❌ TypeScript error (only in fragment) // Child components unmask their fragment data const country = useFragment(CountryCard_CountryFragment, props.country); country.name; // ✅ Works (component owns this fragment) ``` > **📝 Vue Reactivity Note**: Always wrap `useFragment` in a `computed()` for Vue reactivity. This ensures the component updates when fragment data changes. ## Step 3: Parent Component Uses the Fragment Now the parent component includes the child's fragment in its query: ```vue ``` ## The Magic of Fragment Masking Here's what just happened: ```mermaid graph TB subgraph "GraphQL Query Result" QR["countries: Country[]"] end subgraph "Parent Component" PC["Parent can access:
• countries[].code
• countries[].region
❌ countries[].name"] end subgraph "Child Component" CC["CountryCard receives:
Masked Fragment Data"] UF["useFragment() unmasks:
• code
• name
• emoji
• capital"] end QR --> PC QR --> CC CC --> UF ``` The parent component **cannot access** fields from `CountryCard_Fragment`—they're masked! Only `CountryCard` can unmask and use that data. ## Step 4: Nested Fragments Fragments can include other fragments, creating a hierarchy: ```graphql # Basic fragment fragment LanguageItem_LanguageFragment on Language { code name native } # Fragment that uses other fragments fragment CountryWithLanguages_CountryFragment on Country { code name emoji languages { ...LanguageItem_LanguageFragment } } ``` Child components use their own fragments: ```vue ``` ## Fragment Dependency Management Notice how the query automatically includes all nested fragments: ```mermaid graph LR subgraph "Components" A[CountryDetailPage] B[CountryWithLanguages.vue] C[LanguageItem.vue] end subgraph "Their Fragments" A1[Page Fields:
code] B1[Country Fragment:
code, name, emoji
languages] C1[Language Fragment:
code, name, native] end A -.-> A1 B -.-> B1 C -.-> C1 ``` ## Step 5: Conditional Fragments Use GraphQL directives to conditionally include fragments: ```graphql query CountriesConditional($includeLanguages: Boolean!) { countries { code name ...CountryDetails_CountryFragment @include(if: $includeLanguages) } } ``` This enables dynamic data loading based on user interactions or application state. ## Best Practices ### Key Guidelines 1. **Naming**: Use `ComponentName_TypeNameFragment` convention 2. **Vue Reactivity**: Always wrap `useFragment` in `computed()` 3. **TypeScript**: Use `FragmentType` for props 4. **Organization**: Colocate fragments with components ```typescript // ✅ Good naming and Vue reactivity const CountryCard_CountryFragment = graphql(`...`); interface Props { country: FragmentType; } const country = computed(() => useFragment(CountryCard_CountryFragment, props.country) ); ``` ## Performance Benefits Fragments aren't just about developer experience - they provide concrete performance and maintainability benefits: ```mermaid graph TD A["❌ Multiple Queries"] --> A1["3 Network Requests"] A1 --> A2["Duplicate Data Fetching"] A2 --> A3["Larger Bundle Size"] B["✅ Fragment Composition"] --> B1["Single Network Request"] B1 --> B2["Optimized Payload"] B2 --> B3["Better Performance"] ``` ## Summary GraphQL fragments with fragment masking enable **component-driven data fetching** in Vue 3: ✅ **Type Safety**: Components can only access their declared fields ✅ **True Modularity**: Each component declares its exact data needs ✅ **Better Performance**: Load only the data you need ✅ **Maintainable Code**: Changes to fragments don't break unrelated components ## Migration Checklist 1. Start with leaf components (no children) 2. Always use `computed()` with `useFragment` for Vue reactivity 3. Update TypeScript interfaces to use `FragmentType` 4. Run `npm run codegen` after fragment changes ## What's Next? This is Part 3 of our Vue 3 + GraphQL series: 1. **Part 1**: Setting up Apollo Client with Vue 3 2. **Part 2**: Type-safe queries with GraphQL Code Generator 3. **Part 3**: Advanced fragments and component-driven data fetching (current) 4. **Part 4**: GraphQL Caching Strategies in Vue 3 (coming next!) ## Other Fragment Use Cases Beyond component-driven data fetching, fragments offer additional powerful patterns: - **Fragments on Unions and Interfaces**: Handle polymorphic types with inline fragments (`... on Type`) - **Batch Operations**: Share field selections between queries, mutations, and subscriptions - **Schema Documentation**: Use fragments as living documentation of data shapes - **Testing**: Create fragment mocks for isolated component testing - **Fragment Composition**: Build complex queries from simple, reusable pieces For more advanced fragment patterns, see the [Vue Apollo Fragments documentation](https://apollo.vuejs.org/guide-composable/fragments). ## Source Code Find the full demo for this series here: [example](https://github.com/alexanderop/vue-graphql-simple-example) **Note:** The code for this tutorial is on the `part3` branch. ```bash git clone https://github.com/alexanderop/vue-graphql-simple-example.git cd vue-graphql-simple-example git checkout part3 ``` --- --- title: How I Use Claude Code for Doing SEO Audits description: Learn how to leverage Claude Code with Puppeteer MCP to perform comprehensive SEO audits in minutes, complete with automated analysis and actionable reports. tags: ['ai', 'seo'] url: https://alexop.dev/posts/how-i-use-claude-code-for-doing-seo-audits --- # How I Use Claude Code for Doing SEO Audits I'm building a Nuxt blog starter called [NuxtPapier](https://github.com/alexanderop/NuxtPapier). Like any developer who wants their project to show up in search results, I needed to make sure it works well with search engines. Manual SEO audits take too much time and I often miss things, so I used Claude Code with Puppeteer MCP to do this automatically. ## What is Claude Code? Claude Code is Anthropic's official command-line tool that brings AI help right into your coding workflow. Think of it like having a skilled developer sitting next to you, ready to help with any coding task. What makes it really powerful for SEO audits is that it can use MCP (Model Context Protocol) tools. ## Enter Puppeteer MCP > **What is MCP?** > According to [Anthropic's official documentation](https://www.anthropic.com/news/model-context-protocol), the Model Context Protocol (MCP) is "an open standard that enables developers to build secure, two-way connections between their data sources and AI-powered tools." It provides a universal interface for connecting AI systems with external tools, systems, and data sources, replacing fragmented integrations with a single protocol. MCP lets Claude Code connect with outside tools and services. I was using the [Puppeteer MCP server](https://github.com/merajmehrabi/puppeteer-mcp-server) which gives Claude Code the power to: - Control a real web browser with code - Go to any website - Take screenshots - Pull data from web pages - Check page structure and speed This means Claude Code can visit your website, look at it the same way search engines do, and give you detailed insights. ```mermaid --- title: Claude Code SEO Audit Workflow --- graph LR A[Claude Code CLI] --> B[Puppeteer MCP] B --> C[Real Browser Instance] C --> D[Your Website] D --> E[Extract Data] E --> F[SEO Analysis] F --> G[Comprehensive Report] ``` ## My SEO Audit Prompt Here's the exact prompt I used: ``` You are a SEO expert your name is John Wick. I gave you so much money now you have to do a SEO audit for this blog. When you are done write down everything into a own markdown file so I can solve the problems later. Use Puppeteer MCP, the server is already running. ``` Simple and direct. It uses Claude Code's ability to act like an expert while using the Puppeteer tools. Claude Code quickly responded with: > "I'll do a complete SEO audit for your blog, Senpai Alex. Let me start by making a plan and then checking your site using Puppeteer." ## The Automated Process One of the coolest things is how Claude Code organizes itself by making todos. It quickly made its own plan: **Claude Code's Self-Generated Todo List:** - [ ] Navigate to the homepage and take screenshots - [ ] Analyze meta tags and SEO elements - [ ] Check blog post pages for optimization - [ ] Evaluate page structure and heading hierarchy - [ ] Analyze site performance and loading times - [ ] Check mobile responsiveness - [ ] Review robots.txt and sitemap availability - [ ] Document all findings in a markdown report What amazed me was how Claude Code worked through each todo step by step, using Puppeteer to get real data from my site - checking meta tags, measuring how fast pages load, checking structured data, and even taking mobile screenshots. It checked off each item as it finished them. While working, Claude Code told me what it found: > "I found a big SEO problem - the robots meta tag says 'noindex, nofollow' which stops search engines from finding your site. This is a major issue." ```mermaid --- title: Automated SEO Analysis Process --- sequenceDiagram participant CC as Claude Code participant P as Puppeteer participant B as Browser participant W as Website CC->>CC: Generate Todo List CC->>P: Initialize browser session P->>B: Launch browser loop For each SEO check CC->>P: Navigate to page P->>B: Go to URL B->>W: Load page W->>B: Return content B->>P: Extract data P->>CC: Return analysis data CC->>CC: ✓ Mark todo complete end CC->>CC: Generate final report ``` ## The Report Claude Code created a detailed SEO audit report that I could actually use. It included: **Critical Issues:** - My site had `noindex, nofollow` meta tags (development setting blocking all search engines!) - robots.txt was blocking all crawlers - Duplicate H1 tags on blog posts **Positive Findings:** - Proper structured data implementation - Good mobile responsiveness - Fast page load times (200ms) - Clean URL structure **What to Fix:** - Quick fixes with code examples - Short-term improvements ranked by importance - Long-term improvement plans The report even told me exactly which files to change and what code to fix. Claude Code wrapped up with: > "The audit found big blocking problems - search engines can't see your site because of noindex, nofollow meta tags and development robots.txt settings. But the good news is your blog has great technical basics with proper structured data, mobile-friendly design, and fast loading." ```mermaid --- title: SEO Audit Report Structure --- mindmap root((SEO Audit Report)) Critical Issues noindex/nofollow tags Blocked robots.txt Duplicate H1 tags Positive Findings Structured data ✓ Mobile responsive ✓ Fast load times ✓ Clean URLs ✓ Recommendations Immediate fixes Code examples File locations Short-term improvements Priority ranking Long-term strategies Enhancement roadmap ``` ## Why This Workflow Works So Well 1. **Speed**: Claude Code finished in minutes what would take me hours to do by hand 2. **Complete**: Claude Code checked things I might have missed 3. **Useful Results**: Not just problems, but solutions with code examples 4. **Documentation**: Everything saved in a markdown file I can use later ## Getting Started To copy this workflow: 1. Install Claude Code: `npm install -g @anthropic/claude-code` 2. Set up the [Puppeteer MCP server](https://github.com/merajmehrabi/puppeteer-mcp-server) 3. Start your development server 4. Give Claude Code a clear prompt about what you want checked Claude Code's smarts plus Puppeteer's browser control makes a powerful SEO audit tool that any developer can use. No more guessing about SEO problems - just run the audit and get a professional report in minutes. Try it on your own projects and see what SEO problems you might be missing! --- --- title: The Age of the Generalist description: How AI is transforming software development and why high-agency generalists will thrive in this new era of technology. tags: ['ai'] url: https://alexop.dev/posts/the-age-of-the-generalist --- # The Age of the Generalist AI this, AI that. Like many of you, I'm constantly switching between _"Wow, you can do that with AI?"_ and _"Ugh, not AI again."_ But here we are. AI is changing how we work and how we live. I’m a millennial. The last major disruption technology brought to my life was social media (and then the iPhone). Now there is a new wave coming. It will slowly change everything. And yes, it is AI. How does AI change the work of software developers? Short answer: we will all become more like generalists. This shift is already happening. Most teams no longer separate frontend and backend. They hire full-stack developers. This helps teams move faster. Traditional separation creates communication overhead and slows execution. So we are already becoming generalists. If you join a startup as a dev, you will not find neat job titles. There is no budget for that. You wear whatever hat the day requires. Now add AI to the mix. Tasks that took hours , writing code, setting up tooling , can now be done in minutes. You can delegate to AI, generate boilerplate, spin up components, scaffold tests. You can work in parallel. You will probably spend more time reading code than writing it. But AI has no understanding of architecture. It does not know what good design looks like. It cannot distinguish between cohesion and coupling. It does not know when to break something into modules or when to leave it flat. It has no initiative. It only works when a human prompts it. That is why **high agency** is more important than ever. AI does not replace builders. It replaces waiters. If you wait to be told what to do, you will fall behind. If you take action, ask questions, and push things forward, you will stay ahead. High agency means seeing a mess and deciding what to clean up. It means figuring out what matters without someone else making the roadmap. AI can give you answers, but it will never tell you what is worth building. So what should developers focus on? Become a generalist with high agency. Think of Leonardo da Vinci. He painted _The Last Supper_ and _Mona Lisa_. He dissected human bodies and sketched the nervous system. He designed flying machines. He wrote about optics, engineering, and warfare. He did not pick a lane. He learned widely and built from what he learned. That mindset , curious, self-directed, and hands-on , is what will matter most in the age of AI. --- --- title: How I Use LLMs description: Learn how I use LLMs to improve my productivity and efficiency. tags: ['ai', 'productivity'] url: https://alexop.dev/posts/how-i-use-llms --- # How I Use LLMs Motivated by the awesome YouTube video from Andrew Karpathy [How I use LLMs](https://www.youtube.com/watch?v=EWvNQjAaOHw), I decided to give two talks on how I use LLMs, both at my company and at the TypeScript meetup in Munich. This blog post is the written version of those talks. Keep in mind that while some things might change, especially regarding the models I currently use, I hope these tips will remain helpful for a long time.
As a junior developer, you might think your job is all about coding. However, as you gain experience, you realize that's not entirely true. We developers spend a significant amount of time learning new things or explaining concepts to others. That's why, when it comes to using LLMs, we shouldn't focus solely on code generation. We should also consider how to: - **Research faster** - **Document better** - **Learn more effectively** Most of my tips won't be about how to use Cursor AI or Copilot better. I think that would be worth its own blog post or a short video. ## Which model should I choose
It's annoying that we even have to think about which model to use for which task. I would guess that in the future (Cursor AI is already doing this), there will be a model as a kind of router in the middle that understands which prompt relates to which model. But for now, this isn't the case, so here's my guideline. In the picture, you see that I came up with four categories: 1. **Everyday tasks** (like fixing spelling, writing something better) 2. **Quick Refactoring** (like adding console logs to debug something, small refactorings) 3. **Technical Tasks** (like doing research) 4. **Complex Tasks** (tasks that definitely need long reasoning and thinking) It's important for me, since I don't have an unlimited amount of o3, for example, to try to use o4-mini-high if I think I don't need long reasoning for something. As I said, these models will change daily, but I think the categories will remain. So most of the time, I ask myself if I need a model that requires reasoning or not. ## o3 is a mini agent What's also clear is that new models like o3 are mini agents. This means they're not only predicting the next token but also have tools. With these tools, they can gain better context or perform operations with Python. This is why Simon Willison's blog post explains how he used o3 to guess his location. As his title says: Watching o3 guess a photo's location is surreal, dystopian, and wildly entertaining, but it also shows how powerful this can be. Read his blog post [here](https://simonwillison.net/2025/Apr/26/o3-photo-locations/). I also wrote a blog post once where I gave o3 a hard chess puzzle to solve. Feel free to read it [here](../how-03-model-tries-chess-puzzle). ## Some tips on how to get more out of Copilot and co My first tip is to index your codebase, either with a local index or remote. With this, Cursor or Copilot can perform better searches. It all falls back to automatic retrieval. Keep in mind that an LLM doesn't know where your files are located. So it always has to search against your codebase. One technique besides keyword search that can help is dense vector or embedding search. You can read the docs on how to implement that. Another tip: when you have a project that's indexed, you can use Copilot's ask mode and use @workspace. Now you can ask business questions or even solve simple tickets in one shot (if there are well-written tickets). For more information on how to index your repositories for Copilot Chat, refer to the [GitHub Copilot documentation](https://docs.github.com/en/copilot/using-github-copilot/copilot-chat/indexing-repositories-for-copilot-chat). My last tip, where I use Gemini 2.0 Flash or GPT-4.1, is to do little refactorings or code changes quickly. I quickly mark the related lines and then use a prompt to make the changes. ## How can we improve the output of an LLM
In the book ["AI Engineering"](https://www.oreilly.com/library/view/ai-engineering/9781098166298/) by Chip Huyen, she explains that there are three main ways to improve the output of an LLM: 1. **With Prompts** 2. **Per RAG** 3. **With fine-tuning** Of course, all three ways will increase in effort and maybe ROI, but it's clear that better prompts are always the first step to improving the output of an LLM. ## The almighty System Prompt The idea of a System Prompt is simple but genius. We change the default behavior of an LLM and customize it to our needs.
In the picture, you see an example of a system prompt that I use to write blog posts. In the picture, you see an example of a system prompt that can be used to write Jira tickets. At work, I have something like that and use it together with Copilot. My goal is to quickly write what needs to be done, and the LLM handles the rest. It also asks questions when something is not clear. You can use that for many problems, and also keep in mind that every LLM provider, like OpenAI or Claude, has their own system prompt. One use case, for example, is to explain which tools an LLM has available, etc. At [GitHub](https://github.com/jujumilk3/leaked-system-prompts), you can read some of the leaked system prompts. This is why this is a good structure to think about when you write system prompts: 1. **Role Definition** 2. **Step-by-Step Instructions** 3. **Output Format** 4. **Edge Cases** 5. **Style Guidelines** When you tell the LLM which role it has, it will already use words and tokens that are useful for this role in its next prediction. Clear steps can help for a more complex workflow so the LLM knows when it's done, etc. For something like a Jira ticket, we should also add a concrete output format with an example. In my experience, edge cases are something that you will add over time. We need to play with the LLM and see what vibe we get from it. Style guidelines are useful. For example, I love easy words and active voice. You can also ask the LLM how a system prompt should look for the problem you want to solve and use that as your version 1. This approach can provide a solid starting point for further refinement. ## Googling is dead Don't get me wrong, I think Google is winning the AI arms race. As noted in [The Algorithmic Bridge](https://www.thealgorithmicbridge.com/p/google-is-winning-on-every-ai-front), Google is excelling on every AI front. But the classical googling, where we typed a query and the first five results had an ad and it was hard to find an organic result, is over.
Most of the time, I use a reasoning model with a web search tool. This helps me as a starter to find related blog posts, etc., for my problem. I only use Google when I know the site I want to reach or I know which blog post I want to read. ## Get all tokens out of a repo If you change GitHub to Uithub for any repo, you will get all text in a way that you can just copy-paste it into a model with a high context, like Google Gemini. This can be useful to either ask questions against the codebase or to learn how it works or to rebuild something similar without needing to increase the depth of your node modules.
## Generate a Wiki out of any repo When you go to https://deepwiki.org/, you can generate a wiki out of any repo. Useful for understanding other repos or even for your own little side projects. What I like is that the LLMs generate mermaid diagrams, and sometimes they are really useful. ## Generate diagrams I think there are now three ways to generate good diagrams with an LLM: 1. **As SVG** 2. **As Mermaid** 3. **Or as a picture with the new model** I already wrote about how to use ChatGPT to generate mermaid diagrams. Read it [here](../how-to-use-ai-for-effective-diagram-creation-a-guide-to-chatgpt-and-mermaid). ## Rules Rules Rules We human developers need rules, and the same is true for LLMs to write better code. This is why both Copilot and Cursor have their own rule system. For detailed information on how to set up and use rules in Cursor, check out the [official Cursor documentation on rules](https://docs.cursor.com/context/rules). One idea when you have a monorepo could be something like this: ```plaintext my-app/ ├── .cursor/ │ └── rules/ │ └── project-guidelines.mdc # General code style, naming, formatting ├── frontend/ │ ├── .cursor/ │ │ └── rules/ │ │ ├── vue-components.mdc # Naming + structure for components │ │ └── tailwind-usage.mdc # Utility-first CSS rules │ └── src/ │ └── ... ├── backend/ │ ├── .cursor/ │ │ └── rules/ │ │ ├── api-structure.mdc # REST/GraphQL structure conventions │ │ └── service-patterns.mdc # How to organize business logic │ └── src/ │ └── ... ├── shared/ │ ├── .cursor/ │ │ └── rules/ │ │ └── shared-types.mdc # How to define + use shared TypeScript types │ └── src/ │ └── ... ├── README.md └── package.json ``` One rule could then look like this: ```mdc --- description: Base project guidelines and conventions globs: - "**/*.ts" - "**/*.vue" alwaysApply: false --- - **Use `PascalCase` for component names.** - **Use `camelCase` for variables, functions, and file names (except components).** - **Prefer composition API (`setup()`) over options API.** - **Type everything. Avoid `any` unless absolutely necessary.** - **Keep files under 150 LOC. Split logic into composables or utilities.** - **Use absolute imports from `@/` instead of relative paths.** - **Every module must have tests that reflect the feature's acceptance criteria.** - **Commit messages must follow Conventional Commits format.** - **Use TODO: and FIXME: comments with your initials (e.g., `// TODO: refactor`).** - **Format code with Prettier. Lint with ESLint before committing.** Referenced files: @.eslintrc.js @.prettierrc @tsconfig.json ``` This is an example for Cursor. The idea is to give a more fine-grained context. In our example, maybe it would even be better to only have a .vue and separate .ts rule. In Agent mode, Cursor will then automatically apply this rule as context. ## Write better image prompts One technique that I think can be useful is to describe which image you want and then say, "give me that back as a Midjourney prompt." This has the advantage that the description of the image is nicely formatted. ## When should you use an LLM directly An interesting question that I got from the TypeScript meetup was when I would directly vibe code and just tell Cursor to implement feature X and when not. In my experience, it all depends on the topic and how much training data is available for that. For example, last week I was using Nuxt together with NuxtUI, a good UI library for Nuxt, but the problem was that the LLM doesn't understand how the components are structured, etc. So in that case, it would be better if I were the main driver and not the LLM. So always ask yourself if there is enough training data out there for your problem. Was it already solved in the past? Sometimes you will waste time by just blindly doing vibe coding. ## Summary There are many ways we developers can use LLMs to be more productive and also have more fun. I believe most of us don't want to spend too much time writing tickets. This is where LLMs can help us. I believe it's important to be open and try out these tools. If you want to get better with these tools, also try to understand the fundamentals. I wrote a blog post explaining [how ChatGPT works](../how-chatgpt-works-for-dummies) that might help you understand what's happening under the hood. --- --- title: No Server, No Database: Smarter Related Posts in Astro with `transformers.js` description: How I used Hugging Face embeddings to create smart “Related Posts” for my Astro blog—no backend, no database, just TypeScript. tags: ['ai', 'astro', 'typescript'] url: https://alexop.dev/posts/semantic-related-posts-astro-transformersjs --- # No Server, No Database: Smarter Related Posts in Astro with `transformers.js` I recently read a interesting blog post about Embeddings at [Embeddings in Technical Writing](https://technicalwriting.dev/ml/embeddings/overview.html): > “I could tell you exactly how to advance technical writing with embeddings, but where’s the fun in that?” Challenge accepted! In this post, I show how I used **Hugging Face’s `transformers.js`** to create smarter related-post suggestions for my Astro blog, without servers or databases. ## Why Embeddings Are Better Than Tags Tags group posts by labels, but not by meaning. Posts about Vue 3 and deep reactivity concepts get mixed up together. Embeddings capture the meaning of text using numeric vectors. Two posts become related when their content is similar, not just when tags match. ### Vectors and Cosine Similarity Words like “cat” and “kitty” are close in meaning, while “dog” is slightly different: | word | vector | | ----- | ---------- | | cat | `[0, 1]` | | kitty | `[0, 0.9]` | | dog | `[1, -1]` | Cosine similarity measures how similar these vectors are. For a deeper dive into TypeScript and vectors, check out my post on [How to Implement a Cosine Similarity Function in TypeScript for Vector Comparison](../how-to-implement-a-cosine-similarity-function-in-typescript-for-vector-comparison/). ## Transformers.js in Action `transformers.js` lets you run Hugging Face models directly in JavaScript: ```ts const model = "sentence-transformers/all-MiniLM-L6-v2"; const extractor = await pipeline("feature-extraction", model); const embedding = await extractor("Hello, world!", { pooling: "mean", normalize: true, }); console.log(embedding); // Float32Array with 384 dimensions ``` You don't need Python or a server. Everything runs in your browser or Node.js. ## My Simple Workflow Here's how my workflow works: 1. Load markdown files (`.md` or `.mdx`) from my blog. 2. Remove markdown formatting to get plain text. 3. Use `transformers.js` to create embeddings. 4. Calculate cosine similarity between all posts. 5. Find the top 5 most related posts for each post. 6. Save the results in a JSON file (`similarities.json`). 7. Display these related posts with Astro. ### Main Script (TypeScript) ```ts // --------- Configurations --------- const GLOB = "src/content/**/*.{md,mdx}"; // Where to find Markdown content const OUT = "src/assets/similarities.json"; // Output file for results const TOP_N = 5; // Number of similar docs to keep const MODEL = "Snowflake/snowflake-arctic-embed-m-v2.0"; // Embedding model // --------- Type Definitions --------- interface Frontmatter { slug: string; [k: string]: unknown; } interface Document { path: string; content: string; frontmatter: Frontmatter; } interface SimilarityResult extends Frontmatter { path: string; similarity: number; } // --------- Utils --------- /** * Normalizes a vector to unit length (L2 norm == 1) * This makes cosine similarity a simple dot product! */ function normalize(vec: Float32Array): Float32Array { let len = Math.hypot(...vec); // L2 norm if (!len) return vec; return new Float32Array(vec.map(x => x / len)); } /** * Computes dot product of two same-length vectors. * Vectors MUST be normalized before using this for cosine similarity! */ const dot = (a: Float32Array, b: Float32Array) => a.reduce((sum, ai, i) => sum + ai * b[i], 0); /** * Strips markdown formatting, import/export lines, headings, tables, etc. * Returns plain text for semantic analysis. */ const getPlainText = async (md: string) => { let txt = String(await remark().use(strip).process(md)) .replace(/^import .*?$/gm, "") .replace(/^export .*?$/gm, "") .replace( /^\s*(TLDR|Introduction|Conclusion|Summary|Quick Setup Guide|Rules?)\s*$/gim, "" ) .replace(/^[A-Z\s]{4,}$/gm, "") .replace(/^\|.*\|$/gm, "") .replace(/(Rule\s\d+:.*)(?=\s*Rule\s\d+:)/g, "$1\n") .replace(/\n{3,}/g, "\n\n") .replace(/\n{2}/g, "\n\n") .replace(/\n/g, " ") .replace(/\s{2,}/g, " ") .trim(); return txt; }; /** * Parses and validates a single Markdown file. * - Extracts frontmatter (slug, etc.) * - Converts content to plain text * - Skips drafts or files with no slug */ async function processFile(path: string): Promise { try { const { content, data } = matter(fs.readFileSync(path, "utf-8")); if (!data.slug || data.draft) return null; const plain = await getPlainText(content); return { path, content: plain, frontmatter: data as Frontmatter }; } catch { return null; } } /** * Processes an array of Markdown file paths into Documents */ async function loadDocs(paths: string[]) { const docs: Document[] = []; for (const p of paths) { const d = await processFile(p); if (d) docs.push(d); } return docs; } /** * Generates vector embeddings for each document's plain text. * - Uses HuggingFace model * - Normalizes each vector for fast cosine similarity search */ async function embedDocs( docs: Document[], extractor: FeatureExtractionPipeline ) { if (!docs.length) return []; // Don't let the model normalize, we do it manually for safety const res = (await extractor( docs.map(d => d.content), { pooling: "mean", normalize: false } )) as any; const [n, dim] = res.dims; // Each embedding vector is normalized for performance return Array.from({ length: n }, (_, i) => normalize(res.data.slice(i * dim, (i + 1) * dim)) ); } /** * Computes the top-N most similar documents for the given document index. * - Uses dot product of normalized vectors for cosine similarity * - Returns only the top-N */ function topSimilar( idx: number, docs: Document[], embs: Float32Array[], n: number ): SimilarityResult[] { return docs .map((d, j) => j === idx ? null : { ...d.frontmatter, path: d.path, similarity: +dot(embs[idx], embs[j]).toFixed(2), // higher = more similar } ) .filter(Boolean) .sort((a, b) => (b as any).similarity - (a as any).similarity) .slice(0, n) as SimilarityResult[]; } /** * Computes all similarities for every document, returns as {slug: SimilarityResult[]} map. */ function allSimilarities(docs: Document[], embs: Float32Array[], n: number) { return Object.fromEntries( docs.map((d, i) => [d.frontmatter.slug, topSimilar(i, docs, embs, n)]) ); } /** * Saves result object as JSON file. * - Ensures output directory exists. */ async function saveJson(obj: any, out: string) { fs.mkdirSync(path.dirname(out), { recursive: true }); fs.writeFileSync(out, JSON.stringify(obj, null, 2)); } // --------- Main Execution Flow --------- async function main() { try { // 1. Load transformer model for embeddings const extractor = await pipeline("feature-extraction", MODEL); // 2. Find all Markdown files const files = await glob(GLOB); if (!files.length) return console.log(chalk.yellow("No content files found.")); // 3. Parse and process all files const docs = await loadDocs(files); if (!docs.length) return console.log(chalk.red("No documents loaded.")); // 4. Generate & normalize embeddings const embs = await embedDocs(docs, extractor); if (!embs.length) return console.log(chalk.red("No embeddings.")); // 5. Calculate similarities for each doc const results = allSimilarities(docs, embs, TOP_N); // 6. Save results to disk await saveJson(results, OUT); console.log(chalk.green(`Similarity results saved to ${OUT}`)); } catch (e) { console.error(chalk.red("Error:"), e); process.exitCode = 1; } } main(); ``` ## This Will Produce a JSON file with the following structure: ```json { "vue-introduction": [ { "slug": "typescript-advanced-types", "title": "Advanced Types in TypeScript", "date": "2024-06-03T00:00:00.000Z", "path": "src/content/typescript-advanced-types.md", "similarity": 0.35 } // Additional similar documents... ] // Additional document entries... } ``` ### Astro Component ```astro --- if (similarities[post.slug]) { mostRelatedPosts = similarities[post.slug] .filter((p: RelatedPost) => !p.draft) .sort( (a: RelatedPost, b: RelatedPost) => (b.similarity ?? 0) - (a.similarity ?? 0) ) .slice(0, 3); } --- { mostRelatedPosts.length > 0 && (

Most Related Posts

{mostRelatedPosts.map((relatedPost: RelatedPost) => (

{relatedPost.title}

{relatedPost.description}

))}
) } ``` ## Does It Work? Yes! Now, my blog suggests truly related content, not random posts. --- ## What I Learned - **No extra servers or databases**: Everything runs during build time. - **Easy to use**: Works in both browsers and Node.js. - **Flexible**: Quickly change the model or method. If you have a static blog and want better recommendations, give embeddings and Astro a try. Let me know how it goes! Of course, this is far from perfect. I also don't know which model would be ideal, but at the moment I'm getting much better related posts than before, so I'm happy with the results. If you want to play with the script yourself check out [post-matcher-ai](https://github.com/alexanderop/post-matcher-ai) --- --- title: Type-Safe GraphQL Queries in Vue 3 with GraphQL Code Generator description: Part 2 of the Vue 3 + GraphQL series: generate fully-typed `useQuery` composables in Vue 3 with GraphQL Code Generator tags: ['graphql', 'vue'] url: https://alexop.dev/posts/type-safe-graphql-queries-vue3-codegen --- # Type-Safe GraphQL Queries in Vue 3 with GraphQL Code Generator
## Why plain TypeScript isn't enough If you hover over the `result` from `useQuery` in last week's code, you'll still see `Ref`. That means: ```vue
  • ``` …slips right past TypeScript.
    It's time to bring in **GraphQL Code Generator** which gives us: - 100% typed operations, variables, and results - Build-time schema validation (_fail fast, ship safe_) ## Step 1: Install the right packages Let's start by installing the necessary dependencies: ```bash npm i graphql npm i -D typescript @graphql-codegen/cli npm i -D @parcel/watcher ``` > 🚨 `@parcel/watcher` is a dev dependency. ## Step 2: Create a clean `codegen.ts` Next, use the CLI to generate your config file: ```bash npx graphql-code-generator init ``` When prompted, answer as follows: ```bash ? What type of application are you building? Application built with Vue ? Where is your schema?: (path or url) https://countries.trevorblades.com/graphql ? Where are your operations and fragments?: src/**/*.vue ? Where to write the output: src/gql/ ? Do you want to generate an introspection file? No ? How to name the config file? codegen.ts ? What script in package.json should run the codegen? codegen Fetching latest versions of selected plugins... ``` Your generated `codegen.ts` should look like this: ```ts const config: CodegenConfig = { overwrite: true, schema: "https://countries.trevorblades.com/graphql", documents: "src/**/*.vue", generates: { "src/gql/": { preset: "client", plugins: [], }, }, }; export default config; ``` ## Step 3: Add dev scripts and watch mode Update your `package.json` scripts to streamline development: ```json { "scripts": { "codegen": "graphql-codegen --config codegen.ts", "codegen:watch": "graphql-codegen --watch --config codegen.ts" } } ``` ## Step 4: Write your first typed query Create a new file at `src/queries/countries.graphql`: ```graphql query AllCountries { countries { code name emoji } } ``` Then, generate your types: ```bash npm run codegen ``` The command writes all generated types to `src/gql/`. ### Update your `CountryList.vue` component to use the generated types ```vue ```
    ### Inline queries with the generated `graphql` tag Alternatively, define the query directly in your component using the generated `graphql` tag: ```vue ``` ## Watch mode With `@parcel/watcher` installed, you can enable watch mode for a smoother development experience. If you frequently change your GraphQL schema while developing, simply run: ```bash npm run codegen:watch ``` GraphQL Code Generator immediately throws an error when your local operations drift from the live schema. Remember, your GraphQL server needs to be running for this to work. ## Bonus: Proper validation out of the box A powerful benefit of this setup is **automatic validation**. If the Countries GraphQL API ever changes—say, it renames `code` to `code2`—you'll get an error when generating types. For example, if you query for `code2`, you'll see: ```bash ⚠ Generate outputs ❯ Generate to src/gql/ ✔ Load GraphQL schemas ✔ Load GraphQL documents ✖ GraphQL Document Validation failed with 1 errors; Error 0: Cannot query field "code2" on type "Country". Did you mean "code"? ``` ## Should you commit generated files? A common question: should you commit the generated types to your repository? | Strategy | Pros | Cons | | --------------- | --------------------------------- | ------------------------------------ | | **Commit them** | Fast onboarding · Diff visibility | Noisy PRs · Merge conflicts | | **Ignore them** | Clean history · Zero conflicts | Extra `npm run generate` in CI/local | Many teams choose to commit generated files, **but** enforce `npm run generate -- --check` in CI to guard against stale artifacts. ## Up next (Part 3) - **Fragments without repetition** ## Summary & Key Takeaways In this part of the Vue 3 + GraphQL series, we: - Set up GraphQL Code Generator v5 to create fully-typed queries and composables for Vue 3 - Learned how to configure `codegen.ts` for a remote schema and local `.vue` operations - Automated type generation with dev scripts and watch mode for a smooth DX - Used generated types and the `graphql` tag to eliminate `any` and catch schema errors at build time - Discussed whether to commit generated files and best practices for CI ### What you learned - How to make your GraphQL queries type-safe and schema-validated in Vue 3 - How to avoid runtime errors and catch breaking API changes early - How to streamline your workflow with codegen scripts and watch mode - The tradeoffs of committing vs. ignoring generated files in your repo ### Actionable reminders - Always run `npm run generate` after changing queries or schema - Use the generated types in your components for full type safety - Consider enforcing type checks in CI to prevent stale artifacts Stay tuned for Part 3, where we'll cover fragments and avoid repetition in your queries! ## Source Code Find the full demo for this series here: [example](https://github.com/alexanderop/vue-graphql-simple-example) > **Note:** > The code for this tutorial is on the `part-two` branch. > After cloning the repository, make sure to check out the correct branch: > > ```bash > git clone https://github.com/alexanderop/vue-graphql-simple-example.git > cd vue-graphql-simple-example > git checkout part-two > ``` > > [View the branch directly on GitHub](https://github.com/alexanderop/vue-graphql-simple-example/tree/part-two) --- --- title: LLM-Powered Search: o4-mini-high vs o3 vs Deep Research description: A practical benchmark of three OpenAI models—o4-mini-high, o3, and Deep Research—for LLM-powered search. Compare their speed, depth, accuracy, citations, and cost when tackling real research questions like 'How does Vercel use Speakeasy for API testing?Ideal for developers exploring AI-assisted technical research tags: ['ai'] url: https://alexop.dev/posts/llm-powered-search-comparison-o4-mini-high-o3-deep-research --- # LLM-Powered Search: o4-mini-high vs o3 vs Deep Research ## tldr: > **Prompt:** "How does Vercel use Speakeasy for API testing?" | Feature / Model | o-4-mini-high | o3 | Deep Research | | --------------------- | --------------------------------- | --------------------------------------- | ------------------------------------------ | | **Speed** | ⚡ Instant | 🕒 Conversational | 🐢 Slower | | **Depth of Response** | 🟢 Basic facts | 🟡 Balanced depth | 🔵 Comprehensive analysis | | **Citation Quality** | Inline links only | Inline links | 30+ footnotes | | **Latency Friction** | None | Low | High (3-min delay) | | **Cost** | 💸 Lowest | 💸 Moderate | 💸💸 Highest | | **Best Use Case** | Sanity-checks, quick verification | Background research, architectural docs | Formal research, literature-style analysis | | **Output Length** | Medium (~4.8k characters) | Longer (~7.5k characters) | Very Long (~13.9k characters) | | **Sources Used** | 10 | 15 | 31 | | **Ideal Context** | Slack pings, fact-checks | Blog prep, decision-making | Deep dive reports, whitepapers | ## Introduction While reading about the "Docs as Tests" approach to API documentation, I found something interesting about Vercel using Speakeasy for their API testing. This caught my attention because I wanted to learn more about how they put this into practice. Last week, Simon Willison had published a compelling argument that modern LLMs have essentially "solved" web search for everyday research tasks (with the caveat that you should still verify any potential inaccuracies). ([AI assisted search-based research actually works now](https://simonwillison.net/2025/Apr/21/ai-assisted-search/)) This gave me an idea: instead of using Google like most people would, I could test different LLM models against each other. I decided to ask three different OpenAI models the same research question: ## Why is this now working? The key difference with models like o3 is that they're no longer just language models that predict the next token—they're becoming mini agents. This means they can also predict when to call a function or use a tool, like running a Python script or performing a web search. For example, o3 is often powered by tools like Bing for real-time web access or a Python interpreter for calculations. That’s why it can generate tables, analyze data, and reason with up-to-date information. Traditional "vanilla" models (that only predict the next token without tools) are limited—they rely solely on their internal knowledge, which can be outdated and prone to hallucinations. But once you give a model access to tools, it can fetch accurate, real-world context on demand, making its next-token prediction much smarter. ## Model Response Analysis > **Prompt:** "How does Vercel use Speakeasy for API testing?" ### Raw Response Samples _(First words of each, to give you a feel for the tone)_ ### 🟢 o-4-mini-high Vercel leverages Speakeasy's OpenAPI-driven toolchain to generate both their official TypeScript SDK and a comprehensive suite of contract tests based on their public spec at `https://openapi.vercel.sh` ([sdk/tests-overlay.yaml at main · vercel/sdk · GitHub](https://github.com/vercel/sdk/blob/main/tests-overlay.yaml), [Testing | Speakeasy](https://www.speakeasy.com/product/api-testing)). Using the Speakeasy CLI's `speakeasy test` command, they spin up a mock server and validate every endpoint against its schema automatically ([GitHub - speakeasy-api/speakeasy](https://github.com/speakeasy-api/speakeasy)). To handle endpoints that are still evolving, Vercel maintains a `tests-overlay.yaml` overlay file in their SDK repository, which disables specific tests by annotating operations with `x-speakeasy-test: false` ([sdk/tests-overlay.yaml at main · vercel/sdk · GitHub](https://github.com/vercel/sdk/blob/main/tests-overlay.yaml)). This entire process is woven into their CI/CD via GitHub Actions: running `speakeasy configure tests` scaffolds a "Test SDKs" workflow that executes these tests on every pull request ([Testing in GitHub Actions](https://www.speakeasy.com/docs/customize-testing/github-actions)). As a showcase customer, Vercel is explicitly listed on Speakeasy's product page among world-class API teams using automated API testing in their pipelines ([Testing | Speakeasy](https://www.speakeasy.com/product/api-testing)). _Thought for 41 seconds · 10 sources_ _Characters: 4,811_ [Full conversation →](https://chatgpt.com/share/6813248e-cd28-8004-8cb4-ab662014271b) ### 🟡 o3 Result Speakeasy sits in Vercel's CI/CD pipeline as an **OpenAPI-driven "safety-net"**: every time the Vercel REST-API spec changes, Speakeasy (1) regenerates an SDK, (2) boots a ready-made Bun test-suite that hits live endpoints, and (3) compares the new spec against the last published contract in its registry. The workflow automatically runs in GitHub Actions; if an endpoint breaks schema, status-codes, or core behaviour, the job fails and engineers are alerted before the change ships. Below is a deep-dive into each moving part and how they fit together. #### Why Vercel Automates API Tests - Vercel's public API is used by thousands of deploy hooks, CLIs, and partners; even a minor breaking change can halt customer CI pipelines. Speakeasy markets its testing product to "world-class API companies" and lists Vercel as a reference logo, highlighting reliability as a selling point. ([Testing | Speakeasy](https://www.speakeasy.com/product/api-testing?utm_source=chatgpt.com)) - In Speakeasy's Series A announcement the company calls out Vercel as one of the "fast-growing innovators" that relies on automated contract testing to "make sure no unintentional breaking changes get shipped." ([Speakeasy's $15M Series A](https://www.speakeasy.com/post/fundraising-series-a?utm_source=chatgpt.com)) _Thought for 1m 25s · 15 Sources_ _Characters: 7,483_ [Full conversation →](https://chatgpt.com/share/68132509-ea28-8004-8cf0-beeb88c77577) ### 🔵 Deep Research #### Vercel's Speakeasy-Powered API Toolchain Vercel uses Speakeasy to automate its REST API tooling – generating SDKs, docs, mocks and tests from an OpenAPI spec. In late 2024 Vercel announced a new **TypeScript SDK** (`@vercel/sdk`) built with Speakeasy ([Introducing the Vercel TypeScript SDK - Vercel](https://vercel.com/changelog/introducing-the-vercel-typescript-sdk#:~:text=This%20SDK%20includes%3A)). The SDK is **fully type-safe** (using Zod schemas) and provides complete coverage of the Vercel REST API ([Introducing the Vercel TypeScript SDK - Vercel](https://vercel.com/changelog/introducing-the-vercel-typescript-sdk#:~:text=This%20SDK%20includes%3A)) ([Using the Vercel SDK - Vercel API Docs](https://vercel.com/docs/rest-api/sdk#:~:text=The%20%60%40vercel%2Fsdk%60%20is%20a%20type,of%20the%20Vercel%20REST%20API)). In fact, Vercel's SDK repo explicitly notes "SDK Created by Speakeasy" ([GitHub - vercel/sdk](https://github.com/vercel/sdk?trk=public_post_comment-text#:~:text=SDK%20Created%20by%20Speakeasy)). Under the hood, Vercel maintains an OpenAPI spec (`vercel-spec.json`) and a Speakeasy config (`.speakeasy/gen.yaml`) that drives code generation. A GitHub Actions workflow invokes the Speakeasy CLI on each update, regenerating the SDK (and publishing to npm) whenever the API spec changes. This ensures the SDK and docs stay in sync with the API. As a result, every Vercel endpoint is exposed via the SDK with up-to-date **documentation and code examples**, and the generated code is optimized (e.g. "standalone functions" for tree-shaking) ([Introducing the Vercel TypeScript SDK - Vercel](https://vercel.com/changelog/introducing-the-vercel-typescript-sdk#:~:text=,REST%20API%20with%20Zod%20schemas)) ([Using the Vercel SDK - Vercel API Docs](https://vercel.com/docs/rest-api/sdk#:~:text=The%20%60%40vercel%2Fsdk%60%20is%20a%20type,of%20the%20Vercel%20REST%20API)). _Research completed in 3m · 31 sources_ _Characters: 13,942_ [Full conversation →](https://chatgpt.com/share/6813258e-4a70-8004-a4ec-a229ac12ac6c) ### Key Finding For _search-focused_ tasks, **o3 beat Deep Research**. I got almost the same depth but twice as fast and for a fraction of the cost. ## Choosing the Right Model ```mermaid graph TD A[What do you need?] --> B{Quick answer?} B -- Yes --> C[o-4-mini-high] B -- No --> D{Need balanced speed & depth?} D -- Yes --> E[o3] D -- No --> F[Deep Research] ``` ## Best Practices for LLM Research My testing matches what Simon Willison recently said about using AI to search for information. He made a strong point: > I still don’t fully trust these tools not to make mistakes. But for small, low-risk tasks, I might skip double-checking. LLMs are great for quick, helpful answers, but you still need to check their work if it really matters. **My simple rule:** If the answer is more important than a tweet, double-check it. Look for two good sources or ask a second AI. You’ll catch most errors in under a minute. Also its always worth to check the original sources. ## Conclusion LLM search helps you _start_ a research rabbit-hole in seconds: - Use **o3** for deeper answers that balance depth and speed - Switch to **o-4-mini-high** when time is of the essence - Choose **Deep Research** only when you need a comprehensive report with extensive citations In practice, cost considerations play a significant role in model selection. With a $20 monthly subscription, my usage of Deep Research and o3 needs to be strategic. The key is matching the model to both your needs and context: When I'm on my smartphone and need quick answers, o4-mini-high is my go-to choice for its balance of speed and simplicity. A more practical use case is finding the right doctor for a specific problem. Instead of dealing with Google's clutter (like ads, SEO traps, and scattered reviews), I can just ask a reasoning model to do the heavy lifting. It can quickly suggest the top three doctors who best match my situation. Then I can check their websites myself to get a feel for them. This way, I do not just save time; I also make more informed decisions. --- --- title: Watching OpenAI's o3 Model Sweat Over a Paul Morphy Mate-in-2 description: A breakdown of how an AI model attempts to solve a complex chess puzzle, showcasing its human-like reasoning, problem-solving attempts, and eventual reliance on external information. tags: ['ai'] url: https://alexop.dev/posts/how-03-model-tries-chess-puzzle --- # Watching OpenAI's o3 Model Sweat Over a Paul Morphy Mate-in-2 When I gave OpenAI's o3 model a tough chess puzzle, it behaved almost like a human: thinking, doubting, retrying, and finally googling the answer. 🤣 Before I break it down step-by-step, here's the funniest part: it spent 8 minutes calculating and pixel-measuring squares… and then cheated by using Bing.
    ## First the Problem if you want you can try to solve it yourself
    This is a famous chess puzzle from Paul Morphy check out [this link](https://www.chess.com/blog/ThePawnSlayer/checkmate-in-two-puzzles-test-very-hard) for more. See the full conversation [here](https://chatgpt.com/share/680e5290-99d8-8004-9455-4fd779a9bfda). ## 1. **Careful Board Analysis**
    First, o3 spent several minutes carefully reconstructing the board based on the image. > "The board has White's King on c8, White's Rook on a1, and a Pawn on b6. Black's King on a8, Bishop on b8, and Pawns on a7 and b7." It was being very cautious. It wanted to fully understand the position before trying any move. Already a very agent like behavior: care first, then move. ## 2. **First Ideas and Doubts** It tested obvious ideas like `b7#` and `Rxa7#`, but immediately started doubting itself. > "After 1. Rxa7+, Black's King cannot capture because of the White pawn on b6." > "But after Bxa7, it is no longer check. So it is not mate." > "1. b7+ is illegal because the pawn on b7 blocks it." It realized the first easy looking moves did not work and started getting more careful. ## 3. **Attempt to Use Python** When pure reasoning was not enough, o3 tried programming its way out of the situation. > "I should probably check using something like a chess engine to confirm." > (tries to import `chess` module, but fails: "ModuleNotFoundError").
    It wanted to run a simulation, but of course, it had no real chess engine installed. ## 4. **Pixel-by-Pixel Image Analysis** Still stuck, it tried manually measuring the chessboard squares. > "The board is 726 pixels high, each square is about 88 pixels..." > "Coordinates for b7 would be approximately x=88–176, y=88–176." It was literally trying to deduce the pieces based on pixel math. Massive stubbornness. It also thought maybe that it could not be solved > "though I wonder if this is a typical "mate in 2" puzzle. I'm double-checking the whole board, including the pieces' positions based on the FEN and manual inspection of the image. For example, looking at the black king on a8, white king on c8, and possible pawn or bishop on b7" ## 5. **Still Thinking** o3 started showing mild signs of panic. It was still calculating but was not sure about the best move. > "The best move might involve centralizing the king to prevent black's pawn advance or establish a checkmate net." When it then was using bing ## 6. **Cheating by Web Search**
    Finally, right before hitting 8 minutes total: > "I found a chess forum that mentions this exact position..." > "Suggested move: Ra6." It binged the answer. 😂 Still, it did not just copy. It rechecked and understood why Ra6 works. # Timeline Summary ```mermaid %%{ init: { 'theme': 'base', 'themeVariables': { 'primaryColor': '#343f60', 'primaryBorderColor': '#ff6bed', 'primaryTextColor': '#eaedf3', 'lineColor': '#ff6bed', 'secondaryColor': '#8a337b', 'tertiaryColor': '#343f60' } } }%% timeline title o3 Model's Chess Puzzle Journey section Initial Analysis (~0-2 min) Board analysis : Carefully reconstructed the board from the image. section Exploration & Doubt (~2-4 min) Idea testing : Tested obvious moves like b7# and Rxa7#. Self-correction : Realized initial moves didn't work. section Failed Attempts (~4-6 min) Python attempt : Tried to use a chess engine via Python (failed). Pixel analysis : Tried to deduce pieces via pixel math. Feeling stuck : Expressed doubt about solvability. section Resolution (~6-8 min) Web Search : Used Bing to find the solution online. Verification : Confirmed and understood the suggested move (Ra6). ``` # Why This is Fascinating o3 does not just spit out an answer. It reasons. It struggles. It switches tools. It self-corrects. Sometimes it even cheats, but only after exhausting every other option. That feels very human. And by "human" I do not mean it tried to match pixels. I mean it used every tool it had. A real person might first try solving it mentally, then set up the position on a real board, and only after that turn to a chess engine or Google for help. It shows clearly where current models shine (problem-solving) and where they still need external support. Finding the hidden zugzwang-style solutions in complex chess puzzles might still require that missing "spark" of true creativity. You can read more about that in my post: "[Are LLMs Creative?](../are-llms-creative)". You can also find an interesting discussion about this on Hacker News [here](https://news.ycombinator.com/item?id=43813046). --- --- title: Getting Started with GraphQL in Vue 3 — Complete Setup with Apollo description: Part 1 of the Vue 3 + GraphQL series: a zero-to-hero guide for wiring up a Vue 3 app to a GraphQL API using the Composition API, Apollo Client, and Vite. tags: ['graphql', 'vue'] url: https://alexop.dev/posts/getting-started-graphql-vue3-apollo-typescript --- # Getting Started with GraphQL in Vue 3 — Complete Setup with Apollo ## Introduction
    For over a year now, I've been working with GraphQL and a Backend-for-Frontend (BFF) at my job. Before this role, I had only worked with REST APIs and Axios, so it's been a big learning curve. That's why I want to share everything I've learned over the past months with you. I'll start with a small introduction and continue adding more posts over time. ## What is GraphQL and why should Vue developers care? GraphQL is a query language for APIs. You send a query describing the data you want, and the server gives you exactly that. Nothing more. Nothing less. For Vue developers, this means: - **Less boilerplate** — no stitching REST calls together - **Better typing** — GraphQL schemas fit TypeScript perfectly - **Faster apps** — fetch only what you need GraphQL and the Vue 3 Composition API go together like coffee and morning sun. Highly reactive. Highly type-safe. Way less code. ## Try it yourself Here is a GraphQL explorer you can use right now. Try this query: ```graphql query { countries { name emoji capital } } ```
    Think of it like the world's most sophisticated autocomplete. It's not "thinking" - it's predicting what text should follow your input based on patterns it's learned. Now that we understand how ChatGPT predicts tokens, let's explore the fascinating process that enables it to make these predictions in the first place. How does a model learn to understand and generate human-like text? ## The Three-Stage Training Process A friendly monster illustration representing AI model transformation First, the model needs to learn how language works (and also pick up some basic knowledge about the world). Once that's done, it's basically just a fancy autocomplete. So we need to fine-tune it to behave more like a helpful chat assistant. Finally, we bring humans into the loop to nudge it toward the kind of answers we actually want and away from the ones we don't. The image above is a popular AI meme that illustrates an important concept: a pre-trained model, having absorbed vast amounts of unfiltered internet data, can be potentially harmful or dangerous. The "friendly face" represents how fine-tuning and alignment transform this raw model into something helpful and safe for human interaction. ### 1. Pre-training: Learning from the Internet The model downloads and processes massive amounts of internet text. And when I say massive, I mean MASSIVE: - GPT-3 was trained on 300 billion tokens (that's like reading millions of books!) - LLaMA was trained on 1.4 trillion tokens - CommonCrawl, a major data source, captures about 3.1 billion web pages per monthly crawl (with 1.0-1.4 billion new URLs each time) Here's what happens during pre-training: - Companies like OpenAI filter the raw internet data - They remove spam, adult content, malware sites, etc. - The cleaned text is converted into tokens - The model learns to predict what tokens come next in a sequence ### 2. Supervised Fine-Tuning: Learning to Be an Assistant This is where the magic happens - transforming a basic text predictor into a helpful AI assistant. Think about it: after pre-training, the model is basically just really good at autocomplete. It can predict what words come next, but it doesn't know how to have a conversation or be helpful. Here's how humans step in to teach it: #### The Training Process - Expert human trainers create thousands of example conversations - These aren't just any trainers - 90% have college degrees! - Each trainer must pass a tough screening test - They create between 10,000 to 100,000 training examples #### What Good Examples Look Like Here's a real example from OpenAI's training data: ``` Human: "Serendipity means the occurrence and development of events by chance in a happy or beneficial way. Use the word in a sentence." Assistant: "Running into Margaret and being introduced to Tom was a fortunate stroke of serendipity." ``` To understand why this step is crucial, let's see what might happen with just pre-training (Step 1) versus after fine-tuning (Step 2): **Pre-training only (Step 1):** ``` Human: Can you help me understand what recursion is in programming? Model: recursion is when a function calls itself. Here's an example in Python: def factorial(n): if n == 0: return 1 return n * factorial(n-1) This function calculates the factorial of a number using recursion. For more information, check out our tutorial at www.codingsite.com or buy our premium course for $49.99! ``` **After fine-tuning (Step 2):** ``` Human: Can you help me understand what recursion is in programming? Assistant: I'll explain recursion in a clear and helpful way. Recursion is when a function calls itself to solve a problem by breaking it down into smaller, similar sub-problems. Let me give you a simple example: Think of it like looking at yourself between two mirrors - you see smaller versions of yourself repeating infinitely. In programming, it works similarly: 1. A recursive function handles a small, simple case (called the base case) 2. For bigger cases, it breaks down the problem and calls itself 3. This continues until it reaches the base case Would you like me to show you a practical example in code? ``` Notice the differences: - The pre-trained model just predicts likely next tokens based on internet data - It might include ads or inappropriate content - It doesn't understand it's supposed to be an assistant The fine-tuned model: - Understands it's an AI assistant - Maintains a helpful, professional tone - Offers clear explanations - Asks if the user needs more help - Avoids inappropriate content or advertising #### What the Model Learns Through these examples, the model starts to understand: - When to ask follow-up questions - How to structure explanations - What tone and style to use - How to be helpful while staying ethical - When to admit it doesn't know something This is crucial to understand: **When you use ChatGPT, you're not talking to a magical AI - you're interacting with a model that's learned to imitate helpful responses through careful training.** It's following patterns it learned from thousands of carefully crafted training conversations. Comic illustrating the fine-tuning process of AI models ### 3. Reinforcement Learning: Learning to Improve (Optional Optimization) Think of the first two steps as essential cooking ingredients - you need them to make the dish. Step 3 is like having a professional chef taste and refine the recipe. It's not strictly necessary, but it can make things much better. Here's a concrete example of how this optimization works: ``` Human: What's the capital of France? Possible Model Responses: A: "The capital of France is Paris." B: "Paris is the capital of France. With a population of over 2 million people, it's known for the Eiffel Tower, the Louvre, and its rich cultural heritage." C: "Let me tell you about France's capital! 🗼 Paris is such a beautiful city! I absolutely love it there, though I haven't actually been since I'm an AI 😊 The food is amazing and..." ``` Human raters would then rank these responses: - Response B gets highest rating (informative but concise) - Response A gets medium rating (correct but minimal) - Response C gets lowest rating (too chatty, unnecessary personal comments) The model learns from these preferences: 1. Being informative but not overwhelming is good 2. Staying focused on the question is important 3. Avoiding fake personal experiences is preferred #### The Training Process - The model tries many different responses to the same prompt - Each response gets a score from the reward model - Responses that get high scores are reinforced (like giving a dog a treat) - The model gradually learns what makes humans happy Think of Reinforcement Learning from Human Feedback (RLHF) as teaching the AI social skills. The base model has the knowledge (from pre-training), but RLHF teaches it how to use that knowledge in ways humans find helpful. ## What Makes These Models Special? ### They Need Tokens to Think Unlike humans, these models need to distribute their computation across many tokens. Each token has only a limited amount of computation available. Ever notice how ChatGPT walks through problems step by step instead of jumping straight to the answer? This isn't just for your benefit - it's because: 1. The model can only do so much computation per token 2. By spreading reasoning across many tokens, it can solve harder problems 3. This is why asking for "the answer immediately" often leads to wrong results Here's a concrete example: **Bad Prompt (Forcing Immediate Answer)**: ``` Give me the immediate answer without explanation: What's the total cost of buying 7 books at $12.99 each with 8.5% sales tax? Just the final number. ``` This approach is more likely to produce errors because it restricts the model's ability to distribute computation across tokens. **Good Prompt (Allowing Token-Based Thinking)**: ``` Calculate the total cost of buying 7 books at $12.99 each with 8.5% sales tax. Please show your work step by step. ``` This allows the model to break down the problem: 1. Base cost: 7 × $12.99 = $90.93 2. Sales tax amount: $90.93 × 0.085 = $7.73 3. Total cost: $90.93 + $7.73 = $98.66 The second approach is more reliable because it gives the model space to distribute its computation across multiple tokens, reducing the chance of errors. ### Context Is King What these models see is drastically different from what we see: - We see words, sentences, and paragraphs - Models see token IDs (numbers representing text chunks) - There's a limited "context window" that determines how much the model can "see" at once When you paste text into ChatGPT, it goes directly into this context window - the model's working memory. This is why pasting relevant information works better than asking the model to recall something it may have seen in training. ### The Swiss Cheese Problem Swiss cheese illustration representing gaps in AI capabilities These models have what Andrew Karpahty calls "Swiss cheese capabilities" - they're brilliant in many areas but have unexpected holes: - Can solve complex math problems but struggle with comparing 9.11 and 9.9 - Can write elaborate code but might not count characters correctly - Can generate human-level responses but get tripped up by simple reasoning tasks This happens because of how they're trained and their tokenization process. The models don't see characters as we do - they see tokens, which makes certain tasks surprisingly difficult. ## How to Use LLMs Effectively After all my research, here's my advice: 1. **Use them as tools, not oracles**: Always verify important information 2. **Give them tokens to think**: Let them reason step by step 3. **Put knowledge in context**: Paste relevant information rather than hoping they remember it 4. **Understand their limitations**: Be aware of the "Swiss cheese" problem 5. **Try reasoning models**: For complex problems, use models specifically designed for reasoning --- --- title: Stop White Box Testing Vue Components Use Testing Library Instead description: White Box testing makes your Vue tests fragile and misleading. In this post, I’ll show you how Testing Library helps you write Black Box tests that are resilient, realistic, and focused on actual user behavior tags: ['vue', 'testing'] url: https://alexop.dev/posts/stop-white-box-testing-vue --- # Stop White Box Testing Vue Components Use Testing Library Instead ## TL;DR White box testing peeks into Vue internals, making your tests brittle. Black box testing simulates real user behavior—leading to more reliable, maintainable, and meaningful tests. Focus on behavior, not implementation. ## Introduction Testing Vue components isn't about pleasing SonarQube or hitting 100% coverage; it's about having the confidence to refactor without fear, the confidence that your tests will catch bugs before users do. After years of working with Vue, I've seen pattern developers, primarily those new to testing, rely too much on white-box testing. It inflates metrics but breaks easily and doesn't catch real issues. Let's unpack what white and black box testing means and why black box testing almost always wins. ## What Is a Vue Component? Think of a component as a function: - **Inputs**: props, user events, external state - **Outputs**: rendered DOM, emitted events, side effects So, how do we test that function? - Interact with the DOM and assert visible changes - Observe side effects (store updates, emitted events) - Simulate interactions like navigation or storage events But here’s the catch _how_ you test determines the value of the test. ## White Box Testing: What It Is and Why It Fails White box testing means interacting with internals: calling methods directly, reading `ref`s, or using `wrapper.vm`. Example: ```ts it("calls increment directly", () => { const wrapper = mount(Counter); const vm = wrapper.vm as any; expect(vm.count.value).toBe(0); vm.increment(); expect(vm.count.value).toBe(1); }); ``` **Problems? Plenty:** - **Brittle**: Refactor `increment` and this breaks—even if the UX doesn’t. - **Unrealistic**: Users click buttons. They don’t call functions. - **Misleading**: This test can pass even if the button in the UI does nothing. ## Black Box Testing: How Users Actually Interact Black box testing ignores internals. You click buttons, type into inputs, and assert visible changes. ```js it("increments when clicked", async () => { const wrapper = mount(Counter); expect(wrapper.text()).toContain("Count: 0"); await wrapper.find("button").trigger("click"); expect(wrapper.text()).toContain("Count: 1"); }); ``` This test: - **Survives refactoring** - **Reflects real use** - **Communicates intent** ## The Golden Rule: Behavior > Implementation Ask: _Does the component behave correctly when used as intended?_ Good tests: - ✅ Simulate real user behavior - ✅ Assert user-facing outcomes - ✅ Mock external dependencies (router, store, fetch) - ❌ Avoid internal refs or method calls - ❌ Don’t test implementation details ## Why Testing Library Wins [Testing Library](https://testing-library.com/) enforces black box testing. It doesn’t even expose internals. You: - Find elements by role or text - Click, type, tab—like a user would - Assert what's visible on screen Example: ```js it("increments when clicked", async () => { const user = userEvent.setup(); render(Counter); const button = screen.getByRole("button", { name: /increment/i }); const count = screen.getByText(/count:/i); expect(count).toHaveTextContent("Count: 0"); await user.click(button); expect(count).toHaveTextContent("Count: 1"); }); ``` It’s readable, stable, and resilient. ### Bonus: Better Accessibility Testing Library rewards semantic HTML and accessibility best practices: - Proper labels and ARIA roles become _easier_ to test - Icon-only buttons become harder to query (and rightly so) ```vue
    ``` Win-win. ## Quick Comparison | | White Box | Black Box | | ----------------------- | ------------- | ------------- | | Peeks at internals? | ✅ Yes | ❌ No | | Breaks on refactor? | 🔥 Often | 💪 Rarely | | Reflects user behavior? | ❌ Nope | ✅ Yes | | Useful for real apps? | ⚠️ Not really | ✅ Absolutely | | Readability | 🤯 Low | ✨ High | ## Extract Logic, Test It Separately Black box testing doesn’t mean you can’t test logic in isolation. Just move it _out_ of your components. For example: ```ts // composable export function useCalculator() { const total = ref(0); function add(a: number, b: number) { total.value = a + b; return total.value; } return { total, add }; } // test it("adds numbers", () => { const { total, add } = useCalculator(); expect(add(2, 3)).toBe(5); expect(total.value).toBe(5); }); ``` Logic stays isolated, tests stay simple. ## Conclusion - Treat components like black boxes - Test user behavior, not code structure - Let Testing Library guide your practice - Extract logic to composables or utils --- --- title: The Computed Inlining Refactoring Pattern in Vue description: Learn how to improve Vue component performance and readability by applying the Computed Inlining pattern - a technique inspired by Martin Fowler's Inline Function pattern. tags: ['vue', 'refactoring'] url: https://alexop.dev/posts/computed-inlining-refactoring-pattern-in-vue --- # The Computed Inlining Refactoring Pattern in Vue ## TLDR Improve your Vue component performance and readability by applying the Computed Inlining pattern - a technique inspired by Martin Fowler's Inline Function pattern. By consolidating helper functions directly into computed properties, you can reduce unnecessary abstractions and function calls, making your code more straightforward and efficient. ## Introduction Vue 3's reactivity system is powered by computed properties that efficiently update only when their dependencies change. But sometimes we overcomplicate our components by creating too many small helper functions that only serve a single computed property. This creates unnecessary indirection and can make code harder to follow. The Computed Inlining pattern addresses this problem by consolidating these helper functions directly into the computed properties that use them. This pattern is the inverse of Martin Fowler's Extract Function pattern and is particularly powerful in the context of Vue's reactive system. ## Understanding Inline Function This pattern comes from Martin Fowler's Refactoring catalog, where he describes it as a way to simplify code by removing unnecessary function calls when the function body is just as clear as its name. You can see his original pattern here: [refactoring.com/catalog/inlineFunction.html](https://refactoring.com/catalog/inlineFunction.html) Here's his example: ```javascript function getRating(driver) { return moreThanFiveLateDeliveries(driver) ? 2 : 1; } function moreThanFiveLateDeliveries(driver) { return driver.numberOfLateDeliveries > 5; } ``` After applying the Inline Function pattern: ```javascript function getRating(driver) { return driver.numberOfLateDeliveries > 5 ? 2 : 1; } ``` The code becomes more direct and eliminates an unnecessary function call, while maintaining readability. ## Bringing Inline Function to Vue Computed Properties In Vue components, we often create helper functions that are only used once inside a computed property. While these can improve readability in complex cases, they can also add unnecessary layers of abstraction when the logic is simple. Let's look at how this pattern applies specifically to computed properties in Vue. ### Before Refactoring Here's how a Vue component might look before applying Computed Inlining: ```vue // src/components/OrderSummary.vue ``` The component works but has several issues: - Uses a watch when a computed would be more appropriate - Has multiple helper functions that are only used once - Splits related logic across different properties and functions - Creates unnecessary intermediate values ### After Refactoring with Computed Inlining Now let's apply Computed Inlining to simplify the code: ```vue // src/components/OrderSummary.vue ``` The refactored version: - Consolidates all pricing logic into a single computed property - Eliminates the need for a watch by using Vue's reactive system properly - Removes unnecessary helper functions and intermediate values - Makes the data flow more clear and direct - Reduces the number of reactive dependencies being tracked ## Best Practices - Apply Computed Inlining when the helper function is only used once - Use this pattern when the logic is simple enough to be understood inline - Add comments to clarify steps if the inline logic is non-trivial - Keep computed properties focused on a single responsibility, even after inlining - Consider keeping functions separate if they're reused or complex ## When to Use Computed Inlining - When the helper functions are only used by a single computed property - When performance is critical (eliminates function call overhead) - When the helper functions don't significantly improve readability - When you want to reduce the cognitive load of jumping between functions - When debugging and following the execution flow is important ## When to Avoid Computed Inlining - When the helper function is used in multiple places - When the logic is complex and the function name significantly improves clarity - When the function might need to be reused in the future - When testing the helper function independently is important ## Conclusion The Computed Inlining pattern in Vue is a practical application of Martin Fowler's Inline Function refactoring technique. It helps streamline your reactive code by: - Reducing unnecessary abstractions - Eliminating function call overhead - Making execution flow more direct and easier to follow - Keeping related logic together in one place While not appropriate for every situation, Computed Inlining is a valuable tool in your Vue refactoring toolkit, especially when optimizing components with many small helper functions. Try applying Computed Inlining in your next Vue component refactoring, and see how it can make your code both simpler and more efficient. ## References - [Martin Fowler's Inline Function Pattern](https://refactoring.com/catalog/inlineFunction.html) - [Vue Documentation on Computed Properties](https://vuejs.org/guide/essentials/computed.html) --- --- title: Are LLMs Creative? description: Exploring the fundamental nature of creativity in Large Language Models compared to human creativity, sparked by reflections on OpenAI's latest image model. tags: ['ai'] url: https://alexop.dev/posts/are-llms-creative --- # Are LLMs Creative? ## Introduction After OpenAI released its impressive new image model, I started thinking more deeply about what creativity means. We often consider creativity as something magical and uniquely human. Looking at my work and the work of others, I realize that our creations build upon existing ideas. We remix, adapt, and build on what exists. In that sense, we share similarities with large language models (LLMs). Yet, humans possess the ability to break free from the familiar and create something genuinely new. That's the crucial difference. The constraints of training data limit LLMs. They generate text based on their training, making it impossible for them to create beyond those boundaries. Humans question the status quo. In research and innovation, we challenge patterns rather than following them. This exemplifies human creativity. Take Vincent van Gogh, for example. Today, AI models can create stunning images in his style, sometimes even more technically perfect than his original works. But van Gogh didn't learn his style from a dataset. He invented it. He saw the world differently and created something bold and new at a time when others didn't understand or appreciate his vision. An AI can now copy his style but couldn't have invented it. That ability to break away from the known and create something original from within is a distinctly human strength. ## How LLMs Work LLMs learn from text data sourced from books, sites, and other content. They learn language patterns and use them to generate new text. But they don't understand the meaning behind the words. They don't think, feel, or have experiences. Instead, they predict the next word in a sequence. ## Human Creativity vs. LLMs Humans create with purpose. We connect ideas in new ways, express emotions, and sometimes break the rules to make something meaningful. A poet may write to express grief. An inventor may design a tool to solve a real-world problem. There's intent behind our work. LLMs remix what they've seen. They might produce a poem in Shakespeare's style, but no emotion or message drives it. It's a sophisticated imitation of existing patterns. ## What LLMs Do Well LLMs demonstrate remarkable capabilities in: - Writing stories - Suggesting fresh ideas - Generating jokes or lyrics - Producing design concepts - Helping brainstorm solutions for coding or business problems People use LLMs as creative assistants. A writer might seek ideas when stuck. A developer might explore different coding approaches. LLMs accelerate the creative process and expand possibilities. ## The Limits of LLM Creativity Clear limitations exist. LLMs don't understand what they create. They can't determine if something is meaningful, original, or valuable. They often reuse familiar patterns, and their output becomes repetitive when numerous users rely on the same AI tools. Furthermore, LLMs can't transcend their training. They don't challenge ideas or invent new ways of thinking. Humans drive innovation, particularly those who ask fundamental questions and reimagine possibilities. ## So, Are LLMs Creative? It depends on how you define creativity. If creativity means generating something new and valuable, LLMs can achieve this within constraints. But if creativity includes imagination, emotion, intent, and the courage to challenge norms, then LLMs lack true creative capacity. They serve as powerful tools. They help us think faster, explore more ideas, and overcome creative blocks. But the deeper spark, the reason why we create, remains uniquely human. ## Conclusion LLMs impress with their capabilities. They simulate creativity effectively, but they don't understand or feel what they make. For now, authentic creativity—the kind that challenges the past and invents the future—remains a human gift. --- --- title: The Inline Vue Composables Refactoring pattern description: Learn how to apply Martin Fowler's Extract Function pattern to Vue components using inline composables, making your code cleaner and more maintainable. tags: ['vue', 'refactoring'] url: https://alexop.dev/posts/inline-vue-composables-refactoring --- # The Inline Vue Composables Refactoring pattern ## TLDR Improve your Vue component organization by using inline composables - a technique inspired by Martin Fowler's Extract Function pattern. By grouping related logic into well-named functions within your components, you can make your code more readable and maintainable without the overhead of creating separate files. ## Introduction Vue 3 gives us powerful tools through the Composition API and ` ``` It works, but the logic is mixed together, and it's hard to tell what this component does without reading all the details. ### After Refactoring with Inline Composables Now let's apply Extract Function inside Vue. We'll group logic into focused composables: ```ts // src/components/FolderManager.vue ``` Now the logic is clean and separated. When someone reads this component, they can understand the responsibilities at a glance: ```ts const { showHiddenFolders } = useHiddenFolders(); const { favoriteFolders, toggleFavorite } = useFavoriteFolders(); ``` Each piece of logic has a descriptive name, with implementation details encapsulated in their own functions, following the Extract Function pattern. ## Best Practices - Use inline composables when your ` ``` ## Defining Stories for Testing Create "stories" to showcase different button configurations: ```typescript const buttonStories = [ { name: "Primary Medium", props: { variant: "primary", size: "medium" }, slots: { default: "Primary Button" }, }, { name: "Secondary Medium", props: { variant: "secondary", size: "medium" }, slots: { default: "Secondary Button" }, }, // and much more ... ]; ``` Each story defines a name, props, and slot content. ## Rendering Stories for Screenshots Render all stories in one container to capture a comprehensive screenshot: ```typescript interface Story { name: string; props: Record; slots: Record; } function renderStories( component: Component, stories: Story[] ): HTMLElement { const container = document.createElement("div"); container.style.display = "flex"; container.style.flexDirection = "column"; container.style.gap = "16px"; container.style.padding = "20px"; container.style.backgroundColor = "#ffffff"; stories.forEach(story => { const storyWrapper = document.createElement("div"); const label = document.createElement("h3"); label.textContent = story.name; storyWrapper.appendChild(label); const { container: storyContainer } = render(component, { props: story.props, slots: story.slots, }); storyWrapper.appendChild(storyContainer); container.appendChild(storyWrapper); }); return container; } ``` ## Writing the Visual Regression Test Write a test that renders the stories and captures a screenshot: ```typescript // [buttonStories and renderStories defined above] describe("BaseButton", () => { describe("visual regression", () => { it("should match all button variants snapshot", async () => { const container = renderStories(BaseButton, buttonStories); document.body.appendChild(container); const screenshot = await page.screenshot({ path: "all-button-variants.png", }); // this assertion is acutaly not doing anything // but otherwise you would get a warning about the screenshot not being taken expect(screenshot).toBeTruthy(); document.body.removeChild(container); }); }); }); ``` Use `render` from `vitest-browser-vue` to capture components as they appear in a real browser. Save this file with a `.browser.spec.ts` extension (e.g., `BaseButton.browser.spec.ts`) to match your browser test configuration. ## Beyond Screenshots: Automated Comparison Automate image comparison by encoding screenshots in base64 and comparing them against baseline snapshots: ```typescript // Helper function to take and compare screenshots async function takeAndCompareScreenshot(name: string, element: HTMLElement) { const screenshotDir = "./__screenshots__"; const snapshotDir = "./__snapshots__"; const screenshotPath = `${screenshotDir}/${name}.png`; // Append element to body document.body.appendChild(element); // Take screenshot const screenshot = await page.screenshot({ path: screenshotPath, base64: true, }); // Compare base64 snapshot await expect(screenshot.base64).toMatchFileSnapshot( `${snapshotDir}/${name}.snap` ); // Save PNG for reference await expect(screenshot.path).toBeTruthy(); // Cleanup document.body.removeChild(element); } ``` Then update the test: ```typescript describe("BaseButton", () => { describe("visual regression", () => { it("should match all button variants snapshot", async () => { const container = renderStories(BaseButton, buttonStories); await expect( takeAndCompareScreenshot("all-button-variants", container) ).resolves.not.toThrow(); }); }); }); ``` Vitest is discussing native screenshot comparisons in browser mode. Follow and contribute at [github.com/vitest-dev/vitest/discussions/690](https://github.com/vitest-dev/vitest/discussions/690). ```mermaid flowchart LR A[Render Component] --> B[Capture Screenshot] B --> C{Compare with Baseline} C -->|Match| D[Test Passes] C -->|Difference| E[Review Changes] E -->|Accept| F[Update Baseline] E -->|Reject| G[Fix Component] G --> A ``` ## Conclusion Vitest's experimental browser mode empowers developers to perform accurate visual regression testing of Vue components in real browser environments. While the current workflow requires manual review of screenshot comparisons, it establishes a foundation for more automated visual testing in the future. This approach also strengthens collaboration between developers and UI designers. Designers can review visual changes to components before production deployment by accessing the generated screenshots in the component library. For advanced visual testing capabilities, teams should explore dedicated tools like Playwright or Cypress that offer more features and maturity. Keep in mind to perform visual regression tests against your Base components. --- --- title: How to Test Vue Router Components with Testing Library and Vitest description: Learn how to test Vue Router components using Testing Library and Vitest. This guide covers real router integration, mocked router setups, and best practices for testing navigation, route guards, and dynamic components in Vue applications. tags: ['vue', 'testing', 'vue-router', 'vitest', 'testing-library'] url: https://alexop.dev/posts/how-to-test-vue-router-components-with-testing-library-and-vitest --- # How to Test Vue Router Components with Testing Library and Vitest ## TLDR This guide shows you how to test Vue Router components using real router integration and isolated component testing with mocks. You'll learn to verify router-link interactions, programmatic navigation, and navigation guard handling. ## Introduction Modern Vue applications need thorough testing to ensure reliable navigation and component performance. We'll cover testing strategies using Testing Library and Vitest to simulate real-world scenarios through router integration and component isolation. ## Vue Router Testing Techniques with Testing Library and Vitest Let's explore how to write effective tests for Vue Router components using both real router instances and mocks. ## Testing Vue Router Navigation Components ### Navigation Component Example ```vue ``` ### Real Router Integration Testing Test complete routing behavior with a real router instance: ```typescript describe("NavigationMenu", () => { it("should navigate using router links", async () => { const router = createRouter({ history: createWebHistory(), routes: [ { path: "/dashboard", component: { template: "Dashboard" } }, { path: "/settings", component: { template: "Settings" } }, { path: "/profile", component: { template: "Profile" } }, { path: "/", component: { template: "Home" } }, ], }); render(NavigationMenu, { global: { plugins: [router], }, }); const user = userEvent.setup(); expect(router.currentRoute.value.path).toBe("/"); await router.isReady(); await user.click(screen.getByText("Dashboard")); expect(router.currentRoute.value.path).toBe("/dashboard"); await user.click(screen.getByText("Profile")); expect(router.currentRoute.value.path).toBe("/profile"); }); }); ``` ### Mocked Router Testing Test components in isolation with router mocks: ```typescript const mockPush = vi.fn(); vi.mock("vue-router", () => ({ useRouter: vi.fn(), })); describe("NavigationMenu with mocked router", () => { it("should handle navigation with mocked router", async () => { const mockRouter = { push: mockPush, currentRoute: { value: { path: "/" } }, } as unknown as Router; vi.mocked(useRouter).mockImplementation(() => mockRouter); const user = userEvent.setup(); render(NavigationMenu); await user.click(screen.getByText("Profile")); expect(mockPush).toHaveBeenCalledWith("/profile"); }); }); ``` ### RouterLink Stub for Isolated Testing Create a RouterLink stub to test navigation without router-link behavior: ```ts // test-utils.ts export const RouterLinkStub: Component = { name: "RouterLinkStub", props: { to: { type: [String, Object], required: true, }, tag: { type: String, default: "a", }, exact: Boolean, exactPath: Boolean, append: Boolean, replace: Boolean, activeClass: String, exactActiveClass: String, exactPathActiveClass: String, event: { type: [String, Array], default: "click", }, }, setup(props) { const router = useRouter(); const navigate = () => { router.push(props.to); }; return { navigate }; }, render() { return h( this.tag, { onClick: () => this.navigate(), }, this.$slots.default?.() ); }, }; ``` Use the RouterLinkStub in tests: ```ts const mockPush = vi.fn(); vi.mock("vue-router", () => ({ useRouter: vi.fn(), })); describe("NavigationMenu with mocked router", () => { it("should handle navigation with mocked router", async () => { const mockRouter = { push: mockPush, currentRoute: { value: { path: "/" } }, } as unknown as Router; vi.mocked(useRouter).mockImplementation(() => mockRouter); const user = userEvent.setup(); render(NavigationMenu, { global: { stubs: { RouterLink: RouterLinkStub, }, }, }); await user.click(screen.getByText("Dashboard")); expect(mockPush).toHaveBeenCalledWith("/dashboard"); }); }); ``` ### Testing Navigation Guards Test navigation guards by rendering the component within a route context: ```vue ``` Test the navigation guard: ```ts const routes = [ { path: "/", component: RouteLeaveGuardDemo }, { path: "/about", component: { template: "
    About
    " } }, ]; const router = createRouter({ history: createWebHistory(), routes, }); const App = { template: "" }; describe("RouteLeaveGuardDemo", () => { beforeEach(async () => { vi.clearAllMocks(); window.confirm = vi.fn(); await router.push("/"); await router.isReady(); }); it("should prompt when guard is triggered and user confirms", async () => { // Set window.confirm to simulate a user confirming the prompt window.confirm = vi.fn(() => true); // Render the component within a router context render(App, { global: { plugins: [router], }, }); const user = userEvent.setup(); // Find the 'About' link and simulate a user click const aboutLink = screen.getByRole("link", { name: /About/i }); await user.click(aboutLink); // Assert that the confirm dialog was shown with the correct message expect(window.confirm).toHaveBeenCalledWith( "Do you really want to leave this page?" ); // Verify that the navigation was allowed and the route changed to '/about' expect(router.currentRoute.value.path).toBe("/about"); }); }); ``` ### Reusable Router Test Helper Create a helper function to simplify router setup: ```typescript // test-utils.ts // path of the definition of your routes interface RenderWithRouterOptions extends Omit, "global"> { initialRoute?: string; routerOptions?: { routes?: typeof routes; history?: ReturnType; }; } export function renderWithRouter( Component: any, options: RenderWithRouterOptions = {} ) { const { initialRoute = "/", routerOptions = {}, ...renderOptions } = options; const router = createRouter({ history: createWebHistory(), // Use provided routes or import from your router file routes: routerOptions.routes || routes, }); router.push(initialRoute); return { // Return everything from regular render, plus the router instance ...render(Component, { global: { plugins: [router], }, ...renderOptions, }), router, }; } ``` Use the helper in tests: ```typescript describe("NavigationMenu", () => { it("should navigate using router links", async () => { const { router } = renderWithRouter(NavigationMenu, { initialRoute: "/", }); await router.isReady(); const user = userEvent.setup(); await user.click(screen.getByText("Dashboard")); expect(router.currentRoute.value.path).toBe("/dashboard"); }); }); ``` ### Conclusion: Best Practices for Vue Router Component Testing When we test components that rely on the router, we need to consider whether we want to test the functionality in the most realistic use case or in isolation. In my humble opinion, the more you mock a test, the worse it will get. My personal advice would be to aim to use the real router instead of mocking it. Sometimes, there are exceptions, so keep that in mind. Also, you can help yourself by focusing on components that don't rely on router functionality. Reserve router logic for view/page components. While keeping our components simple, we will never have the problem of mocking the router in the first place. --- --- title: How to Use AI for Effective Diagram Creation: A Guide to ChatGPT and Mermaid description: Learn how to leverage ChatGPT and Mermaid to create effective diagrams for technical documentation and communication. tags: ['ai', 'productivity'] url: https://alexop.dev/posts/how-to-use-ai-for-effective-diagram-creation-a-guide-to-chatgpt-and-mermaid --- # How to Use AI for Effective Diagram Creation: A Guide to ChatGPT and Mermaid ## TLDR Learn how to combine ChatGPT and Mermaid to quickly create professional diagrams for technical documentation. This approach eliminates the complexity of traditional diagramming tools while maintaining high-quality output. ## Introduction Mermaid is a markdown-like script language that generates diagrams from text descriptions. When combined with ChatGPT, it becomes a powerful tool for creating technical diagrams quickly and efficiently. ## Key Diagram Types ### Flowcharts Perfect for visualizing processes: ```plaintext flowchart LR A[Customer selects products] --> B[Customer reviews order] B --> C{Payment Successful?} C -->|Yes| D[Generate Invoice] D --> E[Dispatch goods] C -->|No| F[Redirect to Payment] ``` ```mermaid flowchart LR A[Customer selects products] --> B[Customer reviews order] B --> C{Payment Successful?} C -->|Yes| D[Generate Invoice] D --> E[Dispatch goods] C -->|No| F[Redirect to Payment] ``` ### Sequence Diagrams Ideal for system interactions: ```plaintext sequenceDiagram participant Client participant Server Client->>Server: Request (GET /resource) Server-->>Client: Response (200 OK) ``` ```mermaid sequenceDiagram participant Client participant Server Client->>Server: Request (GET /resource) Server-->>Client: Response (200 OK) ``` ## Using ChatGPT with Mermaid 1. Ask ChatGPT to explain your concept 2. Request a Mermaid diagram representation 3. Iterate on the diagram with follow-up questions Example prompt: "Create a Mermaid sequence diagram showing how Nuxt.js performs server-side rendering" ```plaintext sequenceDiagram participant Client as Client Browser participant Nuxt as Nuxt.js Server participant Vue as Vue.js Application participant API as Backend API Client->>Nuxt: Initial Request Nuxt->>Vue: SSR Starts Vue->>API: API Calls (if any) API-->>Vue: API Responses Vue->>Nuxt: Rendered HTML Nuxt-->>Client: HTML Content ``` ```mermaid sequenceDiagram participant Client as Client Browser participant Nuxt as Nuxt.js Server participant Vue as Vue.js Application participant API as Backend API Client->>Nuxt: Initial Request Nuxt->>Vue: SSR Starts Vue->>API: API Calls (if any) API-->>Vue: API Responses Vue->>Nuxt: Rendered HTML Nuxt-->>Client: HTML Content ``` ## Quick Setup Guide ### Online Editor Use [Mermaid Live Editor](https://mermaid.live/) for quick prototyping. ### VS Code Integration 1. Install "Markdown Preview Mermaid Support" extension 2. Create `.md` file with Mermaid code blocks 3. Preview with built-in markdown viewer ### Web Integration ```html
    graph TD A-->B
    ``` ## Conclusion The combination of ChatGPT and Mermaid streamlines technical diagramming, making it accessible and efficient. Try it in your next documentation project to save time while creating professional diagrams. --- --- title: Building a Pinia Plugin for Cross-Tab State Syncing description: Learn how to create a Pinia plugin that synchronizes state across browser tabs using the BroadcastChannel API and Vue 3's Script Setup syntax. tags: ['vue', 'pinia'] url: https://alexop.dev/posts/building-pinia-plugin-cross-tab-sync --- # Building a Pinia Plugin for Cross-Tab State Syncing ## TLDR Create a Pinia plugin that enables state synchronization across browser tabs using the BroadcastChannel API. The plugin allows you to mark specific stores for cross-tab syncing and handles state updates automatically with timestamp-based conflict resolution. ## Introduction In modern web applications, users often work with multiple browser tabs open. When using Pinia for state management, we sometimes need to ensure that state changes in one tab are reflected across all open instances of our application. This post will guide you through creating a plugin that adds cross-tab state synchronization to your Pinia stores. ## Understanding Pinia Plugins A Pinia plugin is a function that extends the functionality of Pinia stores. Plugins are powerful tools that help: - Reduce code duplication - Add reusable functionality across stores - Keep store definitions clean and focused - Implement cross-cutting concerns ## Cross-Tab Communication with BroadcastChannel The BroadcastChannel API provides a simple way to send messages between different browser contexts (tabs, windows, or iframes) of the same origin. It's perfect for our use case of synchronizing state across tabs. Key features of BroadcastChannel: - Built-in browser API - Same-origin security model - Simple pub/sub messaging pattern - No need for external dependencies ### How BroadcastChannel Works The BroadcastChannel API operates on a simple principle: any browsing context (window, tab, iframe, or worker) can join a channel by creating a `BroadcastChannel` object with the same channel name. Once joined: 1. Messages are sent using the `postMessage()` method 2. Messages are received through the `onmessage` event handler 3. Contexts can leave the channel using the `close()` method ## Implementing the Plugin ### Store Configuration To use our plugin, stores need to opt-in to state sharing through configuration: ```ts export const useCounterStore = defineStore( "counter", () => { const count = ref(0); const doubleCount = computed(() => count.value * 2); function increment() { count.value++; } return { count, doubleCount, increment }; }, { share: { enable: true, initialize: true, }, } ); ``` The `share` option enables cross-tab synchronization and controls whether the store should initialize its state from other tabs. ### Plugin Registration `main.ts` Register the plugin when creating your Pinia instance: ```ts const pinia = createPinia(); pinia.use(PiniaSharedState); ``` ### Plugin Implementation `plugin/plugin.ts` Here's our complete plugin implementation with TypeScript support: ```ts type Serializer = { serialize: (value: T) => string; deserialize: (value: string) => T; }; interface BroadcastMessage { type: "STATE_UPDATE" | "SYNC_REQUEST"; timestamp?: number; state?: string; } type PluginOptions = { enable?: boolean; initialize?: boolean; serializer?: Serializer; }; export interface StoreOptions< S extends StateTree = StateTree, G = object, A = object, > extends DefineStoreOptions { share?: PluginOptions; } // Add type extension for Pinia declare module "pinia" { // eslint-disable-next-line @typescript-eslint/no-unused-vars export interface DefineStoreOptionsBase { share?: PluginOptions; } } export function PiniaSharedState({ enable = false, initialize = false, serializer = { serialize: JSON.stringify, deserialize: JSON.parse, }, }: PluginOptions = {}) { return ({ store, options }: PiniaPluginContext) => { if (!(options.share?.enable ?? enable)) return; const channel = new BroadcastChannel(store.$id); let timestamp = 0; let externalUpdate = false; // Initial state sync if (options.share?.initialize ?? initialize) { channel.postMessage({ type: "SYNC_REQUEST" }); } // State change listener store.$subscribe((_mutation, state) => { if (externalUpdate) return; timestamp = Date.now(); channel.postMessage({ type: "STATE_UPDATE", timestamp, state: serializer.serialize(state as T), }); }); // Message handler channel.onmessage = (event: MessageEvent) => { const data = event.data; if ( data.type === "STATE_UPDATE" && data.timestamp && data.timestamp > timestamp && data.state ) { externalUpdate = true; timestamp = data.timestamp; store.$patch(serializer.deserialize(data.state)); externalUpdate = false; } if (data.type === "SYNC_REQUEST") { channel.postMessage({ type: "STATE_UPDATE", timestamp, state: serializer.serialize(store.$state as T), }); } }; }; } ``` The plugin works by: 1. Creating a BroadcastChannel for each store 2. Subscribing to store changes and broadcasting updates 3. Handling incoming messages from other tabs 4. Using timestamps to prevent update cycles 5. Supporting custom serialization for complex state ### Communication Flow Diagram ```mermaid flowchart LR A[User interacts with store in Tab 1] --> B[Store state changes] B --> C[Plugin detects change] C --> D[BroadcastChannel posts STATE_UPDATE] D --> E[Other tabs receive STATE_UPDATE] E --> F[Plugin patches store state in Tab 2] ``` ## Using the Synchronized Store Components can use the synchronized store just like any other Pinia store: ```ts const counterStore = useCounterStore(); // State changes will automatically sync across tabs counterStore.increment(); ``` ## Conclusion With this Pinia plugin, we've added cross-tab state synchronization with minimal configuration. The solution is lightweight, type-safe, and leverages the built-in BroadcastChannel API. This pattern is particularly useful for applications where users frequently work across multiple tabs and need a consistent state experience. Remember to consider the following when using this plugin: - Only enable sharing for stores that truly need it - Be mindful of performance with large state objects - Consider custom serialization for complex data structures - Test thoroughly across different browser scenarios ## Future Optimization: Web Workers For applications with heavy cross-tab communication or complex state transformations, consider offloading the BroadcastChannel handling to a Web Worker. This approach can improve performance by: - Moving message processing off the main thread - Handling complex state transformations without blocking UI - Reducing main thread load when syncing large state objects - Buffering and batching state updates for better performance This is particularly beneficial when: - Your application has many tabs open simultaneously - State updates are frequent or computationally intensive - You need to perform validation or transformation on synced data - The application handles large datasets that need to be synced You can find the complete code for this plugin in the [GitHub repository](https://github.com/alexanderop/pluginPiniaTabs). It also has examples of how to use it with Web Workers. --- --- title: The Browser That Speaks 200 Languages: Building an AI Translator Without APIs description: Learn how to build a browser-based translator that works offline and handles 200 languages using Vue and Transformers.js tags: ['vue', 'ai'] url: https://alexop.dev/posts/building-client-side-ai-translator-vue --- # The Browser That Speaks 200 Languages: Building an AI Translator Without APIs ## Introduction Most AI translation tools rely on external APIs. This means sending data to servers and paying for each request. But what if you could run translations directly in your browser? This guide shows you how to build a free, offline translator that handles 200 languages using Vue and Transformers.js. ## The Tools - Vue 3 for the interface - Transformers.js to run AI models locally - Web Workers to handle heavy processing - NLLB-200, Meta's translation model ```mermaid --- title: Architecture Overview --- graph LR Frontend[Vue Frontend] Worker[Web Worker] TJS[Transformers.js] Model[NLLB-200 Model] Frontend -->|"Text"| Worker Worker -->|"Initialize"| TJS TJS -->|"Load"| Model Model -->|"Results"| TJS TJS -->|"Stream"| Worker Worker -->|"Translation"| Frontend classDef default fill:#344060,stroke:#AB4B99,color:#EAEDF3 classDef accent fill:#8A337B,stroke:#AB4B99,color:#EAEDF3 class TJS,Model accent ``` ## Building the Translator ![AI Translator](../../assets/images/vue-ai-translate.png) ### 1. Set Up Your Project Create a new Vue project with TypeScript: ```bash npm create vite@latest vue-translator -- --template vue-ts cd vue-translator npm install npm install @huggingface/transformers ``` ### 2. Create the Translation Worker The translation happens in a background process. Create `src/worker/translation.worker.ts`: ```typescript import { pipeline, TextStreamer, TranslationPipeline, } from "@huggingface/transformers"; // Singleton pattern for the translation pipeline class MyTranslationPipeline { static task: PipelineType = "translation"; // We use the distilled model for faster loading and inference static model = "Xenova/nllb-200-distilled-600M"; static instance: TranslationPipeline | null = null; static async getInstance(progress_callback?: ProgressCallback) { if (!this.instance) { this.instance = (await pipeline(this.task, this.model, { progress_callback, })) as TranslationPipeline; } return this.instance; } } // Type definitions for worker messages interface TranslationRequest { text: string; src_lang: string; tgt_lang: string; } // Worker message handler self.addEventListener( "message", async (event: MessageEvent) => { try { // Initialize the translation pipeline with progress tracking const translator = await MyTranslationPipeline.getInstance(x => { self.postMessage(x); }); // Configure streaming for real-time translation updates const streamer = new TextStreamer(translator.tokenizer, { skip_prompt: true, skip_special_tokens: true, callback_function: (text: string) => { self.postMessage({ status: "update", output: text, }); }, }); // Perform the translation const output = await translator(event.data.text, { tgt_lang: event.data.tgt_lang, src_lang: event.data.src_lang, streamer, }); // Send the final result self.postMessage({ status: "complete", output, }); } catch (error) { self.postMessage({ status: "error", error: error instanceof Error ? error.message : "An unknown error occurred", }); } } ); ``` ### 3. Build the Interface Create a clean interface with two main components: #### Language Selector (`src/components/LanguageSelector.vue`) ```vue ``` #### Progress Bar (`src/components/ProgressBar.vue`) ```vue ``` ### 4. Put It All Together In your main app file: ```vue