Stay Updated!

Get the latest posts and insights delivered directly to your inbox

Skip to content

Building an AI QA Engineer with Claude Code and Playwright MCP

Published: at 
AI QA Engineer with Claude Code and Playwright

Quick Summary#

Table of Contents#

Open Table of Contents

Introduction#

Manual testing gets old fast. Clicking through your app after every change, checking if forms still work, making sure nothing breaks on mobile—it’s tedious work that most developers avoid.

So I built an AI that does it for me.

Meet Quinn, my automated QA engineer. Quinn tests my app like a real person would. It clicks buttons. It fills forms with weird inputs. It resizes the browser to check mobile layouts. And it writes detailed bug reports.

The best part? Quinn runs automatically every time I open a pull request.

Claude QA workflow: Developer opens PR, GitHub Actions triggers workflow, Claude Code tests via Playwright, QA report posted back to PR
The automated QA workflow

The secret sauce: Claude Code + Playwright#

Two tools make this possible:

Claude Code is Anthropic’s coding assistant. It can run commands, create files, and—here’s the magic—control a web browser.

Playwright is a browser automation tool. It can click, type, take screenshots, and do anything a human can do in a browser.

When you combine them through the Model Context Protocol (MCP) What Is the Model Context Protocol (MCP)? How It Works Learn how MCP (Model Context Protocol) standardizes AI tool integration, enabling LLMs to interact with external services, databases, and APIs through a universal protocol similar to USB-C for AI applications. mcptypescriptai , Claude can literally browse your app like a real user.

Step 1: Give Claude a personality#

I didn’t want a boring test robot. I wanted a QA engineer with opinions.

So I created a prompt file that gives Claude a backstory:

# QA Engineer Identity

You are **Quinn**, a veteran QA engineer with 12 years
of experience breaking software. You've seen it all -
apps that crash on empty input, forms that lose data,
buttons that do nothing.

## Your Philosophy

- **Trust nothing.** Developers say it works? Prove it.
- **Users are creative.** They'll do things no one anticipated.
- **Edge cases are where bugs hide.** The happy path is boring.

This isn’t just for fun. The personality makes Claude test more thoroughly. Quinn doesn’t just check if buttons work—Quinn tries to break things.

I also gave Quinn strict rules:

## Non-Negotiable Rules

1. **UI ONLY.** You interact through the browser like a
   real user. You cannot read source code.

2. **SCREENSHOT BUGS.** Every bug gets a screenshot.

3. **CONTINUE AFTER BUGS.** Finding a bug is not the end.
   Document it, then KEEP TESTING.

4. **MOBILE MATTERS.** Always test mobile viewport (375x667).

Step 2: Create the GitHub Action#

GitHub Actions are like little robots that run tasks for you. They trigger when something happens (like opening a PR) and run whatever commands you specify.

Here’s the core of the workflow:

name: Claude QA

on:
  pull_request:
    types: [labeled]

jobs:
  qa:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Start my app
        run: |
          pnpm dev &
          # Wait for server to be ready
          sleep 10

      - name: Run Claude QA
        uses: anthropics/claude-code-action@v1
        with:
          prompt: ${{ steps.load-prompts.outputs.prompt }}
          claude_args: |
            --mcp-config '{"mcpServers":{"playwright":{
              "command":"npx",
              "args":["@playwright/mcp@latest","--headless"]
            }}}'

Let me break this down:

  1. Trigger: The workflow runs when you add a label to a PR (like qa-verify)
  2. Start the app: Launch your dev server so Claude has something to test
  3. Run Claude: Use Anthropic’s official GitHub Action with Playwright MCP connected

💪 Headless mode

The --headless flag runs the browser without a visible window. This is required for CI environments like GitHub Actions where there’s no display.

Step 3: Tell Claude what to test#

For each PR, I want Claude to verify the actual changes. So I pass in the PR description:

# PR Verification Testing

**PR #32**: Improve set editing and fix playlist overflow

## Your Mission

This PR claims to implement something. Your job is to:
1. **Verify** the claimed changes actually work
2. **Break** them with edge cases
3. **Ensure** no regressions in related features

## Test This PR

- Can users edit ANY set during active workout?
- Do completed sets stay editable?
- Do long exercise names truncate properly?

Claude reads this, understands what changed, and tests specifically for those features.

What Quinn actually does#

Here’s a real example from my workout tracker. I opened a PR that said “allow editing any set during a workout.”

Quinn went to work:

Happy Path

Edge Case

Edge Case

Start Workout

Add Exercise

Fill Set Data

Mark Complete

Test Edit Feature

Change Weight ✓

Enter -50

Enter 999999

Mobile Test

Long Name Test

Generate Report

The report#

Quinn generates a full QA report in Markdown:

# QA Verification Report

**PR**: #32 - Improve set editing
**Tester**: Quinn (Claude QA)

## Executive Summary

**APPROVED** - All claimed features work as described.

## Requirements Verification

| Requirement | Status | How Tested |
|-------------|--------|------------|
| Edit any set | PASS | Changed weight after marking complete |
| Long names truncate | PASS | Added 27-character exercise name |
| Mobile layout | PASS | Tested at 375x667 viewport |

## Bugs Found

None

## Verdict

**APPROVED** - Ready to merge.

This report gets posted automatically as a comment on your PR. You can see exactly what Quinn tested and whether your code is safe to merge.

The toolbox#

Quinn only gets access to browser tools—no code access:

--allowedTools "
  mcp__playwright__browser_navigate,
  mcp__playwright__browser_click,
  mcp__playwright__browser_type,
  mcp__playwright__browser_take_screenshot,
  mcp__playwright__browser_resize,
  Write
"

This keeps things realistic. A real QA engineer tests through the UI, not by reading code. Quinn does the same.

Why this works#

Three reasons this approach beats traditional testing:

It tests like a human#

Unit tests check if functions return the right values. Quinn checks if users can actually accomplish their goals.

It’s flexible#

You don’t write test scripts that break when you change a button’s text. Quinn understands intent and adapts.

It finds unexpected bugs#

Quinn tries things you wouldn’t think to try. Negative numbers? Extremely long inputs? Clicking the same button five times fast? Quinn tests all of it.

Comparison: AI QA vs traditional testing#

AspectUnit TestsE2E ScriptsAI QA (Quinn)
Tests user flows
Handles UI changes
Finds edge casesManualManual✅ Automatic
Setup complexityLowHighMedium
MaintenanceLowHighLow

Getting started#

Want to build your own AI QA engineer? Here’s what you need:

  1. Get Claude Code access — Sign up at Anthropic and get an API token

  2. Create your QA prompt — Give Claude a personality and testing philosophy

  3. Set up the GitHub Action — Use anthropics/claude-code-action with Playwright MCP

  4. Write a verification template — Tell Claude what to test for each PR

💡 Learn more about Claude Code

If you’re new to Claude Code, check out my comprehensive guide to Claude Code features Understanding Claude Code's Full Stack: MCP, Skills, Subagents, and Hooks Explained A practical guide to Claude Code's features — explained in the order they were introduced: MCP (2024), Claude Code core (Feb 2025), Plugins (2025), and Agent Skills (Oct 2025). What each does, how they fit together, and when to use what. claude-codeaimcp +2 covering MCP, Skills, Hooks, and more.

A word of caution#

This approach is experimental. AI-driven QA is exciting, but it’s not a replacement for deterministic testing.

A solid testing foundation still matters more:

AI QA works best as a complement to these, not a replacement. Use it for exploratory testing, edge case discovery, and verifying user flows that are hard to script.

💪 Beyond QA

Claude Code in GitHub Actions isn’t limited to QA. The same pattern works for:

  • SEO audits — Check meta tags, heading structure, Core Web Vitals
  • Accessibility testing How to Improve Accessibility with Testing Library and jest-axe for Your Vue Application Use Jest axe to have automatic tests for your vue application vueaccessibility — Verify ARIA labels, keyboard navigation, color contrast
  • Content review — Validate links, check for broken images, lint prose
  • Visual regression How to Do Visual Regression Testing in Vue with Vitest? Learn how to implement visual regression testing in Vue.js using Vitest's browser mode. This comprehensive guide covers setting up screenshot-based testing, creating component stories, and integrating with CI/CD pipelines for automated visual testing. vuetestingvitest — Compare screenshots across deployments

Any task where you’d open a browser and manually check something can be automated this way.

Conclusion#

Building an AI QA engineer combines two powerful tools: Claude Code for intelligence and Playwright MCP for browser control. The result is automated testing that thinks like a human but works tirelessly.

It’s still early days for this approach. But some day, Quinn might find a bug that would have embarrassed me in production. On that day, this whole experiment will have paid for itself.

Additional resources#

Press Esc or click outside to close

Stay Updated!

Subscribe to my newsletter for more TypeScript, Vue, and web dev insights directly in your inbox.

  • Background information about the articles
  • Weekly Summary of all the interesting blog posts that I read
  • Small tips and trick
Subscribe Now