What This Series Covers
We built an AI-powered testing framework that turned our QA (Quality Assurance) process from a bottleneck into an advantage. This three-part series walks through exactly how we did it.
The Problem We Had
Our development team was shipping faster than QA could validate. Releases slowed simply to make room for testing.
That’s backwards.
But without proper QA, software breaks and users lose trust. Skipping testing was never an option; that would only make the problem worse as time goes on. Shipping faster doesn’t matter if your software is buggy.
We needed automation that could test the way a user does. Clicking buttons. Filling forms. Navigating pages. Doing it at scale without collapsing every time the UI changed.
This was never about replacing our QA team. They’re essential. The goal was to free them from repetitive regression testing so they could focus on exploratory testing, edge cases, and judgment calls automation can’t make.
What We Built
A three-layer system with Claude Code handling translation between intent and execution.
Specification Layer: Human-readable test cases written in plain language.
Execution Layer: Auto-generated Playwright tests created from those specs.
Page Object Layer: Reusable, resilient UI interactions designed to survive change.
Claude Code sits between the Specification and Execution layers, translating intent into working tests.
The Numbers
| Before the Framework | After the Framework |
| Time to write a new test: 2–4 hours | Time to write a new test: 15–30 minutes |
| Test maintenance overhead: ~40% of a sprint | Test maintenance overhead: ~10% of a sprint |
| Regression suite runtime: Hours | Regression suite runtime: Minutes |
| QA bottleneck: Severe | QA bottleneck: Gone |
Impressive results, right? This series will detail how we got from point A to point B.
I’ll be publishing our work in three parts. Before we dig into part one, here’s a look at what all three blogs will cover when read together as a single narrative.
Part 1: The Problem and Architecture
Why we built this system:
- The real truth about our QA bottleneck
- Why skipping testing was never an option
- What we actually needed versus what we thought we needed
- The hidden cost of slow validation
- Why traditional automation failed us before
- Our five non-negotiable requirements
- High-level architecture
Read this if you want to understand the motivation and decide whether this approach fits your situation. (You’re already there. Just scroll down to begin)
Part 2: Building the Framework
How we built it:
- The three-layer architecture in detail
- AI test generation scripts and prompts
- Page objects that don’t break
- Selector strategies that survive UI changes
- The full workflow from spec to passing test
Read this if you want to implement something similar yourself. Part 2 will publish on January 28, 2026.
Part 3: Advanced Topics and Results
Production features and outcomes:
- Multi-user authentication
- Smarter handling of flaky tests
- A real-world financial workflow example
- Lessons learned
- Business impact
Read this if you want to see advanced patterns and real results. Part 3 will publish on February 4, 2026.
If you want to follow along with this series and get a reminder when the next blog publishes, click here to sign up for my weekly email.

Quick Start
If you want to jump straight in, the basic flow looks like this:
- Set up Playwright.
- Write a human-readable test spec.
- Generate executable tests using Claude Code.
- Run Playwright against the generated tests.
The mechanics are simple. The leverage comes from separating intent from execution.
The Core Idea we’re working with is that humans define what to test, while AI figures out how to test it.
QA writes plain-English specs. Claude handles selectors, waits, and assertions. Everyone stays focused on what they’re best at, and the system scales.
Part 1: The Problem and Architecture
We reached a point where QA became a serious bottleneck when development velocity outpaced our validation capacity. This wasn’t a theoretical problem; it actively hurt our business.
In order to keep our standards and quality high, we slowed down innovation to accommodate QA.
Read that again:
Innovation slowed because validation couldn’t keep up.
That’s not a situation any software company wants to be in, and it wasn’t one we were willing to accept. But the numbers didn’t lie about our capacity problems.
The Math That Broke Us
| Sprint 1 | Sprint 5 | Sprint 10 | Sprint 20 |
| New features: 5 | New features: 5 | New features: 5 | New features: 5 |
| Regression tests needed: 20 | Regression tests needed: 100 | Regression tests needed: 200 | Regression tests needed: 400 |
| QA capacity: 20 tests per sprint | QA capacity: 20 tests per sprint | QA capacity: 20 tests per sprint | QA capacity: 20 tests per sprint |
Development speed stayed constant. Regression burden compounded. QA eventually spent all their time re-testing old functionality, leaving no capacity for new features.
What This Isn’t
Before I go further, I want to make one thing crystal clear. This exercise is not about replacing QA engineers.
They understand users deeply. They think through edge cases that developers miss. They catch usability issues no automation would flag.
They’re irreplaceable. Full stop.
What we needed was automation that could:
- Test core functionality
- Mimic real user behavior
- Operate at scale
- Free humans to do human work
The Real Costs
Slowing down QA testing didn’t just hurt our ability to ship more features, it resulted in problems down the line. A slowdown in QA impacts usability for advisors, and it impacts quality-of-life for our employees.
| For our business: | For development: | For QA: | For users: |
| • Features stuck in “ready for QA” limbo • Release cycles stretched from weeks to months • Timelines lost credibility | • Context switching while waiting on QA • Bug fixes harder due to stale code • Growing technical debt | • Burnout from repetitive testing • No time for exploratory work • Declining job satisfaction | • Bugs slipping through incomplete regression • Feature delays • Inconsistent quality |
Why Manual Testing Hits a Wall and Traditional Automation Fails
Manual testing scales linearly, while regression suites grow endlessly. Fatigue increases missed defects and knowledge lives in people’s heads. Coverage becomes inconsistent. Unfortunately, heroics don’t scale.
Brittle selectors broke tests with every UI change. Furthermore, only developers could write tests, which only served to isolate QA expertise.
Flaky tests then eroded trust in results. At the end of the day, maintenance overhead rivaled feature development itself. We needed to find a better way forward.
Our Requirements
Before building anything, we defined five non-negotiables:
| 1. Test at the User Interaction Level | Tests must behave exactly like users do. We would have no API shortcuts and no database manipulation. If users can hit a bug, tests should catch it. |
| 2. Human-Readable Test Cases | Test specs must be written in plain English. Anyone on the team should understand what’s being validated instantly, especially those people on our team who are not developers. |
| 3. AI-Powered Translation | Humans define intent. AI handles selectors, waits, assertions, and edge cases. |
| 4. Resilience to Application Changes | Renaming a button shouldn’t break the suite. Tests must adapt to these types of adjustments. |
| 5. Scalable Execution | Full regression runs must be completed in minutes, not hours. |
The Architecture
Each requirement maps directly to a system choice:
User interaction → Playwright browser automation
Readable specs → Markdown anyone can write
AI translation → Claude Code
Resilience → Flexible selectors with fallback logic
Scalability → Parallel execution with cached authentication
The system layers separate intent, execution, and interaction so each can evolve independently.
Next Up
In one week, I’ll publish Part 2: Building the Framework. In that blog, I’ll cover:
• How specs, tests, and page objects fit together
• The prompts and scripts that power AI test generation
• Selector strategies that survive UI changes
I hope you’ll follow along and join me on this journey of exploring the ways AI can be used to solve real limitations, without displacing the critical role of humans.
If you want a reminder when the next blog publishes, click here to sign up for my weekly email.