We Were Shipping Faster Than We Could Test, So We Changed One Thing

Graphic showing a developer working at a desk with code on computer screens, overlaid with the text “How we created an AI-powered testing framework” and “Building in public: Part 1.”

What This Series Covers

We built an AI-powered testing framework that turned our QA (Quality Assurance) process from a bottleneck into an advantage. This three-part series walks through exactly how we did it.

The Problem We Had

Our development team was shipping faster than QA could validate. Releases slowed simply to make room for testing.

That’s backwards.

But without proper QA, software breaks and users lose trust. Skipping testing was never an option; that would only make the problem worse as time goes on. Shipping faster doesn’t matter if your software is buggy.

We needed automation that could test the way a user does. Clicking buttons. Filling forms. Navigating pages. Doing it at scale without collapsing every time the UI changed.

This was never about replacing our QA team. They’re essential. The goal was to free them from repetitive regression testing so they could focus on exploratory testing, edge cases, and judgment calls automation can’t make.

What We Built

A three-layer system with Claude Code handling translation between intent and execution.

Specification Layer: Human-readable test cases written in plain language.

Execution Layer: Auto-generated Playwright tests created from those specs.

Page Object Layer: Reusable, resilient UI interactions designed to survive change.

Claude Code sits between the Specification and Execution layers, translating intent into working tests.

The Numbers

Before the FrameworkAfter the Framework
Time to write a new test: 2–4 hoursTime to write a new test: 15–30 minutes
Test maintenance overhead: ~40% of a sprintTest maintenance overhead: ~10% of a sprint
Regression suite runtime: HoursRegression suite runtime: Minutes
QA bottleneck: SevereQA bottleneck: Gone

Impressive results, right? This series will detail how we got from point A to point B.

I’ll be publishing our work in three parts. Before we dig into part one, here’s a look at what all three blogs will cover when read together as a single narrative.

Part 1: The Problem and Architecture

Why we built this system:

  • The real truth about our QA bottleneck
  • Why skipping testing was never an option
  • What we actually needed versus what we thought we needed
  • The hidden cost of slow validation
  • Why traditional automation failed us before
  • Our five non-negotiable requirements
  • High-level architecture

Read this if you want to understand the motivation and decide whether this approach fits your situation. (You’re already there. Just scroll down to begin)

Part 2: Building the Framework

How we built it:

  • The three-layer architecture in detail
  • AI test generation scripts and prompts
  • Page objects that don’t break
  • Selector strategies that survive UI changes
  • The full workflow from spec to passing test

Read this if you want to implement something similar yourself. Part 2 will publish on January 28, 2026.

Part 3: Advanced Topics and Results

Production features and outcomes:

  • Multi-user authentication
  • Smarter handling of flaky tests
  • A real-world financial workflow example
  • Lessons learned
  • Business impact

Read this if you want to see advanced patterns and real results. Part 3 will publish on February 4, 2026.

If you want to follow along with this series and get a reminder when the next blog publishes, click here to sign up for my weekly email.

Quick Start

If you want to jump straight in, the basic flow looks like this:

  • Set up Playwright.
  • Write a human-readable test spec.
  • Generate executable tests using Claude Code.
  • Run Playwright against the generated tests.

The mechanics are simple. The leverage comes from separating intent from execution.

The Core Idea we’re working with is that humans define what to test, while AI figures out how to test it.

QA writes plain-English specs. Claude handles selectors, waits, and assertions. Everyone stays focused on what they’re best at, and the system scales.

Part 1: The Problem and Architecture

We reached a point where QA became a serious bottleneck when development velocity outpaced our validation capacity. This wasn’t a theoretical problem; it actively hurt our business.

In order to keep our standards and quality high, we slowed down innovation to accommodate QA.

Read that again: 

Innovation slowed because validation couldn’t keep up.

That’s not a situation any software company wants to be in, and it wasn’t one we were willing to accept. But the numbers didn’t lie about our capacity problems.

The Math That Broke Us

Sprint 1Sprint 5Sprint 10Sprint 20
New features: 5New features: 5New features: 5New features: 5
Regression tests needed: 20Regression tests needed: 100Regression tests needed: 200Regression tests needed: 400
QA capacity: 20 tests per sprintQA capacity: 20 tests per sprintQA capacity: 20 tests per sprintQA capacity: 20 tests per sprint

Development speed stayed constant. Regression burden compounded. QA eventually spent all their time re-testing old functionality, leaving no capacity for new features.

What This Isn’t

Before I go further, I want to make one thing crystal clear. This exercise is not about replacing QA engineers.

They understand users deeply. They think through edge cases that developers miss. They catch usability issues no automation would flag.

They’re irreplaceable. Full stop.

What we needed was automation that could:

  1. Test core functionality
  2. Mimic real user behavior
  3. Operate at scale
  4. Free humans to do human work

The Real Costs

Slowing down QA testing didn’t just hurt our ability to ship more features, it resulted in problems down the line. A slowdown in QA impacts usability for advisors, and it impacts quality-of-life for our employees.

For our business:For development:For QA:For users:
• Features stuck in “ready for QA” limbo
• Release cycles stretched from weeks to months
• Timelines lost credibility
• Context switching while waiting on QA
• Bug fixes harder due to stale code
• Growing technical debt
• Burnout from repetitive testing
• No time for exploratory work
• Declining job satisfaction
• Bugs slipping through incomplete regression
• Feature delays
• Inconsistent quality

Why Manual Testing Hits a Wall and Traditional Automation Fails

Manual testing scales linearly, while regression suites grow endlessly. Fatigue increases missed defects and knowledge lives in people’s heads. Coverage becomes inconsistent. Unfortunately, heroics don’t scale.

Brittle selectors broke tests with every UI change. Furthermore, only developers could write tests, which only served to isolate QA expertise.

Flaky tests then eroded trust in results. At the end of the day, maintenance overhead rivaled feature development itself. We needed to find a better way forward.

Our Requirements

Before building anything, we defined five non-negotiables:

1. Test at the User Interaction LevelTests must behave exactly like users do. We would have no API shortcuts and no database manipulation.

If users can hit a bug, tests should catch it.
2. Human-Readable Test CasesTest specs must be written in plain English. Anyone on the team should understand what’s being validated instantly, especially those people on our team who are not developers.
3. AI-Powered TranslationHumans define intent. AI handles selectors, waits, assertions, and edge cases.
4. Resilience to Application ChangesRenaming a button shouldn’t break the suite. Tests must adapt to these types of adjustments.
5. Scalable ExecutionFull regression runs must be completed in minutes, not hours.

The Architecture

Each requirement maps directly to a system choice:

User interaction → Playwright browser automation
Readable specs → Markdown anyone can write
AI translation → Claude Code
Resilience → Flexible selectors with fallback logic
Scalability → Parallel execution with cached authentication

The system layers separate intent, execution, and interaction so each can evolve independently.

Next Up

In one week, I’ll publish Part 2: Building the Framework. In that blog, I’ll cover:

• How specs, tests, and page objects fit together
• The prompts and scripts that power AI test generation
• Selector strategies that survive UI changes

I hope you’ll follow along and join me on this journey of exploring the ways AI can be used to solve real limitations, without displacing the critical role of humans. 

If you want a reminder when the next blog publishes, click here to sign up for my weekly email.

January 21, 2026
Share:

Deliver a superior client experience with truly customized investment solutions

Alphathena’s cloud-based platform eliminates the complexities associated with direct and custom indexing, simplifying personalization through tax-loss harvesting, auto-rebalancing, and index lifecycle management capabilities.

Table of Contents:

Share:

Deliver a superior client experience with truly customized investment solutions

Alphathena’s cloud-based platform eliminates the complexities associated with direct and custom indexing, simplifying personalization through tax-loss harvesting, auto-rebalancing, and index lifecycle management capabilities.

What’s next

December 16, 2025
By Josh Harris

Are you a
Registered Investment Advisor?

Schedule a meeting with our experts!

Or provide your information and one of our team members will reach out to you.

Schedule a meeting with our experts!

Or provide your information and one of our team members will reach out to you.

Please provide your information and one of our team members will reach out to you.