Home Insights We Were Shipping Faster Than We Could Test, So We Changed One Thing

We Were Shipping Faster Than We Could Test, So We Changed One Thing

What This Series Covers

We built an AI-powered testing framework that turned our QA (Quality Assurance) process from a bottleneck into an advantage. This three-part series walks through exactly how we did it.

The Problem We Had

Our development team was shipping faster than QA could validate. Releases slowed simply to make room for testing.

That’s backwards.

But without proper QA, software breaks and users lose trust. Skipping testing was never an option; that would only make the problem worse as time goes on. Shipping faster doesn’t matter if your software is buggy.

We needed automation that could test the way a user does. Clicking buttons. Filling forms. Navigating pages. Doing it at scale without collapsing every time the UI changed.

This was never about replacing our QA team. They’re essential. The goal was to free them from repetitive regression testing so they could focus on exploratory testing, edge cases, and judgment calls automation can’t make.

What We Built

A three-layer system with Claude Code handling translation between intent and execution.

Specification Layer: Human-readable test cases written in plain language.

Execution Layer: Auto-generated Playwright tests created from those specs.

Page Object Layer: Reusable, resilient UI interactions designed to survive change.

Claude Code sits between the Specification and Execution layers, translating intent into working tests.

The Numbers

Before the Framework	After the Framework
Time to write a new test: 2–4 hours	Time to write a new test: 15–30 minutes
Test maintenance overhead: ~40% of a sprint	Test maintenance overhead: ~10% of a sprint
Regression suite runtime: Hours	Regression suite runtime: Minutes
QA bottleneck: Severe	QA bottleneck: Gone

Impressive results, right? This series will detail how we got from point A to point B.

I’ll be publishing our work in three parts. Before we dig into part one, here’s a look at what all three blogs will cover when read together as a single narrative.

Part 1: The Problem and Architecture

Why we built this system:

The real truth about our QA bottleneck
Why skipping testing was never an option
What we actually needed versus what we thought we needed
The hidden cost of slow validation
Why traditional automation failed us before
Our five non-negotiable requirements
High-level architecture

Read this if you want to understand the motivation and decide whether this approach fits your situation. (You’re already there. Just scroll down to begin)

Part 2: Building the Framework

How we built it:

The three-layer architecture in detail
AI test generation scripts and prompts
Page objects that don’t break
Selector strategies that survive UI changes
The full workflow from spec to passing test

Read this if you want to implement something similar yourself. Part 2 will publish on January 28, 2026.

Part 3: Advanced Topics and Results

Production features and outcomes:

Multi-user authentication
Smarter handling of flaky tests
A real-world financial workflow example
Lessons learned
Business impact

Read this if you want to see advanced patterns and real results. Part 3 will publish on February 4, 2026.

If you want to follow along with this series and get a reminder when the next blog publishes, click here to sign up for my weekly email.

Quick Start

If you want to jump straight in, the basic flow looks like this:

Set up Playwright.
Write a human-readable test spec.
Generate executable tests using Claude Code.
Run Playwright against the generated tests.

The mechanics are simple. The leverage comes from separating intent from execution.

The Core Idea we’re working with is that humans define what to test, while AI figures out how to test it.

QA writes plain-English specs. Claude handles selectors, waits, and assertions. Everyone stays focused on what they’re best at, and the system scales.

Part 1: The Problem and Architecture

We reached a point where QA became a serious bottleneck when development velocity outpaced our validation capacity. This wasn’t a theoretical problem; it actively hurt our business.

In order to keep our standards and quality high, we slowed down innovation to accommodate QA.

Read that again:

Innovation slowed because validation couldn’t keep up.

That’s not a situation any software company wants to be in, and it wasn’t one we were willing to accept. But the numbers didn’t lie about our capacity problems.

The Math That Broke Us

Sprint 1	Sprint 5	Sprint 10	Sprint 20
New features: 5	New features: 5	New features: 5	New features: 5
Regression tests needed: 20	Regression tests needed: 100	Regression tests needed: 200	Regression tests needed: 400
QA capacity: 20 tests per sprint	QA capacity: 20 tests per sprint	QA capacity: 20 tests per sprint	QA capacity: 20 tests per sprint

Development speed stayed constant. Regression burden compounded. QA eventually spent all their time re-testing old functionality, leaving no capacity for new features.

What This Isn’t

Before I go further, I want to make one thing crystal clear. This exercise is not about replacing QA engineers.

They understand users deeply. They think through edge cases that developers miss. They catch usability issues no automation would flag.

They’re irreplaceable. Full stop.

What we needed was automation that could:

Test core functionality
Mimic real user behavior
Operate at scale
Free humans to do human work

The Real Costs

Slowing down QA testing didn’t just hurt our ability to ship more features, it resulted in problems down the line. A slowdown in QA impacts usability for advisors, and it impacts quality-of-life for our employees.

For our business:	For development:	For QA:	For users:
• Features stuck in “ready for QA” limbo • Release cycles stretched from weeks to months • Timelines lost credibility	• Context switching while waiting on QA • Bug fixes harder due to stale code • Growing technical debt	• Burnout from repetitive testing • No time for exploratory work • Declining job satisfaction	• Bugs slipping through incomplete regression • Feature delays • Inconsistent quality

Why Manual Testing Hits a Wall and Traditional Automation Fails

Manual testing scales linearly, while regression suites grow endlessly. Fatigue increases missed defects and knowledge lives in people’s heads. Coverage becomes inconsistent. Unfortunately, heroics don’t scale.

Brittle selectors broke tests with every UI change. Furthermore, only developers could write tests, which only served to isolate QA expertise.

Flaky tests then eroded trust in results. At the end of the day, maintenance overhead rivaled feature development itself. We needed to find a better way forward.

Our Requirements

Before building anything, we defined five non-negotiables:

1. Test at the User Interaction Level	Tests must behave exactly like users do. We would have no API shortcuts and no database manipulation. If users can hit a bug, tests should catch it.
2. Human-Readable Test Cases	Test specs must be written in plain English. Anyone on the team should understand what’s being validated instantly, especially those people on our team who are not developers.
3. AI-Powered Translation	Humans define intent. AI handles selectors, waits, assertions, and edge cases.
4. Resilience to Application Changes	Renaming a button shouldn’t break the suite. Tests must adapt to these types of adjustments.
5. Scalable Execution	Full regression runs must be completed in minutes, not hours.

The Architecture

Each requirement maps directly to a system choice:

User interaction → Playwright browser automation
Readable specs → Markdown anyone can write
AI translation → Claude Code
Resilience → Flexible selectors with fallback logic
Scalability → Parallel execution with cached authentication

The system layers separate intent, execution, and interaction so each can evolve independently.

Next Up

In one week, I’ll publish Part 2: Building the Framework. In that blog, I’ll cover:

• How specs, tests, and page objects fit together
• The prompts and scripts that power AI test generation
• Selector strategies that survive UI changes

I hope you’ll follow along and join me on this journey of exploring the ways AI can be used to solve real limitations, without displacing the critical role of humans.

If you want a reminder when the next blog publishes, click here to sign up for my weekly email.

January 21, 2026

Mohan Naidu

Mohan brings deep experience in investment research and technology in his role as CEO. He has been recognized by StarMine and Institutional Investors for his research efforts, and is an avid participant in deep learning AI technology research. Mohan holds an MBA and is a CFA Charterholder, and most recently as Managing Director at Oppenheimer.

Deliver a superior client experience with truly customized investment solutions

Alphathena’s cloud-based platform eliminates the complexities associated with direct and custom indexing, simplifying personalization through tax-loss harvesting, auto-rebalancing, and index lifecycle management capabilities.

Deliver a superior client experience with truly customized investment solutions

What’s next

AI Insights

How Agentic Workflows Will Disrupt Outsourced Portfolio Management and Why the SMA and TAMP Model Is Ripe for Reinvention

February 24, 2026

By Mohan Naidu

AI Insights

Introducing Athena AI: Our First Agentic Workflow for Portfolio Management

February 23, 2026

By Mohan Naidu

Insights

Clean Data or Costly Mistakes

February 11, 2026

By Kerri Quinn

We Were Shipping Faster Than We Could Test, So We Changed One Thing

What This Series Covers

The Problem We Had

What We Built

The Numbers

Part 1: The Problem and Architecture

Part 2: Building the Framework

Part 3: Advanced Topics and Results

Quick Start

Part 1: The Problem and Architecture

The Math That Broke Us

What This Isn’t

The Real Costs

Why Manual Testing Hits a Wall and Traditional Automation Fails

Our Requirements

The Architecture

Next Up

Mohan Naidu

Table of Contents:

What’s next

How Agentic Workflows Will Disrupt Outsourced Portfolio Management and Why the SMA and TAMP Model Is Ripe for Reinvention

Introducing Athena AI: Our First Agentic Workflow for Portfolio Management

Clean Data or Costly Mistakes

Offer personalized experiences for every client at scale