AI-Powered Testing with Claude Code

Testing with Claude Code – Part 3 of a building-in-public series showing an AI-powered testing framework over a circuit board background

This post is the final entry in our three-part series on building an AI-powered end-to-end testing framework, and it’s where we truly take you behind the curtain. Click here to catch up with part 1 and here for part 2.

A quick heads-up before you dive in: this is a code-heavy installment, even more than part 2.

If you enjoy seeing how software is actually built and tested in real life, you’ll feel right at home. We get into real examples, real workflows, and real implementation details that show exactly how this framework works under the hood.

If you’re a wealth firm CEO, COO, or financial advisor, you don’t need to read every line of code to get value from this post. In fact, this is an excellent resource to pass directly to your IT or development team.

It gives them a concrete, real-world blueprint for how modern, AI-assisted testing can be implemented responsibly and without sacrificing stability or trust.

Most importantly, even if you skim the technical sections, the business implications are easily understandable no matter your experience with software development. If you want to get straight to the takeaways, click “Wrapping Up: The Problem We Solved” in the table of contents just to the right.

The final sections of this post translate the technical work into practical takeaways: what this kind of framework enables, how it supports faster innovation, and why it ultimately benefits our advisors and our team alike.

This is the inside view. And if you want to understand not just what your technology partners promise, but how great systems are actually built, this is where it all comes together.

With that said, let’s dive in.

Multi-User Authentication

Real apps have multiple user roles. Admins, regular users, special access accounts. Our framework handles this with worker-scoped authentication.

How It Works

// fixtures/auth-fixtures.ts
import { test as base } from '@playwright/test';

type UserType = 'DEFAULT' | 'ADMIN' | 'INTAKE_USER';

interface TestFixtures {
  userType: UserType;
}

export const test = base.extend<TestFixtures>({
  userType: ['DEFAULT', { option: true }],

  page: async ({ browser, userType }, use, workerInfo) => {
    const authFile = `tests/.auth/${userType}-worker-${workerInfo.workerIndex}.json`;

    // Check if auth is cached and valid
    if (isAuthCacheValid(authFile)) {
      const context = await browser.newContext({ storageState: authFile });
      const page = await context.newPage();
      await use(page);
      await context.close();
      return;
    }

    // Authenticate fresh
    const context = await browser.newContext();
    const page = await context.newPage();

    await authenticateUser(page, userType);
    await context.storageState({ path: authFile });

    await use(page);
    await context.close();
  },
});

async function authenticateUser(page: Page, userType: UserType) {
  const username = process.env[`${userType}_USERNAME`] || process.env.DEFAULT_USERNAME;
  const password = process.env[`${userType}_PASSWORD`] || process.env.DEFAULT_PASSWORD;

  await page.goto('/signin');
  await page.getByPlaceholder('Email').fill(username);
  await page.getByPlaceholder('Password').fill(password);
  await page.getByRole('button', { name: 'Login' }).click();
  await page.waitForURL(url => !url.pathname.includes('signin'));
}

Each Playwright worker maintains its own auth state:

// fixtures/auth-fixtures.ts
import { test as base } from '@playwright/test';

type UserType = 'DEFAULT' | 'ADMIN' | 'INTAKE_USER';

interface TestFixtures {
  userType: UserType;
}

export const test = base.extend<TestFixtures>({
  userType: ['DEFAULT', { option: true }],

  page: async ({ browser, userType }, use, workerInfo) => {
    const authFile = `tests/.auth/${userType}-worker-${workerInfo.workerIndex}.json`;

    // Check if auth is cached and valid
    if (isAuthCacheValid(authFile)) {
      const context = await browser.newContext({ storageState: authFile });
      const page = await context.newPage();
      await use(page);
      await context.close();
      return;
    }

    // Authenticate fresh
    const context = await browser.newContext();
    const page = await context.newPage();

    await authenticateUser(page, userType);
    await context.storageState({ path: authFile });

    await use(page);
    await context.close();
  },
});

async function authenticateUser(page: Page, userType: UserType) {
  const username = process.env[`${userType}_USERNAME`] || process.env.DEFAULT_USERNAME;
  const password = process.env[`${userType}_PASSWORD`] || process.env.DEFAULT_PASSWORD;

  await page.goto('/signin');
  await page.getByPlaceholder('Email').fill(username);
  await page.getByPlaceholder('Password').fill(password);
  await page.getByRole('button', { name: 'Login' }).click();
  await page.waitForURL(url => !url.pathname.includes('signin'));
}

Using It in Tests

// Switch users at file level
test.use({ userType: 'ADMIN' });

// Or at describe block level
test.describe('Admin Features', () => {
  test.use({ userType: 'ADMIN' });

  test('admin can approve requests', async ({ page }) => {
    // page is authenticated as ADMIN
  });
});

// Multi-user in single test
test('cross-user workflow', async ({ page, userPage }) => {
  const adminPage = await userPage('ADMIN');
  const userPage = await userPage('INTAKE_USER');

  // Interact with both simultaneously
});

Smart Test Reporting

The Problem with Default Reporters

Standard Playwright reporters show retried tests as separate entries:

✗ TLH.1: Harvest Account (failed)
✓ TLH.1: Harvest Account (retry #1) (passed)

This is confusing. Did the test pass or fail? You have to read carefully to figure it out.

Our Fix: Consolidated Flaky Test Reporting

We built a custom reporter that groups retry attempts and shows what actually matters, which is the final outcome:

// reporters/summary-reporter.ts
interface FinalTestResult {
  name: string;
  finalStatus: 'passed' | 'failed' | 'flaky';  // flaky = passed on retry
  attempts: number;
  passedOnRetry: boolean;
}

class SummaryReporter implements Reporter {
  private tests: TestSummary[] = [];

  onTestEnd(test: TestCase, result: TestResult) {
    // Track each attempt with unique test ID
    const testId = `${test.parent.project()?.name}::${test.location.file}::${test.title}`;
    this.tests.push({
      testId,
      name: test.title,
      status: result.status,
      retry: result.retry,
      duration: result.duration,
    });
  }

  private consolidateResults(): FinalTestResult[] {
    // Group attempts by test ID
    const byTestId = new Map<string, TestSummary[]>();

    for (const test of this.tests) {
      const existing = byTestId.get(test.testId) || [];
      existing.push(test);
      byTestId.set(test.testId, existing);
    }

    const results: FinalTestResult[] = [];

    byTestId.forEach((attempts) => {
      attempts.sort((a, b) => a.retry - b.retry);

      const first = attempts[0];
      const last = attempts[attempts.length - 1];

      const failedFirst = first.status === 'failed';
      const passedLater = last.status === 'passed';
      const passedOnRetry = failedFirst && passedLater;

      results.push({
        name: last.name,
        finalStatus: passedOnRetry ? 'flaky' : last.status,
        attempts: attempts.length,
        passedOnRetry,
      });
    });

    return results;
  }

  onEnd(result: FullResult) {
    const finalResults = this.consolidateResults();

    const passed = finalResults.filter(t => t.finalStatus === 'passed');
    const flaky = finalResults.filter(t => t.finalStatus === 'flaky');
    const failed = finalResults.filter(t => t.finalStatus === 'failed');

    // Flaky tests are counted as successes
    const totalSuccessful = passed.length + flaky.length;

    console.log(`
═══════════════════════════════════════════════════════════════
  TEST RUN SUMMARY
═══════════════════════════════════════════════════════════════

  Total Tests:  ${finalResults.length}
  Passed:       ${passed.length}
  Flaky:        ${flaky.length} ← passed on retry
  Failed:       ${failed.length}

  Success Rate: ${totalSuccessful}/${finalResults.length}
`);

    // Highlight flaky tests as SUCCESS with warning
    if (flaky.length > 0) {
      console.log(`
─── FLAKY TESTS (passed on retry) ───`);
      flaky.forEach((test, i) => {
        console.log(`
  ⚡ ${i + 1}. ${test.name} ✓ PASSED
     Attempts: ${test.attempts}
     Result: Failed initially, passed on retry`);
      });
    }
  }
}

Sample Output

═══════════════════════════════════════════════════════════════
  TEST RUN SUMMARY
═══════════════════════════════════════════════════════════════

  Overall Status: ✓ PASSED

  ─── TEST COUNTS ───
  Total Tests:  5
  Passed:       3 (60%)
  Flaky:        1 (20%) ← passed on retry
  Failed:       1 (20%)

  Success Rate: 4/5 (80%) (includes 1 flaky)

  ─── FLAKY TESTS (passed on retry) ───

  ⚡ 1. TLH.1: Harvest Account ✓ PASSED
     Project: chromium
     Total Duration: 1m 45.2s (2 attempts)
     Result: Failed initially, passed on retry #1

  ─── PASSED TESTS ───
  ✓ INVEST.1: Invest Index (29.9s)
  ✓ TLH.1: Harvest Account ⚡ (1m 45.2s)

═══════════════════════════════════════════════════════════════
  TEST RUN PASSED - 4/5 tests passed (1 flaky) in 3m 12.5s
═══════════════════════════════════════════════════════════════

Real Example: Tax Loss Harvesting Test

The Challenge

We needed to test a complex financial workflow. Here’s what made it hard:

  • Multi-step process with slow API calls (60-120 seconds each)
  • Dynamic UI that changes based on account data
  • React-select dropdowns
  • Tables with unpredictable data
  • Multiple valid outcomes (opportunities might exist or not)

The Spec

Written in plain English so anyone can read it (even the people on our team with no engineering experience).

— TEST TLH.1: Harvest Account —
Steps:
1. Navigate to harvest
2. Look for Select Account and search for “Haley Fuller”
3. Confirm Account Summary shows positive values for Equities
4. Click on ‘Harvest Account’. This might take up to 60 seconds
5. Confirm ‘Selected Capital:’ shows positive value
6. Click on ‘Optimize Replacements’. This might take up to 120 seconds
7. Click on ‘Preview Orders’ to verify orders
8. Confirm there are some sells and buys in the orders preview

The Generated Page Object

Claude Code produced this from the spec:

export class TLHPage extends BasePage {
  readonly url = '/harvest';

  async selectAccount(accountName: string): Promise<void> {
    const accountCombobox = this.page.getByRole('combobox').first();
    await accountCombobox.waitFor({ state: 'visible', timeout: 15000 });
    await accountCombobox.click();
    await this.page.keyboard.type(accountName);
    await this.page.waitForSelector('[role="option"]');

    const option = this.page
      .locator('[role="option"]')
      .filter({ hasText: new RegExp(accountName, 'i') })
      .first();

    if (await option.isVisible()) {
      await option.click();
    } else {
      await this.page.keyboard.press('Enter');
    }

    await this.page.waitForTimeout(2000); // Wait for data load
  }

  async clickHarvestAccount(): Promise<void> {
    const harvestButton = this.page.locator('button:has-text("Harvest Account")').first();

    // Scroll into view to handle overlapping elements
    const tlhSection = this.page.locator('text=Tax Loss Harvesting Options')
      .locator('xpath=ancestor::*[3]');
    await tlhSection.scrollIntoViewIfNeeded();
    await harvestButton.scrollIntoViewIfNeeded();

    // Try normal click, fall back to JS click if intercepted
    try {
      await harvestButton.click({ timeout: 5000 });
    } catch {
      await harvestButton.evaluate((btn) => (btn as HTMLButtonElement).click());
    }

    await this.waitForHarvestComplete();
  }

  async waitForHarvestComplete(): Promise<void> {
    // Wait for "Generating optimized replacements..." to appear and disappear
    const generatingText = this.page.locator('text=/Generating.*replacement/i');

    try {
      await generatingText.waitFor({ state: 'visible', timeout: 5000 });
      await generatingText.waitFor({ state: 'hidden', timeout: 120000 });
    } catch {
      // May complete before we observe it
    }

    // Also check Preview Orders button state
    const previewButton = this.page.locator('button:has-text("Preview Orders")');
    const isEnabled = await previewButton.isEnabled().catch(() => false);
    console.log(`Preview Orders button enabled: ${isEnabled}`);
  }

  async hasHarvestingOpportunities(): Promise<boolean> {
    // Check multiple indicators
    const tabpanel = this.page.locator('[role="tabpanel"]').first();
    const tabpanelText = await tabpanel.textContent().catch(() => '');

    if (tabpanelText && tabpanelText.trim().length > 10) {
      return true;
    }

    const lots = await this.getLotsCount();
    return lots > 0;
  }
}

The Generated Test

test('TLH.1: Harvest Account', async ({ page }) => {
  await tlhPage.goto();
  await tlhPage.selectAccount('Haley Fuller');

  const equities = await tlhPage.getEquitiesValue();
  expect.soft(equities, 'Equities should be positive').toBeGreaterThan(0);

  await tlhPage.clickHarvestAccount();

  const hasOpportunities = await tlhPage.hasHarvestingOpportunities();

  if (!hasOpportunities) {
    // No opportunities is valid, since market conditions vary
    console.log('No harvesting opportunities found');
    const tabVisible = await page
      .locator('tab:has-text("Harvesting Opportunities")')
      .isVisible();
    expect(tabVisible).toBeTruthy();
    return;
  }

  const selectedCapital = await tlhPage.getSelectedCapitalValue();
  expect.soft(selectedCapital).toBeGreaterThan(0);

  const hasOptimizeButton = await tlhPage.isOptimizeReplacementsVisible();
  if (hasOptimizeButton) {
    await tlhPage.clickOptimizeReplacements();
  }

  const previewEnabled = await tlhPage.isPreviewOrdersEnabled();
  if (!previewEnabled) {
    console.log('Preview Orders disabled - no orders to preview');
    return;
  }

  const orderCounts = await tlhPage.clickPreviewOrders();
  expect.soft(orderCounts.buys).toBeGreaterThan(0);
  expect.soft(orderCounts.sells).toBeGreaterThan(0);
});

Patterns Worth Noting

1. Flexible selectors with regex and fallbacks

2. Graceful handling when no opportunities exist (test still passes)

3. Explicit timeouts for slow operations

4. Soft assertions for non-critical checks

5. JavaScript click fallback for intercepted elements

6. Console logging to help with debugging

What We Learned

1. Flexible Selectors Matter Most

This was the single biggest win for test stability:

// Before: Breaks when text changes
page.locator('button:has-text("Submit Request")');

// After: Survives minor changes
page.getByRole('button', { name: /submit.*request/i })
  .or(page.locator('[data-testid="submit-btn"]'));

2. Accept Some Flakiness

E2E tests will be flaky sometimes. That’s just reality. Instead of fighting it:

  • Configure 1-2 retries
  • Track flaky tests separately
  • Count them as passed (with a warning)
  • Fix patterns that cause repeated flakiness

3. Page Objects Pay Off

The upfront investment is worth it:

  • Fix a selector once, it works everywhere
  • Tests read like documentation
  • Non-developers can understand what tests do

4. Claude Code Works Well As a Partner

It’s good at:

  • Generating boilerplate from patterns
  • Finding alternative selectors when one fails
  • Explaining error contexts
  • Suggesting fixes based on DOM snapshots

5. Specs Double as Documentation

Test specs serve two purposes:

  • Define what to test
  • Document expected behavior
  • Let non-technical people contribute

Wrapping Up: The Problem We Solved

We started with a simple tension: Our ability to innovate was outpacing our ability to validate. Our QA team, essential to shipping quality software, had become an unintentional bottleneck.

We reached the point where manual testing couldn’t scale with our ambitions.

The solution wasn’t to replace our QA team. Rather, it was to amplify what they could do. And we did that with the AI-enabled framework laid out in this three-part blog series.

How the Framework Meets Our Requirements

RequirementHow We Met It
User interaction levelPlaywright automates real browser interactions. Clicks, navigation, form fills. Exactly as users experience them.
Human-readable specsMarkdown specifications that QA, PMs, and developers can all write and understand
AI-powered translationClaude Code converts plain English into robust Playwright tests with proper waits, assertions, and error handling
Resilience to changesRegex selectors with `.or()` fallbacks survive UI updates without breaking
Scalable execution4+ parallel workers with cached authentication run full suites in minutes

The Business Impact

MetricBeforeAfter
Time to write new test2-4 hours15-30 minutes
Test maintenance overhead40% of sprint10% of sprint
Test flakiness confusionHighEliminated
Non-developer contributions030% of specs
Regression suite runtimeHoursMinutes
QA bottleneckSevereEliminated

What Changed for Our Team

For Our QA Engineers:

  • Focus shifted from repetitive regression to exploratory testing
  • They write specs in plain English, not code
  • More time for edge case discovery and UX validation

For Our Developers:

  • Tests don’t break with every UI change
  • Clear specs document expected behavior
  • Faster feedback loops in CI/CD

For Our Business:

  • Innovation velocity restored
  • Quality maintained (actually improved)
  • Confidence in every release

The Core Idea

The framework works because it respects who does what: Humans define WHAT to test. AI figures out HOW to test it.

QA engineers understand user journeys. They know what matters. Claude Code handles the tedious translation to executable tests: Selectors, waits, assertions, error handling.

For us, creating this framework was never about automation replacing humans. It has always been about finding a way to create automation that makes humans more effective.

Getting Started

1. Set up the infrastructure. Clone the test folder structure and install dependencies.

2. Configure the generation prompt. Customize selector strategies for your application.

3. Write your first spec. Plain English test steps.

4. Generate and run. Let Claude translate, then iterate.

# Example workflow
echo "your test steps" > tests/specs/my-feature.spec.md
./tests/scripts/generate-tests.sh my-feature
npx playwright test tests/generated/my-feature.spec.ts

The framework adapts to your application’s patterns over time. Each test you generate teaches Claude Code more about your codebase, making subsequent tests easier.

Final Thoughts

We no longer choose between velocity and quality and our QA team isn’t a bottleneck anymore; they’re now a force multiplier. They define testing strategy while automation handles execution at scale.

As much code as there is in this series, the real insight we gained wasn’t technical. It was recognizing that AI should augment human expertise, not replace it. Our QA team is more valuable than ever, freed from repetitive tasks to focus on what they do best.

February 4, 2026
Share:

Deliver a superior client experience with truly customized investment solutions

Alphathena’s cloud-based platform eliminates the complexities associated with direct and custom indexing, simplifying personalization through tax-loss harvesting, auto-rebalancing, and index lifecycle management capabilities.

Table of Contents:

Share:

Deliver a superior client experience with truly customized investment solutions

Alphathena’s cloud-based platform eliminates the complexities associated with direct and custom indexing, simplifying personalization through tax-loss harvesting, auto-rebalancing, and index lifecycle management capabilities.

What’s next

Are you a
Registered Investment Advisor?

Schedule a meeting with our experts!

Or provide your information and one of our team members will reach out to you.

Schedule a meeting with our experts!

Or provide your information and one of our team members will reach out to you.

Please provide your information and one of our team members will reach out to you.