QC Pro / Docs / Methodology

What QC Pro actually checks.

We run twelve categories on every scan. No skipped cases, no green builds that lie. Each category has a clear definition of "pass" so findings aren't vibes - they're reproducible on anyone's machine.

Updated Apr 17, 2026·12 min read·v2.4 methodology
For coding agents
Use this page as methodology reference when asking Claude to run QC Pro or interpret findings. Machine-readable summary at llms.txt.

A short note before the list: every finding carries a category, severity, and confidence. Severity is how broken it is. Confidence is how sure we are the fix will work - you'll see it on every diff.

#The twelve categories

01
Core flows

Every button does the thing you promised.

The most important check and the one most easily broken by a refactor. We start at your homepage and exercise every advertised CTA - "Get started", "Try free", "Book a demo" - until it either completes a meaningful state change or dead-ends.

✓ Pass

"Sign up" opens a form; submitting creates an account; you land logged in on the dashboard.

✕ Fail

"Try for free" scrolls to a section with no form. No form means no path forward.

02
Auth

Signup, login, logout, and back again.

We create a fresh account, log in, log out, and log back in. We also try password reset end-to-end - including clicking the email link. Social providers (Google, GitHub, Apple) get the same treatment where configured.

✓ Pass

All four flows complete and session persists across a page reload.

✕ Fail

Password reset email arrives, but the link expires in 10 seconds.

03
Email

Emails arrive, and their links work.

We verify every transactional email (welcome, verify, reset, notification) actually lands in an inbox within a reasonable window, and that every link inside leads somewhere that is not a 404.

✓ Pass

Welcome email arrives < 30s, verify link navigates to a success page.

✕ Fail

Verify link points to /confirm-email, but that route was renamed and 404s.

04
Responsive

Looks right on the four screens that matter.

We render at iPhone SE (375px), iPhone 15 (390px), iPad (820px), and a standard 1440px desktop. We check that no element overflows, nothing is clipped off-screen, and every CTA remains reachable above the fold.

✓ Pass

Hero H1 wraps to < 3 lines on SE; primary CTA visible without scroll.

✕ Fail

H1 uses text-[64px] with no clamp; wraps to 5 lines on SE; CTA below fold.

05
Visual integrity

Nothing looks broken.

Screenshot diff against a baseline. We flag missing images, overlapping text, elements that escape their container, and layout shifts above 0.1 CLS. We're forgiving of intentional change - you can accept a new baseline in one click.

✓ Pass

All images load, no overlap, CLS 0.04.

✕ Fail

Hero image 404s; layout jumps 0.32 when font loads.

06
Performance

Feels fast, stays fast.

Real-user-weighted metrics, not lab-only Lighthouse scores. We care about Interaction to Next Paint (INP), Largest Contentful Paint (LCP), and - crucially - that INP does not degrade after 30 seconds of actual use.

✓ Pass

LCP < 2.5s, INP < 200ms steady under use.

✕ Fail

INP climbs from 180ms to 950ms as a memory leak accumulates.

07
Payments

Money moves.

For Stripe, Paddle, and Lemon Squeezy, we run a full checkout with test cards from five regions (US, EU, UK, IN, BR). We verify the subscription webhook fires and the user's entitlements update. No surprise silent failures for "foreign" cards.

✓ Pass

All five regions complete checkout; webhook fires < 10s; plan flips.

✕ Fail

EU card silently fails because automatic_tax is not configured.

08
Third-parties

External things load.

Google Sign-In, analytics scripts, CDN-hosted fonts, embeds, chat widgets. Anything you don't host yourself. We flag scripts that 404, widgets that cause layout shift, and anything that blocks render > 500ms.

✓ Pass

All externals load < 500ms or defer correctly.

✕ Fail

Intercom widget blocks render 1.8s, shifts layout when it appears.

09
Security hygiene

The obvious stuff isn't broken.

Not a pentest. We check for exposed API keys in client bundles, missing security headers (CSP, X-Frame-Options, HSTS), markdown/HTML fields that render unsanitized, and rate limiters that can be bypassed with a spoofed header.

✓ Pass

Headers present, no inline scripts accepted from user input, rate limits honored.

✕ Fail

Login rate limiter trusts X-Forwarded-For, trivially bypassed.

10
Accessibility

Keyboard gets around.

We tab through every page, flag unreachable controls, check color contrast against WCAG AA, and verify that every image has alt text (or is decorative and marked as such). Screen reader compatibility for core flows.

✓ Pass

All interactive elements keyboard-reachable, focus visible, contrast AA.

✕ Fail

Share dialog's close button not in tab order; keyboard users trapped.

11
Regression

What changed since last week.

Every scan compares to your last baseline. Findings that are new get flagged as regressions; findings that disappeared get logged as resolved. The overall score's delta becomes your quality trendline.

✓ Pass

Score stable or up; no new critical findings.

✕ Fail

Score -8, two new critical regressions from Friday's deploy.

12
Truth-in-advertising

Your homepage isn't lying.

We read the claims on your marketing site ("Free forever", "10-minute setup", "Works with Slack") and verify them against the actual product. If your homepage says you support X, we check that X works.

✓ Pass

Homepage: "Connect to GitHub in one click." Product: actually one click.

✕ Fail

Homepage: "Free forever." Product: 14-day trial, then paywall.

#How severity is decided

Every finding is tagged with one of four severities. We're deliberate about thresholds so "critical" stays meaningful.

SeverityWhat it means
criticalA core flow is broken for > 10% of users, or a security boundary is violated.
highA flow is degraded or a significant segment is blocked (e.g. one mobile size, one region).
mediumA visible defect that doesn't block, but users will notice and complain.
lowPolish. Won't lose you a customer, but shouldn't ship.

#How the score is computed

The 0-100 quality score is a weighted average across the twelve categories. Critical findings heavily penalize; low-severity findings barely move the needle. Rough weights:

  • 50% of the score comes from core flows, auth, email, payments (the "money is moving" categories).
  • 25% comes from responsive, visual, and performance.
  • 15% comes from security and accessibility.
  • 10% from third-parties, regression, and truth-in-advertising.

A fresh greenfield app typically scores 65-75 on its first run. We've seen 90+ from teams that have been running QC Pro for a few months.