Using AI agents to test website accessibility over MCP

AI agent website accessibility testing is now a single tool call away. Here is how OverlayRiskWitness exposes its two-pass axe-core witness over the Model Context Protocol — and what a real tool response looks like.

Most accessibility checks live outside the developer toolchain. A QA engineer runs a scanner in a browser extension, exports a report, pastes findings into a ticket. The agent doing the rest of the build has no visibility into any of it. The Model Context Protocol changes the integration surface: if the accessibility engine speaks MCP, any agent that can call a tool can run the check, read the findings, and act on them — without leaving its own context window.

OverlayRiskWitness exposes its witness as an MCP server. The witness loads a public page twice in a real browser — once with the overlay active, once with it blocked — runs axe-core on each pass, and diffs the results per rule. That full contract is now available as a tool call. This post walks through the transport options, the tool shape, and what the response actually contains.

Why MCP is the right integration surface for accessibility checks

An accessibility scanner has three properties that make it a good MCP tool: it is read-only, it operates on public URLs, and its output is structured enough that an agent can reason over it without parsing freeform text. The witness never mutates the target site — it only loads a public page and reports what axe-core observed. That makes it safe to hand to an autonomous agent as a side-effect-free tool. The worst it can do is load a page twice and return findings.

The MCP wrapper also keeps the heavy machinery — Browserbase sessions, axe-core rule evaluation, claim extraction — server-side. The agent never needs to know that a real browser loaded the page or that a WCAG rule engine ran against the DOM. It asks a question, gets a structured answer, and can act on the finding states without understanding the scanner implementation.

Two transports: stdio and hosted Streamable-HTTP

The server ships on two transports. For local development and desktop AI clients like Claude Desktop or Cursor, the stdio transport is the standard path: you point the client config at the Node binary and the server process starts on demand. For hosted clients and agent pipelines that cannot manage local processes, the hosted Streamable-HTTP endpoint at POST /mcp on overlayrisk.com is the alternative. Both transports expose the same tool with the same schema.

The hosted endpoint is stateless — each request carries its own full context and the server holds no session between calls. It is guarded the same way the public /api/witness route is: per-IP rate limiting, a global kill-switch, and same-URL response caching to avoid redundant browser sessions. Because the endpoint is stateless, it is safe to call from serverless functions and from agent orchestration frameworks that do not guarantee sticky connections.

Agent toolchain to MCP server to Browserbase to axe-core: the witness engine runs server-side; the agent receives a structured finding payload.

Local stdio setup

To configure the stdio transport in a desktop MCP client, add the server entry to the client config file. The server process needs no persistent daemon — the client spawns it on the first tool call and manages the process lifetime.

mcp client config — local stdio transportjson

{
  "mcpServers": {
    "overlayrisk-witness": {
      "command": "node",
      "args": ["./app/bin/start-mcp.js"],
      "env": {
        "APP_URL": "https://overlayrisk.com"
      }
    }
  }
}

The server is also published to npm under the package name in the official MCP registry listing, so clients that resolve servers by package name rather than local path can install it without cloning the repo. The Glama registry listing covers clients that pull from that index. The hosted Streamable-HTTP endpoint on overlayrisk.com/mcp is the Smithery integration point.

Tool call shape and response structure

The witness tool takes one parameter: a public URL. The server runs the two-pass scan — overlay blocked, then overlay active — and returns a single JSON payload. The free tier returns the overlay vendor detected, the count of claims tested, the first finding in full detail, and a count of additional findings locked behind the Risk Packet. Unlocking the full packet requires a $49 one-time purchase; the Drift Monitor subscription re-runs the witness on a schedule and alerts on state changes.

example tool-call request and response (truncated)json

// Request
{
  "method": "tools/call",
  "params": {
    "name": "witness",
    "arguments": {
      "url": "https://example.com"
    }
  }
}

// Response (free tier)
{
  "overlayVendor": "accessiBe",
  "claimsTested": 12,
  "firstFinding": {
    "rule": "color-contrast",
    "wcagCriteria": "1.4.3",
    "state": "didNotHoldUp",
    "overlayOff": { "violations": 6 },
    "overlayOn":  { "violations": 6 },
    "transition": "no_effect",
    "claim": "This site meets WCAG 2.1 AA color contrast requirements."
  },
  "lockedFindingCount": 9,
  "packetUrl": "https://overlayrisk.com/pricing"
}

If you want to see what those fields look like outside the MCP payload, use How to document website accessibility evidence that holds up for the page-level packet structure — exact URL, timestamp, snapshot hash, quoted claim, and first broken step — that the tool response is meant to feed.

Finding states are observations, not legal conclusions

The three finding states — held up, did not hold up, not testable — describe what axe-core observed on a specific page at a specific moment. They are timestamped evidence. Whether a gap between a public claim and an observed finding has legal significance is a question for counsel. The witness provides the evidence; it does not provide a compliance certificate.

What the two-pass engine is actually doing

Understanding the response is easier if you understand the scan mechanics. The witness uses Browserbase — a hosted browser service — to load the target page in a real Chromium instance. This matters: overlay scripts run JavaScript that relies on a real DOM, real CSS computed styles, and real browser rendering. A headless sandbox that does not fully execute the overlay script would produce misleading results.

Pass 1: overlay blocked at network layer, axe-core runs against the base DOM. Pass 2: overlay active, given time to inject, axe-core re-runs. The diff is computed per rule.

Pass one blocks the overlay script at the network layer, so axe-core sees the site exactly as it ships without any runtime augmentation. Pass two loads the same URL with the overlay active and waits for it to inject its changes — the page navigates to domcontentloaded rather than the full load event, because heavy sites on cold Browserbase sessions often never fire load within the timeout budget. Both passes run the same axe-core rule set. The diff is computed rule by rule: a rule that still fails with the overlay on is the one that matters.

held up — the overlay did not introduce new violations on this rule and the claim is consistent with what axe-core saw.
did not hold up — violations persist with the overlay active, or the overlay introduced new failures; the public claim is inconsistent with the observation.
not testable — the rule could not be evaluated in one or both passes; this is a gap in evidence, not a pass.

The three finding states that every witness result resolves to, mapped to the underlying per-rule transition that produced them.

Agent workflow patterns

Because the witness is a standard MCP tool, it composes with other tools in an agent's toolchain without any custom integration work. A few patterns that make practical sense:

Pre-deploy audit: agent calls witness on the staging URL before a deploy approval step, surfaces any did-not-hold-up findings as blocking observations.
Vendor evaluation: agent runs witness on three competitor sites and a prospect's own site, compares lockedFindingCount and firstFinding state across all four before a sales call.
Regression triage: agent detects a deploy event via webhook, calls witness on the affected pages, and posts a summary of state changes to a Slack channel — no manual QA step in the middle.
Drift alerting: Drift Monitor subscription covers up to 20 pages on a schedule ($99/mo); agent consumes the webhook payload and routes findings to the relevant on-call channel.

Read-only and safe for autonomous agents

The witness tool never writes to the target site. It loads a public URL, runs axe-core, and returns observations. An agent with tool-call autonomy can call it without a human approval gate — there is no mutation risk. The same property that makes it safe to hand to an agent also makes it audit-friendly: every call is logged with the input URL, the timestamp, and the result.

The hosted endpoint enforces a per-IP rate limit and a same-URL cache, so a misconfigured agent that fires the same URL in a tight loop will not exhaust Browserbase session capacity or produce unbilled scan volume. Both limits are fail-open: if the cache layer errors, the scan runs normally; if the rate-limit store is unavailable, the request is allowed through rather than blocked.

The MCP server is read-only, stateless on the hosted transport, published to the official MCP registry, Glama, and npm, and has a hosted Streamable-HTTP endpoint for Smithery. If your agent can call a tool, it can run an accessibility witness on any public URL without any browser automation code on the agent side. The engine, the browser sessions, and the axe-core evaluation all stay server-side.

Using AI agents to test website accessibility over MCP

Why MCP is the right integration surface for accessibility checks

Two transports: stdio and hosted Streamable-HTTP

Local stdio setup

Tool call shape and response structure

What the two-pass engine is actually doing

Agent workflow patterns

More from the witness log

Website Accessibility Scores and Google Lighthouse: What a 0–100 Number Can Show — and What It Still Can't Prove

Accessibility guarantee vs independent evidence: what site owners should keep

accessiBe vs a manual audit: what each actually proves for ADA compliance

Using AI agents to test website accessibility over MCP

Why MCP is the right integration surface for accessibility checks#

Two transports: stdio and hosted Streamable-HTTP#

Local stdio setup#

Tool call shape and response structure#

What the two-pass engine is actually doing#

Agent workflow patterns#

More from the witness log

Website Accessibility Scores and Google Lighthouse: What a 0–100 Number Can Show — and What It Still Can't Prove

Accessibility guarantee vs independent evidence: what site owners should keep

accessiBe vs a manual audit: what each actually proves for ADA compliance

Why MCP is the right integration surface for accessibility checks

Two transports: stdio and hosted Streamable-HTTP

Local stdio setup

Tool call shape and response structure

What the two-pass engine is actually doing

Agent workflow patterns