Allure Agent Mode

Test reporting for the agentic era

Allure Agent Mode turns your existing test command into agent-readable test reporting for Codex, Claude, and other coding agents, so they can review what ran before claiming a change is safe.

Try Agent Mode

Keep your test reporting workflow. Let your AI agent use it too.

check checkout feature coverage

Worked for 3m 22s

Coverage Check

I checked checkout coverage with two fresh Allure agent runs:

checkout e2e agent run: 7 passed, expectations matched, 0 findings.
checkout api agent run: 5 passed, expectations matched, 0 findings.

Covered happy paths:

cart summary before payment
successful order creation
declined-payment error state

Main gaps I see:

No focused expired-card test.
No direct retry-after-failure coverage.

Why it exists

"Tests passed" is not enough for AI test work

You ask an agent to check a change. It runs a command, sees green, and says it looks safe. But you still have to ask the real questions: did it run the right tests, did it notice what was skipped, and did it understand what those tests actually proved?

Green does not tell you what was checked

A passing command can hide a lot. The agent may have run too broad a smoke test, missed the feature you cared about, ignored skipped tests, or treated a weak assertion as real validation.

$ npm test -- --grep smokeRUN v2.0.5 /app/checkout✓ test/checkout.smoke.test.ts (12)  ✓ cart summary  ✓ order successTest Files  1 passed (1)Tests       7 passed, 5 skipped (12)Duration    1.4s

Agent seesSmoke passed

Agent missesA leftover it.only skipped half of the checkout smoke suite

The important details are not all in the terminal

The useful story often lives in the report: which tests ran, what steps happened, what retried, what was attached, which screenshots or traces matter, and where the run looks thin.

$ npm test checkout.spec.tsRUN v2.0.5 /app/checkout✓ test/checkout.spec.ts (2)  ✓ creates an order  ✓ declines invalid cardTest Files  1 passed (1)Tests       2 passed (2)Duration    349ms

Agent seesCheckout works as expected

Agent missesThe tests checked the legacy checkout page

You still need to trust the agent's conclusion

When an agent says covered or safe, you need more than confidence. You need to see what it checked, what reporting it reviewed, and what gaps it found before you accept the answer.

$ npm test checkout.spec.tsRUN v2.0.5 /app/checkout× test/checkout.spec.ts (2)  ✓ creates an order  × keeps total after retryExpected: "$42.00"  Received: "$39.00"Test Files  1 failed (1)Tests       1 failed, 1 passed (2)Duration    612ms

Agent seesCheckout is broken

Agent missesA flaky test read stale cart data after retry

That is what Allure Agent Mode gives the agent.

It lets your AI agent use the same test reporting workflow your team already uses, so its answer can be based on what actually ran, not just what the terminal happened to print.

Allure agent outputindex.md

Coverage check

I checked checkout coverage with three fresh Allure agent runs.

Scope gap:checkout.smoke.test.ts passed, but 5 skipped scenarios were present in the same run.
Context gap:checkout.spec.ts covered the legacy checkout route, not the new checkout page.
Test quality: the retry failure read stale cart data after navigation, so I would not treat it as proof that checkout is broken.

I would not call checkout fully covered yet. The useful next test is the new-page retry path with fresh cart state.

How it works

A test report for your agent

Allure Agent Mode wraps the test command your agent would run anyway and writes agent-readable output, including a Markdown run report. The Allure skills teach Codex, Claude, and other coding agents how to set it up, when to run it, and how to review the result before answering.

Test authoring

Add or change tests without losing the behavior you meant to protect. The agent runs the intended scope, checks what actually executed, and strengthens weak assertions or evidence before calling the work done.

Coverage review

Review what is covered at runtime, not just what appears covered from source code. The agent compares intended scope with observed tests, skipped cases, attachments, and gaps.

Regression safety checks

Before saying a change is safe, the agent runs the smallest meaningful test scope and reports what passed, what was skipped or retried, and what remains uncertain.

Failure debugging

When tests fail, the agent reviews per-test evidence, findings, logs, screenshots, traces, retries, and stderr before deciding whether it is a product bug, test bug, fixture issue, or environment problem.

Flaky test investigation

Separate real failures from unstable tests. The agent can inspect retries, timing, stale state, attachments, and repeated runs before recommending a fix.

Evidence enrichment

Improve tests that pass but are hard to review. The agent adds meaningful steps, attachments, labels, parameters, or descriptions, then reruns the same scope to confirm the report is clearer.

Getting started

Try Agent Mode in your repo

Start by adding the Allure skills. If your project already has Allure reporting, you can go straight to the agent workflow setup. If not, ask your agent to configure reporting first. The examples use different coding agents to show the workflow is portable; use whichever CLI you prefer.

Shell

Add the skills

Install the Allure skills in the repository where your tests live.

$ npx skills add allure-framework/skills

Claude

Configure Allure reporting

Optional. Use this when the project does not already emit useful Allure results.

$ claude "configure allure reporting"

Codex

Enable Agent Mode

Let the agent discover local wrappers, test commands, result paths, and supported agent options.

$ codex "enable allure agent mode"

OpenCode

Ask for real test work

Now use normal testing requests. Agent Mode gives the agent a report to review before it answers.

$ opencode run "review checkout test coverage"

Works your way

Use Agent Mode with the AI workflow you already have

Allure Agent Mode does not require MCP, a hosted service, or a specific AI vendor. It works through the test command your coding agent can already run, then gives that agent a readable report to review before it answers.

No MCP requiredRun it from the shell. Your agent only needs to execute project commands and read the generated report.
Any agentic code workflowUse Codex, Claude, OpenCode, custom scripts, or CI-driven agents with the same testing setup.
Any modelSwitch tools or models without changing how your tests report.
Humans still get Allure reportsYour team keeps visual reports for charts, history, CI artifacts, and review.