Agent Mode Allure 3
Allure Agent Mode lets an AI coding agent run your existing test command and review agent-readable test reporting before it answers.
Use it when an agent needs to make a claim about test coverage, a new or changed test, a failing run, flaky behavior, or regression safety. Instead of relying only on terminal output, the agent can review what ran, what passed or failed, what was skipped or retried, and what evidence was attached.
Agent Mode does not replace Allure reports for humans. It adds an agent-readable layer on top of the same reporting workflow your team already uses.
What Agent Mode does
Agent Mode is useful when you ask a coding agent to:
- review whether a feature is actually covered by tests;
- add or improve a test and validate the intended scope;
- debug a failing or flaky test with runtime evidence;
- check whether a change is safe before claiming it is done;
- enrich tests with useful steps, attachments, labels, parameters, or descriptions.
The goal is not just to run a command. The goal is to make the agent base its answer on test reporting that can be reviewed.
How Agent Mode works
allure agent wraps the test command your project already uses:
allure agent -- npm test checkout.spec.tsEverything after -- is still your real test command. Your test runner executes the tests, Allure adapters emit Allure results, and Allure captures command output and runtime evidence to write Agent Mode output for that run.
A typical Agent Mode workflow is:
- Choose a focused validation goal.
- Run the smallest meaningful test scope through
allure agent. - Review the generated Agent Mode output before reading source code.
- Fix product code, tests, fixtures, or evidence quality when needed.
- Rerun when the conclusion depends on the fix.
- Report what ran, what was checked, what evidence supports the conclusion, and what remains uncertain.
Live runs and existing results
Agent Mode can give your coding agent reporting context in two ways:
- wrap a live test command with
allure agent -- <test-command>; - inspect existing Allure results or dump artifacts with
allure agent inspect.
The live-command path creates fresh evidence: the agent runs the relevant test scope through allure agent, Allure captures the command output and runtime evidence, and Allure writes Agent Mode output for that run. The agent then inspects that output to analyze coverage, failures, findings, and limitations.
Sometimes the useful evidence already exists. For example, a run may have happened in CI, on a teammate's machine, or earlier on your own machine. When that run preserved raw allure-results directories or Allure dump artifacts, Allure turns those artifacts into Agent Mode output without rerunning the tests.
Both paths give the agent a reporting-first review surface before it reads long logs, generated HTML reports, or source code. If the original run did not preserve raw results or dump artifacts, inspection has less runtime context to work with.
Agent output
allure agent creates and prints an output directory when no output path is provided. Use that default unless your project needs an explicit output location.
Agent output is the review surface your coding agent reads before it makes a test-quality claim. A typical output directory includes:
agent-output/
index.md
manifest/
tests/
logs/
awesome/The exact file names and internal layout may change in future Allure versions, but the idea stays the same: index.md is the entry point, and the directory contains the run summary plus the supporting runtime details.
Depending on what the run produced, Agent Mode output can give the agent access to:
- the wrapped command, exit code, timing, and selected test scope;
- passed, failed, skipped, broken, retried, and missing-test signals;
- findings about weak evidence, scope mismatches, unexpected skips, or reporting problems;
- per-test summaries with steps, labels, parameters, attachments, retries, and failure details;
- stdout, stderr, logs, and process-level artifacts when they matter;
- the generated human-report status and path, when a human-readable report was produced.
You can inspect the same Agent Mode output yourself. For example, if you already have an allure-results directory from a local run or CI artifact, create a readable Agent Mode output directory with:
allure agent inspect allure-results --output agent-outputEach run should have distinct Agent Mode output. Do not reuse one output directory for unrelated runs, and do not share one output path between parallel runs.
After each run, the agent should report the generated index.md path. For final or user-facing runs, if a human-readable report was generated, the agent should also report its path or state that no human report was generated.
Before you start
Requirements
To use Agent Mode, you need:
- Allure Report 3.12.0 or newer with Agent Mode support;
- a project test command, such as
npm test,pytest,mvn test, or./gradlew test; - Allure results emitted by the tests in a location Allure can discover;
- an AI coding agent that can run shell commands and read generated files in your repository.
If your project does not meet the reporting requirement yet, see Configure Allure reporting before enabling Agent Mode.
Agent Mode does not require MCP, a hosted service, or a specific AI vendor. You can use it with Codex, Claude, OpenCode, custom scripts, CI-driven agents, or another agentic code workflow.
Use the installed CLI as the source of truth for exact options:
allure --version
allure agent --help
allure agent capabilities --jsonIf your project uses a wrapper, package script, or build-system command for Allure, run the checks through that wrapper.
For reviewing Allure results that were already produced by a local run or CI job, use:
allure agent inspect --helpInstall or update Allure
Install Allure Report 3 using the normal installation flow for your project or environment:
Prefer the project-approved Allure command. In many repositories, that may be a package script, build task, wrapper, or local dependency rather than a global allure binary.
Configure Allure reporting
Agent Mode depends on useful Allure results. If your project does not emit Allure results yet, configure the appropriate Allure adapter for your test framework first.
Find your test framework in the Allure framework integrations list and follow its setup instructions.
In practice, allure agent -- <test-command> follows the same reporting constraints as allure run -- <test-command>: the wrapped command must be the trusted project test command, it must run with the wrappers and environment that make Allure result emission work, and Allure must be able to discover the produced result directories. A report command cannot repair missing or empty results after the fact.
Keep these boundaries clear:
- Allure results are raw files emitted by framework adapters.
- Agent Mode output is the Markdown and metadata generated for one
allure agentrun. - Allure reports are visual reports for humans.
Do not write all three artifact types into the same directory. Framework result directories should usually keep allure-results as the final path component, such as target/allure-results, build/allure-results, or out/allure-results.
Set up your coding agent
Add Allure skills for coding agents
If you use Codex, Claude, OpenCode, or another agent that supports local skills, add the Allure skills to the repository where your tests live:
npx skills add allure-framework/skillsThe repository currently includes these Allure skills:
allure-configure-reporting— configures Allure adapters, result directories, local report commands, evidence integrations, and CI artifact or report handling when a project does not emit useful Allure results yet.allure-configure-agent-workflow— configures the local Agent Mode workflow once reporting exists by discovering project test commands, wrappers, result paths, supportedallure agentoptions, and agent-output conventions.allure-test-agent— helps agents do day-to-day test work with Agent Mode, including coverage review, test authoring, failure debugging, flaky-test investigation, evidence enrichment, and regression checks.
Enable the local Agent Mode workflow
After the skills are installed, configure the local Agent Mode workflow with the allure-configure-agent-workflow skill.
If your agent supports direct skill invocation, call allure-configure-agent-workflow directly. Otherwise, ask your coding agent to enable Agent Mode for the project:
codex "enable allure agent mode"claude "enable allure agent mode"opencode run "enable allure agent mode"The agent should discover the local test commands, wrappers, result paths, supported allure agent options, and where generated Agent Mode output should be read from. If your project does not have Allure reporting yet, finish the reporting setup first, then return to Agent Mode.
The expected result is:
- a project guide at
docs/allure-agent-mode.md; - a short root agent entry file, such as
AGENTS.md,CLAUDE.md, or another project-specific instruction file, that links test-related work todocs/allure-agent-mode.md; - local notes about confirmed test commands, Allure wrappers, result paths, Agent Mode capabilities, output conventions, and known limitations.
Try it on real test work
There are many possible uses for Agent Mode. The main goal is always the same: give your coding agent test reporting it can inspect before it makes a claim.
This can make agentic development work faster or more precise when the extra verification is useful. An agent that can review executed tests, skipped cases, retries, attachments, and report output has better ways to check its own work than an agent that only sees terminal text.
The tradeoff is worth knowing up front: report-backed runs use more tokens and larger context than a plain terminal command. That is normal for evidence-rich agent work. Try it on the tasks below and judge the benefit in your own workflow.
Here are a few real tasks you can try with Agent Mode. If you try them, please share feedback with us so we can learn where it helps most.
codex "review checkout test coverage"claude "add a test for expired-card checkout"opencode run "debug the checkout retry test failure"Review test coverage
Use this when you want to know whether a feature, component, package, or test suite is actually covered at runtime.
Expected agent behavior: choose a focused scope, run it through Agent Mode, compare intended coverage with observed runtime coverage, and report which tests ran, which cases were skipped or missing, what evidence looked weak, and which coverage claims are still uncertain.
Add or improve a test
Use this when you want the agent to add a missing test, strengthen weak assertions, or improve test evidence with meaningful steps, attachments, labels, parameters, or descriptions.
Expected agent behavior: identify the behavior and risk, choose the test layer, add or improve the test, run the intended scope through Agent Mode, review the generated output, and only then explain what the new or changed test proves.
Debug a failing test
Use this when a test fails, hangs, or produces confusing terminal output. Agent Mode helps the agent review per-test evidence, findings, attachments, retries, logs, and process output before deciding what broke.
Expected agent behavior: run the smallest failing scope, inspect the generated Agent Mode output before source code, separate product failures from test bugs, fixture or setup issues, environment problems, and reporting or adapter issues, then rerun when the fix needs validation.
Investigate flaky behavior
Use this when a test changes outcome across retries, environments, browsers, shards, or reruns.
Expected agent behavior: inspect retries, timing, stale state, attachments, environment facts, and repeated evidence before recommending a fix. If the run is non-gating, quarantined, or too narrow, the agent should say so.
Check regression safety
Use this before accepting a change as safe. Ask the agent to run the smallest meaningful scope through Agent Mode and explain what the run actually proves.
Expected agent behavior: select the smallest meaningful scope, run it through Agent Mode, review the output, and state exactly what is proven and what is not. The agent should avoid broad claims when the selected scope was only a smoke test, skipped important cases, lacked evidence, or depended on mocks or unavailable services.