CLI Reference — ReplayCI
Complete reference for the replayci command-line tool.
Installation
npm install -D @replayci/cli
Run via npx:
npx replayci [command] [options]
Commands
replayci (default)
Run contracts against a provider. Reads config from .replayci.yml, overridden by CLI flags.
npx replayci
npx replayci --provider openai --model gpt-4o-mini
npx replayci --pack packs/my-pack --persist
replayci init
Scaffold a new project with a config file and starter pack.
npx replayci init
Creates:
.replayci.yml— config file with defaultspacks/starter/— a working contract pack
Skips files that already exist (never overwrites).
replayci observe
Auto-generate contracts by observing live LLM behavior. Provide simple JSON observation specs (messages + tools), and ReplayCI calls the provider, infers contract invariants from the response, and generates a full runnable pack.
npx replayci observe --provider openai --model gpt-4o-mini
npx replayci observe --provider anthropic --model claude-sonnet-4-6 --input specs/ --output packs/my-pack
Generated contracts have status: observed (draft) and must be reviewed before promotion to truth contracts. See the Observe Guide for the full workflow.
Observe-specific flags:
| Flag | Description | Default |
|---|---|---|
--provider <name> | LLM provider: openai, anthropic | Required |
--model <name> | Model ID (e.g. gpt-4o-mini, claude-sonnet-4-6) | Required |
--input <dir> | Directory containing observation spec JSON files | observe/ |
--output <dir> | Output directory for the generated pack | packs/observed |
--json | Force JSON output | Auto-detect |
--timeout_ms <ms> | Timeout per provider call in milliseconds | 30000 |
Observation spec format:
Each .json file in the input directory defines one observation — the messages to send and the tools to make available:
{
"messages": [
{ "role": "user", "content": "What's the weather in San Francisco?" }
],
"tools": [
{
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" }
},
"required": ["location"]
}
}
]
}
Optional fields: tool_choice ("auto", "none", "required" — default "auto"), temperature (default 0), max_tokens (default 1024).
What gets generated:
For each successful observation:
contracts/<tool>.yaml— Contract with inferred invariants, markedstatus: observedgolden/<tool>.success.json— Golden fixture with boundary hashesrecordings/<tool>.success.recording.json— Recording file for replay
Pack-level files (generated once):
pack.yaml— Pack metadata listing all contractsNeverNormalize.json— Standard normalization exclusionsnr-allowlist.json— Empty NR allowlist
Pretty output example:
replayci observe · openai/gpt-4o-mini · 3 specs
✓ get_weather → contracts/get_weather.yaml
✓ deploy_service → contracts/deploy_service.yaml
✗ send_email error: HTTP 401
2 contracts generated · 1 skipped
Next steps:
1. Review contracts in packs/observed/contracts/
2. Run: npx replayci --pack packs/observed --provider recorded
replayci promote
Promote an observed pack to a truth pack. Copies the pack directory and applies promotion transforms: removes status: observed from contracts, expands provider_modes to include live providers, and adds a promotion comment.
npx replayci promote --from packs/observed --to packs/my-truth
Output:
Promoted packs/observed -> packs/my-truth
Next steps:
[ ] Review expect_tools — add/remove required tools
[ ] Review expected_tool_calls — tighten argument invariants
[ ] Adjust pass_threshold (currently 1.0)
[ ] Test: npx replayci --pack packs/my-truth --provider openai --model gpt-4o
| Flag | Description | Required |
|---|---|---|
--from <path> | Source observed pack directory | Yes |
--to <path> | Destination truth pack directory (must not exist) | Yes |
After promoting, edit the contracts in packs/my-truth/contracts/ to tighten invariants. See the Promoting Contracts guide.
replayci validate
Validate a pack's structure without running it. Checks that all files exist, contracts parse correctly, golden cases reference valid fixtures, and invariant paths are syntactically valid.
npx replayci validate --pack packs/my-pack
Output:
Validating packs/my-pack...
pack.yaml OK
contracts/apt_triage.yaml OK
golden/apt_triage.success.json OK
recordings/ WARN: No recording files found.
NeverNormalize.json OK
nr-allowlist.json OK
Contract: apt_triage.yaml
expect_tools (6 tools) OK
expected_tool_calls (6 matchers) OK
pass_threshold: 0.85 OK
golden_cases (1 case) OK
Result: VALID (1 warning)
| Flag | Description | Default |
|---|---|---|
--pack <path> | Pack directory to validate | Required |
--json | Force JSON output | Auto-detect |
Exit codes: 0 = valid, 1 = errors found, 2 = usage error.
replayci compare
Compare two run outputs side-by-side. Useful for evaluating whether a different model or provider is safer for your contracts.
npx replayci --pack packs/my-truth --provider openai --model gpt-4o --json > /tmp/baseline.json
npx replayci --pack packs/my-truth --provider openai --model gpt-4o-mini --json > /tmp/candidate.json
npx replayci compare --baseline /tmp/baseline.json --candidate /tmp/candidate.json
| Flag | Description | Required |
|---|---|---|
--baseline <path> | Path to baseline run JSON file | Yes |
--candidate <path> | Path to candidate run JSON file | Yes |
--json | Force JSON output | No |
replayci sync
Sync reviewed contracts from the hosted dashboard to your local project. Downloads contracts, golden fixtures, and recordings into a replayci/ directory. Uses content-addressed hashing to skip unchanged files.
npx replayci sync # sync all contracts
npx replayci sync --agent tax-router # sync single agent
npx replayci sync --force # force re-sync (skip hash check)
| Flag | Description | Default |
|---|---|---|
--agent <name> | Sync contracts for a specific agent only | All agents |
--output <dir> | Output directory | replayci/ |
--force | Skip content-addressed hash check, re-download everything | false |
Requires REPLAYCI_API_KEY.
replayci doctor
Diagnose your ReplayCI integration health. Checks local SDK state, hosted API connectivity, sync freshness, and produces a verdict.
npx replayci doctor # full check
npx replayci doctor --agent support-bot # check specific agent
npx replayci doctor --no-network # local-only (skip hosted API)
npx replayci doctor --hosted-only # hosted-only (skip local SDK)
npx replayci doctor --strict # exit 1 on warnings
npx replayci doctor --json # JSON output
| Flag | Description | Default |
|---|---|---|
--agent <name> | Check a specific agent | All |
--no-network | Skip hosted API checks | false |
--hosted-only | Skip local SDK checks | false |
--strict | Exit 1 on warnings (not just failures) | false |
--json | Force JSON output | Auto-detect |
--state_dir <path> | Override SDK state directory | .replayci/runtime |
--timeout_ms <ms> | Hosted API request timeout | 10000 |
Exit codes: 0 = healthy, 1 = failures (or warnings with --strict), 2 = usage error.
replayci models
List registered model families and aliases for a provider. Shows how model resolution works — which family patterns match, what request profile each family uses, and available shorthand aliases.
Also supports --check (registry coverage check against live provider API) and --probe (account model access check).
npx replayci models --provider openai
npx replayci models --provider anthropic
npx replayci models --provider openai --json
npx replayci models --check
npx replayci models --provider openai --check
npx replayci models --provider openai --probe
| Flag | Description | Default |
|---|---|---|
--provider <name> | Provider: openai, anthropic | Required for list/probe |
--check | Check registry coverage against live provider API | false |
--probe | Probe account model access (accessible vs denied) | false |
--json | Force JSON output | Auto-detect |
--check fetches the provider's model listing API and tests each model ID against family patterns in the registry. Reports unmatched models. Exits non-zero if unmatched models are found. Without --provider, checks both OpenAI and Anthropic. Requires REPLAYCI_PROVIDER_KEY and/or ANTHROPIC_API_KEY.
--probe calls the provider API with your API key and reports which registered models (aliases and family matches) are accessible to your account vs denied. Useful for onboarding verification. Requires REPLAYCI_PROVIDER_KEY (or OPENAI_API_KEY / ANTHROPIC_API_KEY).
--resolve example (on the default run command, not models):
npx replayci --provider openai --model 5.2 --resolve
Model Resolution
Input: 5.2
Provider: openai
Resolved: gpt-5.2
Family: GPT-5
Source: alias
Token field: max_completion_tokens
Temperature: yes
Exits immediately after printing — no contracts are run. Useful for verifying how an alias or raw model ID resolves before committing it to .replayci.yml.
Pretty output example (models --provider):
OpenAI — Registered Families
GPT-5 family match: ^gpt-5
token field: max_completion_tokens
temperature: yes
GPT-4 family match: ^gpt-4
token field: max_tokens
temperature: yes
O-series (reasoning) match: ^o[1-9]
token field: max_completion_tokens
temperature: no
Aliases:
5.2 → gpt-5.2
4o-mini → gpt-4o-mini
Any model matching a family prefix works automatically.
Unregistered models pass through with provider defaults.
replayci help
Show help text with all flags and examples.
npx replayci --help
npx replayci -h
npx replayci help
replayci drift
Run drift detection against the current baseline.
npx replayci drift
replayci export-bundle
Export a replay bundle for offline verification.
npx replayci export-bundle
replayci replay-bundle
Replay a previously exported bundle.
npx replayci replay-bundle
Options
Provider & model
| Flag | Description | Default |
|---|---|---|
--provider <name> | LLM provider: openai, anthropic, recorded | From .replayci.yml |
--model <name> | Model ID or alias (e.g. gpt-4o-mini, opus-4, 5.2) | From .replayci.yml |
--pack <path> | Path to contract pack directory | packs/starter |
--strict-model | Fail if model has no matching family (instead of passthrough) | false |
--resolve | Print model resolution details and exit (no run) | false |
Execution
| Flag | Description | Default |
|---|---|---|
--persist | Write run artifacts to disk | false |
--capture-recordings | Save live responses as recordings/*.recording.json | false |
--draft | Run observed contracts against live providers (skip provider_modes filter) | false |
--repeat <n> | Run N times for determinism proof | 1 |
--only_contracts <csv> | Filter to specific contract filenames (comma-separated) | All contracts |
--json | Force JSON output (default for non-TTY) | Auto-detect |
Identity
| Flag | Description | Default |
|---|---|---|
--tenant_id <id> | Tenant identifier | t_default |
--run_mode <mode> | Run mode: manual, scheduled, ci | manual |
Timeouts & limits
| Flag | Description | Default |
|---|---|---|
--timeout_ms <ms> | Timeout per contract in milliseconds | 30000 |
--retry_cap <n> | Maximum retry attempts | 2 |
--max_contracts <n> | Limit number of contracts to run | All |
Shadow mode
| Flag | Description |
|---|---|
--shadow-capture | Enable shadow capture for live provider runs |
--shadow-provider <name> | Shadow provider for comparison (openai, anthropic) |
--shadow-model <name> | Shadow model for comparison |
Other
| Flag | Description |
|---|---|
--artifact_root <path> | Override artifact storage directory |
--side_effect_mode <mode> | read_only or allow_all |
Pushing to Dashboard
Set REPLAYCI_API_KEY to automatically push every run result to the hosted dashboard at app.replayci.com:
export REPLAYCI_API_KEY=rci_live_your_key_here
npx replayci --pack packs/my-pack --provider openai --model gpt-4o-mini
Results appear in your dashboard within seconds. No additional flags needed — push is automatic when the API key is set.
- Push is independent of
--persist—--persistwrites artifacts to local disk (requires a database). Dashboard push works without it. - Push failure is non-fatal — if the push fails (network error, invalid key), the run still completes and results are printed to stdout. A warning is emitted to stderr.
- Override the API URL with
REPLAYCI_API_URLfor self-hosted setups.
Get your API key from app.replayci.com/signup or from Settings > API Keys in the dashboard.
Environment variables
| Variable | Description | Required |
|---|---|---|
REPLAYCI_API_KEY | API key for pushing results to the dashboard (see above) | For dashboard push |
REPLAYCI_PROVIDER_KEY | Provider API key (OpenAI or Anthropic) | For live provider runs |
OPENAI_API_KEY | OpenAI API key (fallback if REPLAYCI_PROVIDER_KEY not set) | Alternative for OpenAI |
ANTHROPIC_API_KEY | Anthropic API key (fallback if REPLAYCI_PROVIDER_KEY not set) | Alternative for Anthropic |
REPLAYCI_API_URL | Override API base URL | No (default: https://app.replayci.com) |
DATABASE_URL | PostgreSQL connection string (local dev only) | For local DB persistence |
API keys always come from environment variables — never from .replayci.yml or CLI flags.
Config file
ReplayCI reads .replayci.yml from the project root. CLI flags override config values.
pack: "./packs/starter"
provider: openai
model: gpt-4o-mini
persist: false
capture_recordings: false
draft: false
tenant_id: t_default
run_mode: manual
observe_input: "./observe"
observe_output: "./packs/observed"
All fields are optional. Missing fields use defaults.
Resolution order: CLI flags > .replayci.yml > built-in defaults.
Output formats
Pretty (default for TTY)
replayci · openai/gpt-4o-mini · 4 contracts
✓ tool_call Pass a3f82c91
✓ function_call Pass b7d104e3
✓ structured_output Pass c9a2f156
✗ error_handling Fail e1b3d478 ← schema_payload
3/4 passed · 1 failed
results → https://app.replayci.com/runs/r_8f3a2c
Each line shows: status indicator, contract name, pass/fail, 8-character fingerprint, and optional failure category.
The results → line appears when REPLAYCI_API_KEY is set and results are pushed to the dashboard.
JSON (default for pipes/CI, or --json)
Full structured output including traceability, step details, and all metadata. Suitable for parsing in CI pipelines.
npx replayci --json | jq '.provider_run.steps[0].state'
Dashboard push
When REPLAYCI_API_KEY is set and a provider run completes, results are automatically pushed to the hosted dashboard at https://app.replayci.com.
- Push is independent of
--persist— both can run simultaneously - Push failure warns to stderr but does not affect the exit code
- Override the API URL with
REPLAYCI_API_URLfor local development
Determinism proof
Run contracts multiple times and compare fingerprints:
npx replayci --repeat 3
Output includes a determinism_proof object showing whether each step produced identical fingerprints across all runs. Useful for verifying that provider responses are stable.
Exit codes
| Code | Meaning |
|---|---|
0 | All contracts passed |
1 | One or more contracts failed, or a runtime error occurred |
2 | Drift detection or unknown-rate gate failure |
Examples
# Initialize a new project
npx replayci init
# Run with defaults from .replayci.yml
npx replayci
# Run against OpenAI with a specific model
npx replayci --provider openai --model gpt-4o-mini
# Run against Anthropic
npx replayci --provider anthropic --model claude-sonnet-4-6
# Use model aliases (shorter names)
npx replayci --provider openai --model 5.2
npx replayci --provider anthropic --model opus-4
# Check how a model alias resolves
npx replayci --provider anthropic --model opus-4 --resolve
# Strict mode: fail if model is not in registry
npx replayci --provider openai --model gpt-4o-mini --strict-model
# List registered models and aliases for a provider
npx replayci models --provider openai
# Check registry coverage against live provider API
npx replayci models --check
# Check account model access
npx replayci models --provider openai --probe
# Run against recorded fixtures (offline, deterministic)
npx replayci --provider recorded
# Run specific contracts only
npx replayci --only_contracts tool_call.yaml,function_call.yaml
# Prove determinism with 3 identical runs
npx replayci --repeat 3
# Force JSON output for CI
npx replayci --json
# Run and persist to local disk
npx replayci --persist
# Capture live responses as recording files for replay
npx replayci --provider openai --model gpt-4o-mini --capture-recordings
# Test observed contracts against a live provider without promoting first
npx replayci --pack packs/observed --provider openai --model gpt-4o-mini --draft
# Shadow comparison: OpenAI primary, Anthropic shadow
npx replayci --provider openai --model gpt-4o-mini \
--shadow-capture --shadow-provider anthropic --shadow-model claude-sonnet-4-6
# Auto-generate contracts from observation specs
npx replayci observe --provider openai --model gpt-4o-mini
# Generate contracts with custom input/output directories
npx replayci observe --provider anthropic --model claude-sonnet-4-6 \
--input specs/ --output packs/anthropic-observed
# Run the generated pack against recorded fixtures
npx replayci --pack packs/observed --provider recorded
# Promote an observed pack to a truth pack
npx replayci promote --from packs/observed --to packs/my-truth
# Validate pack structure without running
npx replayci validate --pack packs/my-truth
# Compare two model runs
npx replayci compare --baseline /tmp/gpt4o.json --candidate /tmp/gpt4o-mini.json
# Sync contracts from dashboard
npx replayci sync
# Check integration health
npx replayci doctor
# Show help
npx replayci --help