CLI Reference — ReplayCI

Complete reference for the replayci command-line tool.

Installation

npm install -D @replayci/cli

Run via npx:

npx replayci [command] [options]

Commands

`replayci` (default)

Run contracts against a provider. Reads config from .replayci.yml, overridden by CLI flags.

npx replayci
npx replayci --provider openai --model gpt-4o-mini
npx replayci --pack packs/my-pack --persist

`replayci init`

Scaffold a new project with a config file and starter pack.

npx replayci init

Creates:

.replayci.yml — config file with defaults
packs/starter/ — a working contract pack

Skips files that already exist (never overwrites).

`replayci observe`

Auto-generate contracts by observing live LLM behavior. Provide simple JSON observation specs (messages + tools), and ReplayCI calls the provider, infers contract invariants from the response, and generates a full runnable pack.

npx replayci observe --provider openai --model gpt-4o-mini
npx replayci observe --provider anthropic --model claude-sonnet-4-6 --input specs/ --output packs/my-pack

Generated contracts have status: observed (draft) and must be reviewed before promotion to truth contracts. See the Observe Guide for the full workflow.

Observe-specific flags:

Flag	Description	Default
`--provider <name>`	LLM provider: `openai`, `anthropic`	Required
`--model <name>`	Model ID (e.g. `gpt-4o-mini`, `claude-sonnet-4-6`)	Required
`--input <dir>`	Directory containing observation spec JSON files	`observe/`
`--output <dir>`	Output directory for the generated pack	`packs/observed`
`--json`	Force JSON output	Auto-detect
`--timeout_ms <ms>`	Timeout per provider call in milliseconds	`30000`

Observation spec format:

Each .json file in the input directory defines one observation — the messages to send and the tools to make available:

{
  "messages": [
    { "role": "user", "content": "What's the weather in San Francisco?" }
  ],
  "tools": [
    {
      "name": "get_weather",
      "description": "Get current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": { "type": "string" }
        },
        "required": ["location"]
      }
    }
  ]
}

Optional fields: tool_choice ("auto", "none", "required" — default "auto"), temperature (default 0), max_tokens (default 1024).

What gets generated:

For each successful observation:

contracts/<tool>.yaml — Contract with inferred invariants, marked status: observed
golden/<tool>.success.json — Golden fixture with boundary hashes
recordings/<tool>.success.recording.json — Recording file for replay

Pack-level files (generated once):

pack.yaml — Pack metadata listing all contracts
NeverNormalize.json — Standard normalization exclusions
nr-allowlist.json — Empty NR allowlist

Pretty output example:

  replayci observe · openai/gpt-4o-mini · 3 specs

  ✓ get_weather              → contracts/get_weather.yaml
  ✓ deploy_service           → contracts/deploy_service.yaml
  ✗ send_email               error: HTTP 401

  2 contracts generated · 1 skipped

  Next steps:
    1. Review contracts in packs/observed/contracts/
    2. Run: npx replayci --pack packs/observed --provider recorded

`replayci promote`

Promote an observed pack to a truth pack. Copies the pack directory and applies promotion transforms: removes status: observed from contracts, expands provider_modes to include live providers, and adds a promotion comment.

npx replayci promote --from packs/observed --to packs/my-truth

Output:

Promoted packs/observed -> packs/my-truth

Next steps:
  [ ] Review expect_tools — add/remove required tools
  [ ] Review expected_tool_calls — tighten argument invariants
  [ ] Adjust pass_threshold (currently 1.0)
  [ ] Test: npx replayci --pack packs/my-truth --provider openai --model gpt-4o

Flag	Description	Required
`--from <path>`	Source observed pack directory	Yes
`--to <path>`	Destination truth pack directory (must not exist)	Yes

After promoting, edit the contracts in packs/my-truth/contracts/ to tighten invariants. See the Promoting Contracts guide.

`replayci validate`

Validate a pack's structure without running it. Checks that all files exist, contracts parse correctly, golden cases reference valid fixtures, and invariant paths are syntactically valid.

npx replayci validate --pack packs/my-pack

Output:

Validating packs/my-pack...

  pack.yaml                          OK
  contracts/apt_triage.yaml          OK
  golden/apt_triage.success.json     OK
  recordings/                        WARN: No recording files found.
  NeverNormalize.json                OK
  nr-allowlist.json                  OK

  Contract: apt_triage.yaml
    expect_tools (6 tools)           OK
    expected_tool_calls (6 matchers) OK
    pass_threshold: 0.85             OK
    golden_cases (1 case)            OK

  Result: VALID (1 warning)

Flag	Description	Default
`--pack <path>`	Pack directory to validate	Required
`--json`	Force JSON output	Auto-detect

Exit codes: 0 = valid, 1 = errors found, 2 = usage error.

`replayci compare`

Compare two run outputs side-by-side. Useful for evaluating whether a different model or provider is safer for your contracts.

npx replayci --pack packs/my-truth --provider openai --model gpt-4o --json > /tmp/baseline.json
npx replayci --pack packs/my-truth --provider openai --model gpt-4o-mini --json > /tmp/candidate.json
npx replayci compare --baseline /tmp/baseline.json --candidate /tmp/candidate.json

Flag	Description	Required
`--baseline <path>`	Path to baseline run JSON file	Yes
`--candidate <path>`	Path to candidate run JSON file	Yes
`--json`	Force JSON output	No

`replayci sync`

Sync reviewed contracts from the hosted dashboard to your local project. Downloads contracts, golden fixtures, and recordings into a replayci/ directory. Uses content-addressed hashing to skip unchanged files.

npx replayci sync                     # sync all contracts
npx replayci sync --agent tax-router  # sync single agent
npx replayci sync --force             # force re-sync (skip hash check)

Flag	Description	Default
`--agent <name>`	Sync contracts for a specific agent only	All agents
`--output <dir>`	Output directory	`replayci/`
`--force`	Skip content-addressed hash check, re-download everything	`false`

Requires REPLAYCI_API_KEY.

`replayci doctor`

Diagnose your ReplayCI integration health. Checks local SDK state, hosted API connectivity, sync freshness, and produces a verdict.

npx replayci doctor                         # full check
npx replayci doctor --agent support-bot     # check specific agent
npx replayci doctor --no-network            # local-only (skip hosted API)
npx replayci doctor --hosted-only           # hosted-only (skip local SDK)
npx replayci doctor --strict                # exit 1 on warnings
npx replayci doctor --json                  # JSON output

Flag	Description	Default
`--agent <name>`	Check a specific agent	All
`--no-network`	Skip hosted API checks	`false`
`--hosted-only`	Skip local SDK checks	`false`
`--strict`	Exit 1 on warnings (not just failures)	`false`
`--json`	Force JSON output	Auto-detect
`--state_dir <path>`	Override SDK state directory	`.replayci/runtime`
`--timeout_ms <ms>`	Hosted API request timeout	`10000`

Exit codes: 0 = healthy, 1 = failures (or warnings with --strict), 2 = usage error.

`replayci models`

List registered model families and aliases for a provider. Shows how model resolution works — which family patterns match, what request profile each family uses, and available shorthand aliases.

Also supports --check (registry coverage check against live provider API) and --probe (account model access check).

npx replayci models --provider openai
npx replayci models --provider anthropic
npx replayci models --provider openai --json
npx replayci models --check
npx replayci models --provider openai --check
npx replayci models --provider openai --probe

Flag	Description	Default
`--provider <name>`	Provider: `openai`, `anthropic`	Required for list/probe
`--check`	Check registry coverage against live provider API	`false`
`--probe`	Probe account model access (accessible vs denied)	`false`
`--json`	Force JSON output	Auto-detect

--check fetches the provider's model listing API and tests each model ID against family patterns in the registry. Reports unmatched models. Exits non-zero if unmatched models are found. Without --provider, checks both OpenAI and Anthropic. Requires REPLAYCI_PROVIDER_KEY and/or ANTHROPIC_API_KEY.

--probe calls the provider API with your API key and reports which registered models (aliases and family matches) are accessible to your account vs denied. Useful for onboarding verification. Requires REPLAYCI_PROVIDER_KEY (or OPENAI_API_KEY / ANTHROPIC_API_KEY).

--resolve example (on the default run command, not models):

npx replayci --provider openai --model 5.2 --resolve

  Model Resolution

  Input:       5.2
  Provider:    openai
  Resolved:    gpt-5.2
  Family:      GPT-5
  Source:      alias
  Token field: max_completion_tokens
  Temperature: yes

Exits immediately after printing — no contracts are run. Useful for verifying how an alias or raw model ID resolves before committing it to .replayci.yml.

Pretty output example (models --provider):

OpenAI — Registered Families

  GPT-5 family               match: ^gpt-5
    token field: max_completion_tokens
    temperature: yes

  GPT-4 family               match: ^gpt-4
    token field: max_tokens
    temperature: yes

  O-series (reasoning)       match: ^o[1-9]
    token field: max_completion_tokens
    temperature: no

  Aliases:
    5.2        → gpt-5.2
    4o-mini    → gpt-4o-mini

  Any model matching a family prefix works automatically.
  Unregistered models pass through with provider defaults.

`replayci help`

Show help text with all flags and examples.

npx replayci --help
npx replayci -h
npx replayci help

`replayci drift`

Run drift detection against the current baseline.

npx replayci drift

`replayci export-bundle`

Export a replay bundle for offline verification.

npx replayci export-bundle

`replayci replay-bundle`

Replay a previously exported bundle.

npx replayci replay-bundle

Options

Provider & model

Flag	Description	Default
`--provider <name>`	LLM provider: `openai`, `anthropic`, `recorded`	From `.replayci.yml`
`--model <name>`	Model ID or alias (e.g. `gpt-4o-mini`, `opus-4`, `5.2`)	From `.replayci.yml`
`--pack <path>`	Path to contract pack directory	`packs/starter`
`--strict-model`	Fail if model has no matching family (instead of passthrough)	`false`
`--resolve`	Print model resolution details and exit (no run)	`false`

Execution

Flag	Description	Default
`--persist`	Write run artifacts to disk	`false`
`--capture-recordings`	Save live responses as `recordings/*.recording.json`	`false`
`--draft`	Run observed contracts against live providers (skip `provider_modes` filter)	`false`
`--repeat <n>`	Run N times for determinism proof	`1`
`--only_contracts <csv>`	Filter to specific contract filenames (comma-separated)	All contracts
`--json`	Force JSON output (default for non-TTY)	Auto-detect

Identity

Flag	Description	Default
`--tenant_id <id>`	Tenant identifier	`t_default`
`--run_mode <mode>`	Run mode: `manual`, `scheduled`, `ci`	`manual`

Timeouts & limits

Flag	Description	Default
`--timeout_ms <ms>`	Timeout per contract in milliseconds	`30000`
`--retry_cap <n>`	Maximum retry attempts	`2`
`--max_contracts <n>`	Limit number of contracts to run	All

Shadow mode

Flag	Description
`--shadow-capture`	Enable shadow capture for live provider runs
`--shadow-provider <name>`	Shadow provider for comparison (`openai`, `anthropic`)
`--shadow-model <name>`	Shadow model for comparison

Other

Flag	Description
`--artifact_root <path>`	Override artifact storage directory
`--side_effect_mode <mode>`	`read_only` or `allow_all`

Pushing to Dashboard

Set REPLAYCI_API_KEY to automatically push every run result to the hosted dashboard at app.replayci.com:

export REPLAYCI_API_KEY=rci_live_your_key_here
npx replayci --pack packs/my-pack --provider openai --model gpt-4o-mini

Results appear in your dashboard within seconds. No additional flags needed — push is automatic when the API key is set.

Push is independent of --persist — --persist writes artifacts to local disk (requires a database). Dashboard push works without it.
Push failure is non-fatal — if the push fails (network error, invalid key), the run still completes and results are printed to stdout. A warning is emitted to stderr.
Override the API URL with REPLAYCI_API_URL for self-hosted setups.

Get your API key from app.replayci.com/signup or from Settings > API Keys in the dashboard.

Environment variables

Variable	Description	Required
`REPLAYCI_API_KEY`	API key for pushing results to the dashboard (see above)	For dashboard push
`REPLAYCI_PROVIDER_KEY`	Provider API key (OpenAI or Anthropic)	For live provider runs
`OPENAI_API_KEY`	OpenAI API key (fallback if `REPLAYCI_PROVIDER_KEY` not set)	Alternative for OpenAI
`ANTHROPIC_API_KEY`	Anthropic API key (fallback if `REPLAYCI_PROVIDER_KEY` not set)	Alternative for Anthropic
`REPLAYCI_API_URL`	Override API base URL	No (default: `https://app.replayci.com`)
`DATABASE_URL`	PostgreSQL connection string (local dev only)	For local DB persistence

API keys always come from environment variables — never from .replayci.yml or CLI flags.

Config file

ReplayCI reads .replayci.yml from the project root. CLI flags override config values.

pack: "./packs/starter"
provider: openai
model: gpt-4o-mini
persist: false
capture_recordings: false
draft: false
tenant_id: t_default
run_mode: manual
observe_input: "./observe"
observe_output: "./packs/observed"

All fields are optional. Missing fields use defaults.

Resolution order: CLI flags > .replayci.yml > built-in defaults.

Output formats

Pretty (default for TTY)

  replayci · openai/gpt-4o-mini · 4 contracts

  ✓ tool_call              Pass  a3f82c91
  ✓ function_call          Pass  b7d104e3
  ✓ structured_output      Pass  c9a2f156
  ✗ error_handling         Fail  e1b3d478  ← schema_payload

  3/4 passed · 1 failed

  results → https://app.replayci.com/runs/r_8f3a2c

Each line shows: status indicator, contract name, pass/fail, 8-character fingerprint, and optional failure category.

The results → line appears when REPLAYCI_API_KEY is set and results are pushed to the dashboard.

JSON (default for pipes/CI, or `--json`)

Full structured output including traceability, step details, and all metadata. Suitable for parsing in CI pipelines.

npx replayci --json | jq '.provider_run.steps[0].state'

Dashboard push

When REPLAYCI_API_KEY is set and a provider run completes, results are automatically pushed to the hosted dashboard at https://app.replayci.com.

Push is independent of --persist — both can run simultaneously
Push failure warns to stderr but does not affect the exit code
Override the API URL with REPLAYCI_API_URL for local development

Determinism proof

Run contracts multiple times and compare fingerprints:

npx replayci --repeat 3

Output includes a determinism_proof object showing whether each step produced identical fingerprints across all runs. Useful for verifying that provider responses are stable.

Exit codes

Code	Meaning
`0`	All contracts passed
`1`	One or more contracts failed, or a runtime error occurred
`2`	Drift detection or unknown-rate gate failure

Examples

# Initialize a new project
npx replayci init

# Run with defaults from .replayci.yml
npx replayci

# Run against OpenAI with a specific model
npx replayci --provider openai --model gpt-4o-mini

# Run against Anthropic
npx replayci --provider anthropic --model claude-sonnet-4-6

# Use model aliases (shorter names)
npx replayci --provider openai --model 5.2
npx replayci --provider anthropic --model opus-4

# Check how a model alias resolves
npx replayci --provider anthropic --model opus-4 --resolve

# Strict mode: fail if model is not in registry
npx replayci --provider openai --model gpt-4o-mini --strict-model

# List registered models and aliases for a provider
npx replayci models --provider openai

# Check registry coverage against live provider API
npx replayci models --check

# Check account model access
npx replayci models --provider openai --probe

# Run against recorded fixtures (offline, deterministic)
npx replayci --provider recorded

# Run specific contracts only
npx replayci --only_contracts tool_call.yaml,function_call.yaml

# Prove determinism with 3 identical runs
npx replayci --repeat 3

# Force JSON output for CI
npx replayci --json

# Run and persist to local disk
npx replayci --persist

# Capture live responses as recording files for replay
npx replayci --provider openai --model gpt-4o-mini --capture-recordings

# Test observed contracts against a live provider without promoting first
npx replayci --pack packs/observed --provider openai --model gpt-4o-mini --draft

# Shadow comparison: OpenAI primary, Anthropic shadow
npx replayci --provider openai --model gpt-4o-mini \
  --shadow-capture --shadow-provider anthropic --shadow-model claude-sonnet-4-6

# Auto-generate contracts from observation specs
npx replayci observe --provider openai --model gpt-4o-mini

# Generate contracts with custom input/output directories
npx replayci observe --provider anthropic --model claude-sonnet-4-6 \
  --input specs/ --output packs/anthropic-observed

# Run the generated pack against recorded fixtures
npx replayci --pack packs/observed --provider recorded

# Promote an observed pack to a truth pack
npx replayci promote --from packs/observed --to packs/my-truth

# Validate pack structure without running
npx replayci validate --pack packs/my-truth

# Compare two model runs
npx replayci compare --baseline /tmp/gpt4o.json --candidate /tmp/gpt4o-mini.json

# Sync contracts from dashboard
npx replayci sync

# Check integration health
npx replayci doctor

# Show help
npx replayci --help

Installation​

Commands​

replayci (default)​

replayci init​

replayci observe​

replayci promote​

replayci validate​

replayci compare​

replayci sync​

replayci doctor​

replayci models​

replayci help​

replayci drift​

replayci export-bundle​

replayci replay-bundle​

Options​

Provider & model​

Execution​

Identity​

Timeouts & limits​

Shadow mode​

Other​

Pushing to Dashboard​

Environment variables​

Config file​

Output formats​

Pretty (default for TTY)​

JSON (default for pipes/CI, or --json)​

Dashboard push​

Determinism proof​

Exit codes​

Examples​