CI Integration — ReplayCI
Add ReplayCI to your CI pipeline to catch tool-call regressions before they reach production. This guide covers GitHub Actions, GitLab CI, and general CI setup.
Quick setup
The simplest CI integration is one line:
npx replayci --provider recorded
This runs your contracts against recorded fixtures — no API keys needed, no network calls, deterministic results. If any contract fails, the command exits with a non-zero code and your CI pipeline fails.
GitHub Actions
Basic workflow
# .github/workflows/replayci.yml
name: ReplayCI Gate
on:
pull_request:
branches: [main]
push:
branches: [main]
jobs:
replayci:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- name: Run ReplayCI contracts
run: npx replayci --provider recorded
That's it. Contracts run against recorded fixtures, results appear in your PR checks.
With dashboard push
To see results in the ReplayCI dashboard, add your API key:
- name: Run ReplayCI contracts
env:
REPLAYCI_API_KEY: ${{ secrets.REPLAYCI_API_KEY }}
run: npx replayci --provider recorded
Results are automatically pushed to app.replayci.com where you can see run history, failure trends, and fingerprint tracking. Push is non-blocking — if the dashboard is unreachable, your CI still passes or fails based on contract results alone.
With live provider testing
For advisory live-provider testing alongside your recorded gate:
jobs:
# Hard gate — recorded fixtures, merge-blocking
replayci-gate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- name: Recorded gate
run: npx replayci --provider recorded
# Advisory — live provider, non-blocking
replayci-live:
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
continue-on-error: true # never blocks merge
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- name: Live provider check
env:
REPLAYCI_PROVIDER_KEY: ${{ secrets.OPENAI_API_KEY }}
REPLAYCI_API_KEY: ${{ secrets.REPLAYCI_API_KEY }}
run: npx replayci --provider openai --model gpt-4o-mini
This follows the two-lane pattern: the recorded gate blocks merges (Lane A), while the live check provides advisory evidence (Lane B).
GitLab CI
# .gitlab-ci.yml
replayci:
image: node:20
stage: test
script:
- npm ci
- npx replayci --provider recorded
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH == "main"
With dashboard push:
replayci:
image: node:20
stage: test
variables:
REPLAYCI_API_KEY: $REPLAYCI_API_KEY
script:
- npm ci
- npx replayci --provider recorded
General CI setup
For any CI system, the pattern is the same:
- Install dependencies (
npm ci) - Run
npx replayci --provider recorded - Check the exit code
Exit codes
| Code | Meaning | CI behavior |
|---|---|---|
0 | All contracts passed | Pipeline passes |
1 | One or more contracts failed, or a runtime error | Pipeline fails |
2 | Drift or unknown-rate gate failure | Pipeline fails |
JSON output in CI
In non-TTY environments (CI pipelines, piped output), ReplayCI automatically outputs JSON instead of pretty-printed text. You can also force it with --json:
npx replayci --provider recorded --json | jq '.provider_run.steps[] | select(.status == "Fail")'
The JSON output includes everything you need for CI integration:
{
"pack": "packs/starter",
"contracts_count": 1,
"provider_run": {
"provider": "recorded",
"model": "recorded",
"steps": [
{
"contract_path": "packs/starter/contracts/incident_response.yaml",
"status": "Pass",
"fingerprint": "71ac81a7..."
}
]
}
}
Parsing results
Extract specific information with jq:
# Count failures
npx replayci --json | jq '[.provider_run.steps[] | select(.status == "Fail")] | length'
# Get failure fingerprints
npx replayci --json | jq '.provider_run.steps[] | select(.status == "Fail") | .fingerprint'
# Check if determinism proof passed
npx replayci --repeat 3 --json | jq '.determinism_proof.proven'
The two-lane CI model
ReplayCI is designed around a two-lane CI architecture that separates deterministic safety from live advisory testing.
Lane A — Hard gate (merge-blocking)
- Runs against recorded fixtures only (no live API calls)
- Deterministic — same input always produces the same output
- Fast — no network latency
- Free — no API costs
- Blocks merges on failure
Lane A catches regressions in your contract definitions, fixture files, and runner logic. If a recorded test that used to pass now fails, something changed in your code.
What blocks a merge:
- A contract assertion fails against a recorded fixture
- A new failure fingerprint appears that wasn't in the previous baseline
- The unknown classification rate exceeds 20%
- A NonReproducible result on the deterministic corpus (without an allowlist entry)
Lane B — Evidence lane (advisory)
- Runs against live providers (OpenAI, Anthropic, etc.)
- Non-deterministic — model responses can vary
- Advisory only — never blocks merges
- Results feed the dashboard for trending and comparison
Lane B answers questions like: "Does gpt-4o-mini still call my tools correctly?" and "How does Anthropic compare to OpenAI on my contracts?"
Setting up both lanes
The two-lane model maps naturally to CI jobs:
# Lane A: merge-blocking
replayci-gate:
script: npx replayci --provider recorded
# Lane B: advisory
replayci-live:
allow_failure: true
script: npx replayci --provider openai --model gpt-4o-mini
Determinism proof in CI
Verify that your provider returns consistent results by running contracts multiple times:
npx replayci --provider openai --model gpt-4o-mini --repeat 3
This runs every contract 3 times and compares fingerprints. If all runs produce identical fingerprints for each step, the proof passes.
The JSON output includes a determinism_proof field:
{
"determinism_proof": {
"proven": true,
"total_runs": 3,
"per_step": [
{
"contract_path": "packs/starter/contracts/incident_response.yaml",
"deterministic": true,
"run_count": 3
}
]
}
}
Use this to build confidence before promoting a model version or adding new contracts.
Keeping recordings up to date
When you change your tool definitions or message prompts, your recorded fixtures become stale. The recorded provider detects this via boundary hash validation and flags it as NonReproducible.
To refresh your recordings:
# Re-capture from live provider
npx replayci --provider openai --model gpt-4o-mini --capture-recordings
# Verify the new recordings pass
npx replayci --provider recorded
# Commit the updated recordings
git add packs/*/recordings/
git commit -m "Update recorded fixtures"
A good workflow is to update recordings in a dedicated PR, separate from feature changes. This keeps your CI gate stable and makes recording changes easy to review.
Environment variables reference
| Variable | Purpose | Required in CI? |
|---|---|---|
REPLAYCI_API_KEY | Push results to dashboard | No (optional) |
REPLAYCI_PROVIDER_KEY | API key for live providers | Only for Lane B |
REPLAYCI_API_URL | Override dashboard URL | No (default: https://app.replayci.com) |
For recorded-only CI (Lane A), no environment variables are needed.
Next steps
- Write contracts: See Writing Tests for the full YAML format
- Understand providers: See Providers for how provider abstraction works
- Debug failures: See Troubleshooting for common CI issues