Skip to main content

CI Integration — ReplayCI

Add ReplayCI to your CI pipeline to catch tool-call regressions before they reach production. This guide covers GitHub Actions, GitLab CI, and general CI setup.


Quick setup

The simplest CI integration is one line:

npx replayci --provider recorded

This runs your contracts against recorded fixtures — no API keys needed, no network calls, deterministic results. If any contract fails, the command exits with a non-zero code and your CI pipeline fails.


GitHub Actions

Basic workflow

# .github/workflows/replayci.yml
name: ReplayCI Gate

on:
pull_request:
branches: [main]
push:
branches: [main]

jobs:
replayci:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: actions/setup-node@v4
with:
node-version: '20'

- run: npm ci

- name: Run ReplayCI contracts
run: npx replayci --provider recorded

That's it. Contracts run against recorded fixtures, results appear in your PR checks.

With dashboard push

To see results in the ReplayCI dashboard, add your API key:

      - name: Run ReplayCI contracts
env:
REPLAYCI_API_KEY: ${{ secrets.REPLAYCI_API_KEY }}
run: npx replayci --provider recorded

Results are automatically pushed to app.replayci.com where you can see run history, failure trends, and fingerprint tracking. Push is non-blocking — if the dashboard is unreachable, your CI still passes or fails based on contract results alone.

With live provider testing

For advisory live-provider testing alongside your recorded gate:

jobs:
# Hard gate — recorded fixtures, merge-blocking
replayci-gate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- name: Recorded gate
run: npx replayci --provider recorded

# Advisory — live provider, non-blocking
replayci-live:
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
continue-on-error: true # never blocks merge
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- name: Live provider check
env:
REPLAYCI_PROVIDER_KEY: ${{ secrets.OPENAI_API_KEY }}
REPLAYCI_API_KEY: ${{ secrets.REPLAYCI_API_KEY }}
run: npx replayci --provider openai --model gpt-4o-mini

This follows the two-lane pattern: the recorded gate blocks merges (Lane A), while the live check provides advisory evidence (Lane B).


GitLab CI

# .gitlab-ci.yml
replayci:
image: node:20
stage: test
script:
- npm ci
- npx replayci --provider recorded
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH == "main"

With dashboard push:

replayci:
image: node:20
stage: test
variables:
REPLAYCI_API_KEY: $REPLAYCI_API_KEY
script:
- npm ci
- npx replayci --provider recorded

General CI setup

For any CI system, the pattern is the same:

  1. Install dependencies (npm ci)
  2. Run npx replayci --provider recorded
  3. Check the exit code

Exit codes

CodeMeaningCI behavior
0All contracts passedPipeline passes
1One or more contracts failed, or a runtime errorPipeline fails
2Drift or unknown-rate gate failurePipeline fails

JSON output in CI

In non-TTY environments (CI pipelines, piped output), ReplayCI automatically outputs JSON instead of pretty-printed text. You can also force it with --json:

npx replayci --provider recorded --json | jq '.provider_run.steps[] | select(.status == "Fail")'

The JSON output includes everything you need for CI integration:

{
"pack": "packs/starter",
"contracts_count": 1,
"provider_run": {
"provider": "recorded",
"model": "recorded",
"steps": [
{
"contract_path": "packs/starter/contracts/incident_response.yaml",
"status": "Pass",
"fingerprint": "71ac81a7..."
}
]
}
}

Parsing results

Extract specific information with jq:

# Count failures
npx replayci --json | jq '[.provider_run.steps[] | select(.status == "Fail")] | length'

# Get failure fingerprints
npx replayci --json | jq '.provider_run.steps[] | select(.status == "Fail") | .fingerprint'

# Check if determinism proof passed
npx replayci --repeat 3 --json | jq '.determinism_proof.proven'

The two-lane CI model

ReplayCI is designed around a two-lane CI architecture that separates deterministic safety from live advisory testing.

Lane A — Hard gate (merge-blocking)

  • Runs against recorded fixtures only (no live API calls)
  • Deterministic — same input always produces the same output
  • Fast — no network latency
  • Free — no API costs
  • Blocks merges on failure

Lane A catches regressions in your contract definitions, fixture files, and runner logic. If a recorded test that used to pass now fails, something changed in your code.

What blocks a merge:

  • A contract assertion fails against a recorded fixture
  • A new failure fingerprint appears that wasn't in the previous baseline
  • The unknown classification rate exceeds 20%
  • A NonReproducible result on the deterministic corpus (without an allowlist entry)

Lane B — Evidence lane (advisory)

  • Runs against live providers (OpenAI, Anthropic, etc.)
  • Non-deterministic — model responses can vary
  • Advisory only — never blocks merges
  • Results feed the dashboard for trending and comparison

Lane B answers questions like: "Does gpt-4o-mini still call my tools correctly?" and "How does Anthropic compare to OpenAI on my contracts?"

Setting up both lanes

The two-lane model maps naturally to CI jobs:

# Lane A: merge-blocking
replayci-gate:
script: npx replayci --provider recorded

# Lane B: advisory
replayci-live:
allow_failure: true
script: npx replayci --provider openai --model gpt-4o-mini

Determinism proof in CI

Verify that your provider returns consistent results by running contracts multiple times:

npx replayci --provider openai --model gpt-4o-mini --repeat 3

This runs every contract 3 times and compares fingerprints. If all runs produce identical fingerprints for each step, the proof passes.

The JSON output includes a determinism_proof field:

{
"determinism_proof": {
"proven": true,
"total_runs": 3,
"per_step": [
{
"contract_path": "packs/starter/contracts/incident_response.yaml",
"deterministic": true,
"run_count": 3
}
]
}
}

Use this to build confidence before promoting a model version or adding new contracts.


Keeping recordings up to date

When you change your tool definitions or message prompts, your recorded fixtures become stale. The recorded provider detects this via boundary hash validation and flags it as NonReproducible.

To refresh your recordings:

# Re-capture from live provider
npx replayci --provider openai --model gpt-4o-mini --capture-recordings

# Verify the new recordings pass
npx replayci --provider recorded

# Commit the updated recordings
git add packs/*/recordings/
git commit -m "Update recorded fixtures"

A good workflow is to update recordings in a dedicated PR, separate from feature changes. This keeps your CI gate stable and makes recording changes easy to review.


Environment variables reference

VariablePurposeRequired in CI?
REPLAYCI_API_KEYPush results to dashboardNo (optional)
REPLAYCI_PROVIDER_KEYAPI key for live providersOnly for Lane B
REPLAYCI_API_URLOverride dashboard URLNo (default: https://app.replayci.com)

For recorded-only CI (Lane A), no environment variables are needed.


Next steps

  • Write contracts: See Writing Tests for the full YAML format
  • Understand providers: See Providers for how provider abstraction works
  • Debug failures: See Troubleshooting for common CI issues