CI Integration — ReplayCI

Add ReplayCI to your CI pipeline to catch tool-call regressions before they reach production. This guide covers GitHub Actions, GitLab CI, and general CI setup.

Quick setup

The simplest CI integration is one line:

npx replayci --provider recorded

This runs your contracts against recorded fixtures — no API keys needed, no network calls, deterministic results. If any contract fails, the command exits with a non-zero code and your CI pipeline fails.

GitHub Actions

Basic workflow

# .github/workflows/replayci.yml
name: ReplayCI Gate

on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

jobs:
  replayci:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '20'

      - run: npm ci

      - name: Run ReplayCI contracts
        run: npx replayci --provider recorded

That's it. Contracts run against recorded fixtures, results appear in your PR checks.

With dashboard push

To see results in the ReplayCI dashboard, add your API key:

      - name: Run ReplayCI contracts
        env:
          REPLAYCI_API_KEY: ${{ secrets.REPLAYCI_API_KEY }}
        run: npx replayci --provider recorded

Results are automatically pushed to app.replayci.com where you can see run history, failure trends, and fingerprint tracking. Push is non-blocking — if the dashboard is unreachable, your CI still passes or fails based on contract results alone.

With live provider testing

For advisory live-provider testing alongside your recorded gate:

jobs:
  # Hard gate — recorded fixtures, merge-blocking
  replayci-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm ci
      - name: Recorded gate
        run: npx replayci --provider recorded

  # Advisory — live provider, non-blocking
  replayci-live:
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request'
    continue-on-error: true   # never blocks merge
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm ci
      - name: Live provider check
        env:
          REPLAYCI_PROVIDER_KEY: ${{ secrets.OPENAI_API_KEY }}
          REPLAYCI_API_KEY: ${{ secrets.REPLAYCI_API_KEY }}
        run: npx replayci --provider openai --model gpt-4o-mini

This follows the two-lane pattern: the recorded gate blocks merges (Lane A), while the live check provides advisory evidence (Lane B).

GitLab CI

# .gitlab-ci.yml
replayci:
  image: node:20
  stage: test
  script:
    - npm ci
    - npx replayci --provider recorded
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == "main"

With dashboard push:

replayci:
  image: node:20
  stage: test
  variables:
    REPLAYCI_API_KEY: $REPLAYCI_API_KEY
  script:
    - npm ci
    - npx replayci --provider recorded

General CI setup

For any CI system, the pattern is the same:

Install dependencies (npm ci)
Run npx replayci --provider recorded
Check the exit code

Exit codes

Code	Meaning	CI behavior
`0`	All contracts passed	Pipeline passes
`1`	One or more contracts failed, or a runtime error	Pipeline fails
`2`	Drift or unknown-rate gate failure	Pipeline fails

JSON output in CI

In non-TTY environments (CI pipelines, piped output), ReplayCI automatically outputs JSON instead of pretty-printed text. You can also force it with --json:

npx replayci --provider recorded --json | jq '.provider_run.steps[] | select(.status == "Fail")'

The JSON output includes everything you need for CI integration:

{
  "pack": "packs/starter",
  "contracts_count": 1,
  "provider_run": {
    "provider": "recorded",
    "model": "recorded",
    "steps": [
      {
        "contract_path": "packs/starter/contracts/incident_response.yaml",
        "status": "Pass",
        "fingerprint": "71ac81a7..."
      }
    ]
  }
}

Parsing results

Extract specific information with jq:

# Count failures
npx replayci --json | jq '[.provider_run.steps[] | select(.status == "Fail")] | length'

# Get failure fingerprints
npx replayci --json | jq '.provider_run.steps[] | select(.status == "Fail") | .fingerprint'

# Check if determinism proof passed
npx replayci --repeat 3 --json | jq '.determinism_proof.proven'

The two-lane CI model

ReplayCI is designed around a two-lane CI architecture that separates deterministic safety from live advisory testing.

Lane A — Hard gate (merge-blocking)

Runs against recorded fixtures only (no live API calls)
Deterministic — same input always produces the same output
Fast — no network latency
Free — no API costs
Blocks merges on failure

Lane A catches regressions in your contract definitions, fixture files, and runner logic. If a recorded test that used to pass now fails, something changed in your code.

What blocks a merge:

A contract assertion fails against a recorded fixture
A new failure fingerprint appears that wasn't in the previous baseline
The unknown classification rate exceeds 20%
A NonReproducible result on the deterministic corpus (without an allowlist entry)

Lane B — Evidence lane (advisory)

Runs against live providers (OpenAI, Anthropic, etc.)
Non-deterministic — model responses can vary
Advisory only — never blocks merges
Results feed the dashboard for trending and comparison

Lane B answers questions like: "Does gpt-4o-mini still call my tools correctly?" and "How does Anthropic compare to OpenAI on my contracts?"

Setting up both lanes

The two-lane model maps naturally to CI jobs:

# Lane A: merge-blocking
replayci-gate:
  script: npx replayci --provider recorded

# Lane B: advisory
replayci-live:
  allow_failure: true
  script: npx replayci --provider openai --model gpt-4o-mini

Determinism proof in CI

Verify that your provider returns consistent results by running contracts multiple times:

npx replayci --provider openai --model gpt-4o-mini --repeat 3

This runs every contract 3 times and compares fingerprints. If all runs produce identical fingerprints for each step, the proof passes.

The JSON output includes a determinism_proof field:

{
  "determinism_proof": {
    "proven": true,
    "total_runs": 3,
    "per_step": [
      {
        "contract_path": "packs/starter/contracts/incident_response.yaml",
        "deterministic": true,
        "run_count": 3
      }
    ]
  }
}

Use this to build confidence before promoting a model version or adding new contracts.

Keeping recordings up to date

When you change your tool definitions or message prompts, your recorded fixtures become stale. The recorded provider detects this via boundary hash validation and flags it as NonReproducible.

To refresh your recordings:

# Re-capture from live provider
npx replayci --provider openai --model gpt-4o-mini --capture-recordings

# Verify the new recordings pass
npx replayci --provider recorded

# Commit the updated recordings
git add packs/*/recordings/
git commit -m "Update recorded fixtures"

A good workflow is to update recordings in a dedicated PR, separate from feature changes. This keeps your CI gate stable and makes recording changes easy to review.

Environment variables reference

Variable	Purpose	Required in CI?
`REPLAYCI_API_KEY`	Push results to dashboard	No (optional)
`REPLAYCI_PROVIDER_KEY`	API key for live providers	Only for Lane B
`REPLAYCI_API_URL`	Override dashboard URL	No (default: `https://app.replayci.com`)

For recorded-only CI (Lane A), no environment variables are needed.

Next steps

Write contracts: See Writing Tests for the full YAML format
Understand providers: See Providers for how provider abstraction works
Debug failures: See Troubleshooting for common CI issues

Quick setup​

GitHub Actions​

Basic workflow​

With dashboard push​

With live provider testing​

GitLab CI​

General CI setup​

Exit codes​

JSON output in CI​

Parsing results​

The two-lane CI model​

Lane A — Hard gate (merge-blocking)​

Lane B — Evidence lane (advisory)​

Setting up both lanes​

Determinism proof in CI​

Keeping recordings up to date​

Environment variables reference​

Next steps​