SDK Integration — ReplayCI

The ReplayCI SDK lets you validate tool calls and capture observations directly from your application code. Instead of running tests separately with the CLI, you embed validation into your LLM pipeline.

Two packages:

@replayci/replay — observe tool calls passively and validate responses against contracts
@replayci/contracts-core — shared contract types and evaluation engine (used internally by @replayci/replay)

Install

npm install @replayci/replay

This pulls in @replayci/contracts-core automatically. You also need your LLM provider SDK:

# OpenAI
npm install openai

# Anthropic
npm install @anthropic-ai/sdk

Both provider SDKs are optional peer dependencies — install whichever you use.

Observe — passive capture

observe() wraps your LLM client and captures every tool call in the background. No code changes to your existing logic.

import OpenAI from "openai";
import { observe } from "@replayci/replay";

const openai = new OpenAI();

// Start observing — captures all tool calls automatically
const handle = observe(openai, {
  apiKey: process.env.REPLAYCI_API_KEY,
  agent: "my-agent",
});

// Use your client normally — observe is transparent
const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "What's the weather in SF?" }],
  tools: [{ type: "function", function: { name: "get_weather", ... } }],
});

// Check health at any time
const health = handle.getHealth();
console.log(health.state);    // "active" | "stopped" | "inactive"
console.log(health.runtime);  // captures_seen, queue_size, circuit status

// When done, restore the original client
handle.restore();

How it works

observe() transparently intercepts LLM calls, captures tool-call data, and sends it to the dashboard asynchronously. Your application code is unaffected — the original response is always returned untouched. The server auto-generates contracts from captured calls.

Provider detection

The SDK auto-detects your provider from the client shape:

OpenAI — patches client.chat.completions.create
Anthropic — patches client.messages.create

No configuration needed. If detection fails, observe() returns a no-op handle (your app continues normally).

Options

observe(client, {
  // Required
  apiKey: "rci_live_...",       // or set REPLAYCI_API_KEY env var

  // Optional
  agent: "my-agent",           // agent name for grouping (default: "default")
  captureLevel: "redacted",    // privacy tier (see below)
  endpoint: "https://...",     // custom API endpoint
  maxBuffer: 100,              // max buffered items (default: 100, max: 1000)
  flushMs: 5000,               // flush interval in ms (default: 5000)
  timeoutMs: 5000,             // API timeout in ms (default: 5000, max: 10000)
  stateDir: ".replayci/runtime", // health store directory (default: .replayci/runtime)
  disabled: false,             // disable capture entirely
  diagnostics: (event) => {},  // callback for diagnostic events
});

Privacy tiers

Control what gets captured with captureLevel:

Tier	Tool names	Arguments	Messages	Content
`metadata`	Yes	No	No	No
`redacted` (default)	Yes	Yes	No	No
`full`	Yes	Yes	Yes	Yes

caution

Use full capture level only in development or staging. Messages may contain sensitive customer data, internal IDs, or credentials. The default redacted tier is recommended for production — it captures tool names and arguments but not conversation messages.

Streaming support

observe() handles streaming responses automatically. When your code uses stream: true, the SDK collects chunks and captures the complete response after the stream finishes.

Circuit breaker

If the capture API fails 5 times in a row, the SDK auto-disables for 10 minutes. Your application is never affected by capture failures.

Health monitoring

Every observe() handle exposes getHealth() — a snapshot of the SDK's internal state. Use it for runtime health checks or pass it to monitoring.

const handle = observe(openai, {
  apiKey: process.env.REPLAYCI_API_KEY,
  agent: "my-agent",
});

const health = handle.getHealth();

// Session identity
health.session_id;       // "obs_..." — unique per observe() call
health.agent;            // "my-agent"
health.provider;         // "openai" | "anthropic" | null

// Session state
health.state;            // "active" | "stopped" | "inactive"
health.activation;       // { active: true, reason_code: "ok", activated_at: "..." }

// Runtime stats
health.runtime.captures_seen;         // total captures this session
health.runtime.queue_size;            // pending items in buffer
health.runtime.consecutive_failures;  // flush failures in a row
health.runtime.circuit_open_until;    // null or ISO timestamp
health.runtime.last_flush_error;      // null or error message

The SDK also writes health snapshots to disk at .replayci/runtime/observe-sessions/{session_id}.json. The CLI's replayci doctor command reads these files to diagnose SDK integration issues without touching your application code.

Disable at runtime

# Disable via environment variable
export REPLAYCI_DISABLE=true

Or pass disabled: true in options.

Validate — in-code contract checks

validate() checks an LLM response against your contracts synchronously. Use this to catch contract violations at runtime.

import OpenAI from "openai";
import { prepareContracts, validate } from "@replayci/replay";

const openai = new OpenAI();
const contracts = prepareContracts("./packs/my-pack");

const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "What's the weather?" }],
  tools: [...],
});

const result = validate(response, { contracts });

if (!result.pass) {
  console.error("Contract violations:", result.failures);
  // [{ path: "$.tool_calls[0].name", operator: "equals",
  //    expected: "get_weather", found: "search", message: "..." }]
}

Loading contracts

prepareContracts() accepts multiple input formats:

// From a pack directory (loads all contracts)
const contracts = prepareContracts("./packs/my-pack");

// From specific files
const contracts = prepareContracts([
  "./contracts/weather.yaml",
  "./contracts/search.yaml",
]);

// From contract objects directly
const contracts = prepareContracts({
  tool: "get_weather",
  assertions: {
    output_invariants: [
      { path: "$.tool_calls[0].name", equals: "get_weather" },
    ],
  },
});

Validation result

type ValidationResult = {
  pass: boolean;               // true if all contracts pass
  failures: ContractFailure[]; // list of violations
  matched_contracts: number;   // how many contracts matched
  unmatched_tools: string[];   // tool calls with no matching contract
  evaluation_ms: number;       // how long validation took
};

type ContractFailure = {
  path: string;          // JSON path that failed (e.g. "$.tool_calls[0].name")
  operator: string;      // which check failed (e.g. "equals", "type", "exists")
  expected: unknown;     // what the contract expected
  found: unknown;        // what the response contained
  message?: string;      // human-readable description
  contract_file?: string; // which contract file triggered this
};

Unmatched tool policy

By default, tool calls with no matching contract cause a failure. Change this with unmatchedPolicy:

// Fail if any tool call has no contract (default)
validate(response, { contracts, unmatchedPolicy: "deny" });

// Ignore tool calls without contracts
validate(response, { contracts, unmatchedPolicy: "allow" });

Provider response formats

validate() handles responses from both OpenAI and Anthropic natively. It auto-detects the format and normalizes tool calls for evaluation.

// OpenAI response — works directly
const openaiResponse = await openai.chat.completions.create({ ... });
validate(openaiResponse, { contracts });

// Anthropic response — works directly
const anthropicResponse = await anthropic.messages.create({ ... });
validate(anthropicResponse, { contracts });

// Pre-normalized response — also works
validate({
  tool_calls: [{ id: "1", name: "get_weather", arguments: '{"location":"SF"}' }],
}, { contracts });

Multi-turn agents

The CLI runner evaluates one LLM response per contract — it does not chain tool results back into follow-up turns. For agents that call multiple tools across a conversation loop, use validate() inside your agent's own loop to check each turn independently.

import OpenAI from "openai";
import { observe, prepareContracts, validate } from "@replayci/replay";

const openai = new OpenAI();
const contracts = prepareContracts("./packs/my-agent");

// Start observing all turns
const handle = observe(openai, {
  apiKey: process.env.REPLAYCI_API_KEY,
  agent: "export-compliance",
});

// Your agent's conversation loop
const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
  { role: "system", content: "You are an export compliance agent..." },
  { role: "user", content: "Classify this shipment to Canada..." },
];

const turnResults = [];

for (let turn = 0; turn < maxTurns; turn++) {
  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages,
    tools,
  });

  // Validate THIS turn's tool calls against contracts
  const result = validate(response, { contracts, unmatchedPolicy: "allow" });
  turnResults.push({ turn, ...result });

  if (!result.pass) {
    console.error(`Turn ${turn} contract violation:`, result.failures);
    // Decide: retry, fallback, or abort
  }

  // Extract tool calls and feed results back for next turn
  const toolCalls = response.choices[0].message.tool_calls;
  if (!toolCalls || toolCalls.length === 0) break;

  messages.push(response.choices[0].message);
  for (const tc of toolCalls) {
    const toolResult = await executeToolLocally(tc);
    messages.push({ role: "tool", tool_call_id: tc.id, content: toolResult });
  }
}

// Summary: did every turn pass?
const allPassed = turnResults.every(r => r.pass);
console.log(`Agent finished: ${turnResults.length} turns, all passed: ${allPassed}`);

handle.restore();

Why the CLI can't do this

Each CLI contract maps to one fixture (one request/response pair). A 7-tool agent that calls tools across 7 turns will only trigger 1 tool call per CLI run — the other 6 happen in subsequent turns that the CLI doesn't execute. This isn't a bug; the CLI tests individual contract compliance, not full agent trajectories.

Use the SDK for multi-turn validation, and the CLI for single-turn regression testing in CI.

Observe + Validate together

The most common pattern uses both: observe captures calls for the dashboard, validate catches violations in real-time.

import OpenAI from "openai";
import { observe, prepareContracts, validate } from "@replayci/replay";

const openai = new OpenAI();
const contracts = prepareContracts("./packs/my-pack");

// Start observing
const handle = observe(openai, {
  apiKey: process.env.REPLAYCI_API_KEY,
  agent: "weather-agent",
});

// Make the call
const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "What's the weather?" }],
  tools: [...],
});

// Validate locally
const result = validate(response, { contracts });
if (!result.pass) {
  // Handle violation — log, retry, fallback, etc.
}

// Clean up when done
handle.restore();

Diagnostics

Pass a diagnostics callback to observe() to trace what the SDK is doing:

observe(openai, {
  apiKey: process.env.REPLAYCI_API_KEY,
  diagnostics: (event) => {
    switch (event.type) {
      case "double_wrap":
        // observe() called twice on same client
        console.warn("Client already observed");
        break;
      case "unsupported_client":
        // Client shape not recognized
        console.warn("Unsupported client:", event.detail);
        break;
      case "buffer_overflow":
        // Too many captures buffered
        console.warn(`Dropped ${event.dropped} captures`);
        break;
      case "flush_error":
        // Failed to send captures to API
        console.warn("Flush failed:", event.error);
        break;
      case "circuit_open":
        // Too many consecutive failures — captures paused
        console.warn(`Circuit breaker open for ${event.backoffMs / 60000} min`);
        break;
    }
  },
});

Environment variables

Variable	Description
`REPLAYCI_API_KEY`	API key for capture ingestion (fallback if not passed in options)
`REPLAYCI_DISABLE`	Set to `true` to disable all capture (`1`, `yes`, `on` also work)
`REPLAYCI_API_URL`	Custom API endpoint (default: `https://app.replayci.com`)

What happens after capture

Once observe() sends captures to the server:

Contracts auto-generated — the server infers contracts from observed tool calls (structure, types, schema bounds)
Confidence scoring — contracts gain confidence as more samples arrive (low < 5, medium 5–9, high ≥ 10)
Dashboard visibility — captured tools appear on the Contracts page with coverage analysis
Guard evaluation — the Guard page shows pass rates and failure patterns across all captured calls

See Dashboard Guide for how to review and promote auto-generated contracts.

Next steps

Check integration health — run replayci doctor to verify the SDK is capturing and uploading correctly
Pull contracts locally — run replayci sync to materialize reviewed contracts for local testing and CI
Review captured contracts — Dashboard Guide
Write contracts manually — Writing Tests
Add to CI — CI Integration

Install​

Observe — passive capture​

How it works​

Provider detection​

Options​

Privacy tiers​

Streaming support​

Circuit breaker​

Health monitoring​

Disable at runtime​

Validate — in-code contract checks​

Loading contracts​

Validation result​

Unmatched tool policy​

Provider response formats​

Multi-turn agents​

Why the CLI can't do this​

Observe + Validate together​

Diagnostics​

Environment variables​

What happens after capture​

Next steps​