SDK Integration — ReplayCI
The ReplayCI SDK lets you validate tool calls and capture observations directly from your application code. Instead of running tests separately with the CLI, you embed validation into your LLM pipeline.
Two packages:
@replayci/replay— observe tool calls passively and validate responses against contracts@replayci/contracts-core— shared contract types and evaluation engine (used internally by@replayci/replay)
Install
npm install @replayci/replay
This pulls in @replayci/contracts-core automatically. You also need your LLM provider SDK:
# OpenAI
npm install openai
# Anthropic
npm install @anthropic-ai/sdk
Both provider SDKs are optional peer dependencies — install whichever you use.
Observe — passive capture
observe() wraps your LLM client and captures every tool call in the background. No code changes to your existing logic.
import OpenAI from "openai";
import { observe } from "@replayci/replay";
const openai = new OpenAI();
// Start observing — captures all tool calls automatically
const handle = observe(openai, {
apiKey: process.env.REPLAYCI_API_KEY,
agent: "my-agent",
});
// Use your client normally — observe is transparent
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "What's the weather in SF?" }],
tools: [{ type: "function", function: { name: "get_weather", ... } }],
});
// Check health at any time
const health = handle.getHealth();
console.log(health.state); // "active" | "stopped" | "inactive"
console.log(health.runtime); // captures_seen, queue_size, circuit status
// When done, restore the original client
handle.restore();
How it works
observe() transparently intercepts LLM calls, captures tool-call data, and sends it to the dashboard asynchronously. Your application code is unaffected — the original response is always returned untouched. The server auto-generates contracts from captured calls.
Provider detection
The SDK auto-detects your provider from the client shape:
- OpenAI — patches
client.chat.completions.create - Anthropic — patches
client.messages.create
No configuration needed. If detection fails, observe() returns a no-op handle (your app continues normally).
Options
observe(client, {
// Required
apiKey: "rci_live_...", // or set REPLAYCI_API_KEY env var
// Optional
agent: "my-agent", // agent name for grouping (default: "default")
captureLevel: "redacted", // privacy tier (see below)
endpoint: "https://...", // custom API endpoint
maxBuffer: 100, // max buffered items (default: 100, max: 1000)
flushMs: 5000, // flush interval in ms (default: 5000)
timeoutMs: 5000, // API timeout in ms (default: 5000, max: 10000)
stateDir: ".replayci/runtime", // health store directory (default: .replayci/runtime)
disabled: false, // disable capture entirely
diagnostics: (event) => {}, // callback for diagnostic events
});
Privacy tiers
Control what gets captured with captureLevel:
| Tier | Tool names | Arguments | Messages | Content |
|---|---|---|---|---|
metadata | Yes | No | No | No |
redacted (default) | Yes | Yes | No | No |
full | Yes | Yes | Yes | Yes |
Use full capture level only in development or staging. Messages may contain sensitive customer data, internal IDs, or credentials. The default redacted tier is recommended for production — it captures tool names and arguments but not conversation messages.
Streaming support
observe() handles streaming responses automatically. When your code uses stream: true, the SDK collects chunks and captures the complete response after the stream finishes.
Circuit breaker
If the capture API fails 5 times in a row, the SDK auto-disables for 10 minutes. Your application is never affected by capture failures.
Health monitoring
Every observe() handle exposes getHealth() — a snapshot of the SDK's internal state. Use it for runtime health checks or pass it to monitoring.
const handle = observe(openai, {
apiKey: process.env.REPLAYCI_API_KEY,
agent: "my-agent",
});
const health = handle.getHealth();
// Session identity
health.session_id; // "obs_..." — unique per observe() call
health.agent; // "my-agent"
health.provider; // "openai" | "anthropic" | null
// Session state
health.state; // "active" | "stopped" | "inactive"
health.activation; // { active: true, reason_code: "ok", activated_at: "..." }
// Runtime stats
health.runtime.captures_seen; // total captures this session
health.runtime.queue_size; // pending items in buffer
health.runtime.consecutive_failures; // flush failures in a row
health.runtime.circuit_open_until; // null or ISO timestamp
health.runtime.last_flush_error; // null or error message
The SDK also writes health snapshots to disk at .replayci/runtime/observe-sessions/{session_id}.json. The CLI's replayci doctor command reads these files to diagnose SDK integration issues without touching your application code.
Disable at runtime
# Disable via environment variable
export REPLAYCI_DISABLE=true
Or pass disabled: true in options.
Validate — in-code contract checks
validate() checks an LLM response against your contracts synchronously. Use this to catch contract violations at runtime.
import OpenAI from "openai";
import { prepareContracts, validate } from "@replayci/replay";
const openai = new OpenAI();
const contracts = prepareContracts("./packs/my-pack");
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "What's the weather?" }],
tools: [...],
});
const result = validate(response, { contracts });
if (!result.pass) {
console.error("Contract violations:", result.failures);
// [{ path: "$.tool_calls[0].name", operator: "equals",
// expected: "get_weather", found: "search", message: "..." }]
}
Loading contracts
prepareContracts() accepts multiple input formats:
// From a pack directory (loads all contracts)
const contracts = prepareContracts("./packs/my-pack");
// From specific files
const contracts = prepareContracts([
"./contracts/weather.yaml",
"./contracts/search.yaml",
]);
// From contract objects directly
const contracts = prepareContracts({
tool: "get_weather",
assertions: {
output_invariants: [
{ path: "$.tool_calls[0].name", equals: "get_weather" },
],
},
});
Validation result
type ValidationResult = {
pass: boolean; // true if all contracts pass
failures: ContractFailure[]; // list of violations
matched_contracts: number; // how many contracts matched
unmatched_tools: string[]; // tool calls with no matching contract
evaluation_ms: number; // how long validation took
};
type ContractFailure = {
path: string; // JSON path that failed (e.g. "$.tool_calls[0].name")
operator: string; // which check failed (e.g. "equals", "type", "exists")
expected: unknown; // what the contract expected
found: unknown; // what the response contained
message?: string; // human-readable description
contract_file?: string; // which contract file triggered this
};
Unmatched tool policy
By default, tool calls with no matching contract cause a failure. Change this with unmatchedPolicy:
// Fail if any tool call has no contract (default)
validate(response, { contracts, unmatchedPolicy: "deny" });
// Ignore tool calls without contracts
validate(response, { contracts, unmatchedPolicy: "allow" });
Provider response formats
validate() handles responses from both OpenAI and Anthropic natively. It auto-detects the format and normalizes tool calls for evaluation.
// OpenAI response — works directly
const openaiResponse = await openai.chat.completions.create({ ... });
validate(openaiResponse, { contracts });
// Anthropic response — works directly
const anthropicResponse = await anthropic.messages.create({ ... });
validate(anthropicResponse, { contracts });
// Pre-normalized response — also works
validate({
tool_calls: [{ id: "1", name: "get_weather", arguments: '{"location":"SF"}' }],
}, { contracts });
Multi-turn agents
The CLI runner evaluates one LLM response per contract — it does not chain tool results back into follow-up turns. For agents that call multiple tools across a conversation loop, use validate() inside your agent's own loop to check each turn independently.
import OpenAI from "openai";
import { observe, prepareContracts, validate } from "@replayci/replay";
const openai = new OpenAI();
const contracts = prepareContracts("./packs/my-agent");
// Start observing all turns
const handle = observe(openai, {
apiKey: process.env.REPLAYCI_API_KEY,
agent: "export-compliance",
});
// Your agent's conversation loop
const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
{ role: "system", content: "You are an export compliance agent..." },
{ role: "user", content: "Classify this shipment to Canada..." },
];
const turnResults = [];
for (let turn = 0; turn < maxTurns; turn++) {
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages,
tools,
});
// Validate THIS turn's tool calls against contracts
const result = validate(response, { contracts, unmatchedPolicy: "allow" });
turnResults.push({ turn, ...result });
if (!result.pass) {
console.error(`Turn ${turn} contract violation:`, result.failures);
// Decide: retry, fallback, or abort
}
// Extract tool calls and feed results back for next turn
const toolCalls = response.choices[0].message.tool_calls;
if (!toolCalls || toolCalls.length === 0) break;
messages.push(response.choices[0].message);
for (const tc of toolCalls) {
const toolResult = await executeToolLocally(tc);
messages.push({ role: "tool", tool_call_id: tc.id, content: toolResult });
}
}
// Summary: did every turn pass?
const allPassed = turnResults.every(r => r.pass);
console.log(`Agent finished: ${turnResults.length} turns, all passed: ${allPassed}`);
handle.restore();
Why the CLI can't do this
Each CLI contract maps to one fixture (one request/response pair). A 7-tool agent that calls tools across 7 turns will only trigger 1 tool call per CLI run — the other 6 happen in subsequent turns that the CLI doesn't execute. This isn't a bug; the CLI tests individual contract compliance, not full agent trajectories.
Use the SDK for multi-turn validation, and the CLI for single-turn regression testing in CI.
Observe + Validate together
The most common pattern uses both: observe captures calls for the dashboard, validate catches violations in real-time.
import OpenAI from "openai";
import { observe, prepareContracts, validate } from "@replayci/replay";
const openai = new OpenAI();
const contracts = prepareContracts("./packs/my-pack");
// Start observing
const handle = observe(openai, {
apiKey: process.env.REPLAYCI_API_KEY,
agent: "weather-agent",
});
// Make the call
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "What's the weather?" }],
tools: [...],
});
// Validate locally
const result = validate(response, { contracts });
if (!result.pass) {
// Handle violation — log, retry, fallback, etc.
}
// Clean up when done
handle.restore();
Diagnostics
Pass a diagnostics callback to observe() to trace what the SDK is doing:
observe(openai, {
apiKey: process.env.REPLAYCI_API_KEY,
diagnostics: (event) => {
switch (event.type) {
case "double_wrap":
// observe() called twice on same client
console.warn("Client already observed");
break;
case "unsupported_client":
// Client shape not recognized
console.warn("Unsupported client:", event.detail);
break;
case "buffer_overflow":
// Too many captures buffered
console.warn(`Dropped ${event.dropped} captures`);
break;
case "flush_error":
// Failed to send captures to API
console.warn("Flush failed:", event.error);
break;
case "circuit_open":
// Too many consecutive failures — captures paused
console.warn(`Circuit breaker open for ${event.backoffMs / 60000} min`);
break;
}
},
});
Environment variables
| Variable | Description |
|---|---|
REPLAYCI_API_KEY | API key for capture ingestion (fallback if not passed in options) |
REPLAYCI_DISABLE | Set to true to disable all capture (1, yes, on also work) |
REPLAYCI_API_URL | Custom API endpoint (default: https://app.replayci.com) |
What happens after capture
Once observe() sends captures to the server:
- Contracts auto-generated — the server infers contracts from observed tool calls (structure, types, schema bounds)
- Confidence scoring — contracts gain confidence as more samples arrive (low < 5, medium 5–9, high ≥ 10)
- Dashboard visibility — captured tools appear on the Contracts page with coverage analysis
- Guard evaluation — the Guard page shows pass rates and failure patterns across all captured calls
See Dashboard Guide for how to review and promote auto-generated contracts.
Next steps
- Check integration health — run
replayci doctorto verify the SDK is capturing and uploading correctly - Pull contracts locally — run
replayci syncto materialize reviewed contracts for local testing and CI - Review captured contracts — Dashboard Guide
- Write contracts manually — Writing Tests
- Add to CI — CI Integration