Skip to main content

Providers — ReplayCI

ReplayCI supports multiple LLM providers through a unified interface. Write your contracts once, run them against any provider — same YAML, same assertions, same golden fixtures.


Supported providers

ProviderFlag valueAPI
OpenAIopenaiChat Completions (/v1/chat/completions)
AnthropicanthropicMessages (/v1/messages)
RecordedrecordedOffline — reads from fixture files

Set the provider in .replayci.yml or via CLI flag:

# .replayci.yml
provider: openai
model: gpt-4o-mini
# Or override with flags
npx replayci --provider anthropic --model claude-sonnet-4-6

How provider abstraction works

Each provider adapter translates between the ReplayCI contract format and the provider's native API. This happens transparently — your contracts never reference provider-specific fields.

What the adapters handle for you:

  • Tool definitions — OpenAI uses { type: "function", function: { parameters } }, Anthropic uses { input_schema }. You write tools once; the adapter translates.
  • System messages — OpenAI accepts role: "system" in the messages array. Anthropic expects a separate system field. The adapter extracts and routes it correctly.
  • Tool call responses — OpenAI returns tool_calls[].function.arguments as a JSON string. Anthropic returns tool_use content blocks with input as an object. Both get normalized to { id, name, arguments } before your assertions run, where arguments is stored as a JSON string in the normalized response shape. When you use nested paths like $.tool_calls[0].arguments.location, ReplayCI auto-parses that string during path traversal.
  • Tool choice"auto", "required", "none", or a specific tool name. Each maps to the provider's native format.

The result: Your contract assertions like $.tool_calls[0].name work identically regardless of which provider executed the request.


API key setup

ReplayCI uses a single environment variable for all providers:

export REPLAYCI_PROVIDER_KEY="sk-..."

This key is passed to whichever provider you select. For OpenAI, it's your OpenAI API key. For Anthropic, it's your Anthropic API key.

The recorded provider doesn't need an API key.


Model resolution

ReplayCI resolves model names through a registry before calling the provider API. This gives you short aliases, family-based request profiles, and validation — without changing how you write contracts.

How it works

Resolution is a 3-step process:

  1. Alias lookup — short names like 5.2 or opus-4 expand to full model IDs (gpt-5.2, claude-opus-4-20250514)
  2. Family matching — the resolved ID is tested against family prefix patterns (e.g., ^gpt-5, ^claude-). The first matching family provides the request profile.
  3. Passthrough — if no family matches, the model ID is sent as-is with provider defaults. A warning is emitted to stderr.

The resolved raw model ID is always what gets sent to the provider API and recorded in baseline keys. Aliases are a convenience layer — they never appear in run artifacts.

Aliases

Aliases are short names that expand to full model IDs:

AliasResolves toProvider
opus-4claude-opus-4-20250514Anthropic
sonnet-4.6claude-sonnet-4-6Anthropic
sonnet-4.5claude-sonnet-4-5-20250929Anthropic
haiku-4.5claude-haiku-4-5-20251001Anthropic
5.2gpt-5.2OpenAI
5gpt-5OpenAI
5-minigpt-5-miniOpenAI
4.1gpt-4.1OpenAI
4.1-minigpt-4.1-miniOpenAI
4.1-nanogpt-4.1-nanoOpenAI
4ogpt-4oOpenAI
4o-minigpt-4o-miniOpenAI

Aliases are case-insensitive and whitespace-trimmed. You can use them anywhere a model ID is accepted:

npx replayci --provider openai --model 4o-mini
npx replayci --provider anthropic --model opus-4

Families and request profiles

Each model family has a regex pattern and a RequestProfile that controls how the provider adapter builds API requests:

FamilyPatternToken fieldTemperatureAPI format
GPT-5^gpt-5max_completion_tokensyeschat_completions
GPT-4^gpt-4max_tokensyeschat_completions
O-series^o[1-9]max_completion_tokensnochat_completions
Claude^claude-max_tokensyesmessages

The token_field determines whether the adapter sends max_tokens or max_completion_tokens in the request body. The supports_temperature flag controls whether a temperature parameter is included — reasoning models (O-series) don't support it. The required_headers field adds provider-specific headers (e.g., anthropic-version: 2023-06-01 for Claude).

This means you don't need to remember which parameter each model family expects — the resolver handles it based on the registry.

Passthrough behavior

Models not matching any family pattern still work — they pass through to the provider with sensible defaults:

  • OpenAI default: max_tokens, temperature enabled, chat_completions format
  • Anthropic default: max_tokens, temperature enabled, messages format with anthropic-version header

A warning is printed to stderr:

⚠ Model "my-custom-model" not found in registry; using openai defaults

Use --strict-model to turn this warning into a hard error:

npx replayci --provider openai --model unknown-model --strict-model
# Error: Model "unknown-model" has no matching family for provider openai

Inspecting resolution

Use --resolve to see how a model name resolves without running contracts:

npx replayci --provider openai --model 5.2 --resolve

Use replayci models --provider openai to list all registered families and aliases for a provider. See the CLI Reference for full details.


The recorded provider

The recorded provider is the key to fast, free, deterministic testing. Instead of calling a live API, it reads responses from recording files stored alongside your fixtures.

Why use it

  • No API costs — recorded responses are free to replay
  • Deterministic — same input always produces the same output
  • Fast — no network latency, runs in milliseconds
  • Offline — works without internet access
  • CI-friendly — no API keys needed in your CI environment for recorded tests

How it works

  1. You run your contracts against a live provider with --capture-recordings
  2. ReplayCI saves each response as a .recording.json file
  3. Later, you run with --provider recorded and ReplayCI reads those files instead of calling the API
# Step 1: Capture responses from a live provider
npx replayci --provider openai --model gpt-4o-mini --capture-recordings

# Step 2: Run offline using captured responses
npx replayci --provider recorded

Recording file location

Recording files live in a recordings/ directory next to your golden/ directory:

packs/my-pack/
golden/
tool_call.success.json # your fixture
recordings/
tool_call.success.recording.json # captured response

The naming convention is automatic: foo.jsonfoo.recording.json in the recordings/ directory.

What's in a recording file

A recording captures everything needed to replay a response:

{
"schema_version": "1.0",
"boundary": {
"version": "1.0",
"provider": "openai",
"model_id": "gpt-4o-mini",
"tool_schema_hash": "a3f82c91b4d7e6f0",
"tool_choice_mode": "auto",
"system_prompt_hash": "7b2d4f1e8a9c3d56",
"messages_hash": "c5e8f2a1d6b94370"
},
"response": {
"success": true,
"tool_calls": [
{
"id": "call_abc123",
"name": "get_weather",
"arguments": "{\"location\": \"San Francisco, CA\"}"
}
],
"content": null,
"error": null
},
"metadata": {
"recorded_at": "2026-03-01T10:00:00Z",
"original_latency_ms": 150,
"model_version": "gpt-4o-mini-2024-07-18",
"usage": {
"prompt_tokens": 95,
"completion_tokens": 22,
"total_tokens": 117
}
}
}

Boundary validation

When the recorded provider loads a recording, it validates the boundary — a set of hashes that ensure the recording matches the current fixture:

FieldWhat it checks
providerMust match the original provider
tool_schema_hashSHA-256 of your tool definitions (sorted, canonical)
messages_hashSHA-256 of your messages array
tool_choice_modeMust match the original tool choice setting

If your tool definitions or messages change after capturing a recording, the hashes won't match. ReplayCI flags this as a NonReproducible result with a clear reason code:

  • SCHEMA_DRIFT — tool definitions changed since the recording was captured
  • NON_DETERMINISTIC_INPUT — messages or tool choice changed

When this happens, re-capture your recordings:

npx replayci --provider openai --model gpt-4o-mini --capture-recordings

Switching providers

Same contracts, different provider — just change the flag:

# Test against OpenAI
npx replayci --provider openai --model gpt-4o-mini

# Same contracts against Anthropic
npx replayci --provider anthropic --model claude-sonnet-4-6

# Offline with recorded fixtures
npx replayci --provider recorded

You can capture recordings from each provider separately and test them offline:

# Capture OpenAI responses
npx replayci --provider openai --model gpt-4o-mini --capture-recordings \
--pack packs/openai-v0.1

# Capture Anthropic responses
npx replayci --provider anthropic --model claude-sonnet-4-6 --capture-recordings \
--pack packs/anthropic-v0.1

Shadow mode

Shadow mode lets you compare two providers side-by-side without affecting your primary results:

npx replayci --provider openai --model gpt-4o-mini \
--shadow-capture \
--shadow-provider anthropic --shadow-model claude-sonnet-4-6

The primary provider (OpenAI) determines pass/fail. The shadow provider (Anthropic) runs in parallel for comparison only — its results are captured but never affect your CI gate.


NeverNormalize

Each pack contains a NeverNormalize.json file that protects semantically important fields from being normalized away during baseline comparison:

{
"schema_version": "1.0",
"pack_id": "my-pack",
"fields": [
"tool_calls[].name",
"tool_calls[].arguments",
"content",
"success"
]
}

These four fields should always be listed — they're the core semantic content of every response. Add any additional fields that are important to your specific use case (e.g., specific argument paths).

Do not list volatile fields like timestamps or request IDs — those should be normalized for stable fingerprints.

Every pack directory must contain this file, even if you're just starting out. The starter pack includes one with sensible defaults.


Next steps