Zero-Cost AI Testing. Record Once, Replay Forever

Orchid Team5 min read

How do you write tests for code that calls an LLM? If you've shipped an AI feature, you've wrestled with this question, and you've probably landed on one of two unsatisfying answers.

Option one, hit the live API in CI. Your tests are realistic, but they're also slow, flaky, and non-deterministic. The same prompt can return different completions on different runs, so assertions are fuzzy at best. And every CI run costs real money. Multiply by every push on every branch and the bill adds up fast.

Option two, hand-write mocks. Your tests are fast and free, but now you're maintaining a parallel universe of fake LLM responses. Every time a prompt changes, the mocks drift. Eventually they describe a model that doesn't exist, and your green test suite is testing fiction.

There's a third option. Record real responses once, then replay them forever. Orchid makes this a first-class workflow.

How Replay Mode Works

Orchid is a recording proxy that sits between your app and your LLM providers. In capture mode, it records every request and response pair to a local SQLite database. The full story is in Record, Inspect, Replay.

Replay mode is where the recording pays for itself. When the proxy runs in replay mode, it blocks all outbound network traffic. Each incoming request is hashed by its semantic content, matched against the recording, and answered with the stored response. If no match exists, the proxy returns a deterministic mock error instead of silently calling the provider.

Three properties fall out of this design.

Zero API cost. No outbound calls means no tokens billed. Ever.
Deterministic results. The same request always gets the same response, so your assertions can be exact.
Offline execution. CI runs with no provider credentials and no network dependency on OpenAI, Anthropic, or anyone else.

The Core Workflow

The loop looks like this regardless of language or framework.

Run your test or application once in capture mode against the live API. The proxy records everything.
Export the session as a JSON fixture and commit it to your repository.
Configure CI to import the fixture and run in replay mode. Tests execute instantly, offline, and for free.
When a prompt changes meaningfully, re-record. The fixture diff in your pull request shows reviewers exactly how the model interaction changed.

That last point is an underrated benefit. Fixtures are reviewable artifacts. A prompt change shows up in version control alongside the real response it produced.

Python with pytest

The Python SDK wraps the whole loop in a decorator. Bind a test to a fixture file, and the decorator handles recording and replaying.

import orchid
from openai import OpenAI

@orchid.replay("tests/fixtures/summarizer_test.json")
def test_summarizer():
    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Summarize: the quick brown fox..."}]
    )
    assert response.choices[0].message.content

Run once with ORCHID_RECORD=true to capture live responses into the fixture. Every subsequent run replays from the file. No mocking library, no patching, no fake response objects to maintain. TypeScript projects get the same pattern with the withReplay() helper from the NPM package.

Whole-Application Replay

The decorator approach is great for unit tests, but you can also replay an entire application run without touching the code. The thin SDK reads its configuration from environment variables, so switching modes is a shell-level concern.

export ORCHID_SESSION_ID="my-app-run-001"
export ORCHID_MODE="replay"
python your_app.py

Your application executes its full logic, every LLM call answered from the recording. This is ideal for integration tests, demos that need to work without network access, and reproducing a production issue locally using the exact responses that triggered it.

No SDK? No Problem

The replay machinery is driven by HTTP headers, so any language can use it. Record with X-Orchid-Mode: capture, replay by changing that single header to replay.

curl http://localhost:4320/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "X-Orchid-Session-Id: manual-test-1" \
  -H "X-Orchid-Mode: replay" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

Fixtures move between environments through the control API.

# Export locally after recording
curl -H "X-Orchid-Api-Key: $ORCHID_API_KEY" \
  http://localhost:4321/v1/sessions/manual-test-1/export > fixture.json

# Import in CI before the replay run
curl -X POST -H "X-Orchid-Api-Key: $ORCHID_API_KEY" \
  -H "Content-Type: application/json" \
  --data @fixture.json \
  http://localhost:4321/v1/sessions/import

More language examples live in No SDK Required.

What About Streaming?

Streaming responses are recorded too. The proxy buffers SSE chunks while serving them to your client in real time, then writes the fully reassembled completion to the database. On replay, your application receives the response just as it would from the provider. Your streaming code paths get tested along with everything else.

A Bonus You Didn't Ask For

Because replayed responses arrive with near-zero latency, replay mode also strips network time and provider variance out of your measurements. That makes it a surprisingly good harness for profiling your own agent logic. We dig into that in Benchmarking Your Agent Logic Without the Network Noise.

Try It on One Test

You don't need to migrate your whole suite to see the value. Pick one test that currently hits a live API or leans on a brittle mock. Run the proxy, record a fixture, switch to replay, and watch the test become fast, free, and deterministic.

docker run -d \
  --name orchid-proxy \
  -p 4320:4320 -p 4321:4321 \
  -v orchid-data:/data \
  -e ORCHID_API_KEY=your-secure-api-key \
  -e ORCHID_DB_PATH=/data/orchid.db \
  ghcr.io/mario-guerra/orchid-proxy:latest

Your test suite should not have a monthly bill. Get started at orchidtrace.xyz.