Get Orchid
Back to Blog

Record, Inspect, Replay. A Better Way to Build AI Agents

Orchid Team7 min read

If you're building with LLMs, you've probably noticed something uncomfortable. The hardest part isn't writing the code. It's everything that happens after the code runs.

Your agent fails, and the logs tell you nothing useful. Your test suite either hits live APIs and costs real money, or it relies on hand-written mocks that drift out of date the moment you change a prompt. And at the end of the month, your provider bill arrives with a number nobody can explain.

These feel like three separate problems. They're not. They're all symptoms of the same root cause. You can't see the traffic between your application and the model.

Orchid fixes that with one simple idea. Record everything, then build on the recording.

The Three Problems Every AI Team Hits

Let's name them properly, because you've likely felt all three.

Invisible failures. An agent makes a dozen LLM calls, picks tools, branches on responses, and somewhere in that chain something goes wrong. Traditional logs give you a flat stream of text. The actual decision data, the prompts, the completions, the token counts, the latencies, is scattered or missing entirely. Debugging becomes archaeology.

Fragile, expensive tests. How do you test code that calls an LLM? Most teams choose between two bad options. Hit the live API in CI, which is slow, flaky, and costs money on every run. Or hand-write mocks, which are tedious to build and silently rot as your prompts evolve.

Unaccountable spend. Token costs are tiny per call and shocking in aggregate. A stuck agent loop can burn through real money before anyone notices. Without per-session, per-step cost attribution, you're budgeting blind.

Each of these problems has spawned its own category of tooling. Observability platforms for the first. Mocking libraries for the second. Cost dashboards for the third. Three vendors, three integrations, three places your data lives.

But notice what all three problems have in common. They would each be trivial if you had a complete, queryable record of every request and response that crossed the wire.

One Proxy, Complete Visibility

Orchid is a lightweight proxy that sits between your application and your LLM providers. Your app points at the proxy instead of the provider. That's the whole integration.

From that position, the proxy sees everything. Every prompt, every completion, every streaming chunk, every token count, every millisecond of latency. It records all of it to a local SQLite database, and everything else Orchid does is built on that recording.

And it isn't limited to LLM traffic. Agents call search APIs, vector stores, and internal services too, and one extra header routes any HTTP API through the same recording. Your tool calls land in the same session timeline as your prompts, and they replay in tests just like everything else.

Here's what that means in practice.

Inspect Any Run, Step by Step

Open the embedded web visualizer and you get a chronological timeline of every exchange in a session. Each call is a clickable step. You see which calls succeeded, which failed, and how long each one took. Click any step and the full payload opens, including the system messages, parameters, and the raw response from the provider.

It feels like a browser network inspector for your AI pipeline. No grepping. No reconstructing state from log lines. Just point, click, and see exactly what your agent saw.

We wrote more about why this matters in Why Debugging AI Pipelines Is Broken.

Let Your AI Debug Your AI

Orchid ships with a built-in Model Context Protocol server. That means your coding assistant, whether it's in Cursor, VS Code, or Claude Desktop, can query your recorded sessions directly.

Ask your assistant why last night's pipeline run failed, and it can list the sessions, pull the step outline, search the payloads for errors, and report back with the failing exchange and the exact response that caused it. Your debugging partner has access to the same evidence you do.

This changes the debugging workflow entirely. Read the full walkthrough in Let Your AI Debug Your AI.

Test for Free with Deterministic Replay

Because Orchid records complete request and response pairs, it can serve them back. Switch the proxy to replay mode and it stops making outbound calls entirely. Incoming requests are matched against the recording by a semantic hash of the prompt, and the stored response is returned instantly.

The result is a test suite that runs offline, deterministically, and at zero API cost. Record a session once, export it as a JSON fixture, commit it to your repo, and replay it in CI forever. No hand-written mocks, no drift, no flaky network calls.

We cover the full workflow in Zero-Cost AI Testing.

Know What Everything Costs

Every recorded exchange gets a real-time USD cost attribution based on a pricing engine you control. Costs roll up per session, per step, and per provider, so you can answer questions like "what did that agent run actually cost?" and "which step in the pipeline is burning the budget?" the moment they come up.

More on the pricing engine in Know What Every Agent Run Costs.

What Makes This Approach Different

You might be wondering how this compares to the LLM observability platforms you've seen. A few design choices set Orchid apart.

Zero instrumentation. You don't wrap your LLM calls, add decorators around your generation code, or adopt a framework. You change a base URL, or let the thin SDK patch the HTTP transport layer for you. Your prompt-handling code stays untouched.

Any language. The proxy is driven entirely by HTTP headers. Python, TypeScript, and Rust get first-class SDKs, but Go, Java, Ruby, or a shell script can capture and replay traffic with two headers. See No SDK Required for examples.

Any API, not just LLMs. The same record and replay machinery works for every HTTP call your agent makes. Search APIs, vector stores, internal services, all captured in the same timeline and all served back deterministically in tests.

Local first. Orchid is a single binary with an embedded UI and a SQLite database. Your prompts and completions never leave your infrastructure. There's no cloud backend, no phone-home, and secret-like headers and fields are redacted before anything touches disk. We go deeper on this in Local-First Observability.

One tool, three jobs. Debugging, testing, and cost tracking all run on the same recording. You set it up once and every capability comes along for free.

Where Orchid Shines

A few scenarios where teams get value on day one.

  • Debugging a stuck agent. Open the timeline, spot the loop visually, click into the exchange where reasoning went sideways. We walk through a real case in How to Debug a Stuck LangChain Agent in 30 Seconds.
  • CI for LLM apps. Record fixtures locally, replay in CI. Tests run in milliseconds with zero API spend.
  • Performance profiling. Replay mode removes network latency and provider variance, so you can benchmark your own agent logic in isolation. Details in Benchmarking Your Agent Logic Without the Network Noise.
  • Cost audits. Pull per-session cost summaries, find the expensive steps, and fix them before the invoice does it for you.
  • Agent-assisted triage. Connect your coding assistant over MCP and let it investigate failures with real evidence instead of guesses.

Getting Started Takes Five Minutes

The proxy ships as a Docker image with multi-architecture support.

docker run -d \
  --name orchid-proxy \
  -p 4320:4320 \
  -p 4321:4321 \
  -v orchid-data:/data \
  -e ORCHID_API_KEY=your-secure-api-key \
  -e ORCHID_DB_PATH=/data/orchid.db \
  ghcr.io/mario-guerra/orchid-proxy:latest

Point your LLM client at http://localhost:4320/v1, run your application, and open http://localhost:4321 to watch the recording appear. From there, every capability in this article is one header or one click away.

Building AI agents is hard enough without flying blind. Record your traffic, inspect it when things break, replay it when you test, and track every dollar as you go. Have questions about latency, security, or production use? The FAQ has honest answers. Visit orchidtrace.xyz or check out the GitHub repository to get started.