Frequently Asked Questions

Answers to the questions developers and managers ask before routing traffic through Orchid. If yours is missing, ask me.

The Basics

Who are you?

This is me: marioguerra.xyz

This is my consulting venture: ignitionai.xyz

Why did you build this?

Because I needed it. I was building my own AI-enabled products and kept hitting the same wall. When an agent misbehaved, I had no good way to see what actually went over the wire, and when I wanted to test agent logic, my options were burning real tokens on every run or hand-writing mocks that drifted out of date the moment a prompt changed.

I looked at the existing tools and none of them fit how I wanted to work. Most required instrumenting my code with a vendor SDK, shipped my prompts to someone else's cloud, or solved observability without touching the testing problem at all. So I built the tool I wished existed. A local proxy that records everything with a one-line change, replays it deterministically for free, and keeps all of it on my own machine. Orchid is that tool, cleaned up and shared.

Architecture and Reliability

Does the proxy add latency to my LLM calls?

Some, since it is an extra hop in the request path, but it is designed to stay out of the way. The proxy is a compiled Rust binary, and streaming responses are forwarded to your client chunk by chunk as they arrive while recording happens in parallel. Against LLM calls that routinely take seconds, the proxy's overhead is a rounding error. In replay mode it is actually faster than the real provider, since responses come from local storage.

What happens if the proxy goes down? Is it a single point of failure?

If you use one of the SDKs, no. All three SDKs fail soft. The Python SDK health checks the proxy at startup and routes traffic directly upstream if it is unreachable. The TypeScript and Rust SDKs go further and recover per request, so if the proxy dies mid-run they strip the Orchid headers and retry the call against the original provider directly. Your application keeps working, you just lose recording until the proxy is back.

If you integrate without an SDK by pointing a base URL at the proxy, that fallback is your responsibility. A dead proxy means failed calls until you repoint the base URL or restart the proxy.

Why is it shipped as a container?

Because it is the most reliable way to hand you a working product regardless of where you run it. The proxy is a single Rust executable with an embedded UI and database, and the container packages it so the same image runs identically on macOS, Linux, or Windows, Apple Silicon or x86, your laptop or your cloud. Multi-architecture images are published to GHCR with every release. There is no runtime to install and no dependency matrix to debug.

Does the proxy intercept TLS or act as a man-in-the-middle?

No. Orchid is a reverse proxy that your client points at explicitly, not a transparent network interceptor. There are no certificates to install and no traffic is captured that you did not deliberately route through it. Requests you send to the proxy are forwarded over HTTPS to the upstream provider.

Which providers, frameworks, and languages are supported?

OpenAI, Anthropic, and Google Gemini API shapes are recognized automatically, so for those you only change the base URL. Any other HTTP API, including self-hosted OpenAI-compatible servers like Ollama or vLLM, works by adding an X-Orchid-Target-Url header that tells the proxy where the request was headed.

Frameworks built on the standard clients, such as LangChain, LlamaIndex, CrewAI, and AutoGen, are captured automatically. SDKs ship for Python, TypeScript, and Rust, and every other language integrates with a base URL and two headers. See No SDK Required for examples.

Can it record API calls that are not LLM calls?

Yes, and this matters more than it sounds. Agents call search APIs, vector stores, and internal services alongside language models, and failures there are just as hard to debug. Route any HTTP request through the proxy with an X-Orchid-Target-Url header naming the real upstream, and it is recorded in the same session timeline as your prompts, inspectable in the same visualizer, and served back deterministically in replay mode. One recording covers your agent's entire interaction with the outside world, not just the model.

How much disk space will recordings use?

Recordings are bounded by default. The proxy prunes whole sessions oldest-first once they are older than 30 days or once the database exceeds 1 GB, and both limits are configurable, including off entirely. The active session is never pruned mid-run. Everything lives in a single SQLite file you can inspect, copy, or delete with normal tools.

Security and Privacy

Where does my data go?

Nowhere. The proxy forwards requests only to the upstream APIs your application was already calling, and everything it records stays in a local SQLite database on your infrastructure. There is no cloud backend, no phone-home, and no telemetry. Nothing about your usage is sent to me, and your prompts are never used to train anything. The full architecture case is in Local-First Observability.

What happens to my API keys?

Your provider key travels in the Authorization header exactly as before. The proxy forwards it untouched to the upstream and never writes it to disk. Headers, query strings, and body fields with secret-like names, such as keys, tokens, passwords, and cookies, are stored as redacted values, scrubbed in memory before anything is persisted.

One honest caveat. Redaction recognizes field names, not prompt contents. Prompt and completion text is recorded verbatim, which is the point of a forensic recorder, so a secret pasted into a prompt will be stored with the rest of the prompt text. Treat the database file with the same care as your application logs.

Can my whole team share one proxy?

Yes, and a small VM running one container is a common setup. Be aware that access control is currently a single shared API key, not per-user accounts or roles. Everyone with the key can see every recorded session. For most staging environments that is fine, but it is worth knowing before you point anything sensitive at a shared instance.

Testing and Replay

How does replay matching actually work?

In replay mode the proxy blocks all outbound traffic, hashes the semantic content of each incoming request, and serves the matching recorded response from the database. The same request gets the same response every time, which is what makes tests deterministic. If no match is found, the proxy returns a clear mock error instead of silently calling the live API, so a test can never burn tokens by accident. The full workflow is in Zero-Cost AI Testing.

What happens to my fixtures when I change a prompt?

A meaningfully changed request no longer matches the recording, and the replay returns a mock error, which fails your test loudly rather than passing it falsely. The fix is to re-record the fixture, and the resulting diff in your pull request shows reviewers exactly how the model interaction changed. That is a feature. Silently passing tests against stale mocks is how teams ship regressions.

Are streaming responses supported?

Yes. The proxy forwards SSE chunks to your client in real time while buffering them in parallel, then writes the fully reassembled completion to the database. Replay serves the response back so your streaming code paths get exercised in tests like everything else.

The Project

Can I use Orchid in production?

Orchid is a developer tool, and staging is its home. That said, the design leaves the choice to you. The SDKs fail soft if the proxy is unreachable, the proxy deploys to cloud infrastructure independently of any environment, and integration is an environment variable. Some users temporarily point a production run at the proxy to capture data they cannot reproduce elsewhere, then revert. Nothing in the architecture prevents that, and the fail-soft behavior makes it a low-risk maneuver with an SDK in place.

What I am not claiming today is a high-availability production observability platform with SLAs. If that is what you need, you should know that before you adopt, not after.

What does it cost?

Orchid is free to run today. The binary, the SDKs, the visualizer, and the MCP server have no usage fees and no tiers. I will not pretend to know the long-term answer before I do, but any future commercial offering will not retroactively paywall what you are running now, and this page will be updated before anything changes.

Why is the proxy not open source?

Orchid is an early-stage project, and an open-source license is a one-way door. I am keeping the core closed while I work out what a sustainable long-term model for the project looks like, because I would rather make that decision once and deliberately than rush it and regret it. What I can commit to today is that the proxy is free to run, your data never leaves your infrastructure, and everything it records is stored in open formats you can read without me. If the licensing changes, this page will say so.

Am I locked in if I adopt Orchid?

No, and this is by construction. Recordings live in SQLite, sessions export to plain JSON, and the integration is a base URL change plus a couple of headers. Removing Orchid is the same one-line change as adding it. Worst case, the project disappears tomorrow and your historical data remains readable with tools that will exist for decades.

How does Orchid compare to LangSmith, Langfuse, or Helicone?

Those are capable platforms, and if you want cloud-hosted analytics with team dashboards, they are worth evaluating. Orchid makes a different set of design decisions. It is local-first, so your traffic never leaves your infrastructure. It is zero-instrumentation, so you change a base URL instead of adopting a framework. And the recording does triple duty, powering deterministic replay testing and agent-driven debugging over MCP alongside observability.

If your priority is fleet-level production analytics, a cloud platform may fit better. If your priority is debugging, testing, and cost visibility during development without sending prompts to a third party, that is what Orchid was built for.

Where do I get help or report a bug?

Open an issue on GitHub or use the contact form. Reports that include a session export are much easier to act on if you're comfortable sharing it, which is one more thing the recording is good for.