Get Orchid
Back to Blog

Know What Every Agent Run Costs Before the Bill Arrives

Orchid Team5 min read

Quick question. What did your most expensive agent run cost last week?

If you're like most teams building with LLMs, you don't know. You know the monthly invoice from each provider, and maybe a rough split by API key. But the questions that actually matter for engineering decisions go unanswered. Which pipeline step burns the most budget? What does one customer interaction cost end to end? Did that prompt refactor make things cheaper or more expensive?

This isn't a billing problem. It's an attribution problem. Providers bill you at the account level, but you make engineering decisions at the level of sessions, steps, and prompts. Orchid closes that gap by computing real USD costs for every single exchange as it happens.

Cost Attribution at the Point of Capture

Orchid is a recording proxy that sits between your application and your LLM providers. Every request and response passes through it, which means it sees the exact token counts in every response, including streaming completions, which it reassembles before recording.

When an exchange is captured, the proxy's pricing engine looks up the provider and model, applies the per-token rates, and stores the computed USD cost right alongside the payload. Costs then roll up naturally.

  • Per exchange. Each LLM call has its own cost, visible the moment it completes.
  • Per session. Session summaries show total spend for an agent run or test pass.
  • Per step and provider. Performance profiles group cost by pipeline step, so you can see which stage of your workflow dominates the budget.

There's no nightly batch job and no waiting for the provider's dashboard to catch up. The cost data exists the moment the response lands.

You Control the Pricing Schema

Model pricing changes frequently, and no hardcoded table stays correct for long. So Orchid doesn't hardcode one. You push the pricing schema to the running proxy, and you can update it at any time without a restart.

The schema is a simple JSON document. Providers at the top level, models nested inside, and rates expressed in USD per million tokens.

{
  "openai": {
    "gpt-5.4": { "prompt": 2.50, "completion": 15.00 }
  },
  "anthropic": {
    "claude-4-6-sonnet": { "prompt": 3.00, "completion": 15.00 }
  }
}

Push it with one request.

curl -X POST http://localhost:4321/v1/pricing \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  -d @pricing.json

Matching is case-insensitive and supports substring fallbacks, so a rate defined for gpt-4o automatically covers dated variants like gpt-4o-2024-05-13. If a model isn't in your schema at all, Orchid applies a transparent zero-cost fallback rather than inventing a number. You can inspect the active, normalized schema anytime with a GET on the same endpoint.

Already Recorded Sessions? Backfill Them

What about exchanges captured before you loaded pricing, or before a rate correction? Orchid includes a recompute endpoint that re-runs cost attribution across stored exchanges using the current schema.

curl -X POST http://localhost:4321/api/pricing/recompute \
  -H "Authorization: Bearer <your-api-key>"

The response tells you how many exchanges were updated. Rows with no pricing match are left untouched, so a backfill never overwrites good data with guesses.

Ask Your Assistant About Spend

The pricing engine is also exposed through Orchid's MCP server, which means your coding assistant can work with cost data directly. The update_pricing and get_pricing tools manage the schema, recompute_pricing runs backfills, and get_perf_profile answers questions like "which step in this job cost the most?"

That last one changes how cost conversations happen. Instead of pulling up a dashboard in a meeting, an engineer asks their assistant mid-task and gets a per-step breakdown in seconds. More on agent-driven workflows in Let Your AI Debug Your AI.

What Cost Visibility Actually Buys You

If you're an engineer, per-step cost data turns optimization from guesswork into engineering.

  • Find the expensive step. A profile showing one summarization step responsible for most of a pipeline's spend tells you exactly where a cheaper model or a shorter prompt pays off.
  • Catch runaway loops early. A stuck agent shows up as a session whose cost is climbing abnormally. You can spot it in the session list before it becomes a line item. We covered the debugging side of this in Why Debugging AI Pipelines Is Broken.
  • Measure prompt changes. Re-run the pipeline after a refactor and compare session costs directly. Cheaper or not, now you know.

If you manage a team, the same data answers the questions you actually get asked.

  • Unit economics. What does one customer request cost across the whole pipeline? That's a session summary.
  • Budget forecasting. Per-session costs multiplied by projected volume gives you a defensible estimate instead of a shrug.
  • Model migration decisions. Considering a switch to a cheaper model? Update the pricing schema, replay a recorded workload, and compare. The replay workflow in Zero-Cost AI Testing makes the experiment itself free.

One Recording, Many Answers

Cost tracking in Orchid isn't a separate product or an extra integration. It's a property of the recording. The same captured exchanges that power debugging and replay also carry their own price tags, visible in the embedded visualizer, queryable over the API, and accessible to your AI assistant. The full picture is in Record, Inspect, Replay.

Run the proxy, push your pricing schema, and route some traffic through it. The next time someone asks what an agent run costs, you'll have a number instead of an estimate. Curious what it costs to run Orchid itself? The answer is in the FAQ. Get started at orchidtrace.xyz.