Token usage tracking: Controlling AI costs

Fri Oct 31 2025

AI bills tend to spike quietly: a few extra tokens here, an unexpectedly long response there. Tokens are the meter, and if the meter is hidden, the bill surprises never end. Strong ai observability is the difference between guessing and knowing. This guide focuses on tracking tokens, keeping spend predictable, and proving value when the budget comes under scrutiny. Expect concrete steps, not hand-wavy theory.

The goal is practical: instruments that work in production, not just on a slide. You’ll see what to log, how to estimate, and where to push costs down without hurting quality.

Why tokens matter for cost control

Tokens drive AI cost. No tokens, no bill; more tokens, higher bill. Input and output tokens usually have different price tags, which means long responses often dominate invoices. That shows up in the wild too: teams in r/LLMDevs discuss how output-heavy paths blow up bills and why they track usage in production with real numbers, not vibes (LLM usage and cost).

Multi-modal plans add more moving parts. Pricing for text, vision, and audio rarely lines up, so a shared unit helps when reporting across features. Builders in r/SaaS are converging on the idea of unified token accounting to compare apples to apples across modalities (unified tokens across modalities).

Where to focus first:

  • Response-heavy paths: summarization, code generation, long-form chat. Output tokens pile up fast.

  • Per-user and per-session cost: put it on a dashboard so owners can see it and act on it (user-level costs).

  • Workflow runs: map spend to each execution and use logs as the source of truth (workflow cost control; execution logs approach).

Accurate token math strengthens budgets and headroom plans. It also informs pricing choices: bundle an AI feature, sell it as an add-on, or charge as a standalone SKU. Lenny Rachitsky’s breakdown of monetization tradeoffs is a good framing for those calls (monetize AI features). And when it’s time to ship a change, tie costs to outcomes with experiments; the Statsig team shows how to measure AI-generated code in production and how to optimize LLM choices with online tests (measure and optimize AI-generated code; LLM optimization via online tests).

Setting up a token tracking framework

Start simple so the measurement layer can grow with the product. Raw request counts give fast, directional estimates, but they miss token spikes. To get real ai observability, attach token context to every request: model, user, route, and timing.

A practical maturity path:

  1. Stage 0: log request counts and model names. Good for early red flags.

  2. Stage 1: capture input_tokens and output_tokens from providers; compute total_cost using rate cards. Now budgets feel real.

  3. Stage 2: attribute costs to users, sessions, and workflows; connect them to outcomes with experiments. This is where product and finance align.

Use layered methods that balance speed and accuracy:

  • Estimate tokens client-side with a lightweight tokenizer and log the estimate.

  • Record the provider’s token counts server-side; reconcile differences routinely.

  • Cache prompts and responses to avoid paying for repeat tokens.

  • Attribute costs per user or workflow so owners can control their slice (how to track API usage and costs).

Centralize everything in a well-structured schema. A minimal set:

  • request_id, user_id, route, model, timestamp

  • input_tokens, output_tokens, total_tokens, total_cost

  • workflow_id or session_id, experiment_variant, cache_hit

Multi-modal math is cleaner with a shared unit or conversion factors. r/SaaS threads outline workable approaches for estimating outputs across text and other modes (token estimates across modalities).

Close the loop by tying spend to impact. Feed token and cost metrics into online tests to validate value. Statsig’s write-ups on LLM optimization and AI code generation provide concrete patterns for running those experiments without turning the product into a science project (LLM optimization via online tests; measure and optimize AI-generated code). Cost without outcomes is noise; outcomes without cost is hope.

Practical strategies for limiting spending

Start with prompt discipline. Run quick A/B tests so token cuts do not tank quality. The method matters: experiment design from Statsig’s blog lays out how to measure accuracy, latency, and cost together, which is the only way to avoid penny-wise decisions that hurt UX (experiment design).

Simple prompt fixes:

  • Strip boilerplate and repeated context blocks.

  • Set firm output limits; ask for JSON or bullets, not essays.

  • Keep system messages concise; avoid long-shot few-shot examples unless they pay for themselves.

  • Route to smaller models for simple steps; reserve large models for edge cases.

Next, add a cache. Hash the full input payload, including system prompt, tools, and version. Reuse outputs when there is a hit, and track TTLs so freshness stays reasonable. Teams using n8n report big token savings and steadier workflow costs when cache hit rates climb past 30 percent (workflow costs). That stability carries through to per-user costs when attribution is set up properly (per-user costs).

Treat the cache as a first-class feature:

  • Collect cache_hit as a metric; alert when it drops.

  • Compare token use before and after cache changes.

  • Version prompts so cache keys do not bleed across incompatible formats.

Finally, enforce quotas and clear usage tiers. Cap tokens per user or workspace; throttle expensive routes by role. Unify cross-modality pricing with a consistent token system, then align price fences with the chosen monetization strategy from Lenny’s analysis (unified token system; monetization strategy). No magic here: limits turn surprises into plans.

Integrating token governance with financial oversight

Bring finance in early and make token governance a shared routine, not a quarterly fire drill. Set alert thresholds, decide who responds, and agree on what gets paused when costs spike. Use the pricing lens from Lenny’s newsletter to map variable token costs to plans and quotas that customers understand (monetize your AI features).

Ground monthly expenses in real workloads, not estimates. Logs are the truth, and several community write-ups show how teams make that the backbone of cost reports, from r/LLMDevs to n8n builders (track LLM usage and cost; calculate AI usage costs). Statsig customers often pair this with experiment results so finance can see cost per outcome, not just cost per call.

Metrics that make CFOs and PMs equally happy:

Tailor dashboards to the audience. Product teams want cost per feature and per experiment; finance cares about unit costs and forecast accuracy. Experiments should always expose cost deltas next to outcomes; that is the cleanest way to justify model changes and caching strategies, and the approach mirrors the guidance in Statsig’s LLM optimization posts (LLM optimization; measure AI-generated code).

Closing thoughts

Tokens are the lever for AI cost control, and ai observability is how that lever gets pulled with confidence. Track tokens with a clean schema, attribute costs to users and workflows, and link every optimization to measurable outcomes. Then keep spending in check with prompt discipline, smart caching, and quotas that match the pricing strategy.

Want to dig deeper? Check the community threads on usage tracking in r/LLMDevs, r/node, and r/n8n, Lenny’s take on monetization strategy, and Statsig’s guides on running online experiments with LLMs (LLM usage and cost; user-level costs; workflow cost control; monetize AI features; LLM optimization via online tests; measure and optimize AI-generated code).

Hope you find this useful!



Please select at least one blog to continue.

Recent Posts

We use cookies to ensure you get the best experience on our website.
Privacy Policy