Your AI Bill Is A Context Problem

Within four months, Uber burned through its entire 2026 AI budget. It then capped every engineer at $1,500 a month. Similarly, ServiceNow exhausted its full-year Anthropic coding budget in the first few months. Even Microsoft is winding its own engineers off Claude Code. All these examples remind me of the cloud bill shock of a decade ago, that was met with the same conviction that the cure is a tighter spend cap and a sharper rate.

This time, though, I think it is the wrong cure.

For sure, “tokenmaxxing” is what you get when you incentivize adoption without governing value. Uber, like Meta, literally ranked teams on a usage leaderboard to drive adoption. But capping the bill mistakes a problem of value for a problem of price. The Uber COO said it clearly: Between all that Claude Code spend and anything customers can feel, “that link is not there yet.” And that gap, not the bill, is the problem.

Your AI bill exposes context debt

Why did the token consumption numbers explode? Agentic workflows don’t call a model once: they loop. Anthropic’s own engineering research put a single agent at 4x the tokens of a chat interaction, and a multi-agent system at 15x and every turn of that loop re-feeds the context window. As the window fills, the signal-to-noise ratio collapses, and the quality, latency, and cost degrade together. So the bill includes both the price of intelligence the company consumes as well as the runtime tax it pays when its own knowledge isn’t machine-readable. And its agents have to brute-force the missing meaning back into the window on every single call. This is context debt, billed by the token.

Some of that context is visible; much of it is not. A user sees the prompt they typed, but the model call may also include platform-supplied instructions, prior interaction history, tool metadata, retrieval scaffolding, and other orchestration elements. That hidden context may be necessary, but it is also billable. So, you are not just paying for the answer, you’re paying for the full assembled information scaffolding required to produce the answer in a way that is safe, reliable in a world of probabilities.

And given today’s prices ride on discounted compute and venture-subsidized economics, these bills will only get worse. After all, OpenAI and Anthropic are soon heading for public markets that will want those subsidies gone sooner rather than later.

You cannot cap your way to business value

Forrester’s IT spend management framework is useful here. Visibility into the bill is table stakes, and control capabilities are necessary hygiene. But hygiene is all they are. None of it creates value. Visibility into the total is no longer enough. Enterprises need attribution: which tokens reflected user intent, which grounded the agent in business context, which supported orchestration, and which were avoidable repetition. Without that breakdown, leaders can limit spend, but they cannot improve the economics of the work. The Linux Foundation’s Tokenomics Foundation may help by advancing open standards for AI cost management.

Look, we are still very early in the agentic era. When your organisation budgets tokens, you know you’re not buying an off–the–shelf product with a predefined return. You’re making a deliberate choice to operate at the leading edge, where value only emerges through experimentation. We’re all learning how agents behave, what they cost, where they create value, and how that value compounds. And that’s OK: that learning will capitalize in your AI reinvention. But cap it bluntly and you are shutting down the reinvention the spend was meant to fund.

Optimization is the opposite move. It does not ask how to spend less on intelligence, but how to tie that spend to the outcomes it produces. This is where unit economics come in: you run the system until every dollar buys the most result it can. In an agentic system, what each token returns depends on the context the agent reasons over. Give it the right context, through GraphRAG or another retrieval layer, at the right moment, and the same token buys a better outcome with less waste. Give it brittle or irrelevant context, and you pay full price for confusion. This is context engineering.

The build bill and the run bill

Two bills hide inside your token spend. First, building agentic systems is CAPEX: that’s the experimentation, the coding, the ontology, the wiring of systems into something that hopefully delivers value. It is closer to R&D than to a utility charge, and I would argue this is the vast majority of where the spend lands today. The agents are being put to work building software, few of them are running business processes yet.

The artisans of that build are increasingly forward-deployed engineers (FDEs). They embed in the operational mess of workflows, of decision rights, governance, data, and systems, then translate it into working software, shaping the technical decisions until the ontology and the agents run in production. These experiments earn their tokens by showing their unit economics converging inside a window you set, and when it stops converging you kill the workload, not the budget. A robust strategic portfolio management capability is key here to keep reallocating capital to workloads that are compounding.

Running those systems is the other bill: runtime inference is the recurring, per-call cost of agents once they are live, including the visible and invisible context assembled for each call. This creates a supplier-side conflict: the same platform that designs the runtime and assembles the hidden context also bills for the tokens it creates, so poor context design can quietly become a recurring platform tax the customer cannot easily see, tune, or challenge. This is pure OPEX, and that’s a bill most enterprises have barely begun to see. Here optimization stops being a project and becomes a standing discipline. If the building bill already shocks you, wait until you receive the running bill as you deploy and scale your agentic workflows. To optimize that spend, I argue that a new discipline will emerge: ContextOps.

ContextOps is the operating discipline

The ontology an FDE constructs is a snapshot of a business that won’t hold still: Model capabilities evolve, business and IT processes drift, enterprise decision rights change, and the context that made an agent correct on Tuesday stops being correct by the following quarter. While building is a project motion, keeping the thing grounded as the business shifts beneath it is an operating one.

ContextOps is the FinOps of the agentic era, born at the same kind of inflection: a spend and control surface that turned continuous, consequential, and ungoverned, until someone had to own it as a discipline rather than a cleanup. Of course, ContextOps governs a different object. FinOps optimizes how tokens are consumed, reaching at maturity beyond raw cost to tie consumption to business outcomes through model routing, caching, and inference economics. ContextOps governs what those tokens represent: whether the agent is still reasoning over a faithful, current picture of how the business runs.

In other words, while every cost lever acts on the price of processing context, none of them check whether the context is still true. Imagine an agent that is cheap, fast, well-routed, comfortably under budget, yet approving exceptions against an org chart redrawn two quarters ago. A narrow FinOps view sees a healthy line item. The business has an agent acting, confidently, on a world that no longer exists. Only the discipline watching fidelity catches it.

So ContextOps keeps the ontology current as the business moves, optimizes what each token buys, strips out unnecessary context, and feeds every run back in so the context sharpens instead of staling. Context stops being something you build and becomes something you operate.

Why ContextOps becomes a managed service

One last thought: As companies increasingly use FDEs to build agentic workflows, many will hire service providers to optimize and operate them through ContextOps. The context work is continuous and outcome-focused, and it depends on proximity to business processes and workflows, to leadership and governance, to how demand shows up and decisions get made. That proximity is what makes it possible to harvest decision traces, retire context debt before it becomes a problem, adapt ontologies as the business evolves, and re-ground agents as models, including private models, evolve.

But proximity is not granted once. It is earned continuously through demonstrated value. The provider has to keep proving it, and the only way to keep proving it is to improve the performance of the agentic workflows and the fidelity of the context they run on. That is the flywheel: better performance earns deeper trust, deeper trust grants closer proximity, and closer proximity makes the next improvement possible.

None of this is a one-time fix. The context degrades, and so does the trust that funds access to maintain it, the moment improvement stops. That is why ContextOps will not be bought as a project. It is a long-lived operational motion, likely to emerge as a managed-services category, because the thing it governs never stops moving.

Source link