The cost problem with enterprise AI is a success problem. The tools work, often well enough that people reach for them constantly, and under the pricing model the industry has settled on, every reach costs money. The organisations now cancelling contracts and rationing access are not the ones where AI failed. They are the ones where it worked, usage climbed, and nobody had modelled what that would cost at full volume.
What just happened
Earlier this year Microsoft moved to withdraw a widely used AI coding tool from its developers after the bill ran far beyond what anyone had planned. The tool was charged by consumption rather than by seat, and reports put the cost at somewhere between several hundred and a couple of thousand dollars per developer per month, enough that it became more expensive than some of the people using it. Uber described burning through its entire annual AI budget in four months, and coined an internal term for the effect, the steady creep of consumption that delivers real work but does not show up as anything visible in the product.
In both cases the deployment succeeded. The cost came precisely because people used the tools, and nobody had capped what that use would add up to. These cases are not isolated, because the pricing model underneath them is now the industry norm rather than one vendor’s choice.
Why success and cost rise together
Traditional enterprise software decoupled success from cost. You bought a licence, and once you had bought it, more usage was free, so the more your people used the system the better the investment looked. This shaped how a generation of executives think about software spend: buy it, roll it out, drive adoption, and the economics improve as usage grows.
Consumption-based AI pricing reverses that relationship. When you pay by the token, which is the unit most frontier vendors bill on, cost rises with use, so the more useful the tool and the more your people reach for it, the more it costs. Value and cost climb together, with no licence ceiling to flatten the curve. The thing that made the bill large was that the tool was good. This is unfamiliar to most organisations, and it breaks the long-held intuition that wider adoption is always the goal.
Where the cost risk actually sits
The risk is not spread evenly, and knowing where it concentrates is most of the battle. Individual productivity use, a person working through a chat interface on a fixed monthly plan, is usually capped. The plan has a ceiling, the person can see it, and they ration themselves against it without thinking of it as cost discipline, prioritising what is worth doing within the limit they have. The limit does the governing.
The real exposure lives in programmatic use: applications, automated workflows, and agentic tools that call a model through an API and are billed by the token for everything they consume. Here there is no natural ceiling. A process can call a model thousands of times, an agent can loop through a task consuming far more than a person ever would, and the consumption happens out of sight, measured in tokens rather than in anything a manager would notice on a normal working day. The most dangerous combination is high-value tooling, billed by consumption and used heavily, which is exactly what the coding tools that blew up those budgets were.
The accountability problem underneath
Underneath the pricing sits a question of accountability. When the person consuming the AI does not see or bear the cost, they consume as much as the tool is useful, which is a great deal, and every individual decision looks reasonable while the aggregate runs away. This is an old problem in new clothes. A shared resource with no individual accountability gets over-consumed, not because anyone behaves badly, but because everyone behaves sensibly against a cost they cannot see.
A capped plan solves this quietly, by making the constraint visible at the point of use. The person with a monthly limit weighs each heavy use against the ceiling they will hit, and prioritises accordingly. Uncapped consumption removes that check, and the cost-benefit decision that should happen at the point of use simply does not happen, because nobody there is weighing it. The governance answer follows from the diagnosis. Push accountability for consumption down to the individual or the team, give them a ceiling they can see, and let the constraint do the work that a central budget forecast cannot do in real time.
The human in the loop holds the budget line
This connects to a question that runs through most of how AI actually gets used, which is whether the AI is replacing a person or supporting one. Replacement, the autonomous system that does the job with no human involved, is still mostly early and oversold. Support, the AI that augments a person who remains responsible for the work, is the realistic and common case today, and the cost argument gives another reason to prefer it.
When AI supports a person, that person is in the loop, and the same person is the natural governor of the spend, because they are deciding at the point of use whether a given task is worth doing at all. When AI replaces the person, that check disappears with them, and consumption runs unchecked because nobody is sitting there weighing each call. The human in the loop is not only doing the judgement work that inference cannot be trusted with on its own. They are holding the budget line. Remove them, and an organisation loses both the judgement and the cost discipline in a single move.
What the discipline actually requires
None of this is easy to do in advance, which is the real difficulty at the centre of the problem. Forecasting consumption across an organisation with variable usage is hard, the pilot tells you little because pilot usage is not production usage, and a handful of heavy users can drain a budget faster than finance can see it moving. The difficulty is the reason the discipline matters, rather than an excuse to skip it.
What the discipline requires is a return to a question the hype encouraged people to stop asking: does this deliver more value than it costs, against the alternative? The alternative is sometimes another tool, sometimes an existing process, and sometimes a person. When a consumption-billed tool costs more per month than the salary cost of the work it was doing, the maths has flipped, and an organisation paying attention acts on it. The return is real and often large where AI is pointed at genuinely expensive cognitive work and kept within a cost that stays comfortably below the value it releases. Applying that discipline treats AI the way every other operating cost is already treated, as a category that was briefly exempt from scrutiny during the excitement and is now rejoining the normal rules.
Deployment model is one of the levers, and an underused one. Consumption billing through a cloud API is a variable cost that scales with use indefinitely. Running a model on fixed local compute is a higher upfront cost with a low, predictable marginal cost. For high-volume, predictable workloads, moving off per-token billing onto owned compute breaks the coupling between use and cost and turns an open-ended variable into a fixed line on the budget. It is not right for every case, though it is a lever most organisations have not considered, because the default assumption is that AI means the cloud.
The question worth sitting with
If your organisation is scaling AI, the question worth sitting with is not whether it works, because it probably does. It is what happens to the bill when it works at full volume, used by everyone you intend to give it to, at the intensity your most enthusiastic users are already showing. If you cannot answer that, you are in the position those companies were in before the budget told them the answer. The tool succeeding is not the end of the cost question. It is the beginning of it.
MultipleWorks helps organisations work out where AI earns its cost and where it does not, and how to govern the spend before it governs them. If this article reflects something you are working through, get in touch at hello@multipleworks.com.hk.