What is Usage-Based Pricing?

A billing model that meters each AI prompt, token, or task against a per-model rate card instead of bundling unlimited usage into a flat subscription.

Usage-Based Pricing | The Wise Operator

What It Is

Usage-based pricing is the AI industry’s name for charging by the meter rather than by the seat. Under the older subscription model, a developer or team paid a flat monthly fee and could run as many prompts as they wanted up to a soft cap or a vague “fair use” line. Under usage-based pricing, every prompt routes through a per-model rate card. Cheap models cost a fraction of a cent, frontier reasoning models cost several cents per request, and long-horizon agent runs can cost dollars in a single task. The invoice arrives at the end of the month, and the size is whatever you actually spent.

GitHub Copilot’s June 1, 2026 switch made the model visible to a mass developer audience for the first time. Premium request units, the flat allowance that previously absorbed most prompts, were replaced by “AI Credits” denominated against each model’s API rate. The change did not raise prices on a flat-rate basis. It removed the ceiling that hid the true cost.

How It Actually Works

A usage-based plan does two things at once. It maps every action you take to a billable unit, usually tokens, requests, or compute-seconds. Then it multiplies that unit by the model’s published rate. A code completion against a small, fast model might cost a small fraction of the rate card. A reasoning request against a frontier model can cost a hundred times more. Agent runs that loop through tool use, retrieval, and multi-step planning multiply faster, because each loop is another billable call.

The mechanism that makes the bills feel surprising is rarely the rate card itself. It is the loop. A single user action can fan out into dozens of underlying model calls if the tool is doing planning, retrieval, or self-correction behind the surface. Vendors choose how much of that loop to disclose, and the choice shapes whether the meter feels honest or hidden.

Why It Matters Right Now

The AI industry can no longer afford to give frontier capabilities away under a flat fee. Frontier inference costs more per call than a year ago, agent workflows consume many calls per task, and the gap between what users will pay and what vendors spend has widened past the point where loss-leader pricing pays for itself. The shift to usage-based plans is the industry harmonizing what the customer sees with what the vendor pays. It is also, less politely, an admission that the prior flat tier was a subsidy that funded user habits more than it funded sustainable products.

A Concrete Operator Scenario

An operator runs a small team that built a habit on flat-rate Copilot. Five engineers ran roughly two hundred prompts a day against the frontier model and a few thousand completions, paying $100 a month total. After the switch, the first week’s metered run lands at $1,400 projected for the month. The work is the same. The cost is now fourteen times the seat price.

The operator has three real moves. They can keep the workflow and absorb the bill, which is fine if the AI is producing fifteen times its cost in finished work. They can audit the loop, identify which prompts route to which models, and cap the frontier model behind explicit operator approval so the cheap model handles the bulk of completions. They can shift parts of the workflow to a token-budget discipline, where each engineer carries a daily soft cap that surfaces the meter before the bill does. The right move depends on what the work is actually worth, and on how much of the prior usage was reflexive rather than necessary.

How TWO Uses It

The TWO use of usage-based pricing is not as a billing topic. It is as a formation topic. A flat subscription teaches the operator to ignore the meter, because the meter does not exist on the invoice. A metered plan teaches the operator to weigh each prompt, because each prompt has a visible number next to it. Both shape habits, and the habits compound. After a year on a flat plan the operator’s instinct is to ask the model first and think later. After a year on a metered plan the instinct flips, because the meter trained it.

The Scott-perspective sentence is this: I prefer the metered plan for serious work even when it costs more, because it tells me the truth about what each request is worth. The flat plan lets the AI do my thinking on credit. The metered plan keeps the price of thinking visible, which keeps me deciding what is worth thinking about. The same logic ties this term to agent-loop-cost and to compute-commitment: when the upstream cost is fixed and the downstream cost is metered, the operator is the one absorbing the change in regime.