FOR CTOS

Don't Rewind a Generation, Drop a Tier

The cost-savings playbook most teams get backwards.

May 26, 2026 3 minFreddy Compute

The instinct is that last year's flagship is the cheap option. Often it is not. Anthropic has held Opus at five dollars in and twenty-five out per million tokens from 4.5 all the way through 4.8. The older version is the same price with less capability. Running it to save money is strictly worse on both axes.

Go smaller, not older

Real savings come from dropping a tier, not rewinding a generation. Haiku, Gemini Flash, and Grok Fast can be a fraction of flagship cost. Match the tier to the task. Routine classification, extraction, and formatting do not need a frontier brain, and paying for one is the quiet leak in most AI budgets.

The exception is vendors who do cut prices on prior models, like OpenAI, where the older generation can be a genuine value play. Check before you assume.

Check before you assume

The Value Index shows the flat-pricing labs and the cut-with-age labs side by side.

Go there

MORE FOR CTOS

Keep reading.

Routing Is the New Architecture

One model per job beats one model to rule them all.

3 min read

Token Governance 101: Caps, Roles, and Dashboards First

The controls that stop a half-billion-dollar month.

3 min read

Agentic Workflows Eat 1000x the Tokens. Budget Accordingly.

Why autonomous agents break naive cost models.

3 min read