FOR CTOS
Don't Rewind a Generation, Drop a Tier
The cost-savings playbook most teams get backwards.
The instinct is that last year's flagship is the cheap option. Often it is not. Anthropic has held Opus at five dollars in and twenty-five out per million tokens from 4.5 all the way through 4.8. The older version is the same price with less capability. Running it to save money is strictly worse on both axes.
Go smaller, not older
Real savings come from dropping a tier, not rewinding a generation. Haiku, Gemini Flash, and Grok Fast can be a fraction of flagship cost. Match the tier to the task. Routine classification, extraction, and formatting do not need a frontier brain, and paying for one is the quiet leak in most AI budgets.
The exception is vendors who do cut prices on prior models, like OpenAI, where the older generation can be a genuine value play. Check before you assume.
Check before you assume
The Value Index shows the flat-pricing labs and the cut-with-age labs side by side.
Go thereMORE FOR CTOS
Keep reading.
Routing Is the New Architecture
One model per job beats one model to rule them all.
3 min read
Token Governance 101: Caps, Roles, and Dashboards First
The controls that stop a half-billion-dollar month.
3 min read
Agentic Workflows Eat 1000x the Tokens. Budget Accordingly.
Why autonomous agents break naive cost models.
3 min read