
We Have No Idea How to Value the AI App Layer
Closed‑source AI coding tools are now celebrated for "$500M ARR" and "$1B next year." At the same time, reporting shows many of these businesses operate at "very negative" gross margins because frontier inference is expensive and scales directly with usage. If revenue and costs both rise with tokens, that is not a software‑margin engine; it is commodity flow (like gasoline) sold as subscriptions.
In the open‑source harness world, the picture is different. Ecosystems of projects like Cline and their forks collectively route hundreds of millions of dollars of model usage with zero platform losses, because inference is bring‑your‑own. That throughput is Gross Token Volume, a scale signal, not ARR. The harness does not resell the fuel and does not bleed for it. Users capture the benefit of model competition directly; the platform earns by building better software.
The a16z piece "Questioning Margins is a Boring Cliche" makes arguments worth agreeing with. Margins can improve with proper tiering and routing. Not every task needs a frontier model. Funnels that convert to teams and enterprise matter more than one screenshot of an angry cohort. I agree with that optimism. What follows is not pessimism. It is precision about incentives and accounting.
https://a16z.com/questioning-margins-is-a-boring-cliche/
Routing is good when it is consented and disclosed. It is harmful when it is silent inside a closed harness. If a tool quietly shifts your work from a frontier model to a cheaper one to protect its unit economics, you are paying for one thing and receiving another. That is how you get subscription fog. The right way to make routing a feature is to put the developer in control. Let them select a policy that matches their intent: best capability, best price, enterprise approved. Disclose what ran. Publish caps. If a smaller model is sufficient and better for cost, show it and let the user choose it. If you want to claim the benefits of routing, own the responsibility to make it visible and user‑controlled.
Accounting clarity is the other thing that keeps getting lost. You can be bullish on margin improvement and still insist on clean revenue categories. Payments companies don't call total payment volume ARR. Marketplaces don't call GMV ARR. Electricity retailers don't call kilowatt‑hours ARR. AI apps built on paid models should not call pass‑through token spend ARR. Use GTV for scale. Report net revenue or take‑rate for resellers, enterprise software revenue for harnesses, and contribution margin after model COGS for resellers. You can be optimistic and precise at the same time.
What the open‑source ecosystems prove is that aligned incentives work. When inference is bring‑your‑own, the platform cannot profit by starving the user of tokens. The platform wins by making the work better. The developer captures the benefit of a brutally competitive model market. The moment a better capability or a better price appears, the developer can exploit it. If your app architecture forces them through one vendor's internal mixer, you have taken away the very benefit competition is creating. If you give them a policy, capability first or price first or enterprise approved, you make the market work for the user.
A simple example of routing done right vs wrong:
Right: the developer picks "best price" for code refactors. The app routes to a cheaper model, shows the model in the header, and logs usage visibly. The developer sees the savings and keeps the policy, switching to "best capability" when they need frontier performance.
Wrong: the developer believes they are on a frontier model, hits a hidden cap, and the app quietly routes to a cheaper model mid‑session. The output quality shifts, latency changes, and the developer is left guessing. That is not optimization; that is breakage dressed as margin.
A minimal transparency baseline for AI coding tools:
1) User‑selectable routing policy: best capability, best price, or enterprise‑approved list.
2) Explicit disclosure: which model ran, what policy was in effect, and any caps applied.
3) Pricing clarity: pass‑through and usage shown; no surprise surcharges.
4) Honest reporting: GTV as scale; ARR as software; contribution margin after model COGS for resellers.
5) No silent downgrades: if you are trading performance for margin, surface it and let the user opt in.
An approach to valuing the app layer:
If you resell inference, report GTV, take‑rate, net revenue, and contribution margin after model COGS. Value on net revenue growth and margin quality. If you are a harness, report enterprise software revenue, adoption, and measured productivity lift, and disclose GTV as a scale signal with caveats. Value on software economics, not commodity throughput. In both cases, stop pretending throughput is ARR.
I am optimistic about AI coding tools, and I am optimistic about open‑source harnesses for a simple reason: the incentives are clean. Give developers the wheel, tell them the truth, and let them capture the benefit of the model market. That is how you compound usage and earn pricing power in the part of the stack that should have it, the software.
Throughput is not ARR. Call the flow GTV. Give users control of routing. Price the software. Let model competition work for the developer instead of the spread.
— Nick Baumann
Sources:
- TechCrunch: High costs and thin margins threatening AI coding startups (Aug 7, 2025)
- TechCrunch: Cursor apologizes for unclear pricing changes (July 7, 2025)
- TechCrunch: Cursor's funding and headline ARR (June 5, 2025)
- TechCrunch: Lovable's $1B ARR projection (Aug 14, 2025)
- a16z: Questioning Margins is a Boring Cliché (Aug 21, 2025)