347 shaares
3 liens privés
3 liens privés
Expensively Quadratic: the LLM Agent Cost Curve
2026-02-03 Philip Zeyliger
Pop quiz: at what point in the context length of a coding agent are cached reads costing you half of the next API call? By 50,000 tokens, your conversation’s costs are probably being dominated by cache reads.
Let’s take a step back. We’ve previously written about how coding agents work: they post the conversation thus far to the LLM, and continue doing that in a loop as long as the LLM is requesting tool calls. When there are no more tools to run, the loop waits for user input, and the whole cycle starts over.