Posts by tag 'LLM'

Apr 11, 2026 · AI Engineering

RadixAttention in SGLang: Prefix Caching Documentation

SGLang's RadixAttention uses radix trees for KV cache optimization. How it outperforms vLLM PagedAttention for multi-turn conversations and agent workflows.

Feb 9, 2026 · Strategy

Squeezing the Inference Lever: The Economics of LLM Throughput

Inference price isn't a fixed cost-it's an engineering variable. We break down the three distinct levers of efficiency: Model Compression, Runtime Optimization, and Deployment Strategy.

Feb 5, 2026 · Agentic AI

LLMs are Terrible Backends (Unless You Force JSON)

Non-determinism is a bug, not a feature. We explore how to whip the model into compliance using Enforcers, Pydantic, and Constrained Generation.

Search

Tag: LLM

RadixAttention in SGLang: Prefix Caching Documentation

Squeezing the Inference Lever: The Economics of LLM Throughput

LLMs are Terrible Backends (Unless You Force JSON)

Strictly Necessary

Analytics