
· Rajat Pandit · AI Infrastructure
Continuous Batching in vLLM: Killing the Hardware Idle
If your GPUs are idling at 40% utilization during inference, you are burning capital on memory bottlenecks, not computation.

If your GPUs are idling at 40% utilization during inference, you are burning capital on memory bottlenecks, not computation.

Recompilation is the silent killer of training throughput. If you see 'Jit' in your profiler, you are losing money. We dive into XLA internals.