Tag: GPU Optimization

Mar 22, 2026 · Rajat Pandit · AI Infrastructure
vLLM Continuous Batching & PagedAttention: Maximizing Throughput
vLLM continuous batching combined with PagedAttention dramatically increases inference throughput. Learn how this architecture eliminates KV cache fragmentation and boosts GPU utilization by 3x.
Feb 3, 2026 · AI Infrastructure
JAX XLA: Why Your GPU is Idle 40% of the Time
Recompilation is the silent killer of training throughput. If you see 'Jit' in your profiler, you are losing money. We dive into XLA internals.