AI Infrastructure

Feb 24, 2026 · AI Infrastructure
JAX Pallas: Writing GPU Kernels for Maximum Performance
JAX Pallas is NVIDIA's GPU programming API for high-performance compute kernels. Write optimized kernels for matrix multiplication and memory access patterns.
- JAX
- XLA
- TPUs
- GCP
- Pallas
- Compilers
Feb 19, 2026 · AI Infrastructure
Single-Batch Inference: Speculative Decoding on an A100
See how speculative decoding performs for single-batch requests on an NVIDIA A100. We analyze acceptance rates, latency, and the mechanics of the draft model gamble.
Feb 6, 2026 · AI Infrastructure
My Profiling Nightmare: The Warp Stall
A war story of chasing a 5ms latency spike to a single loose thread. How to read Nsight Systems and spot Warp Divergence.
Feb 3, 2026 · AI Infrastructure
JAX XLA: Why Your GPU is Idle 40% of the Time
Recompilation is the silent killer of training throughput. If you see 'Jit' in your profiler, you are losing money. We dive into XLA internals.
Jan 12, 2026 · AI Infrastructure
The Compute-to-Cashflow Gap
The AI industry is shifting from celebrating large compute budgets to hunting for efficiency. Your competitive advantage is no longer your GPU count, but your cost-per-inference.
Jan 11, 2026 · AI Infrastructure
AI Quantization and Hardware Co-Design
Explore how quantization and hardware co-design overcome memory bottlenecks, comparing NVIDIA and Google architectures while looking toward the 1-bit future of efficient AI model development.

Newer posts