Tag: XLA

Feb 24, 2026 · AI Infrastructure
JAX Pallas: Writing GPU Kernels for Maximum Performance
JAX Pallas is NVIDIA's GPU programming API for high-performance compute kernels. Write optimized kernels for matrix multiplication and memory access patterns.
- JAX
- XLA
- TPUs
- GCP
- Pallas
- Compilers
Feb 3, 2026 · AI Infrastructure
JAX XLA: Why Your GPU is Idle 40% of the Time
Recompilation is the silent killer of training throughput. If you see 'Jit' in your profiler, you are losing money. We dive into XLA internals.
Dec 29, 2025 · AI Infrastructure
Business Case for JAX: JAX vs Custom C+ AI Training Stack Performance
Business case for JAX in AI training: compare JAX vs custom C++ training stack performance. See how compiler-first JAX reduces data movement overhead and improves throughput by 2.7x.
Dec 28, 2025 · AI Infrastructure
Why More GPUs Is No Longer a Viable Strategy in 2026
As hardware lead times and power constraints hit a ceiling, the competitive advantage in AI has shifted from chip volume to architectural efficiency. This article explores how JAX, Pallas, and Megakernels are redefining Model FLOPs Utilization (MFU) and providing the hardware-agnostic Universal Adapter needed to escape vendor lock-in.