

JAX Pallas: Writing GPU Kernels for Maximum Performance
JAX Pallas is NVIDIA's GPU programming API for high-performance compute kernels. Write optimized kernels for matrix multiplication and memory access patterns.


JAX Pallas is NVIDIA's GPU programming API for high-performance compute kernels. Write optimized kernels for matrix multiplication and memory access patterns.


Recompilation is the silent killer of training throughput. If you see 'Jit' in your profiler, you are losing money. We dive into XLA internals.