
The Real Performance Improvement Rate of AI Training Chips
Analyze the actual performance improvement rate of training chips and GPUs vs marketing hype. Here is the data on real compute scaling for training and inference.

Analyze the actual performance improvement rate of training chips and GPUs vs marketing hype. Here is the data on real compute scaling for training and inference.

When XLA's heuristics fail for custom attention mechanisms, you can't just hope for a compiler update. Here is how you write Triton-like kernels directly in Python using JAX Pallas.