
The Real Performance Improvement Rate of AI Training Chips
Analyze the actual performance improvement rate of training chips and GPUs vs marketing hype. Here is the data on real compute scaling for training and inference.

Analyze the actual performance improvement rate of training chips and GPUs vs marketing hype. Here is the data on real compute scaling for training and inference.

FP8 is the new frontier for training efficiency, but it breaks in the most sensitive layers. We dissect the E4M3/E5M2 split and how to spot divergence.

A model is only as smart as its router. We explore the physics of expert zones, the tax of token dropping, and how to keep your load balancer honest.