

MoE Routing Collapse: When Your Specialists Stop Specializing
A model is only as smart as its router. We explore the physics of expert zones, the tax of token dropping, and how to keep your load balancer honest.


A model is only as smart as its router. We explore the physics of expert zones, the tax of token dropping, and how to keep your load balancer honest.


When your model doesn't fit on one GPU, you're no longer just learning coding-you're learning physics. We dive deep into the primitives of NCCL, distributed collectives, and why the interconnect is the computer.


NCCL debugging is critical for distributed training bottlenecks. Learn to set NCCL_DEBUG, tune the NCCL_ALGO environment variable for Ring, Tree, or CollNet, and troubleshoot GPU network failures.
Nvidia Blackwell microscaling and the new FP4 formats double inference speeds. Dive into how the second-generation Transformer Engine uses scale factors and sparsity for AI workloads.