

How xAI Built Grok: JAX, Rust, and the GPU Training Infrastructure
How xAI built Grok without PyTorch: Discover the JAX and Rust training infrastructure, GPU cluster architecture, and hardware stack powering Grok.


How xAI built Grok without PyTorch: Discover the JAX and Rust training infrastructure, GPU cluster architecture, and hardware stack powering Grok.


The bottleneck for LLMs is memory bandwidth, not compute. Discover how to use speculative decoding on GCP to achieve 3x speedups by using small "draft" models to accelerate massive "oracle" models.


When XLA's heuristics fail for custom attention mechanisms, you can't just hope for a compiler update. Here is how you write Triton-like kernels directly in Python using JAX Pallas.


See how speculative decoding performs for single-batch requests on an NVIDIA A100. We analyze acceptance rates, latency, and the mechanics of the draft model gamble.


A model is only as smart as its router. We explore the physics of expert zones, the tax of token dropping, and how to keep your load balancer honest.


Recompilation is the silent killer of training throughput. If you see 'Jit' in your profiler, you are losing money. We dive into XLA internals.