Tag: JAX

Feb 24, 2026 · Deep Tech
Writing Pallas Kernels for JAX: Stepping Outside the XLA Safety Net
When XLA's heuristics fail for custom attention mechanisms, you can't just hope for a compiler update. Here is how you write Triton-like kernels directly in Python using JAX Pallas.
- JAX
- XLA
- TPUs
- GCP
- Pallas
- Compilers
Feb 19, 2026 · Deep Tech
Speculative Decoding: Cheating Physics for Latency
Using a 'Draft' model costs 10% more VRAM but saves 50% Latency. Here is the mechanics of the gamble.
Feb 12, 2026 · Engineering
MoE Routing Collapse: When Your Specialists Stop Specializing
A model is only as smart as its router. We explore the physics of expert zones, the tax of token dropping, and how to keep your load balancer honest.
Feb 3, 2026 · Deep Tech
JAX XLA: Why Your GPU is Idle 40% of the Time
Recompilation is the silent killer of training throughput. If you see 'Jit' in your profiler, you are losing money. We dive into XLA internals.
Jan 12, 2026 · AI at Scale
The Compute-to-Cashflow Gap
The AI industry is shifting from celebrating large compute budgets to hunting for efficiency. Your competitive advantage is no longer your GPU count, but your cost-per-inference.
Jan 6, 2026 · Engineering
Blackwell's Sparse Attention Engines: The Reality of FP4
FP4 isn't just 'lower precision' - it requires a fundamental rethink of activation outliers. We dive into the bit-level implementation of NVFP4, Micro-Tensor Scaling, and the new Tensor Memory hierarchy.

Newer posts