From the Front Lines of Tech

Sharing strategic insights and lessons from over 20 years of building scalable systems, leading high-performing teams, and navigating complex technology shifts.

Apr 10, 2026 · Agentic AI
The Infinite Board Problem: Pruning State in Long-Running Reasoning Loops
How to manage the shared state size in complex reasoning loops to prevent context window overflow without losing critical history.
Apr 9, 2026 · AI Infrastructure
Hierarchical KV Caching: Scaling Context Beyond VRAM Limits
As context windows scale to a million tokens, the KV cache becomes too large for GPU memory. The solution is a multi-tiered cache that offloads data to CPU and NVMe without killing latency.
Apr 8, 2026 · AI Infrastructure
How xAI Built Grok: Training Data and Compute Infrastructure
How xAI built Grok from training data to compute infrastructure: the JAX and Rust stack, GPU cluster architecture, and why they moved beyond PyTorch.
Apr 7, 2026 · Strategy
The CAIO's First 100 Days: Beyond Pilot Purgatory
Moving from setting up the office to surviving the execution phase without failing ROI checks. A guide for the new Chief AI Officer.
Apr 7, 2026 · AI Infrastructure
Demystifying Google TPU SparseCore: Accelerating Recommendation Systems
How Google TPU SparseCore solves embedding lookup bottlenecks in recommender models. Learn the co-designed architecture of Trillium's SparseCores.
Apr 6, 2026 · AI Infrastructure
AI Training Chip Performance: Real Scaling Data vs Marketing Hype (Blackwell to Hopper)
AI training chip performance data: analyzing real scaling from Hopper to Blackwell. 3.2x training, 50x inference gains, and why memory bandwidth matters more than FLOPs.

Newer posts

Older posts

Strictly Necessary

Analytics