From the Front Lines of Tech

Sharing strategic insights and lessons from over 20 years of building scalable systems, leading high-performing teams, and navigating complex technology shifts.

Feb 9, 2026 · Strategy
Squeezing the Inference Lever: The Economics of LLM Throughput
Inference price isn't a fixed cost-it's an engineering variable. We break down the three distinct levers of efficiency: Model Compression, Runtime Optimization, and Deployment Strategy.
Feb 6, 2026 · Systems Engineering
My Profiling Nightmare: The Warp Stall
A war story of chasing a 5ms latency spike to a single loose thread. How to read Nsight Systems and spot Warp Divergence.
Feb 5, 2026 · Applied AI
LLMs are Terrible Backends (Unless You Force JSON)
Non-determinism is a bug, not a feature. We explore how to whip the model into compliance using Enforcers, Pydantic, and Constrained Generation.
Feb 4, 2026 · Agentic AI
Gemini CLI Hooks: Automating the Agentic Loop
Chat is reactive. Hooks are proactive. We explore how to use Gemini CLI Hooks to inject context and enforce security before the model thinks.
Feb 3, 2026 · Deep Tech
JAX XLA: Why Your GPU is Idle 40% of the Time
Recompilation is the silent killer of training throughput. If you see 'Jit' in your profiler, you are losing money. We dive into XLA internals.
Feb 2, 2026 · Strategy
Why AI Pilots Fail: The 80% Stat
Most enterprise AI fails not because of the model, but because of the 'Last Mile' integration costs. We breakdown the hidden latency budget of RAG.

Newer posts

Older posts

Strictly Necessary

Analytics