From the Front Lines of Tech

Sharing strategic insights and lessons from over 20 years of building scalable systems, leading high-performing teams, and navigating complex technology shifts.

Feb 26, 2026 · Applied AI
Real-time Audio VAD: The Hardest Problem in Voice Agents
Latency in Voice Agents isn't just network time; it's Turn-Taking latency. If your agent cannot reliably detect when the user has stopped speaking, the illusion of intelligence shatters instantly.
Feb 25, 2026 · Cloud
Stateful Agents on K8s: Redis is Your Bottleneck, Not the Vector DB
Agents are stateless. Their memory is not. Scaling the LLM reasoning loop is trivial compared to solving the transactional concurrency of agent memory on Kubernetes.
Feb 24, 2026 · Deep Tech
Writing Pallas Kernels for JAX: Stepping Outside the XLA Safety Net
When XLA's heuristics fail for custom attention mechanisms, you can't just hope for a compiler update. Here is how you write Triton-like kernels directly in Python using JAX Pallas.
- JAX
- XLA
- TPUs
- GCP
- Pallas
- Compilers
Feb 23, 2026 · Strategy
The Context Window ROI: Why RAG is a Tax on Reasoning
At $5 per million tokens with Gemini 2.5 Pro, the context window is no longer a scarcity. It is an asset class. It is time to rethink the true cost of RAG pipelines.
Feb 22, 2026 · Hands-on
Optimizing LangGraph Cycles: Stopping the Infinite Loop
Preventing infinite recursion loops in reasoning chains with robust circuit breakers.
Feb 21, 2026 · Agentic AI
A2A Architectures: Tools are not just Functions (The Two-Phase Commit)
Why Agent-to-Agent (A2A) interactions and Side Effects require a 'Two-Phase Commit' for safety.

Newer posts

Older posts

Strictly Necessary

Analytics