From the Front Lines of Tech

Sharing strategic insights and lessons from over 20 years of building scalable systems, leading high-performing teams, and navigating complex technology shifts.

Mar 23, 2026 · Rajat Pandit · AI Infrastructure
KV Cache Offloading in K8s: The Stateless Truce
Your beloved stateless Kubernetes architecture is fundamentally at war with the massive, stateful memory requirements of long-context LLM inference. We need a truce.
Mar 23, 2026 · Strategy
Beyond MMLU: The Shift to "Tool Correctness" Metrics
Why standard LLM benchmarks fail for agents, and how to measure real tool usage in production.
- Week 12
- Strategic
Mar 22, 2026 · Rajat Pandit · AI Infrastructure
vLLM Continuous Batching: How PagedAttention Optimizes GPU Throughput
vLLM continuous batching and PagedAttention explained: see how dynamic KV cache allocation eliminates memory fragmentation and boosts GPU throughput by 3x–5x.
Mar 21, 2026 · AI Infrastructure
Deploying Agentic AI as a Service (AaaS)
Deep dive into deploying agentic ai as a service (aaas).
- Week 10
- Technical
Mar 21, 2026 · Rajat Pandit · Strategy
The Shift to GenUI: Why Fixed Dashboards Are Dying
Fixed dashboards are the legacy interfaces of 2024. Your users are no longer satisfied looking at pre-canned charts; they expect the interface itself to adapt to the context of their query.
Mar 19, 2026 · Agentic AI
Measuring Tool Use Correctness & Plan Adherence
Deep dive into measuring tool use correctness & plan adherence.
- Week 10
- Technical

Newer posts

Older posts

Strictly Necessary

Analytics