Tag: GPUs

May 20, 2026 · AI Infrastructure
Serverless Inference: Conquering the 5-Second Cold Start
The infrastructure hacks required to make scale-to-zero LLM inference viable for production latency.
Apr 21, 2026 · AI Infrastructure
Multi-Cloud GPU Arbitrage: Routing Workloads Between Hyperscalers and Neoclouds
Don't lock into one vendor. Learn how to use an abstraction layer to route training and inference workloads to the cheapest available capacity across hyperscalers and neoclouds.
Dec 21, 2025 · AI Infrastructure
Layered improvements with G4 / RTX 6000 Pro
Google Cloud’s G4 architecture delivers 168% higher throughput by maximizing PCIe Gen 5 performance. This deep dive examines the engineering stack driving these gains, from direct P2P communication and NUMA optimizations to Titanium offloads. Explore how G4 transforms standard connectivity into a high-speed fabric for demanding AI inference and training.
Dec 20, 2025 · AI Infrastructure
Getting most out of your GPUs using MIG
Understanding how to partition a single GPU into multiple isolated instances for cost-efficient AI workloads, with a deep dive into NVIDIA's MIG technology and the architectural differences between GKE and EKS..