Search

Cluster Hub

AI Infrastructure

Silicon, JAX, Networking (NCCL/Ring bottlenecks), TPUs, GPU optimization, and Deep Tech.

AI Infrastructure

Cluster Articles

The Compute-to-Cashflow Gap

The Compute-to-Cashflow Gap

The AI industry is shifting from celebrating large compute budgets to hunting for efficiency. Your competitive advantage is no longer your GPU count, but your cost-per-inference.

AI Quantization and Hardware Co-Design

AI Quantization and Hardware Co-Design

Explore how quantization and hardware co-design overcome memory bottlenecks, comparing NVIDIA and Google architectures while looking toward the 1-bit future of efficient AI model development.

Layered improvements with G4 / RTX 6000 Pro

Layered improvements with G4 / RTX 6000 Pro

Google Cloud’s G4 architecture delivers 168% higher throughput by maximizing PCIe Gen 5 performance. This deep dive examines the engineering stack driving these gains, from direct P2P communication...

Getting most out of your GPUs using MIG

Getting most out of your GPUs using MIG

Understanding how to partition a single GPU into multiple isolated instances for cost-efficient AI workloads, with a deep dive into NVIDIA's MIG technology and the architectural differences between...

Why do large enterprises need a Chief AI Officer?

Why do large enterprises need a Chief AI Officer?

As organizations pivot from AI experimentation to enterprise-scale deployment, a recurring structural friction often emerges. Through my engagements with leadership teams in APAC, it has become clear...

Network Design for AI Workloads

Network Design for AI Workloads

Generative AI has shifted data center traffic patterns, making network performance the new bottleneck for model training. This post contrasts how the "Big Three" cloud providers utilize distinct...

Not All Zeros Are the Same - Sparsity Explained

Not All Zeros Are the Same - Sparsity Explained

Demystifying hardware acceleration and the competing sparsity philosophies of Google TPUs and Nvidia. This post connects novel architectures, like Mixture-of-Experts, to hardware design strategy and...

Switching Technologies in AI Accelerators

Switching Technologies in AI Accelerators

This post contrasts the switching technologies of NVIDIA and Google's TPUs. Understanding their different approaches is key to matching modern AI workloads, which demand heavy data movement, to the...

The Case for SparseCore

The Case for SparseCore

Large-scale recommendation models involve a two-part process. First, a "sparse lookup" phase retrieves data from memory, a task that is challenging for standard GPUs. Second, a "dense computation"...

The theory behind Technical Debt

The theory behind Technical Debt

Technical debt is not new, This weekend I went down the trail to read-up on its impact due to the increased throughput of code generation thanks to AI. Turns out AI code generation is a double-edged...