
Speculative Decoding: Cheating Physics for Latency
Using a 'Draft' model costs 10% more VRAM but saves 50% Latency. Here is the mechanics of the gamble.

Using a 'Draft' model costs 10% more VRAM but saves 50% Latency. Here is the mechanics of the gamble.