VRAMfit guide · updated 2026-06-28
How much VRAM do you need to run DeepSeek-R1?
In short: DeepSeek-R1 ships in several sizes, and VRAM scales with the size you pick. At Q4_K_M the small distills run on a single 8-16 GB card, while the full 671B build needs about 393 GB - workstation or multi-GPU territory. Pick the largest size that fits your card comfortably.
How much VRAM does DeepSeek-R1 need?
Per size at a 8,192-token context. The last column is the smallest RETAIL card that runs that size comfortably at Q4_K_M (datacenter and Mac options excluded):
| Model | Params | Q4_K_M VRAM | Q8_0 VRAM | Smallest comfortable retail GPU |
|---|---|---|---|---|
| deepseek-r1:1.5b | 1.5B | 2.7 GB | 3.4 GB | NVIDIA GeForce GTX 1650 |
| deepseek-r1:7b | 7.6B | 5.5 GB | 8.9 GB | NVIDIA GeForce RTX 5060 |
| deepseek-r1:8b | 8B | 6.4 GB | 9.9 GB | NVIDIA GeForce RTX 5060 |
| deepseek-r1:14b | 14B | 9.8 GB | 15.9 GB | NVIDIA GeForce RTX 5070 |
| deepseek-r1:32b | 32B | 19.9 GB | 33.9 GB | NVIDIA GeForce RTX 4090 |
| deepseek-r1:70b | 70B | 41.2 GB | 71.9 GB | NVIDIA RTX PRO 5000 Blackwell 72GB |
| deepseek-r1:671b | 671B | 392.6 GB | 686.1 GB | 48 GB+ / multi-GPU |
Which size should you run?
The distilled small sizes are the practical local picks - they fit common cards and stay fast. Larger sizes raise quality but quickly exceed a single consumer card, so match the size to your VRAM rather than always reaching for the biggest.
The card to run it on
For a strong, all-in-VRAM experience the NVIDIA RTX PRO 5000 Blackwell 72GB runs deepseek-r1:70b comfortably at about 19 tok/s (estimated) at Q4_K_M. Check any other size on the fit board.
Frequently asked questions
How much VRAM does DeepSeek-R1 need?
It depends on the size. The small distills fit 8-16 GB cards at Q4_K_M; the full 671B build needs about 393 GB - see the per-size table above.
Can I run DeepSeek-R1 on a 24 GB GPU?
Yes for the small and mid sizes at Q4_K_M; the largest size overflows 24 GB and needs offloading or a bigger card.
Does a higher quant need a bigger card?
Yes - Q8_0 roughly doubles the weight memory versus Q4_K_M, so it can push a size up into the next card tier.