VRAMfit guide · updated 2026-06-28
What GPU do I need to run Llama 3.3 70B?
In short: llama3.3:70b is a 70B dense model, so at the popular Q4_K_M quantization it needs about 41 GB of VRAM (weights + KV cache + overhead) at a 8,192-token context. That means a single 24 GB card cannot hold it comfortably; the smallest comfortable RETAIL option is a NVIDIA RTX PRO 5000 Blackwell 72GB card, and higher quants need even more.
How much VRAM does llama3.3:70b need?
Computed at a 8,192-token context. "Smallest comfortable retail GPU" excludes datacenter accelerators and unified-memory Macs - it answers what you could buy as a card:
| Quantization | Weights | Total VRAM | Smallest comfortable retail GPU |
|---|---|---|---|
| Q4_K_M | 39.4 GB | 41.2 GB | NVIDIA RTX PRO 5000 Blackwell 72GB (72 GB) |
| Q5_K_M | 48.1 GB | 50.0 GB | NVIDIA RTX PRO 5000 Blackwell 72GB (72 GB) |
| Q8_0 | 70.0 GB | 71.9 GB | NVIDIA RTX PRO 6000 Blackwell (96 GB) |
Why not a 24 GB card?
A 70B model's weights alone exceed 24 GB at 4-bit, so a single 24 GB card (RTX 4090/3090) must offload layers to system RAM and slows down sharply. To keep the whole model resident you want a 48 GB-class workstation card, two 24 GB cards pooled, or a large unified-memory machine.
The practical pick
For full-VRAM 70B on one card the NVIDIA RTX PRO 5000 Blackwell 72GB runs llama3.3:70b comfortably at about 19 tok/s (estimated) at Q4_K_M. Check the exact fit and speed for any card on the fit board.
Frequently asked questions
Can a single 24 GB GPU run llama3.3:70b?
Not comfortably at Q4_K_M - the model needs about 41 GB, more than a 24 GB card holds. It runs with CPU offloading at reduced speed, or you can pool two 24 GB cards.
What is the cheapest way to run a 70B model fully in VRAM?
A 48 GB workstation card holds it on one card; two used 24 GB cards (e.g. RTX 3090) pooled is often the cheapest 48 GB of VRAM, with a small multi-GPU efficiency penalty.
Does a higher quant change the GPU I need?
Yes. Q5 and Q8 grow the weights, pushing the requirement higher - the table above shows the smallest comfortable retail card per quant.