VRAMfit guide · updated 2026-06-27

How to choose a GPU for local LLMs (2026)

In short: VRAM is the spec that decides what you can run; memory bandwidth decides how fast it runs. 16 GB comfortably runs ~14B models, 24 GB runs ~32B, and 96 GB puts 120B-class MoE models on a single workstation card. Buy as much VRAM as the budget allows; everything else is secondary for local LLMs.

Why VRAM first?

A model either fits in VRAM or it does not - no amount of GPU compute rescues a model that spills to system RAM. Bandwidth then sets the token rate, because decoding streams the whole model per token. Raw compute (TFLOPS) mostly affects prompt-processing speed.

What each VRAM tier buys you

The biggest model that fits comfortably at Q4_K_M / 8,192 context, with VRAMfit's estimated decode speed on the example card:

VRAM tierExample cardBiggest comfortable modelEst. speed
8 GBNVIDIA GeForce RTX 4060llama3:8b (8B)~33 tok/s
12 GBNVIDIA GeForce RTX 5070deepseek-r1:14b (14B)~47 tok/s
16 GBNVIDIA GeForce RTX 5080gpt-oss:20b (20B)~47 tok/s
24 GBNVIDIA GeForce RTX 4090deepseek-r1:32b (32B)~31 tok/s
32 GBNVIDIA GeForce RTX 5090falcon:40b (40B)~44 tok/s
48 GBNVIDIA RTX PRO 5000 Blackwelldeepseek-llm:67b (67B)~20 tok/s
96 GBNVIDIA RTX PRO 6000 Blackwellzephyr:141b (141B)~12 tok/s

See your exact card - including used-market and workstation options - on the fit board or the GPU comparison chart.

Frequently asked questions

Is a used RTX 3090 still a good buy for local AI?

Often, yes: 24 GB of VRAM at used prices runs the same models a 4090 fits, just at lower speed (936 vs 1008 GB/s bandwidth and far less compute for long prompts).

Do AMD and Intel GPUs work for local LLMs?

Yes - llama.cpp/Ollama support Radeon (ROCm/Vulkan) and Intel Arc (Vulkan/SYCL). Software is less turnkey than CUDA but improving quickly; the VRAM math is identical.

Can I combine two GPUs?

Yes, inference engines can split layers across cards, pooling VRAM with a modest bandwidth-efficiency penalty. VRAMfit's board has a card-count control that models this.

Tool Check your own GPU on the fit board