What GPU runs qwen3.5?

In short: qwen3.5 ships in several sizes, and VRAM scales with the size you pick. At Q4_K_M the small distills run on a single 8-16 GB card, while the full 122B build needs about 70 GB - workstation or multi-GPU territory. Pick the largest size that fits your card comfortably.

How much VRAM does qwen3.5 need?

Per size at a 8,192-token context. The last column is the smallest RETAIL card that runs that size comfortably at Q4_K_M (datacenter and Mac options excluded):

Model	Params	Q4_K_M VRAM	Q8_0 VRAM	Smallest comfortable retail GPU
qwen3.5:0.8b	0.8B	2.3 GB	2.7 GB	NVIDIA GeForce GTX 1650
qwen3.5:2b	2B	3.0 GB	3.9 GB	NVIDIA GeForce GTX 1650
qwen3.5:4b	4B	4.1 GB	5.9 GB	NVIDIA GeForce RTX 2060
qwen3.5:9b	9B	6.9 GB	10.9 GB	NVIDIA GeForce RTX 3080
qwen3.5:27b	27B	17.1 GB	28.9 GB	NVIDIA GeForce RTX 4090
qwen3.5:35b	35B	21.6 GB	36.9 GB	NVIDIA GeForce RTX 5090
qwen3.5:122b	122B	70.5 GB	123.9 GB	NVIDIA RTX PRO 6000 Blackwell

Which size should you run?

The distilled small sizes are the practical local picks - they fit common cards and stay fast. Larger sizes raise quality but quickly exceed a single consumer card, so match the size to your VRAM rather than always reaching for the biggest.

The card to run it on

For a strong, all-in-VRAM experience the NVIDIA RTX PRO 6000 Blackwell runs qwen3.5:122b comfortably at about 14 tok/s (estimated) at Q4_K_M. Check any other size on the fit board.

Frequently asked questions

How much VRAM does qwen3.5 need?

It depends on the size. The small distills fit 8-16 GB cards at Q4_K_M; the full 122B build needs about 70 GB - see the per-size table above.

Can I run qwen3.5 on a 24 GB GPU?

Yes for the small and mid sizes at Q4_K_M; the largest size overflows 24 GB and needs offloading or a bigger card.

Does a higher quant need a bigger card?

Yes - Q8_0 roughly doubles the weight memory versus Q4_K_M, so it can push a size up into the next card tier.

Tool Check your own GPU on the fit board