RTX 5090 vs RTX 4090 for AI: which should you buy?

In short: the RTX 5090 beats the RTX 4090 for AI on every axis that matters - 32 GB vs 24 GB of VRAM and about 78% more memory bandwidth - so it holds bigger models and decodes them faster. The 4090 remains the value pick at 24 GB; the 5090 is the upgrade when you want larger models and more speed.

RTX 5090 vs RTX 4090: the specs

Spec	NVIDIA GeForce RTX 5090	NVIDIA GeForce RTX 4090
VRAM	32 GB	24 GB
Memory bandwidth	1792 GB/s	1008 GB/s
FP16 compute	209.6 TFLOPS	165.2 TFLOPS
Architecture	Blackwell	Ada Lovelace

What each runs, computed

From VRAMfit's catalog at Q4_K_M / 8,192 context:

Measure	NVIDIA GeForce RTX 5090	NVIDIA GeForce RTX 4090
Biggest comfortable model	falcon:40b (40B)	deepseek-r1:32b (32B)
Models that fit comfortably	341	320
Speed on qwen3:32b	~55 tok/s	~31 tok/s

On a 32B sample model the 5090 is the faster card thanks to its much higher bandwidth - decode speed tracks bandwidth, not raw TFLOPS.

Verdict by use case

For the biggest models on one card, and for the fastest decode, the NVIDIA GeForce RTX 5090 wins. If you mostly run models up to ~32B and want the best price, the NVIDIA GeForce RTX 4090 is still excellent value. Compare both against your models on the fit board.

Frequently asked questions

Is the RTX 5090 better than the 4090 for AI?

Yes - more VRAM and much higher memory bandwidth mean it holds larger models and decodes faster. The 4090 is the cheaper option for models up to ~32B.

Does the 5090 run 70B models?

Not comfortably on its own at Q4_K_M - 32 GB is short of a 70B model's needs - but it runs them with light offloading better than a 4090 does, and pools well with a second card.

How much faster is the 5090 for inference?

Decode is bandwidth-bound, and the 5090 has about 78% more memory bandwidth than the 4090, so token rates scale roughly with that on models both can hold.

Tool Check your own GPU on the fit board