VRAMfit guide · updated 2026-06-28
RTX 5090 vs RTX 4090 for AI: which should you buy?
In short: the RTX 5090 beats the RTX 4090 for AI on every axis that matters - 32 GB vs 24 GB of VRAM and about 78% more memory bandwidth - so it holds bigger models and decodes them faster. The 4090 remains the value pick at 24 GB; the 5090 is the upgrade when you want larger models and more speed.
RTX 5090 vs RTX 4090: the specs
| Spec | NVIDIA GeForce RTX 5090 | NVIDIA GeForce RTX 4090 |
|---|---|---|
| VRAM | 32 GB | 24 GB |
| Memory bandwidth | 1792 GB/s | 1008 GB/s |
| FP16 compute | 209.6 TFLOPS | 165.2 TFLOPS |
| Architecture | Blackwell | Ada Lovelace |
What each runs, computed
From VRAMfit's catalog at Q4_K_M / 8,192 context:
| Measure | NVIDIA GeForce RTX 5090 | NVIDIA GeForce RTX 4090 |
|---|---|---|
| Biggest comfortable model | falcon:40b (40B) | deepseek-r1:32b (32B) |
| Models that fit comfortably | 341 | 320 |
| Speed on qwen3:32b | ~55 tok/s | ~31 tok/s |
On a 32B sample model the 5090 is the faster card thanks to its much higher bandwidth - decode speed tracks bandwidth, not raw TFLOPS.
Verdict by use case
For the biggest models on one card, and for the fastest decode, the NVIDIA GeForce RTX 5090 wins. If you mostly run models up to ~32B and want the best price, the NVIDIA GeForce RTX 4090 is still excellent value. Compare both against your models on the fit board.
Frequently asked questions
Is the RTX 5090 better than the 4090 for AI?
Yes - more VRAM and much higher memory bandwidth mean it holds larger models and decodes faster. The 4090 is the cheaper option for models up to ~32B.
Does the 5090 run 70B models?
Not comfortably on its own at Q4_K_M - 32 GB is short of a 70B model's needs - but it runs them with light offloading better than a 4090 does, and pools well with a second card.
How much faster is the 5090 for inference?
Decode is bandwidth-bound, and the 5090 has about 78% more memory bandwidth than the 4090, so token rates scale roughly with that on models both can hold.