Consumer and datacenter cards
Pool identical GPUs
8192
Tokens held in the KV cache
Smaller bits, smaller footprint
Catalog status will appear here.
Each bar shows where VRAM goes: Weights KV cache Overhead Dashed line = your GPU's limit

Comfortable

Plenty of headroom.

Tight

Fits, with little to spare.

Needs offloading

Runs only with layers on system RAM.

Won't fit

Too large for this card.