Guides

Written and computed by VRAMfit: the numbers come from the same math that powers the fit board. See also the buying guide for AI machines (DGX Spark, Mac Studio, Strix Halo desktops).

How much VRAM does an LLM need?

The VRAM formula for local LLMs, with computed requirements for popular open-weight models at Q4_K_M and the smallest GPU that runs each comfortably.

LLM quantization explained: Q4_K_M vs Q8_0 vs FP16

What GGUF quantization levels mean, computed VRAM needs at every level for 8B and 70B models, and how to choose.

How to choose a GPU for local LLMs (2026)

VRAM tiers from 8 GB to 96 GB with the biggest model each runs comfortably, computed from VRAMfit's catalog, plus why bandwidth beats TFLOPS for decoding.

MoE models: why a 120B model can feel like a 5B

Mixture-of-experts explained: active vs total parameters, computed VRAM needs for the catalog's MoE models, and when MoE is the right choice.

KV cache: how context length eats your VRAM

What the KV cache is, computed growth from 2K to 32K context for 8B and 70B models, and how to keep long contexts affordable.

Latest AI articles

Aggregated automatically from trusted sources (Hugging Face, Ollama, NVIDIA, Google DeepMind, OpenAI, Qwen and more), kept only when they touch the models and hardware VRAMfit tracks. Headlines link to the original.