LLaMA 4 is Meta's latest open-weight model family. Scout uses a 109B MoE architecture with 17B active parameters, 10M token context window, and native multimodal capabilities. LLaMA 3.3 70B remains a strong general-purpose option.
Deploy LLaMA 4 in minutes
Starting at $0.66/hr on dedicated GPU
| Model | GPU | VRAM | Price | Action |
|---|---|---|---|---|
LLaMA 4 Scout Scout (109B MoE) | A100 80GB PCIe | 80 GB | $1.85/hr | Deploy |
LLaMA 3.3 70B Large (70B) | RTX A6000 | 48 GB | $0.66/hr | Deploy |
Prices include 30% service fee. Billed per minute while running.
LLaMA 4 requires 48–80GB VRAM depending on variant. Consumer GPUs like the RTX 5080 (16GB) or RTX 4090 (24GB) cannot run this model.
On ModelPilot, deploy on a dedicated cloud GPU (up to 80GB VRAM) starting at $0.66/hr with no setup required.
LLaMA 4 requires 48–80GB VRAM depending on the variant.
Starting at $0.66/hr on a dedicated GPU. Billed per minute while running, with auto-stop when credits run out.
Text models typically deploy in 5–15 minutes including model download.
LLaMA 4 requires 48GB+ VRAM, which exceeds most consumer GPUs. Cloud GPUs (A6000 48GB, A100 80GB) are recommended.
Pick your GPU and have it running in minutes. No infrastructure setup required.