Skip to main content

Deploy GPT-OSS

Text & Chat

GPT-OSS is OpenAI's open-weight model family. The 20B model offers native function calling, while the 120B flagship provides visible chain-of-thought reasoning comparable to GPT-4.

Deploy GPT-OSS in minutes

Starting at $0.53/hr on dedicated GPU

Available Variants (2)

ModelGPUVRAMPriceAction
GPT-OSS 20B
Medium (20B)
L424 GB$0.53/hrDeploy
GPT-OSS 120B
Large (120B)
A100 80GB PCIe80 GB$1.85/hrDeploy

Prices include 30% service fee. Billed per minute while running.

Requirements

GPT-OSS requires 24–80GB VRAM depending on variant. Consumer GPUs like the RTX 5080 (16GB) or RTX 4090 (24GB) may not have enough memory for larger variants.

On ModelPilot, deploy on a dedicated cloud GPU (up to 80GB VRAM) starting at $0.53/hr with no setup required.

Includes OpenWebUI chat interface and OpenAI-compatible API endpoint.

Use Cases

  • Function calling and tool use
  • Chain-of-thought reasoning
  • AI agent development
  • Enterprise deployments

Related Models

Frequently Asked Questions

How much VRAM does GPT-OSS need?

GPT-OSS requires 24–80GB VRAM depending on the variant.

How much does it cost to run GPT-OSS?

Starting at $0.53/hr on a dedicated GPU. Billed per minute while running, with auto-stop when credits run out.

How long does GPT-OSS take to deploy?

Text models typically deploy in 5–15 minutes including model download.

Can I run GPT-OSS on my local GPU?

You can run smaller variants locally if your GPU has enough VRAM. For larger variants or sustained production use, cloud GPUs offer more capacity and reliability.

Ready to deploy GPT-OSS?

Pick your GPU and have it running in minutes. No infrastructure setup required.