← Back to hard AIs

Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · OpenAI

gpt-oss-120b

Model family: gpt-oss

Size
large (117.0B params)
Context
131,072 tokens
Released
2025-08-04
Openness
open-weight
License
Cost tier
mixed
Rating
4.0 — A genuinely strong open reasoning model — near o4-mini quality, single-GPU, clean Apache 2.0, full chain-of-thought and tool use. Held to 4.0 by being text-only and needing an 80GB GPU rather than consumer hardware.
Modalities
text
Capabilities
chat, coding, function-calling, instruction-following, long-context, math, reasoning, tool-use
Access
api-first-party, api-third-party, local-runtime-ollama, local-runtime-vllm, weights-download-hf

Quick Take

OpenAI's open comeback: an Apache 2.0 reasoning modelA model trained to "think through" problems step by step before answering, often by producing internal reasoning that's either shown or hidden from the user. Reasoning models trade speed for accuracy on hard problems — they're slower and more expensive per answer, but markedly better at math, logic, and complex analysis. OpenAI's o1 series and Mistral's Magistral are reasoning models. that nears o4-mini quality, runs on a single 80GB GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models., and you can download, fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch., and self-host freely.

Plain-English Description

gpt-oss-120b, released in August 2025, was a notable moment — OpenAI's first open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. model since GPT-2, and a real one. Released under the permissive Apache 2.0 license, it brings capability that used to be API-only into weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. you can download, fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch., and run on your own hardware. OpenAI positions it as near-parity with its o4-mini reasoning modelA model trained to "think through" problems step by step before answering, often by producing internal reasoning that's either shown or hidden from the user. Reasoning models trade speed for accuracy on hard problems — they're slower and more expensive per answer, but markedly better at math, logic, and complex analysis. OpenAI's o1 series and Mistral's Magistral are reasoning models. on core benchmarks.

It's a mixture-of-experts model: 117 billion parameters total, but only about 5.1 billion active per tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words., which is how it manages to fit on a single 80GB datacenter GPUA GPU designed for server and cloud use, typically Nvidia H100, H200, B200, or A100. Datacenter GPUs have 40-192GB of VRAM and cost tens of thousands of dollars each. Required for training frontier models and for running the largest models in full precision. Most cloud AI APIs run on datacenter GPUs behind the scenes.. It's built for reasoning and agentic work — adjustable reasoning effort (low/medium/high), full chain-of-thought you can inspect, and native tool use including function calling, web browsing, and Python execution. One limitation to note: it's text-only, with no image or audio input.

For a business that wants strong, self-hostable reasoning with full data control and a clean license, this is one of the more credible options — and it carries the weight of OpenAI's name, which matters to some buyers evaluating open models.

Best For

  • Self-hostedRunning a model on hardware you control — your own servers, your own cloud instance, or your own laptop — rather than paying to access it through someone else's API. Self-hosting gives you full control over data and predictable costs, but requires the hardware and operational effort to run the model. Only possible with open-weight models. reasoning and agentic workloads where data must stay in-house.
  • Organizations that want OpenAI-lineage capability they can own and fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch..
  • Single-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. (80GB) deployments needing near-o4-mini reasoning at no per-tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words. cost.
  • Building agents with tool use, code execution, and inspectable chain-of-thought.

Not For

  • Laptop or consumer-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. deployment — it needs an 80GB card; use gpt-oss-20b.
  • MultimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. tasks — it's text-only.
  • Teams that want the absolute frontier — the closed GPT-5.5 goes higher.
  • Buyers who'd rather not manage inferenceRunning a model to get outputs — as opposed to training it. When you send a prompt to ChatGPT, that's inference. Inference is much cheaper than training per operation but adds up quickly at scale. Pricing pages almost always refer to inference costs (per million tokens, per request, etc.), not training costs. infrastructure at all.

License — Plain-English Summary

Apache 2.0 — unrestricted commercial use, modification, fine-tuning, and redistribution, no royalties or user-count carve-outs; keep the notices and flag significant changes. OpenAI attaches a short "gpt-oss usage policy" describing acceptable use, which doesn't restrict commercial deployment but is worth reading. Self-hostedRunning a model on hardware you control — your own servers, your own cloud instance, or your own laptop — rather than paying to access it through someone else's API. Self-hosting gives you full control over data and predictable costs, but requires the hardware and operational effort to run the model. Only possible with open-weight models., the model keeps all data in-house. As open licenses go, this is among the cleanest — on par with Gemma 4 and the Apache-licensed Qwen models.

How It Compares

Against gpt-oss-20b, the 120b is far more capable but needs a datacenter GPUA GPU designed for server and cloud use, typically Nvidia H100, H200, B200, or A100. Datacenter GPUs have 40-192GB of VRAM and cost tens of thousands of dollars each. Required for training frontier models and for running the largest models in full precision. Most cloud AI APIs run on datacenter GPUs behind the scenes. rather than a laptop. Against the closed GPT-5.4, gpt-oss-120b is the self-hostable option — less peak capability, full ownership and no per-tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words. cost. Against other open flagships like Gemma 4 31B and the Apache-licensed Qwen models, gpt-oss-120b competes on reasoning and agentic tooling under an equally clean license, though those rivals add multimodality that gpt-oss lacks.

Cost

Self-hosted cost
$0.00 beyond compute
Notes
Free to self-host under Apache 2.0; also served by OpenAI and third parties per-token. Reasoning effort is adjustable (low / medium / high).

Hardware requirements

Min VRAM
80 GB
Recommended VRAM
80 GB
Runs on laptop
No
Notes
Designed to fit a single 80GB datacenter GPU (H100 / MI300X) thanks to MXFP4 quantization of the expert weights. Not a laptop model — use gpt-oss-20b for that.

Comparable models

Commercial-use conditions

Apache 2.0 permits unrestricted commercial use, modification, fine-tuning, and redistribution. OpenAI also attaches a short "gpt-oss usage policy" covering acceptable use; it doesn't restrict commercial deployment but is worth a read.

Sources