Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →
DeepSeek-R1
Model family: deepseek-r1
- llm
- open-weight
- commercial-friendly
- frontier
- reasoning
- math
- china-based
- mixture-of-experts
- distillable
Quick Take
The MIT-licensed reasoning modelA model trained to "think through" problems step by step before answering, often by producing internal reasoning that's either shown or hidden from the user. Reasoning models trade speed for accuracy on hard problems — they're slower and more expensive per answer, but markedly better at math, logic, and complex analysis. OpenAI's o1 series and Mistral's Magistral are reasoning models. that put DeepSeek on the map — matched OpenAI's best at a fraction of the cost, and ships in small distilled versions you can run on a laptop.
Plain-English Description
DeepSeek-R1, released in January 2025, is the model that made DeepSeek a household name in AI. It's a "reasoning" model — one trained to think step by step before answering, which makes it strong at math, code, and multi-step logic. What stunned the industry wasn't only that R1 matched OpenAI's o1 on hard reasoning tasks; it was that DeepSeek built it on comparatively modest computing resources and then gave the weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. away for free. The release triggered a sharp market reaction and set off a wave of open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. releases across the Chinese AI sector.
R1 was trained with an unusual recipe: instead of the standard approach of teaching the model with large labeled datasets first, DeepSeek applied reinforcement learning directly and let reasoning behaviors emerge. The current version, R1-0528, deepened that reasoning further — its score on the AIME 2025 math exam jumped from 70% to 87.5% between versions.
The full R1 is a 685-billion-parameter model that needs serious hardware, but DeepSeek also released six "distilled" versions — smaller models (built on Llama and Qwen) trained to imitate R1's reasoning. Distilling is like having a brilliant professor train a sharp student: the student is far smaller and cheaper to run but inherits much of the reasoning skill. The 7B and 8B distills run on a single consumer GPUA GPU designed for desktop PCs and gaming — typically Nvidia RTX 3090, 4090, 5090 or similar. Consumer GPUs have 8-32GB of VRAM and cost a few thousand dollars each. Capable of running small and medium models, especially when quantized. The boundary between "runs on a consumer GPU" and "needs a datacenter GPU" roughly separates small from large models in the catalog., which is how most people actually use R1 today.
Best For
- Math, logic, and step-by-step reasoning tasks, especially where you want to inspect the model's chain of thought.
- Running a capable reasoning modelA model trained to "think through" problems step by step before answering, often by producing internal reasoning that's either shown or hidden from the user. Reasoning models trade speed for accuracy on hard problems — they're slower and more expensive per answer, but markedly better at math, logic, and complex analysis. OpenAI's o1 series and Mistral's Magistral are reasoning models. locally and cheaply via the small distilled variants.
- Fine-tuning and distillationA technique for training a smaller model (the "student") to imitate a larger model (the "teacher"). The result is a compact model that retains much of the larger model's capability at a fraction of the compute cost. Distilled models are common in production because they're cheaper to run than the full-size originals while performing nearly as well on most tasks. projects — DeepSeek explicitly permits both, and R1's reasoning traces are valuable training material.
- Educational and research use where the open RL-trained recipe is itself the point of interest.
Not For
- Frontier production work today — DeepSeek-V4-Pro and DeepSeek-V4-Flash have overtaken R1 on capability and context length.
- MultimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. tasks — R1 is text-only.
- Anyone relying on DeepSeek's hosted service in a regulated or privacy-sensitive setting. R1 is the specific model whose hosted app drew bans and restrictions from multiple governments over data routing to China; if that's a concern, run the open weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. yourself rather than using the hosted APIAccessing a model by sending requests to the creator's (or a provider's) servers, typically pay-per-use. Hosted APIs handle all the operational work — scaling, hardware, uptime — in exchange for a per-token or per-request fee. Every closed-API model is hosted; many open-weight models are also available via hosted APIs from providers like Together, Fireworks, or Groq..
- General-purpose chat where a non-reasoning modelA model trained to "think through" problems step by step before answering, often by producing internal reasoning that's either shown or hidden from the user. Reasoning models trade speed for accuracy on hard problems — they're slower and more expensive per answer, but markedly better at math, logic, and complex analysis. OpenAI's o1 series and Mistral's Magistral are reasoning models. would be faster and cheaper.
License — Plain-English Summary
R1's weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. are MIT-licensed, and DeepSeek goes out of its way to spell out that commercial use and distillationA technique for training a smaller model (the "student") to imitate a larger model (the "teacher"). The result is a compact model that retains much of the larger model's capability at a fraction of the compute cost. Distilled models are common in production because they're cheaper to run than the full-size originals while performing nearly as well on most tasks. of the R1 series are explicitly allowed — including the base and chat variants. That makes R1 one of the most legally unencumbered reasoning models available; you can build on it, ship products with it, and even train your own models from its outputs. The recurring DeepSeek point applies with extra force here: R1's hosted app is the one governments restricted over data-sovereignty concerns, but those concerns attach to the hosted service, not to the open weights you run yourself.
How It Compares
Against the V4 family, R1 is the previous generation — V4 is smarter, handles far longer context, and is the model to choose for new frontier work. Against DeepSeek-V3.2, R1 is the dedicated reasoning specialist where V3.2 is more general-purpose (though V3.2-Speciale blurred that line). Against OpenAI's o1/o3 reasoning models, R1 reached comparable quality on math and coding at a tiny fraction of the cost and as open weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. — the headline contrast that made it famous. And uniquely in DeepSeek's lineup, R1's distilled variants give it a genuine accessibility story: there's a version small enough for almost any hardware budget.
Cost
- Self-hosted cost
- $0.00 beyond compute
- Notes
- R1 was served first-party behind the deepseek-reasoner endpoint at roughly $0.55 per million input and $2.19 per million output tokens; that endpoint is now a compatibility alias that points to V4 Flash's thinking mode and retires 2026-07-24. R1's open weights remain freely downloadable under MIT and are served by third-party hosts. Treat it now mainly as a self-host or third-party option.
Hardware requirements
- Min VRAM
- 16 GB
- Recommended VRAM
- 384 GB
- Runs on laptop
- Yes
- Notes
- The full 685B model needs a multi-GPU cluster. But the distilled variants span the range — the Qwen-7B and Llama-8B distills run on a single consumer GPU or a capable laptop, which is how most people actually run "R1."