← Back to hard AIs

Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models

DeepSeek

4.0 ★ — Outsized ecosystem impact and an unusually clean MIT posture on its flagship models, at prices that undercut every Western lab — held back from higher only by the data-governance questions around the hosted service and a mixed license picture on its older model families.

Type
ai-native-company
Country
CN
Founded
2023
License posture
predominantly-open-weight
Website

Quick Take

DeepSeek is the Chinese AI lab that proved a small team could build frontier-class models cheaply and give the weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. away — with the data-governance asterisk that comes from being headquartered in China.

Who They Are

DeepSeek is an AI-native research lab based in Hangzhou, China, founded in 2023 by Liang Wenfeng, who also co-founded the quantitative hedge fund High-Flyer. That origin matters: the team came out of mathematical optimization and efficient computing rather than the usual big-tech AI playbook, and it shows in their work — DeepSeek's whole reputation rests on squeezing top-tier results out of far less compute than rivals assumed was necessary.

The lab became globally famous in January 2025 when it released DeepSeek-R1, a reasoning modelA model trained to "think through" problems step by step before answering, often by producing internal reasoning that's either shown or hidden from the user. Reasoning models trade speed for accuracy on hard problems — they're slower and more expensive per answer, but markedly better at math, logic, and complex analysis. OpenAI's o1 series and Mistral's Magistral are reasoning models. that matched OpenAI's best at a fraction of the cost and was given away as open weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself.. The release rattled markets and kicked off a wave of open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. releases from other Chinese labs. DeepSeek has kept that cadence since, shipping the V3 generation, the V3.2 reasoning models that took gold-medal scores at the 2025 International Mathematical Olympiad, and in April 2026 the V4 family — its current flagship and, at the time of writing, one of the strongest open-weight models in the world.

Model Philosophy

DeepSeek's posture is "frontier capability, open weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself., lowest possible price." Its headline models — the V3, V3.2, R1 and V4 families — ship under the permissive MIT license, which lets you download, run, modify, fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch. and redistribute the weights commercially with essentially no strings beyond keeping the copyright notice. That is more open than Meta's Llama license (which carries a large-user carve-out) and far more open than the closed APIs from OpenAI and Anthropic. The lab pairs this with a first-party API priced to undercut everyone, plus automatic prompt caching that drops repeat-context costs close to zero.

Two caveats keep this from being a simple "most open lab wins" story. First, the openness is uneven across the catalog: DeepSeek's older and more specialized models (the original DeepSeek-LLM and Coder families, the Janus multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. models, the math and vision-language models) ship under a custom DeepSeek Model License with acceptable-use restrictions rather than plain MIT. Second, open weights solve the licensing question but not the trust question — see below.

What To Know Before You Commit

The single most important thing a business reader needs to understand about DeepSeek is the difference between the hosted service and the open weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself..

If you use DeepSeek's own API or chat app, your data travels to servers in China and falls under Chinese jurisdiction. That is why multiple US states, Australia, Taiwan, South Korea, Denmark, Italy and others have restricted the hosted apps on government and corporate devices, and why many regulated industries treat the hosted service as a non-starter. Hosted outputs also apply content controls on politically sensitive topics. None of this is hidden — it is a direct consequence of where the company is based.

The open weights are a different proposition entirely. Because the flagship models are MIT-licensed and downloadable, you (or a Western hosting provider like Together, Fireworks or OpenRouter) can run them on infrastructure you control, with no data leaving your environment. For a privacy-sensitive business, "self-host the open weights" is the escape hatch that makes DeepSeek's capability-per-dollar usable without the data-sovereignty exposure. The trade-off is that the largest models are big — running V4 Pro at production speed needs a serious GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. cluster, though the smaller variants are far more approachable.

How They Compare

Against Meta, DeepSeek is more permissively licensed on its flagships (clean MIT versus Llama's community license with its large-user carve-out) but carries the China-jurisdiction concern Meta doesn't. Against Mistral AI, the two are the leading open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. labs in their respective regions — Mistral AI in the EU, DeepSeek in China — with Mistral offering an easier data-governance story for Western buyers and DeepSeek typically winning on raw capability-per-dollar. Against the closed labs (OpenAI, Anthropic, Google), DeepSeek's pitch is simple: comparable frontier performance on coding and reasoning, open weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. you can self-host, and prices often tens of times lower — at the cost of the trust and jurisdiction questions a US-based closed API doesn't raise.

Original Models

Deepseek V4

The practical V4: most of Pro's smarts at a fraction of the cost, small enough that a mid-size team can actually self-host it, and MIT-licensed.

The MIT-licensed base checkpointA specific saved version of a model at a particular point in training. When a creator releases "Llama 3.1 8B Instruct," they're releasing a checkpoint — a frozen snapshot of the model as it existed at the end of training. Most models ship only a single public checkpoint; some creators release multiple (base, instruct, reasoning variants of the same underlying model). behind V4 Flash — the affordable starting point for domain-specific fine-tuning of the V4 architecture.

DeepSeek's current flagship: a frontier-class open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. model that goes toe-to-toe with the best closed systems on coding and reasoning, ships under MIT, and costs a fraction of the price.

The raw 1.6T base checkpointA specific saved version of a model at a particular point in training. When a creator releases "Llama 3.1 8B Instruct," they're releasing a checkpoint — a frozen snapshot of the model as it existed at the end of training. Most models ship only a single public checkpoint; some creators release multiple (base, instruct, reasoning variants of the same underlying model). behind V4 Pro, MIT-licensed, intended for continued pre-training and custom fine-tunes rather than direct use.

Deepseek V3 2

DeepSeek's prior flagship: an MIT-licensed reasoning modelA model trained to "think through" problems step by step before answering, often by producing internal reasoning that's either shown or hidden from the user. Reasoning models trade speed for accuracy on hard problems — they're slower and more expensive per answer, but markedly better at math, logic, and complex analysis. OpenAI's o1 series and Mistral's Magistral are reasoning models. strong enough that its high-compute variant took competition-math gold — now overtaken by V4 but still a solid self-host choice.

The reasoning-maxed variant of V3.2 — MIT-licensed, competition-math gold-level, but reasoning-only with no tool-calling support.

The experimental V3.2 build that pioneered sparse attentionThe mechanism inside a Transformer that lets the model weigh which parts of the input matter most when processing each word. When you read "the cat sat on the mat," attention is how the model knows that "it" in a later sentence refers back to the cat, not the mat. Attention is what made modern language models possible. for cheaper long context; MIT-licensed and now superseded by stable V3.2 and V4.

Deepseek V3 1

The V3.1 general-purpose model — MIT-licensed, capable on chat and long context, now a generation behind V3.2 and V4.

Deepseek V3

The March 2025 refresh of DeepSeek V3 — same architecture and MIT license, with improved coding and reasoning.

The foundational DeepSeek V3 — MIT-licensed, GPT-4-class at a fraction of the cost, and the architectural base for everything that followed.

Janus

The tiny 1B Janus-Pro — runs almost anywhere and both reads and generates images, under the custom DeepSeek Model License, with quality limited by its size.

DeepSeek's small open multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. model: it both reads and creates images, runs on a laptop, and beat DALL-E 3 on prompt-following at launch — though it's now aging and carries a custom license.

Deepseek R1

The MIT-licensed reasoning modelA model trained to "think through" problems step by step before answering, often by producing internal reasoning that's either shown or hidden from the user. Reasoning models trade speed for accuracy on hard problems — they're slower and more expensive per answer, but markedly better at math, logic, and complex analysis. OpenAI's o1 series and Mistral's Magistral are reasoning models. that put DeepSeek on the map — matched OpenAI's best at a fraction of the cost, and ships in small distilled versions you can run on a laptop.

The pure-reinforcement-learning reasoning modelA model trained to "think through" problems step by step before answering, often by producing internal reasoning that's either shown or hidden from the user. Reasoning models trade speed for accuracy on hard problems — they're slower and more expensive per answer, but markedly better at math, logic, and complex analysis. OpenAI's o1 series and Mistral's Magistral are reasoning models. behind R1 — MIT-licensed and fascinating for research, but rougher than R1 for production use.

Derivatives Authored

Deepseek R1 Distill

DeepSeek-R1-0528-Qwen3-8B — full entry

DeepSeek-R1-Distill-Llama-70B — full entry

DeepSeek-R1-Distill-Llama-8B — full entry

DeepSeek-R1-Distill-Qwen-1.5B — full entry

DeepSeek-R1-Distill-Qwen-14B — full entry

DeepSeek-R1-Distill-Qwen-32B — full entry

DeepSeek-R1-Distill-Qwen-7B — full entry

Sources