Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · DeepSeek

DeepSeek-V4-Pro

Model family: deepseek-v4

Size

frontier (1600.0B params)

Context

1,048,576 tokens

Released

2026-04-23

Openness

open-weight

License

MIT License · commercial: yes

Cost tier

mixed

Rating

4.5 ★ — Frontier-class coding and reasoning at an MIT license and an API price a fraction of the closed labs; docked half a point because the model is still a preview and is too large to self-host without serious hardware.

Modalities

text

Capabilities

chat, coding, function-calling, instruction-following, long-context, math, reasoning, tool-use

Access

api-first-party, api-third-party, local-runtime-vllm, weights-download-hf

llm
open-weight
commercial-friendly
frontier
long-context
reasoning
coding
china-based
mixture-of-experts

Quick Take

DeepSeek's current flagship: a frontier-class open-weight model that goes toe-to-toe with the best closed systems on coding and reasoning, ships under MIT, and costs a fraction of the price.

Plain-English Description

DeepSeek-V4-Pro is the top model in DeepSeek's V4 family, released as a preview in April 2026. It is a "mixture-of-experts" model, which means that although it contains 1.6 trillion parameters in total, it only switches on a small slice — about 49 billion — for any given word it processes. Think of it as a large hospital with hundreds of specialists on staff, but only the two or three relevant to your case get called into the room. That design is how DeepSeek delivers frontier-scale capability while keeping the per-query cost low.

Two things make V4 Pro stand out. The first is its memory: a one-million-token context window (with up to 384,000 tokens of output), large enough to drop an entire codebase or a stack of long contracts into a single prompt. The second is its coding and reasoning performance — on the benchmarks DeepSeek reports, it lands alongside the strongest closed models from OpenAI and Anthropic rather than a tier below them, which is unusual for a model you can download for free.

The catch is size. At 1.6 trillion parameters, V4 Pro is genuinely a data-center model — self-hosting it at production speed takes a serious GPU cluster. Most teams will reach it through the API instead, where DeepSeek's pricing undercuts Western labs dramatically and automatic caching makes repeated-context work (like agents reading the same files over and over) cheaper still.

Best For

Agentic coding and software-engineering tasks where you want frontier-quality output without frontier-level API bills.
Long-document and whole-codebase analysis that needs the full million-token window.
Complex multi-step reasoning and math where you'd otherwise reach for a closed reasoning model.
Teams that want the option to self-host eventually — the MIT license means you're never locked into the API.
High-volume production workloads with repeated context, where prompt caching slashes the effective cost.

Not For

Anyone who can't send data to China and isn't set up to self-host. The first-party API routes data to Chinese servers under Chinese jurisdiction; if that's disqualifying for your industry, you must self-host the weights or use a Western third-party host — and Pro's size makes self-hosting expensive.
Small teams wanting to run the flagship on their own modest hardware — that's what DeepSeek-V4-Flash is for.
Image, audio, or video work — V4 Pro is text-only.
Buyers who need a vendor with Western data-residency guarantees and enterprise support contracts out of the box.

License — Plain-English Summary

DeepSeek-V4-Pro's weights are released under the MIT license — about as permissive as licenses get. You can use it commercially, modify it, fine-tune it, and redistribute it, and the only obligation is to keep DeepSeek's copyright notice with the weights. There is no large-user carve-out like Llama's and no non-commercial restriction. The license is genuinely not the thing to worry about here. The thing to worry about is data governance: MIT tells you what you may do with the weights, but it says nothing about where your data goes if you use DeepSeek's hosted API. Run the weights yourself (or via a trusted host) and the licensing and the privacy questions both resolve in your favor.

How It Compares

Against its own sibling DeepSeek-V4-Flash, Pro is the higher-capability, higher-cost option — Flash trades some benchmark headroom for far lower price and the ability to self-host on modest hardware. Against the closed frontier models from OpenAI and Anthropic, V4 Pro reaches comparable coding and reasoning quality while being open-weight and far cheaper per token — the gap it doesn't close is the data-governance and enterprise-support story a US-based vendor provides. Against other open-weight contenders like Qwen's Qwen and Zhipu's GLM, V4 Pro currently sits at or near the top of the open-weight rankings for coding and agentic work, though independent reproductions of DeepSeek's benchmark claims are still being published — treat the headline scores as directional until the community confirms them.

Under the Hood

V4 uses a mixture-of-experts transformer (61 layers, hidden dimension 7168) with a new hybrid attention design DeepSeek calls Compressed Sparse Attention combined with a Heavily Compressed Attention head, aimed at making the million-token context cheap to process. It was trained using the Muon optimizer and is notable for being among the first frontier open-weight models tuned for day-one deployment on Huawei's Ascend accelerators, reducing dependence on Nvidia hardware. Reported benchmarks include 80.6% on SWE-bench Verified and 93.5% on LiveCodeBench, with a Codeforces rating around 3206. These are vendor-reported figures from the April 2026 announcement; independent verification is ongoing. The model serves through vLLM and SGLang and exposes both OpenAI-compatible and Anthropic-compatible API endpoints.

Cost

Self-hosted cost: $0.00 beyond compute
API input (per 1M tokens): $1.74
API output (per 1M tokens): $3.48
API providers: deepseek, openrouter, together, fireworks
Notes: First-party API steady-state pricing after the launch promo ended 2026-05-31. Cache-hit input is far cheaper (about $0.0145 per million tokens), so agent and repeated-context workloads cost much less in practice. Self-hosting is free beyond your own compute. Third-party Western providers (OpenRouter, Together, Fireworks) host the open weights outside China.

Hardware requirements

Min VRAM: 640 GB
Recommended VRAM: 1128 GB
Runs on laptop: No
Notes: Full-precision weights are roughly 3TB; you cannot fit Pro on a single 8-GPU node. Realistic self-hosting means multi-node or 4x H200-class cards. Most teams will use the API for Pro and reserve self-hosting for V4 Flash.