Alibaba's closed, agent-first flagship: frontier-tier coding and reasoning with a million-tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words. memory, priced at roughly half its Western rivals — but API-only, with no weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. to own.
Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →
Qwen
4.5 ★ — The broadest and most-adopted open-weight family in the world, almost all of it under clean Apache 2.0, spanning phone-size to cluster-scale — held back from a perfect score only by the recent pivot to keeping the frontier flagships closed, which muddies an otherwise exemplary open posture.
- open-weight
- china-based
- big-tech-lab
- commercial-friendly
- apache-2-0
- multilingual
Quick Take
Alibaba's Qwen is the world's broadest open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. model family — Apache 2.0 weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. spanning phone-size to 397 billion parameters — now paired with a closed, API-only frontier flagship.
Who They Are
Qwen (from the Chinese "Tongyi Qianwen") is the large-model team inside Alibaba Cloud, the cloud-computing arm of Alibaba Group. Since 2023 it has been the most prolific frontier-model shipper of any major tech company, releasing models at a pace that outstrips most dedicated AI labs — dense models, mixture-of-experts models, coding specialists, vision-language models, audio models, and embeddings, across more than a dozen size points.
The strategy behind that firehose is straightforward and worth understanding, because it explains why the models are so generously licensed. Alibaba Cloud makes its money from cloud compute and API access, not from selling model licenses. Open-sourcing Qwen drives adoption; adoption drives people to run those models — often on Alibaba Cloud. The result is that Qwen has become one of the most-downloaded model families on Hugging Face and the default open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. choice for a huge swath of developers and businesses worldwide.
Model Philosophy
For most of its history, Qwen's answer to "how open are you?" was "very": nearly the entire lineup ships under Apache 2.0, the gold standard of permissive licenses — unrestricted commercial use, modification, and redistribution, no royalties, no user-count carve-outs. That is more permissive than Meta's Llama license and on par with the most open models anywhere.
In 2026 the posture got more nuanced. Alibaba began holding its absolute frontier models closed: the agent-tuned "Max" flagships (Qwen3.6 Max, then Qwen3.7-Max) and their multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. "Plus" siblings are proprietary and API-only, with no downloadable weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself.. The tier just below — the numbered open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. models like Qwen3.5-397B-A17B and the Qwen3 and Qwen3.6 families — stays Apache 2.0. So the lineup now splits cleanly: open-weight workhorses you can self-host freely, and a closed frontier model you can only rent through the API. For a business reader, that split is the single most important thing to keep straight.
What To Know Before You Commit
Match the model to the job, and mind the open/closed line. If you want maximum capability and you're comfortable using a hosted APIAccessing a model by sending requests to the creator's (or a provider's) servers, typically pay-per-use. Hosted APIs handle all the operational work — scaling, hardware, uptime — in exchange for a per-token or per-request fee. Every closed-API model is hosted; many open-weight models are also available via hosted APIs from providers like Together, Fireworks, or Groq., the closed Max flagship is the top of the range. If you want to own your stack — self-host, keep data in-house, avoid vendor lock-in, fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch. freely — the open Apache 2.0 models are the reason Qwen is so widely used, and they run on everything from a laptop to an H100 cluster.
The China-jurisdiction consideration applies the same way it does for any Chinese lab: the hosted DashScope API routes data to Alibaba Cloud under Chinese law, while the open weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself., run on your own infrastructure, carry no such routing. Qwen has drawn far less government-restriction attentionThe mechanism inside a Transformer that lets the model weigh which parts of the input matter most when processing each word. When you read "the cat sat on the mat," attention is how the model knows that "it" in a later sentence refers back to the cat, not the mat. Attention is what made modern language models possible. than DeepSeek did, but the underlying data-governance logic is identical — and for the closed Max/Plus models, the hosted API is the only way to use them, so there's no self-host escape hatch for those specific models.
How They Compare
Against Meta, Qwen is more permissively licensed across its open tier (clean Apache 2.0 versus Llama's community license with its large-user carve-out) and offers a far wider range of sizes, but carries the China-jurisdiction consideration Meta doesn't. Against DeepSeek, the two are the leading Chinese open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. labs — DeepSeek tends to win on raw frontier capability-per-dollar and keeps its flagships MIT-open, while Qwen wins on breadth of sizes, Apache licensing, and multilingual coverage, but has moved its very top models closed. Against the Western closed labs (OpenAI, Anthropic, Google) and Mistral AI, Qwen's open tier is the pitch: competitive capability you can download and self-host for free, in exchange for the data-governance questions a US- or EU-based vendor doesn't raise.
Original Models
Qwen3 7
Qwen3.7-Plus — the closed, multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. sibling of the Max flagship; vision input, 1M context, API-only, proprietary.
Identity
- Creator
- Qwen
- Model family
- qwen3-7
- Release date
- 2026-05-19
Technical specs
- Parameter count
- The closed, multimodal sibling of Qwen3.7-Max — adds image and video input. API-only on DashScope; no downloadable weights.
- Context window
- 1M tokens
- Modalities
- Image Input
- Text
- Video Input
- Primary capabilities
- Chat
- Function Calling
- Instruction Following
- Long Context
- Reasoning
- Tool Use
- Vision
License
- License
- Qwen Proprietary (Alibaba Cloud)
- Commercial use
- Allowed
- Terms
- Modification ✗
- Redistribution ✗
- Attribution ✗
Access
- Openness
- Closed Api
- Access methods
- Api First Party
- Api Third Party
- Cost tier
- Paid Api
Qwen3 6
The best open model you can actually run yourself: a dense 27B that beats Qwen's own 397B flagship on agentic coding while fitting on a single consumer GPUA GPU designed for desktop PCs and gaming — typically Nvidia RTX 3090, 4090, 5090 or similar. Consumer GPUs have 8-32GB of VRAM and cost a few thousand dollars each. Capable of running small and medium models, especially when quantized. The boundary between "runs on a consumer GPU" and "needs a datacenter GPU" roughly separates small from large models in the catalog., under Apache 2.0.
The Qwen3.6 open MoEA model architecture that splits the model into many smaller specialized "expert" networks, only activating a handful per input rather than running the whole model every time. The practical effect: you get the knowledge capacity of a big model with the compute cost of a much smaller one. Mistral Large 3 and Mistral Small 4 are both MoE models. (35B-A3B) — the efficient sibling of the dense 3.6-27B, Apache 2.0, single-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models.-friendly.
Identity
- Creator
- Qwen
- Model family
- qwen3-6
- Release date
- 2026-04-15
Technical specs
- Parameter count
- 35B
- Context window
- 262K tokens
- Modalities
- Text
- Primary capabilities
- Chat
- Coding
- Instruction Following
- Long Context
- Multilingual
- Reasoning
- Tool Use
License
- License
- Apache License 2.0
- Commercial use
- Allowed
- Terms
- Modification ✓
- Redistribution ✓
- Attribution ✓
Access
- Openness
- Open Weight
- Access methods
- Api Third Party
- Local Runtime Ollama
- Local Runtime Vllm
- Weights Download Hf
- Cost tier
- Mixed
Qwen3 5
The most capable model you can legally download and self-host with no strings — a 397B multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. Apache-2.0 flagship that rivals the frontier and speaks 201 languages.
Qwen3 Coder
Qwen's open coding workhorse: a 30B mixture-of-experts model tuned for agentic coding and tool-calling, with repo-scale context, that runs on a single GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. — under Apache 2.0.
Qwen3
The 0.6B dense Qwen3 — the family's smallest model, Apache 2.0, for highly constrained and edge deployments.
Identity
- Creator
- Qwen
- Model family
- qwen3
- Release date
- 2025-04-27
Technical specs
- Parameter count
- 600M
- Context window
- 33K tokens
- Modalities
- Text
- Primary capabilities
- Chat
- Instruction Following
- Multilingual
- Reasoning
License
- License
- Apache License 2.0
- Commercial use
- Allowed
- Terms
- Modification ✓
- Redistribution ✓
- Attribution ✓
Access
- Openness
- Open Weight
- Access methods
- Local Runtime Llama Cpp
- Local Runtime Ollama
- Weights Download Hf
- Cost tier
- Mixed
The 1.7B dense Qwen3 — an edge/on-deviceRunning a model directly on a consumer device — a laptop, a phone, a smart speaker — rather than in a data center. On-device inference keeps data private by never leaving the device, and works offline. Small models (under ~10B parameters, often quantized) can run on-device; larger models cannot yet. size, Apache 2.0, for phones and embedded use.
Identity
- Creator
- Qwen
- Model family
- qwen3
- Release date
- 2025-04-27
Technical specs
- Parameter count
- 1.7B
- Context window
- 33K tokens
- Modalities
- Text
- Primary capabilities
- Chat
- Instruction Following
- Multilingual
- Reasoning
License
- License
- Apache License 2.0
- Commercial use
- Allowed
- Terms
- Modification ✓
- Redistribution ✓
- Attribution ✓
Access
- Openness
- Open Weight
- Access methods
- Local Runtime Llama Cpp
- Local Runtime Lm Studio
- Local Runtime Ollama
- Weights Download Hf
- Cost tier
- Mixed
The 14B dense Qwen3 — a balanced single-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. generalist under Apache 2.0.
Identity
- Creator
- Qwen
- Model family
- qwen3
- Release date
- 2025-04-27
Technical specs
- Parameter count
- 14B
- Context window
- 131K tokens
- Modalities
- Text
- Primary capabilities
- Chat
- Coding
- Instruction Following
- Long Context
- Multilingual
- Reasoning
- Tool Use
License
- License
- Apache License 2.0
- Commercial use
- Allowed
- Terms
- Modification ✓
- Redistribution ✓
- Attribution ✓
Access
- Openness
- Open Weight
- Access methods
- Api Third Party
- Local Runtime Llama Cpp
- Local Runtime Ollama
- Local Runtime Vllm
- Weights Download Hf
- Cost tier
- Mixed
The open generalist that defined Qwen3: a 235B Apache-2.0 mixture-of-experts model that went toe-to-toe with the closed frontier and became one of the most-deployed open models anywhere.
The general-purpose 30B-A3B MoEA model architecture that splits the model into many smaller specialized "expert" networks, only activating a handful per input rather than running the whole model every time. The practical effect: you get the knowledge capacity of a big model with the compute cost of a much smaller one. Mistral Large 3 and Mistral Small 4 are both MoE models. — fast, single-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models.-friendly, Apache 2.0; the all-rounder counterpart to Qwen3-Coder.
Identity
- Creator
- Qwen
- Model family
- qwen3
- Release date
- 2025-04-27
Technical specs
- Parameter count
- 30.5B
- Context window
- 262K tokens
- Modalities
- Text
- Primary capabilities
- Chat
- Coding
- Instruction Following
- Long Context
- Multilingual
- Reasoning
- Tool Use
License
- License
- Apache License 2.0
- Commercial use
- Allowed
- Terms
- Modification ✓
- Redistribution ✓
- Attribution ✓
Access
- Openness
- Open Weight
- Access methods
- Api Third Party
- Local Runtime Llama Cpp
- Local Runtime Ollama
- Local Runtime Vllm
- Weights Download Hf
- Cost tier
- Mixed
The 32B dense Qwen3 — the largest single-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models.-friendly dense modelA model where every parameter is used for every input — the entire model runs on every token. Contrast with sparse or Mixture of Experts models, which activate only a fraction of the model per input. Dense models are simpler and more predictable; MoE models are more efficient at scale. in the family, Apache 2.0, with hybrid thinking modes.
Identity
- Creator
- Qwen
- Model family
- qwen3
- Release date
- 2025-04-27
Technical specs
- Parameter count
- 32B
- Context window
- 131K tokens
- Modalities
- Text
- Primary capabilities
- Chat
- Coding
- Instruction Following
- Long Context
- Multilingual
- Reasoning
- Tool Use
License
- License
- Apache License 2.0
- Commercial use
- Allowed
- Terms
- Modification ✓
- Redistribution ✓
- Attribution ✓
Access
- Openness
- Open Weight
- Access methods
- Api Third Party
- Local Runtime Llama Cpp
- Local Runtime Ollama
- Local Runtime Vllm
- Weights Download Hf
- Cost tier
- Mixed
The 4B dense Qwen3 — surprisingly strong for its size, Apache 2.0, runs on modest hardware.
Identity
- Creator
- Qwen
- Model family
- qwen3
- Release date
- 2025-04-27
Technical specs
- Parameter count
- 4B
- Context window
- 33K tokens
- Modalities
- Text
- Primary capabilities
- Chat
- Instruction Following
- Multilingual
- Reasoning
License
- License
- Apache License 2.0
- Commercial use
- Allowed
- Terms
- Modification ✓
- Redistribution ✓
- Attribution ✓
Access
- Openness
- Open Weight
- Access methods
- Local Runtime Llama Cpp
- Local Runtime Lm Studio
- Local Runtime Ollama
- Weights Download Hf
- Cost tier
- Mixed
The 8B dense Qwen3 — laptop-feasible, Apache 2.0, a common base for local apps and fine-tunes.
Identity
- Creator
- Qwen
- Model family
- qwen3
- Release date
- 2025-04-27
Technical specs
- Parameter count
- 8B
- Context window
- 131K tokens
- Modalities
- Text
- Primary capabilities
- Chat
- Coding
- Instruction Following
- Long Context
- Multilingual
- Reasoning
- Tool Use
License
- License
- Apache License 2.0
- Commercial use
- Allowed
- Terms
- Modification ✓
- Redistribution ✓
- Attribution ✓
Access
- Openness
- Open Weight
- Access methods
- Local Runtime Llama Cpp
- Local Runtime Lm Studio
- Local Runtime Ollama
- Local Runtime Vllm
- Weights Download Hf
- Cost tier
- Mixed
Qwq
QwQ-32B — Qwen's early dedicated reasoning modelA model trained to "think through" problems step by step before answering, often by producing internal reasoning that's either shown or hidden from the user. Reasoning models trade speed for accuracy on hard problems — they're slower and more expensive per answer, but markedly better at math, logic, and complex analysis. OpenAI's o1 series and Mistral's Magistral are reasoning models., Apache 2.0; still capable but superseded by Qwen3's native thinking modes.
Identity
- Creator
- Qwen
- Model family
- qwq
- Release date
- 2025-03-05
Technical specs
- Parameter count
- 32B
- Context window
- 131K tokens
- Modalities
- Text
- Primary capabilities
- Chat
- Coding
- Instruction Following
- Math
- Reasoning
License
- License
- Apache License 2.0
- Commercial use
- Allowed
- Terms
- Modification ✓
- Redistribution ✓
- Attribution ✓
Access
- Openness
- Open Weight
- Access methods
- Local Runtime Llama Cpp
- Local Runtime Ollama
- Local Runtime Vllm
- Weights Download Hf
- Cost tier
- Mixed
Qwen2 5
Qwen2.5's 14B general model, Apache 2.0 — the base for DeepSeek-R1-Distill-Qwen-14B.
Identity
- Creator
- Qwen
- Model family
- qwen2-5
- Release date
- 2024-09-18
Technical specs
- Parameter count
- 14B
- Context window
- 131K tokens
- Modalities
- Text
- Primary capabilities
- Chat
- Coding
- Instruction Following
- Long Context
- Multilingual
- Reasoning
License
- License
- Apache License 2.0
- Commercial use
- Allowed
- Terms
- Modification ✓
- Redistribution ✓
- Attribution ✓
Access
- Openness
- Open Weight
- Access methods
- Local Runtime Llama Cpp
- Local Runtime Ollama
- Local Runtime Vllm
- Weights Download Hf
- Cost tier
- Mixed
Qwen2.5's 32B general model, Apache 2.0 — the base for DeepSeek-R1-Distill-Qwen-32B (the standout o1-mini-class distill).
Identity
- Creator
- Qwen
- Model family
- qwen2-5
- Release date
- 2024-09-18
Technical specs
- Parameter count
- 32B
- Context window
- 131K tokens
- Modalities
- Text
- Primary capabilities
- Chat
- Coding
- Instruction Following
- Long Context
- Multilingual
- Reasoning
License
- License
- Apache License 2.0
- Commercial use
- Allowed
- Terms
- Modification ✓
- Redistribution ✓
- Attribution ✓
Access
- Openness
- Open Weight
- Access methods
- Local Runtime Llama Cpp
- Local Runtime Ollama
- Local Runtime Vllm
- Weights Download Hf
- Cost tier
- Mixed
Qwen2 5 Math
Qwen2.5's 1.5B math model, Apache 2.0 — notable as the base for DeepSeek-R1-Distill-Qwen-1.5B.
Identity
- Creator
- Qwen
- Model family
- qwen2-5-math
- Release date
- 2024-09-18
Technical specs
- Parameter count
- 1.5B
- Context window
- 4.1K tokens
- Modalities
- Text
- Primary capabilities
- Math
- Reasoning
License
- License
- Apache License 2.0
- Commercial use
- Allowed
- Terms
- Modification ✓
- Redistribution ✓
- Attribution ✓
Access
- Openness
- Open Weight
- Access methods
- Local Runtime Llama Cpp
- Local Runtime Ollama
- Local Runtime Vllm
- Weights Download Hf
- Cost tier
- Mixed
Qwen2.5's 7B math model, Apache 2.0 — the base for DeepSeek-R1-Distill-Qwen-7B.
Identity
- Creator
- Qwen
- Model family
- qwen2-5-math
- Release date
- 2024-09-18
Technical specs
- Parameter count
- 7B
- Context window
- 4.1K tokens
- Modalities
- Text
- Primary capabilities
- Math
- Reasoning
License
- License
- Apache License 2.0
- Commercial use
- Allowed
- Terms
- Modification ✓
- Redistribution ✓
- Attribution ✓
Access
- Openness
- Open Weight
- Access methods
- Local Runtime Llama Cpp
- Local Runtime Ollama
- Local Runtime Vllm
- Weights Download Hf
- Cost tier
- Mixed