← Back to hard AIs

Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Qwen

Qwen3-Coder-30B-A3B

Model family: qwen3-coder

Size
mid (30.5B params)
Context
262,144 tokens
Released
2025-07-30
Openness
open-weight
License
Apache License 2.0 · commercial: yes
Cost tier
mixed
Rating
4.5 — The open coding workhorse: excellent agentic-coding and tool-calling for its size, repo-scale context, runs on one GPU, Apache 2.0. Still the local-coding default because no newer Qwen-Coder has shipped.
Modalities
code, text
Capabilities
coding, function-calling, instruction-following, long-context, tool-use
Access
api-third-party, local-runtime-llama-cpp, local-runtime-lm-studio, local-runtime-mlx, local-runtime-ollama, local-runtime-vllm, weights-download-hf

Quick Take

Qwen's open coding workhorse: a 30B mixture-of-experts model tuned for agentic coding and tool-calling, with repo-scale context, that runs on a single GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. — under Apache 2.0.

Plain-English Description

Qwen3-Coder-30B-A3B is the coding specialist of the open Qwen lineup. It's a mixture-of-experts model — 30.5 billion parameters total, but only about 3.3 billion active per tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words. — which is why it punches well above its memory footprint. The design goal is agentic coding: not just autocompleting a function, but operating inside coding tools (it ships with a function-call format built for platforms like Qwen Code and Cline), reading large codebases, and executing multi-step tasks.

Two practical strengths matter for business use. First, context: it handles 256,000 tokens natively and can stretch to about a million with a scaling technique called YaRN, which means it can hold an entire repository in working memory rather than squinting at one file at a time. Second, it runs locally — with only 3.3B active parametersIn a Mixture of Experts model, the number of parameters that actually run for any given input, as opposed to the total parameter count that's stored. Mistral Large 3, for example, has 675B total parameters but only 41B active per query — meaning it runs at roughly the cost of a 41B dense model while drawing on 675B worth of knowledge. it's a long-standing favorite for coding on a single consumer GPUA GPU designed for desktop PCs and gaming — typically Nvidia RTX 3090, 4090, 5090 or similar. Consumer GPUs have 8-32GB of VRAM and cost a few thousand dollars each. Capable of running small and medium models, especially when quantized. The boundary between "runs on a consumer GPU" and "needs a datacenter GPU" roughly separates small from large models in the catalog. or an Apple Silicon Mac via the MLX runtime.

It's a focused tool, not a general chatbot — it runs in a direct "non-thinking" mode without the step-by-step reasoning traces some models expose. And because no newer Qwen-Coder has shipped (there's no 3.6 or 3.7 Coder yet), this remains the go-to open coding model from Qwen in 2026.

Best For

  • Self-hostedRunning a model on hardware you control — your own servers, your own cloud instance, or your own laptop — rather than paying to access it through someone else's API. Self-hosting gives you full control over data and predictable costs, but requires the hardware and operational effort to run the model. Only possible with open-weight models. coding assistants and agents wired into tools like Cline, Qwen Code, or your own IDE integration.
  • Repository-scale tasks — refactors, codebase Q&A, multi-file changes — that need the long context.
  • Local development on a single GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. or a Mac, with no per-tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words. API costs and no code leaving your machine.
  • Teams that want an Apache 2.0 coding model they can fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch. on their own codebase.

Not For

  • General chat, reasoning, or writing — it's tuned for code; use a generalist like Qwen3.6-27B for mixed work.
  • Tasks that benefit from visible chain-of-thought reasoning — this model runs in direct, non-thinking mode only.
  • MultimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. needs — it's text and code, no image or video input.
  • Anyone wanting the absolute strongest coding scores regardless of footprint — the larger flagships and the dense Qwen3.6-27B can edge it on some agentic-coding suites.

License — Plain-English Summary

Apache 2.0 — unrestricted commercial use, modification, fine-tuning, and redistribution, no royalties, no carve-outs. For a coding model that's especially valuable: you can fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch. it on your proprietary codebase and ship the result in a commercial product with no licensing entanglements. Keep the notices, flag significant changes if you redistribute, and that's the whole obligation. Self-hostedRunning a model on hardware you control — your own servers, your own cloud instance, or your own laptop — rather than paying to access it through someone else's API. Self-hosting gives you full control over data and predictable costs, but requires the hardware and operational effort to run the model. Only possible with open-weight models., it also keeps your source code entirely in-house.

How It Compares

Against the dense Qwen3.6-27B, the Coder is more specialized for tool-driven coding workflows while the 27B is the stronger all-rounder of similar size — many teams now reach for the 27B for general agentic coding and keep the Coder for tooling-heavy pipelines. Against Qwen3-30B-A3B, the general MoEA model architecture that splits the model into many smaller specialized "expert" networks, only activating a handful per input rather than running the whole model every time. The practical effect: you get the knowledge capacity of a big model with the compute cost of a much smaller one. Mistral Large 3 and Mistral Small 4 are both MoE models. of the same size, the Coder trades broad capability for sharper code and tool-calling behavior. Against closed coding options, Qwen3-Coder's pitch is ownership: comparable everyday coding help that you self-host for free, versus renting a closed model by the tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words..

Cost

Self-hosted cost
$0.00 beyond compute
Notes
Free to self-host under Apache 2.0. 256K native context extends to ~1M via YaRN scaling for repository-scale work. Third-party hosts also serve it.

Hardware requirements

Min VRAM
18 GB
Recommended VRAM
24 GB
Runs on laptop
Yes
Notes
With only 3.3B active parameters it runs efficiently on a single consumer GPU; a long-time favorite for local coding on Apple Silicon via MLX.

Comparable models

Sources