Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →
Claude Sonnet 4.6
Model family: claude-sonnet-4
- llm
- closed-api
- frontier
- multimodal
- long-context
- coding
- us-based
- proprietary
Quick Take
The Claude most teams should actually use: near-Opus coding and reasoning at a third of the price, with a 1M-tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words. window and strong prompt-caching economics.
Plain-English Description
Claude Sonnet 4.6 is Anthropic's recommended default, and the recommendation is sound. It sits in the middle tier between the flagship Opus and the budget Haiku, but "middle" undersells it — thanks to Anthropic's pattern of capability creeping upward, Sonnet 4.6 matches or beats the previous generation's Opus on coding and document tasks, at Sonnet's $3 in / $15 out pricing.
For the majority of production workloads — coding assistance, analysis, writing, customer-facing apps, RAG pipelines — it delivers most of what the flagship offers at a fraction of the cost. It carries the same 1-million-tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words. context windowThe maximum amount of text the model can "see" at once — prompt plus prior conversation plus any documents you give it. Measured in tokens (which are roughly three-quarters of a word each). A 128K context window is about 96,000 words of input — roughly a 400-page book. Larger context windows let the model work with bigger documents but cost more to run. as Opus, and its prompt-caching economics are a genuine advantage: you can park a large document corpus or codebase in the cache once and re-read it cheaply on every request, which makes long-context agents affordable.
Like all of Claude, it's closed and API-only, available through Anthropic, Amazon Bedrock, and Google Vertex. The usual caveat applies — there's no self-hosting.
Best For
- The default model for most production work — coding, analysis, writing, support, RAG.
- Long-context agents that benefit from 1M tokens plus cheap cached re-reads.
- Teams that want most of Opus's quality without the flagship bill.
- Coding-heavy workflows where Claude's reliability shows.
Not For
- The very hardest agentic coding or reasoning — step up to Claude Opus 4.8.
- Ultra-high-volume, simple tasks — Claude Haiku 4.5 is cheaper for routing/extraction.
- Self-hosting or in-house-only data — Claude is closed.
- Buyers chasing the lowest tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words. price — GPT-5.2/5.4 and open models can be cheaper.
License — Plain-English Summary
Proprietary and API-only, on Anthropic's commercial terms: commercial rights to your outputs, no rights to the model, accessed through Anthropic's API or via Bedrock/Vertex. Enterprise and zero-retention data options are available. No open weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself., no self-hosting — that's true of every Claude model. The main practical lever here isn't licensing but caching: lean on prompt caching to keep long-context costs down.
How It Compares
Against Claude Opus 4.8, Sonnet gives up some peak capability for roughly a third of the cost — the better default for most work. Against Claude Haiku 4.5, it's more capable but pricier; route the simplest high-volume tasks down to Haiku. Against GPT-5.4 and Google's Gemini 3.5 Flash, Sonnet 4.6 competes as a mid-tier workhorse — often preferred for coding, while the others may win on price or multimodality.
Cost
- API input (per 1M tokens)
- $3.00
- API output (per 1M tokens)
- $15.00
- API providers
- anthropic-api, amazon-bedrock, google-vertex-ai
- Notes
- $3.00 input / $15.00 output per million tokens — held steady across several Sonnet generations. Prompt caching cuts cached input ~90% (a big lever for long-context agents); batch ~50% off. 1M-token context at flat rate.
Comparable models
Commercial-use conditions
Commercial use permitted through the Anthropic API / Bedrock / Vertex under Anthropic's terms; no weights, no modification, no redistribution.