Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →
Seed-OSS-36B-Instruct
Model family: seed-oss
- llm
- open-weight
- commercial-friendly
- apache-2-0
- long-context
- reasoning
- mid-to-large
- china-based
- tool-use
Quick Take
ByteDance's open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. flagship-by-proxy: a 36-billion-parameter Apache-2.0 model with a native 512K context — four times the open-sourceA stricter standard than open-weight: the weights, the training code, and the training data are all released publicly. Very few large language models meet the full open-source bar — most "open" models in the AI world are actually open-weight. When in doubt, check the license file and the creator's documentation. norm — built for long-document reasoning and agent workflows you can run on your own hardware.
Plain-English Description
Seed-OSS-36B is the model ByteDance gives away. Following the same playbook OpenAI used with GPT-OSS, ByteDance kept its commercial flagship (the Seed 2.0 / Doubao line) closed and instead trained a separate, capable model specifically for the open-sourceA stricter standard than open-weight: the weights, the training code, and the training data are all released publicly. Very few large language models meet the full open-source bar — most "open" models in the AI world are actually open-weight. When in doubt, check the license file and the creator's documentation. community. It launched in August 2025 in three flavors: a Base modelA model straight out of pretraining, before any fine-tuning for chat or specific tasks. Base models predict the next token but don't follow instructions well — they'll continue your prompt rather than respond to it. Most people never use base models directly; they use the instruct-tuned or chat versions built on top. Useful mostly for researchers and people doing their own fine-tuning., a Base variant trained without synthetic data (for researchers who want a cleaner baseline), and the Instruct model most users will want.
Its headline feature is context length. Most open models top out around 128K tokens; Seed-OSS handles 512,000 — and that window was built in during pre-training, not bolted on afterward, which tends to mean it holds up better on genuinely long inputs. It also ships a "thinking budget" control that lets you dial how much the model reasons before answering, trading speed for depth. For a 36B model trained on a relatively lean 12 trillion tokens, its reasoning, math, and coding scores are strong, beating similarly-sized open models on several public benchmarks.
The practical cost is hardware. A 36B model already needs serious GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. memory; pushing toward that 512K context pushes memory use up further. This is a workstation-or-server model, not a laptop model — but everything about it is yours to run, modify, and ship.
Best For
- Long-document and large-codebase work where the 512K context genuinely earns its keep — multi-document analysis, long-context RAG, repository-level reasoning.
- Teams that need open weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. for data-privacy, on-premise, or auditability reasons and want a permissive Apache license.
- Agent and tool-using workflows (it ships with native tool-call support).
- Reasoning-heavy tasks where the adjustable thinking budget lets you tune cost vs. quality.
Not For
- Laptop or low-VRAMThe memory built into a GPU. VRAM size determines what models you can load and run — a model's weights must fit in VRAM (or be cleverly swapped in and out). A 7B model in 4-bit quantization needs about 6GB of VRAM; a 70B model in 4-bit needs about 40GB; full-precision frontier models need multiple high-end GPUs. When people talk about a model "fitting" on a GPU, they mean VRAM. deployment — this needs workstation- or server-class GPUs, especially at long context.
- Buyers who want turnkey hosted access with no infrastructure: it's primarily a self-host model (some third-party API hosts exist, but it's not a first-party API product).
- MultimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. needs — Seed-OSS is text-only; for images or video you're looking at BAGEL or the closed Seed 2.0 line.
License — Plain-English Summary
Seed-OSS-36B is Apache 2.0 — about as business-friendly as open licenses get. You can use it commercially, modify it, fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch. it, and redistribute it; you just keep the license and copyright notices intact. No user-count carve-outs, no field-of-use restrictions. If you can run it, you can build on it.
How It Compares
- Qwen open models (similar size) — the natural comparison; Seed-OSS counters with a much longer native context, while Qwen offers broader multilingual tiers and a larger ecosystem.
- DeepSeek open models — strong reasoning peers; DeepSeek's flagships are larger and heavier to run, Seed-OSS is more workstation-friendly at 36B.
- Seed 2.0 Pro — ByteDance's own closed flagship; far more capable and multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default., but you can't download it. Seed-OSS is the open alternative when ownership beats peak capability.
Under the Hood
Dense 36B decoderThe part of a model that generates output, one token at a time, from an internal representation. Chat models are almost all decoder-only architectures — they take your prompt, process it, and stream out a response token by token. "Decoder-only" is the technical name for the family most people just call "chatbots." (64 layers) using Grouped-Query AttentionThe mechanism inside a Transformer that lets the model weigh which parts of the input matter most when processing each word. When you read "the cat sat on the mat," attention is how the model knows that "it" in a later sentence refers back to the cat, not the mat. Attention is what made modern language models possible., SwiGLU, RMSNorm, and RoPE; ~155K vocabulary; native 512K context from pre-training. Trained on ~12T tokens. Reported open-benchmark results include MMLUA broad knowledge test covering 57 subjects from law and medicine to mathematics and history. Scores are reported as percentage correct. A score around 85% is strong for a frontier model; above 90% is state-of-the-art. MMLU is probably the most-cited benchmark in AI model comparisons, though it has known weaknesses — models can memorize the questions, and the test reflects a specific cultural and academic context.-Pro 65.1 and BBH 87.7 on the base modelA model straight out of pretraining, before any fine-tuning for chat or specific tasks. Base models predict the next token but don't follow instructions well — they'll continue your prompt rather than respond to it. Most people never use base models directly; they use the instruct-tuned or chat versions built on top. Useful mostly for researchers and people doing their own fine-tuning. and AIME24 91.7 on Instruct — competitive-to-leading among open models of its scale. Released under Apache 2.0 by the ByteDance Seed team and optimized for international (i18n) use.