Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · ByteDance

Seed-OSS-36B-Instruct

Model family: seed-oss

Size

mid (36.0B params)

Context

512,000 tokens

Released

2025-08-19

Openness

open-weight

License

Apache License 2.0 · commercial: yes

Cost tier

self-hosted-only

Rating

4.0 ★ — A genuinely permissive open model with a standout 512K native context and strong reasoning for its size; the catch is the hardware needed to use that context.

Modalities

text

Capabilities

chat, coding, instruction-following, long-context, math, multilingual, reasoning, tool-use

Access

api-third-party, local-runtime-vllm, weights-download-hf

llm
open-weight
commercial-friendly
apache-2-0
long-context
reasoning
mid-to-large
china-based
tool-use

Quick Take

ByteDance's open-weight flagship-by-proxy: a 36-billion-parameter Apache-2.0 model with a native 512K context — four times the open-source norm — built for long-document reasoning and agent workflows you can run on your own hardware.

Plain-English Description

Seed-OSS-36B is the model ByteDance gives away. Following the same playbook OpenAI used with GPT-OSS, ByteDance kept its commercial flagship (the Seed 2.0 / Doubao line) closed and instead trained a separate, capable model specifically for the open-source community. It launched in August 2025 in three flavors: a Base model, a Base variant trained without synthetic data (for researchers who want a cleaner baseline), and the Instruct model most users will want.

Its headline feature is context length. Most open models top out around 128K tokens; Seed-OSS handles 512,000 — and that window was built in during pre-training, not bolted on afterward, which tends to mean it holds up better on genuinely long inputs. It also ships a "thinking budget" control that lets you dial how much the model reasons before answering, trading speed for depth. For a 36B model trained on a relatively lean 12 trillion tokens, its reasoning, math, and coding scores are strong, beating similarly-sized open models on several public benchmarks.

The practical cost is hardware. A 36B model already needs serious GPU memory; pushing toward that 512K context pushes memory use up further. This is a workstation-or-server model, not a laptop model — but everything about it is yours to run, modify, and ship.

Best For

Long-document and large-codebase work where the 512K context genuinely earns its keep — multi-document analysis, long-context RAG, repository-level reasoning.
Teams that need open weights for data-privacy, on-premise, or auditability reasons and want a permissive Apache license.
Agent and tool-using workflows (it ships with native tool-call support).
Reasoning-heavy tasks where the adjustable thinking budget lets you tune cost vs. quality.

Not For

Laptop or low-VRAM deployment — this needs workstation- or server-class GPUs, especially at long context.
Buyers who want turnkey hosted access with no infrastructure: it's primarily a self-host model (some third-party API hosts exist, but it's not a first-party API product).
Multimodal needs — Seed-OSS is text-only; for images or video you're looking at BAGEL or the closed Seed 2.0 line.

License — Plain-English Summary

Seed-OSS-36B is Apache 2.0 — about as business-friendly as open licenses get. You can use it commercially, modify it, fine-tune it, and redistribute it; you just keep the license and copyright notices intact. No user-count carve-outs, no field-of-use restrictions. If you can run it, you can build on it.

How It Compares

Qwen open models (similar size) — the natural comparison; Seed-OSS counters with a much longer native context, while Qwen offers broader multilingual tiers and a larger ecosystem.
DeepSeek open models — strong reasoning peers; DeepSeek's flagships are larger and heavier to run, Seed-OSS is more workstation-friendly at 36B.
Seed 2.0 Pro — ByteDance's own closed flagship; far more capable and multimodal, but you can't download it. Seed-OSS is the open alternative when ownership beats peak capability.

Under the Hood

Dense 36B decoder (64 layers) using Grouped-Query Attention, SwiGLU, RMSNorm, and RoPE; ~155K vocabulary; native 512K context from pre-training. Trained on ~12T tokens. Reported open-benchmark results include MMLU-Pro 65.1 and BBH 87.7 on the base model and AIME24 91.7 on Instruct — competitive-to-leading among open models of its scale. Released under Apache 2.0 by the ByteDance Seed team and optimized for international (i18n) use.