Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →
Hermes 4 405B
fine-tune derivative of Llama 3.1 405B by Nous Research
Nous Research's post-training of Llama 3.1 405B into Hermes 4 — adding hybrid reasoning (toggleable <think> tags), stronger schema-adherent output, and steerable, low-refusal instruction following.
- llm
- open-weight
- large
- reasoning
- agentic
- self-hostable
- fine-tune
- us-based
- llama-derivative
Quick Take
Nous Research's flagship: a 405B hybrid-reasoning fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch. of Llama 3.1 405B, state-of-the-art among open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. models on reasoning, with toggleable chain-of-thought.
Plain-English Description
Hermes 4 405B is the top of Nous Research's lineup — their post-trainingAny training that happens after pretraining to make a base model useful for real tasks. Includes instruction tuning, chat tuning, and alignment work. Post-training is dramatically cheaper than pretraining — thousands to low millions rather than tens of millions. Most of what distinguishes GPT-4 from Llama 3.1 as a product, rather than as a base capability, is post-training. applied to Meta's largest Llama 3.1 model. The headline feature is hybrid reasoning: a system prompt toggles
Beyond reasoning, Hermes is known for steerability — it adopts strong personas, follows detailed system prompts closely, and refuses less than Meta's own Instruct release, which makes it a favorite for builders who want control over voice and behavior. It's a serious model that needs serious hardware: at 405B it's a multi-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. or cloud-inferenceRunning a model to get outputs — as opposed to training it. When you send a prompt to ChatGPT, that's inference. Inference is much cheaper than training per operation but adds up quickly at scale. Pricing pages almost always refer to inference costs (per million tokens, per request, etc.), not training costs. deployment.
Like all Llama-based Hermes models, its license is inherited from Llama (see below).
Best For
- Top-tier open reasoning you can self-host (with cluster-scale hardware).
- Applications wanting strong steerability and low-refusal instruction following at the frontier.
- Agentic and schema-adherent output (structured tool calls) at maximum capability.
- Research and high-end deployments where owning the weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. matters.
Not For
- Anyone without multi-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models./cluster capacity — use Hermes 4 70B.
- Products near the 700M-MAU mark, which trip Llama's license carve-out.
- Teams wanting a clean, unrestricted license — the Apache-based Hermes 4.3 36B avoids Llama's terms.
- MultimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. tasks — text only.
License — Plain-English Summary
Two layers. Nous releases the Hermes 4 weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. openly, but the base is Meta's Llama 3.1 405B, so Meta's Llama 3.1 Community License governs the model and travels with it: commercial use is allowed, but you must display "Built with Llama," observe the acceptable-use terms, and secure a separate Meta license only if your product exceeds 700 million monthly active users. That threshold is irrelevant for nearly all businesses, hence "conditional." For a similar model without Llama's strings, the Apache-licensed Hermes 4.3 36B is the alternative.
How It Compares
Against Hermes 4 70B, the 405B is more capable but far heavier — the 70B is what Nous recommends for hosted use. Against the prior Hermes 3 405B, Hermes 4 adds hybrid reasoning and sharper outputs. Against its base Llama 3.1 405B, Hermes is the steerable, lower-refusal, reasoning-toggle alternative to Meta's own Instruct tuning.
Cost
- Self-hosted cost
- $0.00 beyond compute
- Notes
- Free to self-host; the base model's license governs commercial use (see License).
Comparable models
Commercial-use conditions
Nous releases the Hermes weights openly, but the base is Meta's Llama 3.1, so Meta's Llama 3.1 Community License governs the model — including the clause requiring a separate Meta license if your product exceeds 700 million monthly active users.