← Back to hard AIs

Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Meta · Llama 3.1 8B

Feature-frozen. The creator has frozen feature development on this model (critical fixes only).

DeepHermes 3 8B

fine-tune derivative of Llama 3.1 8B by Nous Research

Nous Research's reasoning-focused fine-tune of Llama 3.1 8B — one of the first models to unify fast 'intuitive' responses and long chain-of-thought in one model, toggled via system prompt.

Size
small (8.0B params)
Context
131,072 tokens
Released
2025-02-12
Openness
open-weight
License
Cost tier
mixed
Rating
3.5 — An influential early unified-reasoning model, but a preview on an 8B base with the Llama license — 3.5.
Modalities
text
Capabilities
chat, instruction-following, long-context, reasoning, tool-use
Access
local-runtime-llama-cpp, local-runtime-ollama, local-runtime-vllm, weights-download-hf

Quick Take

A laptop-class reasoning modelA model trained to "think through" problems step by step before answering, often by producing internal reasoning that's either shown or hidden from the user. Reasoning models trade speed for accuracy on hard problems — they're slower and more expensive per answer, but markedly better at math, logic, and complex analysis. OpenAI's o1 series and Mistral's Magistral are reasoning models.: Nous's DeepHermes fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch. of Llama 3.1 8B, one of the first to unify fast answers and toggleable chain-of-thought in a single model.

Plain-English Description

DeepHermes 3 8B (a preview release, February 2025) was an early example of a now-common idea: one model that can either answer directly or, when told to via system prompt, switch into explicit chain-of-thought reasoning. Built on Llama 3.1 8B, it puts that toggleable-reasoning behavior on hardware almost anyone has.

It's a small reasoning modelA model trained to "think through" problems step by step before answering, often by producing internal reasoning that's either shown or hidden from the user. Reasoning models trade speed for accuracy on hard problems — they're slower and more expensive per answer, but markedly better at math, logic, and complex analysis. OpenAI's o1 series and Mistral's Magistral are reasoning models. — useful for local, private reasoning on math, logic, and structured problems, with Hermes's usual steerability. As a preview on an 8B base, set expectations accordingly: it demonstrated the approach rather than topping benchmarks.

License is inherited from Llama (see below).

Best For

  • Local, private reasoning on a laptop with a mode toggle for chain-of-thought.
  • Experimenting with unified intuitive/reasoning behavior at small scale.
  • Steerable instruction following with optional deliberation.

Not For

  • Strong reasoning — larger reasoning models go much further; this is an 8B preview.
  • Products near 700M MAU (Llama carve-out).
  • MultimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. tasks — text only.

License — Plain-English Summary

Two layers — Nous's open DeepHermes weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. over Meta's Llama 3.1 8B, governed by the Llama 3.1 Community License (commercial use with attribution; 700M-MAU carve-out). For a clean-license DeepHermes, the Mistral-based DeepHermes 3 Mistral 24B is Apache 2.0.

How It Compares

Against DeepHermes 3 Mistral 24B, the 8B is far lighter but less capable, and the Mistral variant carries a cleaner Apache license. Against the standard Hermes 3 — Llama 3.1 8B, DeepHermes adds toggleable reasoning. Against its base Llama 3.1 8B, it's Nous's reasoning-tuned variant.

Cost

Self-hosted cost
$0.00 beyond compute
Notes
Free to self-host; the base model's license governs commercial use (see License).

Comparable models

Commercial-use conditions

Nous releases the Hermes weights openly, but the base is Meta's Llama 3.1, so Meta's Llama 3.1 Community License governs the model — including the clause requiring a separate Meta license if your product exceeds 700 million monthly active users.

Sources