Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Meta · Llama 3.1 8B Instruct

Feature-frozen. The creator has frozen feature development on this model (critical fixes only).

Catalog entry last reviewed 92 days ago.

Hermes 3 — Llama 3.1 8B

fine-tune derivative of Llama 3.1 8B Instruct by Nous Research

Full-parameter fine-tune of Llama 3.1 8B (base, not Instruct) produced by Nous Research. Adds improved function calling, structured output (JSON mode), better roleplaying behavior, stronger steerability via system prompts, and ChatML prompt format. Claimed to be competitive with or superior to Meta's own Llama 3.1 8B Instruct on most general capabilities.

Size

small (8.0B params)

Context

131,072 tokens

Released

2024-08-14

Openness

open-weight

License

Llama 3.1 Community License (inherited) · commercial: conditional

Cost tier

mixed

Rating

4.0 ★ — A high-quality fine-tune that meaningfully improves on Meta's own Instruct version for steerability, function calling, and roleplay — but it's still fundamentally an 8B model, so it inherits the capability ceiling of its base. Choose this over Meta's Instruct when you want more control over behavior; stick with Meta's when first-party alignment matters more.

Modalities

text

Capabilities

chat, function-calling, instruction-following, long-context, multilingual, tool-use

Access

api-third-party, local-runtime-llama-cpp, local-runtime-lm-studio, local-runtime-ollama, local-runtime-vllm, weights-download-hf

llm
open-weight
commercial-friendly
small-to-mid
fine-tune
llama-derivative
function-calling
tool-use
us-based

Quick Take

Nous Research's full-parameter fine-tune of Llama 3.1 8B — trades first-party Meta alignment for better steerability, stronger function calling, and a more flexible prompt format.

Plain-English Description

Hermes 3 is Nous Research's fine-tuned version of Meta's Llama 3.1 8B base model. The quick version: same 8 billion parameters, same 128K context window, same general capability ceiling, but different personality and different strengths in how it responds to you. Nous took Meta's base Llama 3.1 8B (not the Instruct variant — they started from the raw pretrained model) and did their own instruction tuning on it, with a specific focus on function calling, structured output (JSON mode), and steerability via system prompts.

"Steerability" is the word that matters most here. Meta's own Llama 3.1 8B Instruct is tuned to Meta's standards for what a helpful, harmless assistant should be — reasonable defaults but relatively opinionated about what it will and won't do, and not especially responsive to attempts to change its voice through system prompts. Hermes is tuned in the opposite direction: more willing to adopt whatever persona, role, or behavioral rules you specify in the system prompt, and less likely to break character in the middle of a session. That's a feature if you're building an application where you want tight control over the model's behavior, and potentially a problem if you need the model to consistently refuse certain categories of request regardless of what your users prompt it with.

The other practical difference is the prompt format. Hermes uses ChatML — the same prompt format OpenAI's API uses — which makes it drop-in compatible with a lot of tooling that already expects that format. Meta's Instruct versions use their own Llama prompt format. If you're switching between multiple models in your stack, the ChatML compatibility is convenient.

Best For

Applications where system-prompt steerability is a feature — you want the model to take on specific personas, follow detailed behavioral rules, or operate as part of an agent framework with defined roles
Function calling and tool use — Nous has invested specifically in making these reliable, and the model ships with documented function-calling templates
Developers already building on OpenAI-compatible tooling who want a drop-in open-weight alternative with matching prompt format
Roleplay, interactive fiction, and creative applications where Meta's first-party alignment is too restrictive for the use case
Self-hosted deployment where you want a fine-tune that's widely discussed and documented in the open-source community

Not For

Consumer-facing applications in regulated industries where you need the stricter refusal behavior of a first-party instruction-tuned model
Any use case where "Llama 3.1 8B was too small" — Hermes is the same model underneath, with the same capability ceiling
Teams who don't want to think about license inheritance — the Llama 3.1 Community License still governs this model, and you still need to display "Built with Llama" attribution
Organizations above 700M monthly active users (the underlying Llama license applies)
Businesses that need long-term support guarantees — Nous Research is a well-funded lab, but Hermes 3 is feature-frozen and the next-generation Hermes 4 series has already started shipping elsewhere

License — Plain-English Summary

The license situation here is actually simple, once you understand the inheritance: Nous Research publishes Hermes 3 under the Llama 3.1 Community License from Meta. They didn't add new restrictions; they didn't relicense it under something more permissive; the license is Meta's, and all of Meta's terms apply. That means: free for commercial use unless you had more than 700M monthly active users on July 23, 2024; must display "Built with Llama" attribution; must include the license file when redistributing; no using it to train non-Llama foundation models; standard prohibited uses (CSAM, illegal activity, military weapons development). For the vast majority of businesses, this is permissive commercial use.

How It Compares

Llama 3.1 8B Instruct (see Llama 3.1 8B Instruct — Meta's own fine-tune of the same base model; tighter default alignment, same hardware, same license, Meta's own prompt format instead of ChatML)
Llama 3.3 70B Instruct (see Llama 3.3 70B Instruct — if Hermes 8B's capability ceiling is the issue, the larger base model is the answer, not a different fine-tune of the same base)
Other Hermes 3 variants (70B and 405B versions exist, built on Llama 3.1's larger models — same Nous Research post-training approach scaled up; same license inheritance pattern)

Under the Hood

Hermes 3 is a full-parameter fine-tune of Llama 3.1 8B base, not a LoRA or adapter. Training used Nous Research's own post-training pipeline with a focus on instruction-following, agentic behaviors, ChatML formatting, and function calling. The model retains Llama 3.1's architectural details — dense decoder-only transformer, Grouped-Query Attention, 128K context window — and the December 2023 pretraining knowledge cutoff of the base. Native function calling is supported via a documented JSON schema approach (see Nous Research's Hermes-Function-Calling GitHub repository for current templates). ChatML is the default prompt format. Official GGUF quantizations are published by Nous Research themselves, making Ollama and llama.cpp deployment straightforward. The Hermes 3 Technical Report (arXiv:2408.11857) documents the training approach and evaluation methodology.

Cost

Self-hosted cost: $0.00 beyond compute
API providers: openrouter, lambda-labs, fireworks, together
Notes: API pricing varies by provider and is not consistently published for Hermes variants; check providers directly. Self-hosting is the more common deployment pattern for Hermes models. Official GGUF quantizations are published by Nous Research themselves for direct download.

Pricing data is 92 days old. Verify with the source before relying on it.

Hardware requirements

Min VRAM: 6 GB
Recommended VRAM: 16 GB
Runs on laptop: Yes
Notes: Same hardware profile as Llama 3.1 8B — 4-bit quantized runs on 6GB cards, full precision wants ~16GB. Official GGUF quantizations from Nous Research are available for direct use in llama.cpp, Ollama, and LM Studio.

Comparable models

Commercial-use conditions

Licensing inherits directly from Meta's Llama 3.1 base model. Free for commercial use unless your product had more than 700 million monthly active users on July 23, 2024. Past that threshold, a separate Meta license is required. Nous Research has not added restrictions beyond the base Llama license.