Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Google

Gemma 4 E2B

Model family: gemma-4

Size

small (2.0B params)

Context

131,072 tokens

Released

2026-04-01

Openness

open-weight

License

Apache License 2.0 · commercial: yes

Cost tier

mixed

Rating

4.0 ★ — The standout on-device model: genuinely useful capability on phones and edge hardware, multimodal, under clean Apache 2.0. Rated 4.0 rather than higher only because the small size naturally limits what it can do versus larger models.

Modalities

image-input, text

Capabilities

chat, instruction-following, multilingual, vision

Access

local-runtime-llama-cpp, local-runtime-lm-studio, local-runtime-mlx, local-runtime-ollama, weights-download-hf

llm
open-weight
commercial-friendly
small
on-device
multimodal
multilingual
self-hostable
us-based
apache-2-0

Quick Take

Google's smallest open model: capable enough to be useful, small enough to run on a phone or a Raspberry Pi, multimodal, and Apache 2.0.

Plain-English Description

Gemma 4 E2B is the entry point of the Gemma 4 family, built for the edge — phones, embedded devices, single-board computers, and any setting where the model has to run locally with low latency and a tiny memory footprint. The "E2B" name means it behaves like a roughly 2-billion-parameter model in terms of speed and memory, which is small enough to fit comfortably where a full-size model never could.

What makes it interesting is that "small" no longer means "useless." E2B inherits Gemma 4's architecture and multimodal abilities (it can handle images, not just text), so it can power genuinely helpful on-device features — a private assistant, document or photo understanding, classification, simple agents — without ever sending data to a server. For privacy-sensitive or offline use cases, that's the whole point: the model runs on the device, so the data never leaves it.

Set expectations appropriately: this is a small model, so it won't match the larger Gemma 4 sizes or the Gemini cloud models on hard reasoning or complex coding. It's the right tool when footprint, latency, privacy, and cost-of-running matter more than raw capability — which, for a surprising number of real products, they do.

Best For

On-device and mobile apps that need AI without a server round-trip.
Privacy-first or offline use cases where data must stay on the device.
Embedded and IoT deployments on constrained hardware.
Low-latency, high-volume tasks where a small fast model is cheaper than calling an API.
Prototyping local AI features before deciding whether you need a larger model.

Not For

Hard reasoning, complex coding, or nuanced long-form work — step up to Gemma 4 26B-A4B or Gemma 4 31B.
Tasks needing the largest context or deepest knowledge.
Audio-heavy multimodal work, if your deployment target can't support it — verify on your device.
Anyone expecting frontier-model quality from a phone-class model.

License — Plain-English Summary

Apache 2.0, like all of Gemma 4 — unrestricted commercial use, modification, fine-tuning, and redistribution, no royalties or carve-outs. For an on-device model this is especially clean: you can embed it in a shipping product, tune it for your use case, and distribute it, with the only obligation being to keep the notices. And because it runs locally, it sidesteps data-routing questions entirely — nothing leaves the device.

How It Compares

Against its larger Gemma 4 siblings, E2B trades capability for footprint — Gemma 4 26B-A4B is far more capable but needs a real GPU, while E2B runs on a phone. Against other small on-device models (small Llama, Qwen, and Phi-class models), Gemma 4 E2B competes on capability-per-byte and multimodality under a clean Apache 2.0 license. Against any cloud API, the comparison isn't really capability — it's that E2B runs entirely on the user's device, with the privacy, offline, and zero-marginal-cost benefits that brings.

Cost

Self-hosted cost: $0.00 beyond compute
Notes: Free to self-host under Apache 2.0. Designed for on-device deployment via Google AI Edge / LiteRT, Ollama, llama.cpp, and similar runtimes. Context follows the Gemma 4 family standard; practical on-device length depends on the host device's memory.

Hardware requirements

Min VRAM: 4 GB
Recommended VRAM: 8 GB
Runs on laptop: Yes
Notes: Runs on phones, single-board computers (e.g. Raspberry Pi class), laptops, and modest GPUs. The most accessible model in the catalog.