← Back to hard AIs

Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models

Google

4.5 ★ — A research pedigree second to none, a closed frontier line (Gemini) that competes at the top, and an open family (Gemma) that just moved to clean Apache 2.0 and runs from phones to servers — a rare creator that's genuinely strong on both the closed and open sides.

Type
big-tech-lab
Country
US
Founded
1998
License posture
mixed
Website

Quick Take

Google runs two AI lines at once: the closed, frontier-tier Gemini family for the cloud, and the open, Apache 2.0 Gemma family for everything from phones to on-prem servers.

Who They Are

Google's AI work runs through Google DeepMind, the combined research organization behind both the Gemini models and the open Gemma family. The research pedigree is arguably the deepest in the field — the original "TransformerThe core model architecture that powers nearly every modern AI language model. Introduced by Google researchers in 2017, it uses a mechanism called attention to process text by looking at every word in context with every other word simultaneously, rather than one at a time. "Transformer" is the T in GPT, BERT, and most other model names." architecture that underpins essentially every modern AI model came out of Google in 2017. Today Gemini powers the Gemini app and AI Mode in Google Search for well over 900 million monthly users, which makes Google one of the largest deployers of AI on the planet, not just one of the largest builders.

For business readers, the important thing is that Google deliberately ships in two directions. Gemini is the closed, hosted frontier line — you rent it through the API or get it free in Google's apps. Gemma is the open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. line — you download it and run it yourself. Google's pitch is that you can combine them: Gemini in the cloud for the hardest work, Gemma on your own hardware for private, low-latency, low-cost local processing.

Model Philosophy

The two lines have opposite licensing philosophies, and that's the point. Gemini is proprietary: no weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself., API-only, you're buying access. Gemma is open: as of Gemma 4 (April 2026) it ships under Apache 2.0 — the permissive gold standard, with unrestricted commercial use, modification, and redistribution. That Apache 2.0 move was notable, because earlier Gemma versions used Google's own custom "Gemma Terms of Use," which allowed commercial use but layered on a prohibited-use policy and wasn't a true open-sourceA stricter standard than open-weight: the weights, the training code, and the training data are all released publicly. Very few large language models meet the full open-source bar — most "open" models in the AI world are actually open-weight. When in doubt, check the license file and the creator's documentation. license. So within the Gemma family itself there are now two license regimes: Gemma 4 under clean Apache 2.0, and older Gemma 2 and Gemma 3 models under the custom Gemma terms.

The strategic logic mirrors Alibaba's with Qwen: Google makes its money from cloud and ads, not model licenses, so giving away capable open models drives the ecosystem (and, ideally, Google Cloud usage) while the closed Gemini line monetizes the frontier directly.

What To Know Before You Commit

Decide first whether you're renting or owning. If you want frontier capability and you're comfortable with a hosted APIAccessing a model by sending requests to the creator's (or a provider's) servers, typically pay-per-use. Hosted APIs handle all the operational work — scaling, hardware, uptime — in exchange for a per-token or per-request fee. Every closed-API model is hosted; many open-weight models are also available via hosted APIs from providers like Together, Fireworks, or Groq., Gemini is the line — Gemini 3.5 Flash is now Google's default model, and the Pro tier targets the hardest reasoning and long-context work. If you want to self-host, keep data in-house, or run on the edge, Gemma is the line — and with Gemma 4 on Apache 2.0, the licensing is as clean as it gets.

The data-governance picture is the gentlest of the recent creators in this catalog: Google is US-based, the hosted Gemini API runs on Google Cloud with enterprise data-residency options, and the open Gemma weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. carry no routing at all. The main diligence items are ordinary ones — Google Cloud's data terms for hosted use, and (for older Gemma models) the custom Gemma Terms of Use rather than Apache 2.0. Check which Gemma generation you're actually downloading.

How They Compare

Against the US open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. player Meta, Google now matches and arguably beats Llama on licensing for its current open models (Gemma 4's Apache 2.0 versus Llama's community license with its large-user carve-out), while also fielding a closed frontier line Meta doesn't really have. Against the China-based open labs DeepSeek and Qwen, Google offers a cleaner data-governance story (US jurisdiction) and a true cloud-frontier model in Gemini, while those labs often win on raw open-weight capability-per-dollar. Against the other closed Western frontier labs (OpenAI, Anthropic), Gemini's distinguishing strengths are native multimodality — especially video understanding — the largest production context windows, and deep integration with Google Cloud and Workspace.

Original Models

Gemini 3 5

Google's new default model: frontier-tier coding and reasoning, full text-image-audio-video input, a million-tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words. memory, and Flash-tier speed — free in Google's apps, cheap on the API.

Gemma 4

The Gemma 4 to start with: a mixture-of-experts model that gives 26B-class quality at roughly 4B-class cost, multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. and Apache 2.0, comfortable on a single consumer GPUA GPU designed for desktop PCs and gaming — typically Nvidia RTX 3090, 4090, 5090 or similar. Consumer GPUs have 8-32GB of VRAM and cost a few thousand dollars each. Capable of running small and medium models, especially when quantized. The boundary between "runs on a consumer GPU" and "needs a datacenter GPU" roughly separates small from large models in the catalog..

Google's open flagship: a 31B multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. model under clean Apache 2.0 that beats far larger models on math and coding — and runs on a single consumer GPUA GPU designed for desktop PCs and gaming — typically Nvidia RTX 3090, 4090, 5090 or similar. Consumer GPUs have 8-32GB of VRAM and cost a few thousand dollars each. Capable of running small and medium models, especially when quantized. The boundary between "runs on a consumer GPU" and "needs a datacenter GPU" roughly separates small from large models in the catalog..

Google's smallest open model: capable enough to be useful, small enough to run on a phone or a Raspberry Pi, multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default., and Apache 2.0.

The 4B-class on-deviceRunning a model directly on a consumer device — a laptop, a phone, a smart speaker — rather than in a data center. On-device inference keeps data private by never leaving the device, and works offline. Small models (under ~10B parameters, often quantized) can run on-device; larger models cannot yet. Gemma 4 — more capable than E2B for higher-end mobile and laptop use, multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default., Apache 2.0.

Gemini 3 1

Gemini 3.1 Flash — the prior fast Gemini tier, multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. and 1M-context, now superseded by 3.5 Flash.

Gemini 3.1 Flash-Lite — Google's cheapest closed tier (~$0.25/$1.50), for high-volume simpler workloads.

Google's current Pro flagship: the model to reach for when a job leans on deep reasoning or precise retrieval across very long documents — capable today, but with a successor weeks away.

Gemini 2 5

Gemini 2.5 Flash — the 2025 fast tier, multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. and long-context, superseded by the 3.x Flash line.

Gemini 2.5 Pro — the 2025 Pro flagship, long-context and multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default., now two generations behind.

Gemma 3

The 12B Gemma 3 — a single-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. mid-size under the custom Gemma Terms of Use, superseded by Gemma 4.

The 1B Gemma 3 — smallest, text-only, edge-oriented, under the custom Gemma Terms of Use.

The 27B Gemma 3 flagship — the prior open generation under the custom Gemma Terms of Use; superseded by Gemma 4.

The 4B Gemma 3 — laptop-feasible and multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default., under the custom Gemma Terms of Use; superseded by Gemma 4.

Sources