Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Google

Gemini 3.5 Flash

Model family: gemini-3-5

Context

1,048,576 tokens

Released

2026-05-18

Openness

closed-api

License

Google Gemini API Terms · commercial: yes

Cost tier

freemium

Rating

4.5 ★ — Now Google's default for good reason — frontier-tier coding and agentic performance, full multimodality, a 1M-token window, fast, and free in the app for individuals. Held off 5 only because it's closed (no self-hosting).

Modalities

audio-input, image-input, text, video-input

Capabilities

chat, coding, function-calling, instruction-following, long-context, multilingual, reasoning, tool-use, vision

Access

api-first-party, api-third-party, hosted-chat-ui

llm
closed-api
frontier
multimodal
long-context
coding
agentic
us-based
proprietary

Quick Take

Google's new default model: frontier-tier coding and reasoning, full text-image-audio-video input, a million-token memory, and Flash-tier speed — free in Google's apps, cheap on the API.

Plain-English Description

Gemini 3.5 Flash, launched at Google I/O in May 2026, is the model now powering the Gemini app and AI Mode in Google Search for hundreds of millions of people — and it's Google's recommended default for developers too. "Flash" used to mean "the cheap, fast, weaker option," but Google flipped that: this Flash actually outperforms the previous generation's Pro model on coding and agentic tasks while running about four times faster. The headline framing from Google is that the everyday model is now strong enough to handle work that used to require the flagship.

It's natively multimodal in the fullest sense — a single request can mix text, images, audio, video, and PDFs, with text out — and video understanding in particular is a Google strength. It carries a one-million-token context window (enough for hundreds of pages or a large codebase at once) and a "dynamic thinking" mode that decides how much step-by-step reasoning to spend, with manual levels if you want to tune the cost/quality trade-off.

The practical catch is that it's closed: there are no weights to download, so everything runs through Google's API or apps. For individuals that's a feature (it's free in the Gemini app); for businesses with strict data-residency needs it means relying on Google Cloud's terms rather than self-hosting. At $1.50 in / $9 out per million tokens it's mid-priced among frontier models — cheaper than the top closed flagships, pricier than the budget tiers.

Best For

General-purpose frontier work where you want one fast, capable, multimodal model for most tasks.
Agentic and coding workflows — it's tuned for tool use and parallel agent loops.
Multimodal jobs involving video or audio, where Gemini's understanding leads the field.
Long-document and large-codebase analysis that benefits from the 1M-token window.
Individuals and teams already in the Google ecosystem (Workspace, Cloud, Search) who want tight integration — and free access in the app.

Not For

Anyone who needs to self-host or keep data fully in-house — it's closed and API-only; use Gemma 4 31B for that.
The very hardest deep-reasoning or precise long-context retrieval, where the Pro tier still leads — see Gemini 3.1 Pro (and the imminent 3.5 Pro on our watchlist).
Cost-minimizing high-volume workloads where a cheaper budget tier or an open self-hosted model would do — at $1.50/$9 it's not the cheapest option.
Teams wanting to fine-tune or own the model.

License — Plain-English Summary

There's no open license here — Gemini 3.5 Flash is proprietary, accessed through Google's API under Google Cloud's terms. You get commercial rights to what you build with the outputs, but no rights to the model itself: no weights, no modification, no redistribution. Treat it like any closed frontier API: your diligence is Google's Generative AI Prohibited Use Policy and the data-handling terms of the Gemini API / Vertex AI, which (being Google Cloud) include enterprise data-residency and processing options. If owning or self-hosting matters, Gemma is the Google line to look at instead.

How It Compares

Against Gemini 3.1 Pro, Flash is faster and cheaper and now beats it on coding and agentic benchmarks, but 3.1 Pro still leads on the hardest academic reasoning and precise long-context retrieval — pick Flash for most work, Pro for the deep end (until Gemini 3.5 Pro ships). Against Gemma 4 31B, the open Google option, Flash is more capable and fully managed but closed — Gemma is the choice when you need to self-host. Against the other closed frontier models (GPT-5.5, Claude Opus 4.7, and the China-based closed flagships), Gemini's edge is native multimodality — especially video — the 1M context window, and Google ecosystem integration, at a price that undercuts the top Western flagships.

Under the Hood

Gemini 3.5 Flash is the first model in the Gemini 3.5 family; the API model ID is gemini-3.5-flash (no preview suffix), internal version 3.5-flash-05-2026, with a January 2026 knowledge cutoff. Independent measurement put it around 55 on the Artificial Analysis Intelligence Index at launch, with roughly 186 output tokens/second — fast for a reasoning-capable model. Reported agentic/coding scores include 76.2% on Terminal-Bench 2.1, 83.6% on MCP Atlas (tool use), and 84.2% on CharXiv reasoning (multimodal charts). It supports function calling, structured output, search-as-a-tool, and code execution, and is distributed across the Gemini API, Google AI Studio, Vertex AI, Google Antigravity, Android Studio, the Gemini app, and AI Mode in Search.

Cost

API input (per 1M tokens): $1.50
API output (per 1M tokens): $9.00
API providers: google-gemini-api, google-vertex-ai, openrouter
Notes: $1.50 input / $9.00 output per million tokens; cached input $0.15 (non-global regions $1.65 / $9.90). Free for consumers in the Gemini app and AI Mode in Search. Max output 65,536 tokens. No self-hosting — closed model.

Comparable models

Commercial-use conditions

Commercial use is permitted through the Gemini API / Vertex AI under Google Cloud's terms. You're buying access, not the model — no weights, no modification, no redistribution.