Gemini 3.5 Flash Is Now Powering Google Search – Here Is What Changed

📖 5 min read

If you’ve used Google Search in the past few weeks and noticed its AI answers feel sharper, that’s not your imagination. Google quietly made Gemini 3.5 Flash the default model powering its Gemini app and AI Mode in Google Search globally – rolled out the same day it was announced at Google I/O 2026 on May 19. No opt-in required. Billions of search queries are now running through it every day.

The model is now generally available (GA) under the API identifier gemini-3.5-flash. Here’s what actually changed, what it costs, and whether the price hike is worth it.

What Gemini 3.5 Flash Is

Google describes 3.5 Flash as its “strongest agentic and coding model yet” – designed for multi-step workflows and long-horizon tasks, not just simple chat. The key marketing claim: it beats Gemini 3.1 Pro on key coding and agent benchmarks, while running roughly 4x faster than other frontier models.

Google’s published benchmark scores for Gemini 3.5 Flash:

📧 Want more like this? Get our free The 2026 AI Playbook: 50 Ways AI is Making People Rich — Free for a limited time - going behind a paywall soon

Benchmark Gemini 3.5 Flash Score What It Measures
Terminal-Bench 2.1 76.2% Agentic terminal and coding tasks
MCP Atlas 83.6% Tool use via Model Context Protocol
CharXiv Reasoning 84.2% Multimodal chart and document understanding
GDPval-AA 1656 Elo Real-world economically valuable work

Caveat: These are Google’s own numbers from their I/O announcement. Third-party blogs showing exact head-to-head comparisons with competitor models on these specific benchmarks are largely unverified – treat them with skepticism until independent testing confirms them.

The Full Spec Sheet

Property Value
Model ID gemini-3.5-flash
Input context window 1,048,576 tokens (~1M)
Max output tokens 65,536 (~65k)
Supported inputs Text, image, video, audio, PDF
Knowledge cutoff January 2025
Computer Use support No
Batch API discount 50% off

The Price Hike: 3x More Expensive Than What It Replaced

This is where it gets uncomfortable for teams already using Google’s Flash models at scale. Gemini 3.5 Flash costs three times what its predecessor did – on both input and output tokens:

Model Input / 1M tokens Output / 1M tokens Status
Gemini 3.5 Flash $1.50 $9.00 GA, stable
Gemini 3 Flash Preview $0.50 $3.00 Preview (predecessor)
Gemini 3.1 Pro Preview $2.00 $12.00 Preview (≤200k-token prompts)

The silver lining: 3.5 Flash is 25% cheaper than Gemini 3.1 Pro ($1.50 vs $2.00 input, $9.00 vs $12.00 output). If you were already on Pro and can switch, you save money. If you were on the old Flash preview expecting Flash-tier prices, you’re paying 3x more.

Join 2,400+ readers getting weekly AI insights

Free strategies, tool reviews, and money-making playbooks - straight to your inbox.

No spam. Unsubscribe anytime.

Google’s own migration guide acknowledges the cost jump and suggests “highly cost-sensitive users” consider Gemini 3.1 Flash-Lite instead. That’s a candid admission that not everyone should upgrade automatically.

One more billing detail worth knowing: thinking tokens are billed at the output rate ($9.00/1M). The default thinking_level has been changed from “high” to “medium” – likely to control costs – but if your application explicitly set thinking_level: high, you’ll see higher bills.

What’s Changed in the API (That Will Break Your Code)

If you’re a developer calling the Gemini API, two breaking changes apply to all Gemini 3.x models:

  • temperature, top_p, and top_k are no longer recommended – Google’s documentation says these parameters are not recommended for any Gemini 3.x model
  • Default thinking_level changed from “high” to “medium” – if you relied on high-reasoning output by default, you need to set this explicitly

What About Gemini 3.5 Pro?

Google announced Gemini 3.5 Pro at the same I/O event, describing it as “already being used internally” with a rollout coming “next month” – i.e., June 2026. As of June 21, 2026, it has not reached general availability. It remains in limited preview for select Vertex AI enterprise customers only and is not available in the public Gemini app, Google AI Studio, or the general API.

Confirmed specs for 3.5 Pro: a 2M token context window, Deep Think reasoning mode, and expected pricing around $15/$60 per million tokens. If your workloads require very long document retrieval or frontier-level reasoning, it may be worth waiting before committing to Flash.

Who’s Actually Using This Already

Beyond Google Search and the Gemini app, Manus AI announced it integrated Gemini 3.5 as the backbone of its latest agent release, with its co-founder noting that capabilities like Wide Research “have become significantly more powerful” with the upgrade. That kind of real-world signal – from a production agent company – is more meaningful than benchmark tables alone.

What To Do About It

If you’re a developer on the Gemini API: Audit your code for temperature/top_p/top_k parameters before migrating. Set thinking_level explicitly if you need high-quality reasoning. Run a cost estimate on your average monthly token usage at the new $1.50/$9.00 rate before switching from the old Flash preview.

If you’re a business using Gemini via Vertex AI: The benchmark gains on coding and agentic tasks are real (per Google’s own data) and the speed improvement is substantial. If you’re running AI agents or code generation pipelines at scale, the 4x speed claim alone could justify the 3x cost increase in total workflow time.

If you’re a regular user: You’re already on it. Google Search’s AI Mode and the Gemini app have been running 3.5 Flash since May 19. If it feels better, that’s likely why.

BetOnAI Verdict

Gemini 3.5 Flash is a genuine step up in capability – faster than other frontier models, better on agentic benchmarks than the Pro-tier model it’s replacing in practice, and now running at planetary scale inside Google’s own products. Those are real data points.

But the 3x price increase is not a small detail. Anyone running Flash workloads at volume needs to rerun their cost models before treating this as a drop-in replacement. Google calling out its own Flash-Lite as the cost-sensitive alternative is an unusual level of transparency – take the hint seriously.

The bigger story is Gemini 3.5 Pro still being MIA. Google promised June and the month is running out. When it arrives with a 2M context window and Deep Think reasoning, the calculus on whether to stick with Flash or upgrade to Pro will shift significantly. For now, 3.5 Flash is the best Google model most people can actually access – and if the benchmark numbers hold up under independent testing, it’s a strong one.

Bottom line: Worth upgrading from 3.1 Pro (you’ll save 25% and likely get better speed). Worth careful cost analysis before upgrading from old Flash. Worth waiting to see 3.5 Pro before locking in long-context architecture decisions.


Sources:

Enjoyed this? There's more where that came from.

Get the AI Playbook - 50 ways AI is making people money in 2026.
Free for a limited time.

Join 2,400+ subscribers. No spam ever.

𝕏0 R0 in0 🔗0
Scroll to Top