Gemini 2.5 Pro and Flash from Google go GA

Google has officially advanced its Gemini 2.5 model lineup as of June 17, 2025: both Gemini 2.5 Pro and Gemini 2.5 Flash are now stable and generally available, while Gemini 2.5 Flash‑Lite enters public preview. Pro offers top-tier reasoning, multimodal understanding, coding capabilities, and handles up to a 1 million-token context—ideal for complex, mission-critical tasks. Flash strikes a balance of speed, cost-efficiency, and robust reasoning, with simplified pricing at $0.30 per million input and $2.50 per million output tokens. Flash‑Lite is the fastest and most economical option, optimized for high-throughput tasks like translation and classification, with reasoning off by default and support for tool use. All three models share the million-token context window, adaptive “thinking” control, and grounding via Google Search, code execution, function-calling, and multimodality—all accessible via Gemini app, AI Studio, Vertex AI, and more.

Here are some key features across the Gemini 2.5 models:

Hybrid Reasoning Models: Designed to provide excellent performance while being efficient in terms of cost and speed.
“Thinking” Capabilities: Models can reason through their thoughts before responding, which leads to improved accuracy. Developers have control over the “thinking budget” to balance performance and cost.
Native Multimodality: Understands and processes inputs across various modalities including text, images, audio, and video.
Long Context Window: Features a 1 million-token context length, allowing them to comprehend vast datasets and handle complex problems from different information sources.
Tool Integration: Can connect to tools like Google Search and code execution.

Gemini 2.5 Pro:

Most Advanced Model: Excels at coding and highly complex tasks.
Enhanced Reasoning: State-of-the-art in key math and science benchmarks.
Advanced Coding: Capable of generating code for web development tasks and creating interactive simulations.

Gemini 2.5 Flash:

Fast Performance: Optimized for everyday tasks and large-scale processing.
Cost-Efficient: Balances price and performance.
Live API Native Audio: Offers high-quality, natural conversational audio outputs with enhanced voice quality and adaptability, including features like Proactive Audio and Affective Dialog.

Gemini 2.5 Flash-Lite:

Most Cost-Efficient and Fastest: Designed for high-volume, latency-sensitive tasks like translation and classification.
Higher Quality: Outperforms 2.0 Flash-Lite on coding, math, science, reasoning, and multimodal benchmarks.
Lower Latency: Offers lower latency compared to 2.0 Flash-Lite and 2.0 Flash for a broad range of prompts.

Gemini Technical Report