← Back
Google
Google launches Gemini 3.1 Flash-Lite; cuts inference costs 45% vs. predecessor
Gemini · releasemodelapiperformance · deepmind.google ↗

Overview

Google has introduced Gemini 3.1 Flash-Lite, a new lightweight model in the Gemini 3 series designed for cost-sensitive, high-volume applications. The model is now available in preview via the Gemini API in Google AI Studio and for enterprise deployments via Vertex AI.

Performance & Pricing

Gemini 3.1 Flash-Lite delivers significant performance improvements over its predecessor:

  • 2.5x faster time to first answer token
  • 45% faster output generation speed
  • Identical pricing: $0.25/1M input tokens and $1.50/1M output tokens
  • Strong benchmarks: Scores 1432 on the Arena.ai leaderboard, with 86.9% accuracy on GPQA Diamond and 76.8% on MMMU Pro—exceeding larger Gemini 2.5 models

Capabilities & Use Cases

The model comes with built-in thinking levels in both AI Studio and Vertex AI, allowing developers to control reasoning depth based on task complexity. This makes it suitable for:

  • High-volume translation and content moderation
  • Real-time UI and dashboard generation
  • Simulation creation
  • Multi-step instruction following

Developer Access

3.1 Flash-Lite is available today in preview. Developers can access it immediately via Google AI Studio, while enterprise customers can deploy it through Vertex AI. The model is optimized for responsive, real-time applications where latency and cost are critical concerns.