Google launches Gemini 3.1 Flash-Lite; cuts inference costs 45% vs. predecessor

Overview

Google has introduced Gemini 3.1 Flash-Lite, a new lightweight model in the Gemini 3 series designed for cost-sensitive, high-volume applications. The model is now available in preview via the Gemini API in Google AI Studio and for enterprise deployments via Vertex AI.

Performance & Pricing

Gemini 3.1 Flash-Lite delivers significant performance improvements over its predecessor:

2.5x faster time to first answer token
45% faster output generation speed
Identical pricing: $0.25/1M input tokens and $1.50/1M output tokens
Strong benchmarks: Scores 1432 on the Arena.ai leaderboard, with 86.9% accuracy on GPQA Diamond and 76.8% on MMMU Pro—exceeding larger Gemini 2.5 models

Capabilities & Use Cases

The model comes with built-in thinking levels in both AI Studio and Vertex AI, allowing developers to control reasoning depth based on task complexity. This makes it suitable for:

High-volume translation and content moderation
Real-time UI and dashboard generation
Simulation creation
Multi-step instruction following

Developer Access

3.1 Flash-Lite is available today in preview. Developers can access it immediately via Google AI Studio, while enterprise customers can deploy it through Vertex AI. The model is optimized for responsive, real-time applications where latency and cost are critical concerns.

Overview

Performance & Pricing

Capabilities & Use Cases

Developer Access

Products

Tags

Published

Source

Related News