Google releases Gemini 3.1 Flash-Lite; 2.5X faster than Flash with 45% output speed boost

Gemini 3.1 Flash-Lite Now Available

Google has released Gemini 3.1 Flash-Lite, a new model designed specifically for developers building at scale. The model is now available in preview access through Google AI Studio and Vertex AI for enterprises.

Performance and Cost Metrics

The 3.1 Flash-Lite model delivers substantial performance improvements over its predecessor:

2.5X faster Time to First Answer Token
45% increase in output speed compared to Gemini 2.5 Flash
Pricing: $0.25/1M input tokens and $1.50/1M output tokens
Arena.ai Leaderboard score: 1432 Elo, outperforming comparable tier models
Benchmark results: 86.9% on GPQA Diamond, 76.8% on MMMU Pro—surpassing larger models from prior Gemini generations

Key Capabilities and Use Cases

The model includes thinking levels as a standard feature in both AI Studio and Vertex AI, allowing developers to control inference depth based on task complexity. Recommended use cases include:

High-volume translation and content moderation
User interface and dashboard generation
Real-time simulations and complex instruction following
Cost-sensitive, high-frequency workloads

Developer Access

Developers can start using 3.1 Flash-Lite immediately through Google AI Studio for experimentation, while enterprises can access it via Vertex AI with additional management and security features.

Gemini 3.1 Flash-Lite Now Available

Performance and Cost Metrics

Key Capabilities and Use Cases

Developer Access

Products

Tags

Published

Source

Related News