Google releases Gemini 3.1 Flash-Lite; 2.5X faster than Flash with 45% output speed boost
Gemini 3.1 Flash-Lite Now Available
Google has released Gemini 3.1 Flash-Lite, a new model designed specifically for developers building at scale. The model is now available in preview access through Google AI Studio and Vertex AI for enterprises.
Performance and Cost Metrics
The 3.1 Flash-Lite model delivers substantial performance improvements over its predecessor:
- 2.5X faster Time to First Answer Token
- 45% increase in output speed compared to Gemini 2.5 Flash
- Pricing: $0.25/1M input tokens and $1.50/1M output tokens
- Arena.ai Leaderboard score: 1432 Elo, outperforming comparable tier models
- Benchmark results: 86.9% on GPQA Diamond, 76.8% on MMMU Pro—surpassing larger models from prior Gemini generations
Key Capabilities and Use Cases
The model includes thinking levels as a standard feature in both AI Studio and Vertex AI, allowing developers to control inference depth based on task complexity. Recommended use cases include:
- High-volume translation and content moderation
- User interface and dashboard generation
- Real-time simulations and complex instruction following
- Cost-sensitive, high-frequency workloads
Developer Access
Developers can start using 3.1 Flash-Lite immediately through Google AI Studio for experimentation, while enterprises can access it via Vertex AI with additional management and security features.