Overview
Google has introduced Gemini 3.1 Flash-Lite, a new lightweight model in the Gemini 3 series designed for cost-sensitive, high-volume applications. The model is now available in preview via the Gemini API in Google AI Studio and for enterprise deployments via Vertex AI.
Performance & Pricing
Gemini 3.1 Flash-Lite delivers significant performance improvements over its predecessor:
- 2.5x faster time to first answer token
- 45% faster output generation speed
- Identical pricing: $0.25/1M input tokens and $1.50/1M output tokens
- Strong benchmarks: Scores 1432 on the Arena.ai leaderboard, with 86.9% accuracy on GPQA Diamond and 76.8% on MMMU Pro—exceeding larger Gemini 2.5 models
Capabilities & Use Cases
The model comes with built-in thinking levels in both AI Studio and Vertex AI, allowing developers to control reasoning depth based on task complexity. This makes it suitable for:
- High-volume translation and content moderation
- Real-time UI and dashboard generation
- Simulation creation
- Multi-step instruction following
Developer Access
3.1 Flash-Lite is available today in preview. Developers can access it immediately via Google AI Studio, while enterprise customers can deploy it through Vertex AI. The model is optimized for responsive, real-time applications where latency and cost are critical concerns.