← Back
Google
Google releases Gemini 3.1 Flash-Lite; 2.5X faster than Flash with 45% output speed boost
GeminiGoogle · releasemodelfeatureapi · deepmind.google ↗

Gemini 3.1 Flash-Lite Now Available

Google has released Gemini 3.1 Flash-Lite, a new model designed specifically for developers building at scale. The model is now available in preview access through Google AI Studio and Vertex AI for enterprises.

Performance and Cost Metrics

The 3.1 Flash-Lite model delivers substantial performance improvements over its predecessor:

  • 2.5X faster Time to First Answer Token
  • 45% increase in output speed compared to Gemini 2.5 Flash
  • Pricing: $0.25/1M input tokens and $1.50/1M output tokens
  • Arena.ai Leaderboard score: 1432 Elo, outperforming comparable tier models
  • Benchmark results: 86.9% on GPQA Diamond, 76.8% on MMMU Pro—surpassing larger models from prior Gemini generations

Key Capabilities and Use Cases

The model includes thinking levels as a standard feature in both AI Studio and Vertex AI, allowing developers to control inference depth based on task complexity. Recommended use cases include:

  • High-volume translation and content moderation
  • User interface and dashboard generation
  • Real-time simulations and complex instruction following
  • Cost-sensitive, high-frequency workloads

Developer Access

Developers can start using 3.1 Flash-Lite immediately through Google AI Studio for experimentation, while enterprises can access it via Vertex AI with additional management and security features.