← Back
Google
Google launches Gemini 3.1 Flash-Lite, a 2.5X faster budget model priced at $0.25 per million input tokens
Gemini · releasemodelfeatureapi · deepmind.google ↗

Introducing Gemini 3.1 Flash-Lite

Google has announced Gemini 3.1 Flash-Lite, a new lightweight model purpose-built for high-volume developer workloads at scale. Available today in preview via Google AI Studio and Vertex AI, this model prioritizes speed and cost-efficiency without sacrificing quality.

Performance and Pricing

The model is priced at just $0.25 per million input tokens and $1.50 per million output tokens, making it one of the most cost-effective options available. Performance improvements over its predecessor are substantial:

  • 2.5X faster time to first answer token compared to Gemini 2.5 Flash
  • 45% increase in output speed according to Artificial Analysis benchmarks
  • Elo score of 1432 on Arena.ai Leaderboard
  • Scores of 86.9% on GPQA Diamond and 76.8% on MMMU Pro, outperforming several larger models from prior generations

Adaptive Intelligence with Thinking Levels

A key differentiator is the inclusion of thinking levels in both AI Studio and Vertex AI. This feature gives developers the ability to control how much computational "thinking" the model applies to each task, allowing fine-tuned control over the speed-quality tradeoff for different workloads.

Use Cases and Capabilities

3.1 Flash-Lite is optimized for a wide range of applications:

  • High-volume translation and content moderation at scale
  • User interface and dashboard generation from wireframes and specifications
  • Simulations and dynamic content creation (e.g., real-time weather dashboards)
  • Instruction following and complex reasoning tasks where cost and latency are critical

The model demonstrates strong multimodal understanding and is capable of handling complex reasoning tasks while maintaining the cost-efficiency needed for production systems processing thousands of requests.