← Back
Cloudflare
Cloudflare Workers AI adds frontier-scale Kimi K2.5 model for agent development
Cloudflare WorkersCloudflare · featuremodelapireleaseplatformperformance · blog.cloudflare.com ↗

New Frontier Models on Workers AI

Cloudflare is bringing large-scale inference to its Workers AI platform, starting with Moonshot AI's Kimi K2.5 model. This marks a significant expansion of Workers AI's capabilities, moving beyond small models to frontier-scale open-source LLMs optimized for agentic applications. Kimi K2.5 features a full 256k context window and supports multi-turn tool calling, vision inputs, and structured outputs—key requirements for building reliable autonomous agents.

Performance and Cost Advantages

Cloudflare has deployed Kimi K2.5 internally across multiple use cases with strong results. Most notably, a security code review agent processing 7 billion tokens daily identified 15+ confirmed issues, while reducing costs by 77% compared to mid-tier proprietary models—a difference of approximately $2.4M annually on that single use case alone. This cost-performance advantage is becoming critical as agent adoption scales across organizations.

Infrastructure Optimizations

To serve models like Kimi effectively, Cloudflare has implemented several backend optimizations:

  • Custom kernels built on their proprietary Infire inference engine to maximize GPU utilization
  • Disaggregated prefill separating prefill and generation stages across machines for improved throughput
  • Tensor and expert parallelization techniques requiring deep ML infrastructure expertise—now abstracted away from developers

These optimizations are automatically applied; developers only need to call the API without managing the underlying infrastructure complexity.

Platform Enhancements

The launch includes new platform features for agent workloads, including prefix caching and surfacing of cached token metrics to improve inference efficiency and reduce costs for long-running agentic tasks that reuse common contexts.

What Developers Need to Know

Kimi K2.5 is available now on Workers AI and can be integrated directly into existing Cloudflare Agents SDK projects. The unified platform enables the complete agent lifecycle—state persistence via Durable Objects, long-running tasks via Workflows, and secure execution via Sandbox—all with a frontier-scale reasoning model at the core.