← Back
Cloudflare
Cloudflare Workers AI adds frontier open-source models, launches Kimi K2.5 with 77% cost savings
Cloudflare WorkersCloudflare · releasefeaturemodelapiintegrationperformancepricing · blog.cloudflare.com ↗

Workers AI Enters the Large Model Market

Cloudflare has officially expanded Workers AI to support frontier-scale open-source models, beginning with Moonshot AI's Kimi K2.5. This marks a significant shift for the platform, which has historically focused on smaller models. Kimi K2.5 brings enterprise-grade capabilities including a full 256k context window, multi-turn tool calling, vision inputs, and structured outputs—features essential for building sophisticated agentic applications.

Proven Cost Efficiency in Production

Cloudflare tested Kimi K2.5 extensively across internal tools and production workloads. A security review agent processing 7 billion tokens daily across a single codebase demonstrates the value proposition: the same workload would cost approximately $2.4M annually on a mid-tier proprietary model but costs a fraction of that with Kimi, achieving a 77% cost reduction. The model powers Cloudflare's internal development tools, including the OpenCode environment and the public Bonk code review agent on GitHub.

Infrastructure Optimizations for Scale

Serving large models at scale requires more than running models out-of-the-box. Cloudflare has implemented:

  • Custom kernels optimized for Kimi K2.5 built on top of the proprietary Infire inference engine
  • Advanced parallelization techniques (data, tensor, and expert parallelization)
  • Disaggregated prefill strategies that separate prefill and generation stages for better throughput and GPU utilization
  • Prefix caching to reduce redundant processing in multi-turn agent conversations

Developers no longer need ML engineering expertise to achieve these optimizations—Cloudflare handles the complexity.

Platform-Wide Improvements for Agents

Beyond the Kimi launch, Cloudflare is releasing platform improvements specifically designed for agentic workloads. These enhancements complement existing agent infrastructure including Durable Objects for state persistence, Workflows for long-running tasks, the Agents SDK for building agentic applications, and Sandbox containers for secure execution.

What's Next

As AI adoption accelerates and personal agents become commonplace (like OpenClaw running 24/7), cost becomes the primary constraint rather than capability. Enterprises can now transition from proprietary models to frontier open-source alternatives via Workers AI, supporting everything from personal agents to organization-wide autonomous systems on a single, unified platform.