Workers AI enters the frontier model era
Cloudflare has officially expanded Workers AI beyond smaller models to support frontier-scale open-source models. The platform now offers Moonshot AI's Kimi K2.5, a large language model featuring a full 256k context window with support for multi-turn tool calling, vision inputs, and structured outputs. This release marks a significant step in Cloudflare's vision to become the complete platform for building and deploying AI agents.
Production-proven cost efficiency
Cloudflare has tested Kimi K2.5 extensively in production across internal tools, including their OpenCode development environment and the Bonk code review agent. The results demonstrate substantial cost savings: a security review agent processing 7 billion tokens per day caught 15+ confirmed vulnerabilities in a single codebase while costing only a fraction of what proprietary models would charge—reducing inference costs by 77% compared to mid-tier alternatives.
Technical optimizations for large model serving
Supporting large models at scale required significant changes to Cloudflare's inference infrastructure. The team developed custom kernels for Kimi K2.5 built on their proprietary Infire inference engine, implementing advanced optimization techniques including:
- Disaggregated prefill: Separating prefill and generation stages across machines for better throughput
- Data, tensor, and expert parallelization strategies for improved GPU utilization
- Custom kernel optimization that improves performance beyond out-of-the-box model serving
Developers using Workers AI no longer need ML engineering expertise to optimize large model deployment—Cloudflare handles the infrastructure complexity.
Platform enhancements for agents
Beyond the model addition, Cloudflare is releasing new platform improvements for agentic workloads, including prefix caching and enhanced token visibility to help optimize inference performance and reduce redundant processing costs.
Why this matters
As AI adoption accelerates and organizations deploy multiple agents processing millions of tokens hourly, open-source models offering competitive reasoning capabilities at lower costs become critical. Workers AI positions Cloudflare to serve this shift, offering everything from serverless endpoints for individual agents to dedicated instances supporting enterprise-scale deployments.