Cloudflare adds Moonshot AI's Kimi K2.5 to Workers AI platform with 256k context window

Frontier Model Now Available on Workers AI

Cloudflare has made Kimi K2.5, an open-source frontier-scale model from Moonshot AI, available on its Workers AI inference platform. This marks the first major frontier model natively integrated into Cloudflare's AI stack, enabling developers to build sophisticated AI agents without leaving the platform.

Key Capabilities

Kimi K2.5 brings several advanced features to Workers AI:

256,000 token context window — Maintains full conversation history, tool definitions, and entire codebases across long-running agent sessions
Multi-turn tool calling — Agents can invoke external tools and APIs across multiple conversation turns
Vision inputs — Process images alongside text in the same request
Structured outputs — JSON mode and JSON Schema support for reliable downstream parsing
Function calling — Seamless integration of external APIs and tools into agent workflows

Performance Optimizations

Cloudflare has enhanced its inference infrastructure with two major improvements:

Prefix Caching: The platform now surfaces cached tokens as a usage metric and applies discounted pricing to cached tokens compared to fresh input tokens. This optimization significantly reduces Time to First Token (TTFT) and increases Tokens Per Second (TPS) throughput, particularly beneficial for agents that resend context across multiple turns.

Asynchronous Batch API: A redesigned pull-based system allows developers to submit batches of inference requests that process as capacity becomes available, typically completing within 5 minutes. This eliminates capacity errors and is ideal for non-real-time workflows like code scanning or research agents.

Multiple Access Methods

Developers can access Kimi K2.5 through several interfaces:

Workers AI binding (env.AI.run())
REST API endpoints (/run or /v1/chat/completions)
AI Gateway
OpenAI-compatible endpoint for drop-in compatibility

Complete documentation, pricing details, and prompt caching guidelines are available on the official model page.

Frontier Model Now Available on Workers AI

Key Capabilities

Performance Optimizations

Multiple Access Methods

Products

Tags

Published

Source

Related News