Powering the agents: Workers AI now runs large models, starting with Kimi K2.5

Cloudflare Workers AI Now Runs Large Models, Starting with Kimi K2.5

Cloudflare has announced that its Workers AI platform now supports large models, starting with Moonshot AI's Kimi K2.5 model. This model has a full 256k context window and supports multi-turn tool calling, vision inputs, and structured outputs, making it suitable for a wide range of agentic tasks. With this update, developers can now run the entire agent lifecycle on a single, unified platform, eliminating the need for multiple infrastructure components.

Key Technical Details and Practical Implications

The Kimi K2.5 model has been integrated into Cloudflare's internal development tools and automated code review pipeline, demonstrating its efficiency and cost-effectiveness. In production, the model has proven to be a fast and efficient alternative to larger proprietary models without sacrificing quality. For example, an agent that processes over 7B tokens per day using Kimi has caught more than 15 confirmed issues in a single codebase, while saving 77% of the costs compared to running on a mid-tier proprietary model.

Large Model Inference Stack and Future Plans

Cloudflare has made changes to its inference stack to support large models like Kimi K2.5. The company plans to continue facilitating the shift to open-source models that offer frontier-level reasoning without the proprietary price tag. Workers AI will provide everything from serverless endpoints for personal agents to dedicated instances powering autonomous agents across entire organizations.