GLM-4.7-Flash Model Available on Workers AI
Cloudflare has introduced GLM-4.7-Flash, a fast multilingual text generation model optimized for dialogue and instruction-following tasks. The model features a 131,072 token context window, making it suitable for long-form content, complex reasoning, and processing extended documents. Key capabilities include:
- Multi-turn tool calling for building AI agents that invoke functions across multiple conversation turns
- Multilingual support for applications requiring content generation in multiple languages
- Fast inference optimized for low-latency responses in chatbots and virtual assistants
- Instruction following for code generation and structured task completion
The model is accessible via Workers AI binding (env.AI.run()), REST API endpoints (/run and /v1/chat/completions), AI Gateway, or the Vercel AI SDK through workers-ai-provider.
New TanStack AI Integration Package
Cloudflare released @cloudflare/tanstack-ai v0.1.1, a framework-agnostic package bringing Workers AI and AI Gateway support to TanStack AI. This package provides adapters for four configuration modes and supports:
- Chat completions with streaming, tool calling, structured output, and reasoning text
- Image generation using available text-to-image models
- Transcription for speech-to-text conversion
- Text-to-speech for audio generation
- Summarization for text processing
AI Gateway adapters also route requests from third-party providers (OpenAI, Anthropic, Gemini, Grok, OpenRouter) through Cloudflare for caching, rate limiting, and unified billing.
Enhanced workers-ai-provider with New Capabilities
The workers-ai-provider v3.1.1 for the Vercel AI SDK now supports three additional capabilities beyond chat and image generation:
- Transcription (
provider.transcription(model)) — Automatic handling of model-specific speech-to-text inputs - Text-to-speech (
provider.speech(model)) — Audio generation with voice and speed customization - Reranking (
provider.reranking(model)) — Document reranking for RAG pipelines and search optimization
A major reliability overhaul (v3.0.5) fixes streaming token-by-token delivery, tool call ID handling, conversation history preservation, and introduces error detection for premature stream termination. The createAutoRAG export has been renamed to createAISearch with backward compatibility.
Getting Started
Install the packages via npm:
npm install @cloudflare/tanstack-ai @tanstack/ai
npm install workers-ai-provider@latest ai
Developers can immediately begin building agentic applications that run entirely at the edge with GLM-4.7-Flash's multi-turn tool calling combined with TanStack AI or Vercel AI SDK integrations.