Cloudflare launches GLM-4.7-Flash on Workers AI with multi-turn tool calling for edge agents

GLM-4.7-Flash Model Available on Workers AI

Cloudflare has introduced GLM-4.7-Flash, a fast multilingual text generation model optimized for dialogue and instruction-following tasks. The model features a 131,072 token context window, making it suitable for long-form content, complex reasoning, and processing extended documents. Key capabilities include:

Multi-turn tool calling for building AI agents that invoke functions across multiple conversation turns
Multilingual support for applications requiring content generation in multiple languages
Fast inference optimized for low-latency responses in chatbots and virtual assistants
Instruction following for code generation and structured task completion

The model is accessible via Workers AI binding (env.AI.run()), REST API endpoints (/run and /v1/chat/completions), AI Gateway, or the Vercel AI SDK through workers-ai-provider.

New TanStack AI Integration Package

Cloudflare released @cloudflare/tanstack-ai v0.1.1, a framework-agnostic package bringing Workers AI and AI Gateway support to TanStack AI. This package provides adapters for four configuration modes and supports:

Chat completions with streaming, tool calling, structured output, and reasoning text
Image generation using available text-to-image models
Transcription for speech-to-text conversion
Text-to-speech for audio generation
Summarization for text processing

AI Gateway adapters also route requests from third-party providers (OpenAI, Anthropic, Gemini, Grok, OpenRouter) through Cloudflare for caching, rate limiting, and unified billing.

Enhanced workers-ai-provider with New Capabilities

The workers-ai-provider v3.1.1 for the Vercel AI SDK now supports three additional capabilities beyond chat and image generation:

Transcription (provider.transcription(model)) — Automatic handling of model-specific speech-to-text inputs
Text-to-speech (provider.speech(model)) — Audio generation with voice and speed customization
Reranking (provider.reranking(model)) — Document reranking for RAG pipelines and search optimization

A major reliability overhaul (v3.0.5) fixes streaming token-by-token delivery, tool call ID handling, conversation history preservation, and introduces error detection for premature stream termination. The createAutoRAG export has been renamed to createAISearch with backward compatibility.

Getting Started

Install the packages via npm:

npm install @cloudflare/tanstack-ai @tanstack/ai
npm install workers-ai-provider@latest ai

Developers can immediately begin building agentic applications that run entirely at the edge with GLM-4.7-Flash's multi-turn tool calling combined with TanStack AI or Vercel AI SDK integrations.

GLM-4.7-Flash Model Available on Workers AI

New TanStack AI Integration Package

Enhanced workers-ai-provider with New Capabilities

Getting Started

Products

Tags

Published

Source

Related News