Cloudflare's Browser Rendering adds /crawl endpoint for automated website discovery in beta

New Website Crawling Capability

Cloudflare has launched a new /crawl endpoint in open beta for its Browser Rendering service, enabling developers to crawl entire websites with a single API call. Simply submit a starting URL and the service automatically discovers, renders, and returns content in multiple formats—perfect for building RAG pipelines, training models, or monitoring content across a site.

How It Works

The crawling process runs asynchronously. You submit a URL and receive a job ID, then check back for results as pages are processed. The endpoint respects standard web conventions, including robots.txt directives and crawl-delay settings, and operates as a signed agent that respects Cloudflare's AI Crawl Control.

Key Features

Multiple output formats: Return crawled content as HTML, Markdown, or structured JSON (powered by Workers AI)
Flexible crawl controls: Configure crawl depth, page limits, and wildcard patterns to include or exclude URL paths
Automatic page discovery: Discovers URLs from sitemaps, page links, or both
Incremental crawling: Use modifiedSince and maxAge parameters to skip unchanged pages, reducing costs on repeated crawls
Static mode: Set render: false to fetch static HTML without browser overhead for faster processing
Respectful crawling: Honors robots.txt and crawl-delay directives by default

Availability & Limitations

The /crawl endpoint is available on both Workers Free and Paid plans. Note that the endpoint cannot bypass Cloudflare bot detection or CAPTCHAs and self-identifies as a bot, which some websites may block.

New Website Crawling Capability

How It Works

Key Features

Availability & Limitations

Products

Tags

Published

Source

Related News