New Website Crawling Capability
Cloudflare has launched a new /crawl endpoint in open beta for its Browser Rendering service, enabling developers to crawl entire websites with a single API call. Simply submit a starting URL and the service automatically discovers, renders, and returns content in multiple formats—perfect for building RAG pipelines, training models, or monitoring content across a site.
How It Works
The crawling process runs asynchronously. You submit a URL and receive a job ID, then check back for results as pages are processed. The endpoint respects standard web conventions, including robots.txt directives and crawl-delay settings, and operates as a signed agent that respects Cloudflare's AI Crawl Control.
Key Features
- Multiple output formats: Return crawled content as HTML, Markdown, or structured JSON (powered by Workers AI)
- Flexible crawl controls: Configure crawl depth, page limits, and wildcard patterns to include or exclude URL paths
- Automatic page discovery: Discovers URLs from sitemaps, page links, or both
- Incremental crawling: Use
modifiedSinceandmaxAgeparameters to skip unchanged pages, reducing costs on repeated crawls - Static mode: Set
render: falseto fetch static HTML without browser overhead for faster processing - Respectful crawling: Honors
robots.txtandcrawl-delaydirectives by default
Availability & Limitations
The /crawl endpoint is available on both Workers Free and Paid plans. Note that the endpoint cannot bypass Cloudflare bot detection or CAPTCHAs and self-identifies as a bot, which some websites may block.