New Browser Rendering Crawl Endpoint
Cloudflare has released a new /crawl endpoint for its Browser Rendering service, allowing developers to crawl entire websites with a single API call. The endpoint automatically discovers pages, renders them in a headless browser, and returns content in multiple formats including HTML, Markdown, and structured JSON powered by Workers AI.
Key Features
The crawl endpoint includes several powerful capabilities:
- Multiple output formats: Extract content as HTML, Markdown, or structured JSON
- Automatic page discovery: Discovers URLs from sitemaps, page links, or both
- Crawl scope controls: Configure depth, page limits, and URL path patterns to include or exclude specific sections
- Incremental crawling: Use
modifiedSinceandmaxAgeparameters to skip unchanged pages, reducing time and cost on repeated crawls - Static mode: Set
render: falseto fetch static HTML without spinning up a browser for faster crawling of non-dynamic sites - Respectful crawling: Honors
robots.txtdirectives and AI Crawl Control by default, identifying itself as a signed agent
How It Works
Crawl jobs run asynchronously. Developers submit a starting URL and receive a job ID to check back for results as pages are processed. The endpoint is available on both Workers Free and Paid plans.
Important Limitations
Note that the /crawl endpoint cannot bypass Cloudflare bot detection or CAPTCHAs and self-identifies as a bot, which may impact crawling of some protected sites.
For detailed setup instructions, refer to the crawl endpoint documentation.