Cloudflare launches Browser Rendering crawl endpoint for website-wide scraping with a single API call

New Browser Rendering Crawl Endpoint

Cloudflare has released a new /crawl endpoint for its Browser Rendering service, allowing developers to crawl entire websites with a single API call. The endpoint automatically discovers pages, renders them in a headless browser, and returns content in multiple formats including HTML, Markdown, and structured JSON powered by Workers AI.

Key Features

The crawl endpoint includes several powerful capabilities:

Multiple output formats: Extract content as HTML, Markdown, or structured JSON
Automatic page discovery: Discovers URLs from sitemaps, page links, or both
Crawl scope controls: Configure depth, page limits, and URL path patterns to include or exclude specific sections
Incremental crawling: Use modifiedSince and maxAge parameters to skip unchanged pages, reducing time and cost on repeated crawls
Static mode: Set render: false to fetch static HTML without spinning up a browser for faster crawling of non-dynamic sites
Respectful crawling: Honors robots.txt directives and AI Crawl Control by default, identifying itself as a signed agent

How It Works

Crawl jobs run asynchronously. Developers submit a starting URL and receive a job ID to check back for results as pages are processed. The endpoint is available on both Workers Free and Paid plans.

Important Limitations

Note that the /crawl endpoint cannot bypass Cloudflare bot detection or CAPTCHAs and self-identifies as a bot, which may impact crawling of some protected sites.

For detailed setup instructions, refer to the crawl endpoint documentation.

New Browser Rendering Crawl Endpoint

Key Features

How It Works

Important Limitations

Products

Tags

Published

Source

Related News