Having an effective Cloudflare scraper opens a whole new world of public data that you can extract with automated connections. Because basic scrapers fail to utilize dynamic fingerprinting methods and proxy rotation, they cannot access many protected platforms due to rate limits, IP blocks, and CAPTCHA challenges.
In this guide, we try to help upcoming businesses and freelancers to reliably fetch pages protected by Cloudflare using our beginner-friendly HTML API. Here, we will explain the common JavaScript rendering challenges, device fingerprinting issues, and how our Python SDK resolves them under the hood through the provided API parameters. Follow the steps to build a small, testable proof of concept before scaling.
Quick Answer (TL;DR)
Use our API to make your life easier with JavaScript rendering and stealth proxies. Add a short js_scenario wait, and set stealth_proxy=True plus country_code when needed. Test one URL, confirm the rendered HTML contains real content, then scale slowly while logging status codes and response length. Below is the full code used in this tutorial that utilizes our Web Scraping API:
#Importing our HTML API
from scrapingbee import ScrapingBeeClient
# Initializing our API client in the "client" variable
client = ScrapingBeeClient(api_key='YOUR_API_KEY')
url="https://www.yelp.com/search?find_desc=Restaurants&find_loc=Miami%2C+FL%2C+United+States"
# Start the function definition, the indented lines define its boundaries
def yelp_scraper():
# Rules for JavaScript Rendering
# Instructions for JavaScript Rendering that will scroll the loaded webpage
js_scenario = {
"instructions":[
{"wait": 2000},
]
}
response = client.get(
url,
params={
"js_scenario": js_scenario,
'stealth_proxy': 'True',
"country_code":'us'
}
)
print("STATUS CODE: ", response.status_code)
result=response.text
print(result)
yelp_scraper()
What Is a Cloudflare Scraper?
A Cloudflare web scraper is any scraper tool or setup designed to extract data from sites protected by Cloudflare’s anti‑bot measures.
Normal scrapers fail because Cloudflare’s defenses check browser behavior, fingerprinting, JavaScript runtime execution, cookies, and headers. Primitive web scrapers that do not behave like a real browser are often flagged and blocked.
A robust Cloudflare scraper mimics genuine browser requests (headers, TLS fingerprinting, JS execution) or uses an internal tool that handles the connection under the hood, like our HTML API. It removes the most complicated parts of data collection, allowing you to focus on data analysis and its implementation for your use cases. For more information about similar obstacles, check out our blog on Web Scraping Challenges.
Why Cloudflare Blocks Scrapers
Many websites use Cloudflare to reduce the amount of malicious bot traffic connecting to the platform. When you try to fetch such a site with a simple HTTP client, you may get blocked, challenged, or receive a JavaScript interstitial page instead of the real content.
If your scraper sends too many requests too quickly, uses suspicious IP addresses, or skips important browser behavior like JavaScript execution, Cloudflare steps in. It may show a challenge page, block the request, or trigger a CAPTCHA.
One of Cloudflare’s main defenses is fingerprinting. It checks technical details like browser headers, TLS settings, and how JavaScript runs. If something doesn’t match normal browser parameters, it will flag your request.
Fortunately, our API implements many techniques that know how to Bypass Bot Detection, like browser emulation and automatic management of proxy connections. By implementing our tools and applying available parameters, you will be able to bypass many anti-bot systems.
Common Mistakes Developers Make
When developers try to build a web scraper that acts like a real web browser, static scraping tools struggle to bypass Cloudflare and collect data from dynamically loaded pages.
One common mistake is using static headers or copying browser headers without adjusting for each session. Cloudflare easily detects these as fake.
Running your scraper connections through your main IP address is a bad idea, but free proxy servers is another ineffective solution that fails to bypass Cloudflare protection. These addresses are often overused, blacklisted, and are often managed by malicious third-parties trying to steal your data.
Your scraper will not extract data if it ignores CAPTCHA and JavaScript challenges. Tools like Python's requests or curl can’t process these obstacles, so they get stuck or return an incomplete HTML.
Attempting to scrape too quickly from a single IP address is another read, leading to blocks or rate limiting. The most effective scrapers use Rotating Proxies to avoid sending connection requests from the same IP address that can trigger Cloudflare challenges.
By understanding these issues, we can show how a simple web scraping script automatically manages HTTP headers, User agents, Cookie headers, and proxy connections to access the target website and extract its original content.
How to Scrape Cloudflare-Protected Sites With ScrapingBee
Let's create a basic web scraping script using our API to extract data from Yelp, a website that uses Cloudflare bot detection to limit web access for real user connections. Follow these steps or copy the full code and tweak it to match your use cases.
Step 1: Get Your ScrapingBee API Key
Before we start working on the script, make sure you have the following tools and libraries installed on your device:
Python. Version 3.6+, available via Microsoft Store, Linux package managers, or straight from the website: Python.org.
A ScrapingBee account. Register a new account to enjoy a free trial of 1,000 or check ScrapingBee Pricing for long-term deals.
ScrapingBee Python SDK. An external Python library that combines basic scraping tools with JavaScript Rendering, proxy management, and other customizable features to bypass Cloudflare bot detection.
Pandas (optional). A data management library to structure extracted data in a readable and understandable format after a successful connection.
After installing Python, you can set up its external libraries with one line using its package manager pip. Go to Command Prompt (or Terminal for Linux users) and enter the following line:
pip install scrapingbee pandas
Before we start working on the script, log in to your account to retrieve the API key from the dashboard:

Now you can create a designated folder for your web scraping project. In it, make a text file with a .py extension and open it with a text editor of your choice.
Note: We recommend using Visual Studio Code for beginners to write code with clear syntax highlighting and options for real-time error markings.
Step 2: Make Your First Request
First, start your script by importing the downloaded libraries to enable our Python SDK in your Python script:
#Importing our HTML API
from scrapingbee import ScrapingBeeClient
# pandas dataframes for better data formatting
import pandas as pd
Then, create a "client" variable which will connect your code to our HTML API. In the parentheses, paste in your API key copied from the dashboard.
client = ScrapingBeeClient(api_key='YOUR_API_KEY')
Before we start working on the function that contains our scraping logic, create a URL variable. You can encode the base URL with user input to target different parts of the website. In our example, we are targeting a Yelp page with the best restaurants in Miami:
url="https://www.yelp.com/search?find_desc=Restaurants&find_loc=Miami%2C+FL%2C+United+States"
Now we can begin defining the function. In it, we create a js_scenario variable that can handle specific instructions to interact with JavaScript elements on the web page. Let's make it wait for 2 seconds for the headless browser to load the page before extracting its content.
# Start the function definition, the indented lines define its boundaries
def yelp_scraper():
# Rules for JavaScript Rendering
js_scenario = {
"instructions": [
{"wait": 2000}
]
Let's define our GET API call by assigning its result to the "response" variable. Here we assign our "js_scenario" variable and additional parameters to bypass Cloudflare bot protection:
render_js=true (enabled by default) – Runs JavaScript on the page to solve Cloudflare’s challenge scripts; required to bypass most JS-based protections and avoid getting stuck on interstitial pages.
block_resources=false (enabled by default) – Loads all resources, including challenge scripts and tokens, needed when bypassing Cloudflare.
stealth_proxy=true – Adds full browser fingerprinting emulation (headers, TLS fingerprints, cookies, user-agent rotation, and JavaScript challenge handling). It behaves like a genuine Chrome session.
country_code=XX – Selects a proxy from a specific country, useful for avoiding region-specific Cloudflare rules or accessing geo-restricted content.
After adding parameters that are not enabled by default, our variable responsible for the GET API call should look like this:
response = client.get(
url,
params={
"js_scenario": js_scenario,
"stealth_proxy": 'True',
# GET API call will use US-based proxies
"country_code":'us'
}
)
If you encounter any issues or lack coding experience, check out our blog on Python Web Scraping. Now all we need to do is print out the result and HTTP status code (200 is ok) to know what problem to address if the connection fails. After that, we can finish the script by invoking the defined function:
print("STATUS CODE: ", response.status_code)
result=response.text
print(result)
yelp_scraper()
If your connection is successful, the extracted data from a Cloudflare-protected website, the result should look something like this:

Step 3: Handle JavaScript and Captchas
Most Cloudflare-protected sites rely on JavaScript challenges to verify that you’re a real visitor. When a scraper doesn’t run JavaScript or doesn't handle full browser fingerprinting emulation, Cloudflare blocks it before any real data loads. That’s why using our Headless Browser and applying "stealth_proxy" parameters through our Python SDK is so beneficial. It tells the API to use a full headless browser that executes the same scripts and uses the same headers as Chrome or Firefox. This lets Cloudflare finish its checks and deliver the real HTML content.
Advanced Tips for Cloudflare Scraping
To ensure consistent access to protected websites with your Cloudflare scraper, make sure to adjust the automated connection so it mimics a normal browser. Make sure to rotate user agents per session and use randomized HTTP headers that match modern browsers.
Identical requests with the same fingerprints are the main trigger of Cloudflare protections, especially if all connections originate from the same IP address. Add small human-like delays in js_scenario instructions and randomize request intervals.
Use Rotating Proxies and Geotargeting
Fortunately, with our API, you can just use the stealth proxy feature with country targeting when geolocation affects challenge behavior. For heavy scraping and access to protected sites, it will distribute traffic across accounts and IP pools rather than scaling a single session.
Without the stealth proxy parameter, you are far more likely to encounter HTTP errors, mostly 403/429/500 responses. For example, here is an output of our script without the "stealth_proxy" variable:

With just these three parameters, we can bypass website protection and extract its contents:
params={
"js_scenario": js_scenario,
'stealth_proxy': 'True',
"country_code":'us'
}
Start Scraping Cloudflare-Protected Websites Today
Scraping Cloudflare-protected can be a lot easier with the right tools at your disposal. Our API handles JavaScript Rendering for even the most stubborn websites, while stealth proxies, and randomized user-agents will make you forget CAPTCHA challenges, letting you focus on data.
Sign up today and test our services with a free trial of 1,000 credits – more than enough to test the convenience and comfort of our beginner-friendly Python SDK. Good luck with your scraping!
Frequently Asked Questions (FAQs)
Can I bypass Cloudflare with Python requests?
No. Plain requests cannot execute in-page JavaScript or reproduce full browser fingerprints. Our API incorporates JavaScript rendering and stealth proxy rotation to manage fingerprinting in the background while users can focus on the data in the Cloudflare-protected website.
Does ScrapingBee solve Cloudflare captchas?
Yes, ScrapingBee can bypass many Cloudflare CAPTCHA automatically when render_js=true is set. While not every challenge can be solved, our JavaScript rendering and proxy rotation allow most users to scrape Cloudflare-protected pages without triggering many detection systems.
How does ScrapingBee handle JavaScript on Cloudflare sites?
Our API uses JavaScript Rendering tools to execute scripts and wait for DOM elements. The API returns the post-rendered HTML and ensures a far more consistent connection to scrape websites protected by Cloudflare.
Is it legal to scrape Cloudflare-protected sites?
Yes, scraping a Cloudflare-protected site is legal, especially if you're accessing publicly available data. However, legality depends on factors like your location, the site’s terms of service, and what kind of data you're collecting.

Kevin worked in the web scraping industry for 10 years before co-founding ScrapingBee. He is also the author of the Java Web Scraping Handbook.
