Cloudscraper Python is a popular package for scraping websites protected by Cloudflare without spinning up a full browser. It helps you bypass basic JavaScript challenges, handle cookies automatically, and get real HTML instead of those annoying block or "checking your browser" pages.
In this guide, we'll break down how to set Cloudscraper up the right way, what it actually does under the hood, and where its hard limits are. You'll also learn when Cloudscraper is totally fine to use, and when it's smarter to switch to heavier, more reliable tools for production-grade scraping.

Quick answer (TL;DR)
The Cloudscraper Python package helps you get past mid-level Cloudflare protections so you can fetch real HTML instead of block pages. It works best when a site still returns server-side HTML and you mainly need cookie handling, challenge delays, and a browser-like request fingerprint.
That said, it's not a real browser. It won't render JavaScript-heavy apps or execute client-side frameworks. If a site relies heavily on JS or uses newer Cloudflare protection layers, Cloudscraper may simply stop working. In those cases, browser-based tools or managed scraping APIs like ScrapingBee are usually the more reliable choice.
Below you'll find a minimal example that shows how to create a configured Cloudscraper session, fetch a page, extract book data, and save the results to a JSON file.
Full example: Scrape data and save results to JSON
import json
import random
import time
from urllib.parse import urljoin
import cloudscraper25
from bs4 import BeautifulSoup
def build_scraper():
"""
Create a Cloudscraper session with sane, browser-like defaults.
These settings are usually enough for basic Cloudflare JS challenges.
"""
scraper = cloudscraper25.create_scraper(
browser={
"browser": "chrome",
"platform": "windows",
"desktop": True,
},
# Node.js interpreter tends to be the most stable option
interpreter="nodejs",
# Cloudflare sometimes expects a short delay before the real request
delay=7,
)
# Minimal but realistic headers
scraper.headers.update(
{
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
}
)
return scraper
def fetch_html(scraper, url, tries=3, timeout=30):
"""
Fetch HTML with simple retry logic and exponential-ish backoff.
Designed to handle temporary Cloudflare blocks or rate limits.
"""
for attempt in range(1, tries + 1):
# Small jitter helps avoid looking too robotic
time.sleep(random.uniform(0.5, 1.5))
try:
response = scraper.get(url, timeout=timeout)
status = response.status_code
print(f"[attempt {attempt}] GET {url} -> {status}")
if status == 200:
return response.text
if status in (403, 429):
# Blocked or rate-limited, back off a bit before retrying
backoff = attempt * random.uniform(3, 6)
print(f"Blocked or rate limited. Sleeping {backoff:.1f}s")
time.sleep(backoff)
continue
except Exception as e:
print(f"Request failed: {e}")
time.sleep(attempt * 2)
raise RuntimeError("Failed to fetch page after multiple attempts")
def parse_rating(star_p):
"""
Extract rating word from class list like:
['star-rating', 'Three']
"""
if not star_p:
return "Unknown"
for cls in star_p.get("class", []):
if cls != "star-rating":
return cls
return "Unknown"
def extract_books(html, base_url):
"""
Parse book cards from the Books to Scrape homepage.
"""
soup = BeautifulSoup(html, "html.parser")
books = []
for pod in soup.select("article.product_pod"):
link = pod.select_one("h3 a")
title = link.get("title", "") if link else ""
href = link.get("href", "") if link else ""
book_url = urljoin(base_url, href)
price_el = pod.select_one("p.price_color")
price = price_el.get_text(strip=True) if price_el else ""
avail_el = pod.select_one("p.instock.availability")
availability = " ".join(avail_el.get_text(strip=True).split()) if avail_el else ""
rating_el = pod.select_one("p.star-rating")
rating = parse_rating(rating_el)
books.append(
{
"title": title,
"price": price,
"availability": availability,
"rating": rating,
"url": book_url,
}
)
return books
if __name__ == "__main__":
target_url = "https://books.toscrape.com/"
scraper = build_scraper()
html = fetch_html(scraper, target_url)
books = extract_books(html, target_url)
print(f"Extracted {len(books)} books")
# Persist results for later use or analysis
with open("books.json", "w", encoding="utf-8") as f:
json.dump(books, f, ensure_ascii=False, indent=2)
print("Saved results to books.json")
If this starts failing due to newer Cloudflare protections, that is usually a signal to switch approaches. For those cases, browser-based scraping or a managed API like ScrapingBee tends to be more stable long-term.
Installing and setting up cloudscraper in Python
Cloudscraper is a Python library built for one job: fetching pages from Cloudflare-protected sites using normal HTTP requests, but with extra logic to handle common Cloudflare "are you a browser?" challenges (including JavaScript-based checks). In practice, Cloudscraper Python setups are what you reach for when requests gets blocked and you just want the raw HTML so you can parse it yourself with something like BeautifulSoup or lxml. It doesn't extract data, it doesn't parse pages, and it doesn't "scrape for you" — it only helps you get a usable response instead of a block page.
For this guide we'll use cloudscraper25, a newer enhanced package that's actively maintained and generally the most stable option right now. It also supports multiple JavaScript interpreters, which matters when you hit tougher challenge pages.
Cloudscraper Python install using pip
The package name is cloudscraper25. If you want maximum reproducibility, pin a known-good version (you can always update later after testing):
python -m pip install "cloudscraper25==2.7.0"
We'll be using Python 3.10+ in this tutorial.
To solve Cloudflare's JavaScript challenges, the library needs a way to execute (or emulate) small pieces of JavaScript. cloudscraper25 supports multiple approaches, and these are the options you'll usually see mentioned:
- js2py (the easiest way to get started)
- native solver (pure Python logic where possible)
- Node.js (often the most reliable when sites get stricter)
- V8-based options (via Python bindings, depending on your environment)
- ChakraCore (a legacy option on some setups)
You don't need to choose one manually on day one. The default configuration works fine for many targets. If you later switch to interpreter="nodejs" or a V8-backed option for harder sites, just make sure the required runtime or bindings are installed on your machine.
If you're still wrapping your head around scraping basics in Python, this page covers common questions and pitfalls: Common questions about web scraping in Python.
Creating a virtual environment for scraping projects
Scraping dependencies change often. One site works with one version, another breaks unless you downgrade, and suddenly everything conflicts. Isolating each scraping project saves you from that pain.
We'll use uv because it's fast and modern. Create a new project:
uv init cloudscraper-project
cd cloudscraper-project
Install cloudscraper into the project:
uv add cloudscraper25
Installing BeautifulSoup for HTML parsing
Cloudscraper only fetches HTML, nothing more. Once you have the page content, you still need a parser to work with the DOM. BeautifulSoup does that job.
Install it with uv:
uv add beautifulsoup4
By default, BeautifulSoup uses Python's built-in html.parser, which is fine for most sites. You can switch to lxml later if you need better performance or more forgiving parsing, but don't overthink it early on.
If you're deciding between parsing libraries and frameworks, this comparison helps: Which is better Scrapy or BeautifulSoup?.
Creating a cloudscraper instance with custom settings
This is where Cloudscraper Python starts doing the real work. A plain, default scraper can fail almost instantly on Cloudflare-protected sites. A properly configured one behaves more like a real browser, waits when it should, and has a chance to solve challenges instead of immediately getting blocked.
The main function you'll work with is create_scraper(). Most of your success comes from how you configure this step, not from adding complex retry logic later.
Setting browser and platform in create_scraper()
Cloudflare fingerprints browsers pretty aggressively. It doesn't just look at the User-Agent string — it checks whether headers, platform signals, and request behavior all line up in a believable way.
You control most of this using the browser argument.
import cloudscraper25
scraper = cloudscraper25.create_scraper(
browser={
"browser": "chrome",
"platform": "ios",
"desktop": False
}
)
What this configuration does in practice:
- Generates a mobile Chrome User-Agent
- Aligns headers and platform hints to an iOS-style environment
- Avoids desktop-only signals that don't make sense for mobile devices
Mobile browser profiles often work better against more aggressive Cloudflare setups, simply because many sites are more permissive toward mobile traffic. Desktop profiles can still be faster and more stable on simpler targets. There's no universal "best" choice here — testing both is normal.
The key thing to remember is consistency. If you claim to be iOS but send headers or behaviors that only make sense for desktop Chrome, Cloudflare will notice. Mismatched signals are one of the fastest ways to get blocked.
Using interpreter='nodejs' for JavaScript challenges
Many Cloudflare protections rely on executing JavaScript to validate the client. The default Python-based interpreter works for simpler cases, but it can struggle with newer or more complex challenge logic.
Switching to Node.js is often more reliable:
scraper = cloudscraper25.create_scraper(
interpreter="nodejs"
)
This requires Node.js to be installed on the machine running your scraper. If Node isn't available, Cloudscraper will fail when it tries to solve a JavaScript challenge, usually with errors related to missing binaries or execution failures.
One important thing to keep in mind: even with Node.js, some heavy or very new Cloudflare challenges can still fail. At that point, the limitation is on the tool itself rather than your configuration, and there isn't always a clean workaround.
If you want deeper context on how Cloudflare antibot systems work, this article helps: How to bypass cloudflare antibot protection at scale.
Adding delay to mimic human behavior
Cloudflare often expects a short pause before certain challenges are solved. Responding instantly can look unnatural and increase the chance of getting stuck in challenge loops or blocked outright.
You can control this behavior using the delay parameter:
scraper = cloudscraper25.create_scraper(
delay=7
)
In practice, a delay somewhere between 5 and 10 seconds is usually safe. Very low values can cause Cloudflare to repeatedly re-issue challenges or flag the session as suspicious.
In more advanced scripts, this is commonly combined with small, random sleep intervals between requests. That helps break up obvious timing patterns, especially when you're scraping multiple pages in a row.
Cloudscraper Python usage example with 2captcha
Some sites add CAPTCHA challenges on top of Cloudflare's checks. When that happens, a CAPTCHA-solving service like 2captcha can be wired directly into Cloudscraper Python.
Here's a minimal example:
scraper = cloudscraper25.create_scraper(
captcha={
"provider": "2captcha",
"api_key": "YOUR_2CAPTCHA_API_KEY"
}
)
response = scraper.get("https://example.com")
print(response.text)
You only need this setup if the site actually presents CAPTCHA challenges. If a target doesn't trigger CAPTCHAs, you should remove the captcha argument entirely and keep the scraper configuration simpler.
Also worth stating clearly: 2captcha is a paid service. Never hardcode API keys into public repositories. Use environment variables, an .env file, or a secrets manager instead.
If your scraper loads the page but the data still looks empty, this issue is often related: Scraper doesn't see the data I see in the browser.
Cloudscraper's hard technical limits
cloudscraper25 is basically a requests.Session-style client with Cloudflare challenge solving on top. It can emulate a lot (browser profiles, JS execution via interpreters, some fingerprint stuff), but it's still not the same thing as running a real browser.
It's not a full browser runtime
Even with "stealth" features, you're not getting the whole browser package: real rendering, storage quirks, navigation timing, and all the little signals that come from an actual Chromium instance.
What that means:
- you can look pretty legit and still get flagged on stricter setups
- random header changes usually don't help and can make you less consistent
No real user behavior
A real browser generates behavior signals: scrolling, clicks, event timing, JS runtime behavior across a full page lifecycle. cloudscraper25 doesn't interact with the page like a user, so if the site expects interaction (or scores you based on it), you'll eventually hit a wall.
Why "just tweak it" stops working
Delays and sane browser profiles can fix simpler challenge flows. But once you're being judged on deeper fingerprint + IP reputation + behavior signals, you'll see the pattern: repeat 403s, looping challenges, or "works once, then dies".
Rule of thumb: if you've done the reasonable tuning and it's still stuck, switch to a real browser (Playwright/Selenium) or a managed scraping API instead of burning days on micro-tweaks.
Fetching and parsing HTML from Cloudflare-protected sites
Once your scraper is configured, the next step is actually fetching the page and confirming that you received real HTML instead of a block or challenge page. This section focuses on the basic request flow and the early sanity checks you should do before starting any data extraction.
Identify the Cloudflare protection you're facing
Before you touch headers, delays, or retries, figure out what kind of Cloudflare protection you're actually dealing with. This saves hours of pointless tuning.
Cloudflare doesn't block everything the same way. Different modes mean very different outcomes.
Legacy JS challenge / "checking your browser" challenge page
This is the classic challenge-page flow.
What it looks like:
- Temporary redirect or short wait before loading the page
- HTML contains Cloudflare challenge markers (often
__cf_chl_*, sometimes "Checking your browser...") - After a delay, you get real server-side HTML
Reality check:
- This is the zone where Cloudscraper has the best chance
- Works more often when your IP reputation is decent
- Node.js interpreter can help on stricter variants
JavaScript-heavy challenge pages
Still server-side HTML, but tougher JS logic.
What it looks like:
- Challenge HTML includes heavier inline JS
- Cookies get set after JS runs
- Page eventually redirects to the real content
Reality check:
- Cloudscraper may work
- Success depends on interpreter, delay, and IP quality
- Expect occasional breakage when Cloudflare changes challenge logic
Turnstile challenges
Turnstile is Cloudflare's CAPTCHA replacement, delivered via the same Challenge Platform.
What it looks like:
cf-turnstilereferences in HTML- Invisible or visible challenge
- Page may load but content is gated
Reality check:
cloudscraper25claims Turnstile support, but results vary by site- Often requires a CAPTCHA provider
- Passing Turnstile does not guarantee access if fingerprint or IP is still flagged
Hard WAF / managed blocks
This is the "nope" zone.
What it looks like:
- Instant
403,1020, or access denied page - No challenge loop, no "wait 5 seconds", no redirect path
Reality check:
- Error 1020 specifically means you're blocked by a Cloudflare firewall rule
- Headers and delays won't fix a hard rule block
- You'll need better IPs, a real browser, or a managed scraping service
Using scraper.get() to retrieve HTML content
The scraper.get() method behaves very similarly to requests.get(). The difference is that Cloudscraper handles Cloudflare challenges, cookies, and redirects for you behind the scenes.
response = scraper.get("https://example.com")
print(response.status_code)
print(response.text[:500])
Common status codes you will see:
200means the request worked and HTML was returned403usually means Cloudflare or another WAF blocked the request429means you are sending too many requests and should slow down
It's a good habit to log the URL and status code for every request. When something suddenly stops working, those logs make it much easier to see what changed and why.
Checking response status codes for 403 errors
A 403 response is rarely a Python bug. In most cases, it simply means Cloudflare decided your request doesn't look legitimate enough. When this happens, a few things are worth trying before you give up:
- Increase the delay between requests
- Adjust the browser or platform settings
- Switch between mobile and desktop profiles
If a site keeps returning 403 responses no matter how you tune the configuration, that's a strong signal that Cloudscraper Python is no longer effective for that target. At that point, using a browser-based scraper or a managed scraping API is often the only realistic option.
More background on Cloudflare bans here: How to bypass error 1005.
A practical 403 debugging checklist
When you hit 403s, don't start random tuning. First, confirm what you're actually getting back.
Save the raw response
Write response.text to debug.html so you can see if it's real content, a challenge page, or an access denied. Also log:
- status code
- final URL (
response.url) - a couple headers like
cf-ray/set-cookie
Challenge vs hard block
You might still win if:
- it's a "Checking your browser…" style page
- HTML contains
__cf_chl_*orcf-turnstile - cookies change across attempts
Stop wasting time if:
- you get
1020(firewall rule) or instant access denied - repeated 403 with the same HTML/cookies every time
429shows up (slow down hard)
Make only a couple sane tweaks
Try:
- mobile vs desktop browser profile
- more delay + jitter
- lower request rate
If it's still stuck after that, switch approach (better IPs, real browser, or a managed API).
def dump_debug(response, path="debug.html"):
print("status:", response.status_code)
print("final url:", response.url)
print("cf-ray:", response.headers.get("cf-ray"))
print("set-cookie:", response.headers.get("set-cookie"))
with open(path, "w", encoding="utf-8") as f:
f.write(response.text)
Parsing HTML with BeautifulSoup and html.parser
Once you have real HTML, parsing it is the easy part.
from bs4 import BeautifulSoup
html = response.text
soup = BeautifulSoup(html, "html.parser")
Before you start writing selectors, always confirm that the content you want actually exists in the returned HTML. If it's missing, the site may be loading that data with JavaScript after the initial page load.
A quick debugging trick is to print a slice of the response body:
print(response.text[:1000])
If the data isn't there, no amount of BeautifulSoup logic will fix it. At that point you either need to execute JavaScript with a real browser, or find and call the site's underlying API directly.
Scraping nowsecure.nl with Cloudscraper Python
nowsecure.nl is a small "can my scraper get through?" kind of test page that people use to sanity-check their bot-protection setup. If you make it through, you should see simple HTML content like "NOWSECURE" and "by nodriver". The nice part is that this page isn't a heavy JavaScript app. When you're allowed in, it returns server-side HTML. That means you can usually fetch it with scraper.get() and then parse it normally with BeautifulSoup.
Creating a scraper that looks like a real browser
This setup aims to be stable and predictable. It uses a realistic browser profile, Node.js for tougher JavaScript challenges, and a delay so you don't look like a speed-run bot.
import random
import time
import cloudscraper25
from bs4 import BeautifulSoup
def build_scraper() -> "cloudscraper25.CloudScraper":
"""
Create a configured CloudScraper session with a consistent fingerprint.
"""
scraper = cloudscraper25.create_scraper(
# Keep the fingerprint stable: don't mix platform + headers randomly.
browser={
"browser": "chrome",
"platform": "windows",
"desktop": True,
},
# Node.js is often the most reliable interpreter for JS-based checks.
# Make sure Node.js is installed, or challenge solving can fail.
interpreter="nodejs",
# Cloudflare sometimes expects a short wait before IUAM-style challenges are solved.
delay=7,
)
# Add a few realistic headers. Keep them consistent with the chosen platform.
scraper.headers.update(
{
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
}
)
return scraper
def fetch_html(scraper, url: str, timeout: int = 30, tries: int = 3) -> str:
"""
Fetch HTML with simple retry logic and backoff.
Returns response.text on success, raises RuntimeError on failure.
"""
last_error = None
for attempt in range(1, tries + 1):
# Small jitter reduces obvious timing patterns.
time.sleep(random.uniform(0.6, 1.6))
try:
response = scraper.get(url, timeout=timeout)
status = response.status_code
print(f"[attempt {attempt}/{tries}] GET {url} -> {status}")
if status == 200:
# Optional: if you want, you can sanity-check common block markers here.
return response.text
if status in (403, 429):
# 403: blocked. 429: rate-limited. Back off harder before retrying.
backoff = attempt * random.uniform(3.0, 6.0)
print(f"Blocked or rate limited (status {status}). Backing off {backoff:.1f}s.")
time.sleep(backoff)
continue
# Other statuses can happen; treat as retryable at first.
backoff = attempt * random.uniform(1.5, 3.0)
print(f"Unexpected status {status}. Retrying in {backoff:.1f}s.")
time.sleep(backoff)
except Exception as e:
last_error = e
backoff = attempt * random.uniform(2.0, 5.0)
print(f"Request failed: {e}. Retrying in {backoff:.1f}s.")
time.sleep(backoff)
raise RuntimeError(f"Failed to fetch HTML after {tries} tries. Last error: {last_error}")
def parse_nowsecure_title(html: str) -> str:
"""
Extract the visible title/heading from the nowsecure.nl page.
"""
soup = BeautifulSoup(html, "html.parser")
# The page typically shows "NOWSECURE" in a prominent heading.
# Grab the first <h2> as a simple check that we got real content.
h2 = soup.find("h2")
return h2.get_text(strip=True) if h2 else ""
if __name__ == "__main__":
url = "https://nowsecure.nl/"
scraper = build_scraper()
html = fetch_html(scraper, url)
# Quick sanity check during debugging.
print(html[:300])
title = parse_nowsecure_title(html)
if not title:
raise RuntimeError("Fetched HTML, but could not find expected content. You may still be blocked.")
print(f"Parsed title: {title}")
What success looks like
If your request makes it through, the parsed title should be something like NOWSECURE (you may also see NowSecure, depending on how you extract it). That's a good signal that you received real HTML and not a block or challenge page.
If you keep getting 403 responses instead, your requests are being blocked. At that point, try switching the browser profile, increasing the delay, and backing off more aggressively between attempts. If none of that helps, it's likely a sign that Cloudscraper Python isn't effective for this target anymore.
Extracting and saving data from target pages
Let's use books.toscrape.com as a demo target. It's a training site, but the workflow is exactly the same on real projects: fetch HTML, parse it, extract the fields you care about, then save the results so you can reuse them later.
This particular site doesn't require Cloudflare handling, so plain requests is enough. If you already have a cloudscraper25 session in your script, you can use that too — it won't hurt. Either way, the parsing and extraction logic stays the same.
Locating elements using class selectors
The basic loop looks like this:
- Open DevTools
- Inspect the thing you want (a book card)
- Identify selectors that are stable (not random IDs)
- Extract with BeautifulSoup
On Books to Scrape, the list sits inside ol.row, and each book card is an article.product_pod. Inside each card you'll find:
- title in
h3 a[title] - price in
p.price_color - rating in
p.star-rating(the second class name is the rating, likeThree) - availability in
p.instock.availability
Here's a clean extractor that pulls one page of books.
import json
from urllib.parse import urljoin
import requests
from bs4 import BeautifulSoup
def parse_rating(star_p) -> str:
"""
Ratings are encoded as a class like:
<p class="star-rating Three">
We return the word part (e.g. 'Three'). If missing, return 'Unknown'.
"""
if not star_p:
return "Unknown"
classes = star_p.get("class", [])
# Typically looks like: ["star-rating", "Three"]
for cls in classes:
if cls != "star-rating":
return cls
return "Unknown"
def scrape_books_page(url: str) -> list[dict]:
"""
Fetch a single listing page and extract book cards.
"""
response = requests.get(url, timeout=30)
print(f"GET {url} -> {response.status_code}")
response.raise_for_status()
soup = BeautifulSoup(response.text, "html.parser")
books = []
for pod in soup.select("article.product_pod"):
link = pod.select_one("h3 a")
# Title (prefer the title attribute, fall back to text)
title = (link.get("title") or link.get_text(strip=True)) if link else ""
# Convert relative URL to absolute
href = link.get("href", "") if link else ""
book_url = urljoin(url, href)
# Price
price_el = pod.select_one("p.price_color")
price = price_el.get_text(strip=True) if price_el else ""
# Availability (normalize whitespace)
avail_el = pod.select_one("p.instock.availability")
availability = " ".join(avail_el.get_text(strip=True).split()) if avail_el else ""
# Rating (One / Two / Three / Four / Five)
rating_el = pod.select_one("p.star-rating")
rating = parse_rating(rating_el)
books.append(
{
"title": title,
"price": price,
"availability": availability,
"rating": rating,
"url": book_url,
}
)
return books
if __name__ == "__main__":
start_url = "https://books.toscrape.com/"
data = scrape_books_page(start_url)
print(f"Extracted {len(data)} books")
print(data[0])
One thing to keep in mind: class names and HTML structure can change over time. When a scraper breaks, it's often not because the code is wrong, but because the site layout changed.
In many cases, the fix is as simple as updating a few selectors after re-inspecting the page in DevTools.
Saving extracted data to JSON files
Once you have your extracted data as a list of dictionaries, saving it to disk is straightforward.
import json
# Save results to a JSON file
with open("books.json", "w", encoding="utf-8") as f:
json.dump(data, f, ensure_ascii=False, indent=2)
print("Saved to books.json")
Reading the data back later is just as simple:
import json
with open("books.json", "r", encoding="utf-8") as f:
books = json.load(f)
print(f"Loaded {len(books)} books")
print(books[0]["title"], books[0]["price"])
This pattern works well for small to medium datasets and makes it easy to debug, inspect, or pass data between scripts.
If you want a quick reality check on whether Python is a solid choice for scraping work, this is worth a read: Is Python good for web scraping?.
Limitations and alternatives to Cloudscraper
Cloudscraper Python is useful, but it's not magic. It works well against some Cloudflare setups and completely fails against others. Knowing when to stop tweaking settings and switch tools can save a lot of time and frustration.
Why Cloudscraper fails on newer Cloudflare versions
Cloudflare continuously evolves its bot detection stack. That includes:
- more advanced browser fingerprinting
- stricter and more dynamic JavaScript challenges
- Turnstile-based checks
- tighter behavior, timing, and interaction validation
Open-source solvers like Cloudscraper tend to lag behind these changes. When a site upgrades its protection, a scraper that worked yesterday can suddenly stop working even though your code hasn't changed. A common symptom is the "403 loop": you send a request, get 403, tweak headers or delays, retry, and still get 403. When this keeps happening, it usually means Cloudscraper is no longer effective for that specific target.
At that point, chasing Cloudflare changes yourself has a real cost. You end up spending time maintaining fragile workarounds instead of building features or actually collecting data.
Using ScrapingBee or ZenRows as alternatives
When Cloudscraper Python stops working, managed scraping APIs are often the fastest way forward. They handle Cloudflare challenges, proxy rotation, and browser behavior for you, so you don't have to keep chasing protection updates yourself.
ScrapingBee is a common choice because it focuses on HTML scraping and Cloudflare handling without forcing you into full browser automation. You send a URL, and you get clean HTML back.
Here's a simple example that replaces a cloudscraper.get() call with ScrapingBee:
import requests
API_KEY = "YOUR_SCRAPINGBEE_API_KEY"
target_url = "https://example.com/"
params = {
"api_key": API_KEY,
"url": target_url,
# Enables Cloudflare-friendly proxy routing
"stealth_proxy": True,
}
response = requests.get(
"https://app.scrapingbee.com/api/v1/",
params=params,
timeout=60,
)
response.raise_for_status()
html = response.text
# Save HTML for parsing later
with open("page.html", "w", encoding="utf-8") as f:
f.write(html)
print("Fetched page via ScrapingBee")
ZenRows is another managed option in the same space. The core idea is similar: you provide a URL, they deal with Cloudflare and other anti-bot systems, and you receive usable HTML. The main differences tend to be pricing, configuration style, and which protections they handle best.
If you're scraping something business-critical, switching to a managed API is often cheaper than spending days tuning Cloudscraper settings that may break again next week.
When to switch to Playwright or Selenium
Sometimes even managed HTML APIs aren't enough. This usually happens when:
- the site is a heavy JavaScript application
- data loads only after user interactions
- there are multi-step flows or complex login sequences
- Cloudflare challenges depend on real, interactive browser behavior
In these cases, full browser automation with Playwright or Selenium is often the most reliable option. You're driving a real browser, so JavaScript execution, cookies, rendering, and timing behave the way Cloudflare expects.
A common workflow looks like this:
- Test the target locally using Playwright or Selenium
- Confirm that browser automation can consistently load the data
- Decide whether to scale it yourself or switch to a managed browser automation service
Browser automation is heavier and slower than pure HTTP scraping, but when a site is designed around client-side behavior, it's often the only approach that actually works.
If you want a broader comparison of Python scraping frameworks, this overview is useful: What is the best framework for web scraping with Python?.
Start scaling Cloudflare scraping with ScrapingBee
At this point, you have a working setup with Cloudscraper Python. You've seen how to configure it, fetch pages, parse HTML, and save real data. For many sites, that's enough to get started and understand how Cloudflare-protected scraping works. The main downside is maintenance. Cloudflare changes a lot, and when protections tighten up, Cloudscraper can suddenly fall into endless 403 loops even if your code didn’t change.
For production workloads, ScrapingBee removes most of that pain. It handles Cloudflare, proxies, and browser-like behavior for you, so you can spend your time on extraction instead of bypass logic. And you don’t need to rewrite your parser — ScrapingBee returns clean HTML that you can feed into the same BeautifulSoup code you already have.
If you are ready to scrape real targets at scale, the next step is simple. Sign up, point ScrapingBee at a Cloudflare-protected page, and plug the response into your existing parsing pipeline. It is usually the fastest way to go from a working script to something you can rely on long term.
Conclusion
The Cloudscraper Python package is a good way to learn how Cloudflare protection works and how to get past simpler setups. It's well suited for experiments, small projects, and sites that still return server-side HTML.
The main drawback is maintenance. As Cloudflare protections get stricter, keeping Cloudscraper working can take more effort than it's worth. For anything that needs to run reliably or at scale, switching to managed APIs or browser-based tools is usually the better long-term choice.
If you want to go deeper, these reads are worth your time:
- How to bypass cloudflare antibot protection at scale in 2025
- How to bypass error 1005 "access denied, you have been banned" when scraping
Frequently asked questions (FAQs)
What is Cloudscraper in Python used for?
Cloudscraper in Python is used to bypass older and mid-level Cloudflare protections so you can fetch real HTML instead of block pages. It handles JavaScript challenges, cookies, and basic browser fingerprinting, making it easier to scrape Cloudflare-protected sites without running a full browser.
Why does Cloudscraper return 403 even with interpreter="nodejs"?
A 403 usually means Cloudflare still does not trust your request. Even with Node.js, newer Cloudflare checks use stronger fingerprinting and behavior signals. When you see repeated 403 responses, it often means Cloudscraper is outdated for that site, not that your Python code is broken.
How do I fix "Cloudscraper cannot solve the challenge" errors?
First, increase delays and verify your browser and platform settings are consistent. Make sure Node.js is installed and accessible. If errors persist, the site likely upgraded its Cloudflare protection. At that point, further tweaking usually wastes time, and switching tools is the practical move.
When should I use ScrapingBee instead of Cloudscraper?
Use ScrapingBee when Cloudscraper hits endless 403s, breaks after Cloudflare updates, or requires constant maintenance. ScrapingBee handles Cloudflare for you and returns clean HTML, letting you reuse the same parsing code without fighting anti-bot changes.


