Cloudscraper Python guide: Scrape Cloudflare sites step by step

Q: How do I fix "Cloudscraper cannot solve the challenge" errors?

Start by increasing delays and ensuring browser and platform settings are consistent. Confirm that Node.js is installed and accessible. If the errors persist, the site likely upgraded its Cloudflare protection, and continuing to tweak settings is usually not worth the effort.

Q: When should I use ScrapingBee instead of Cloudscraper?

Use ScrapingBee when Cloudscraper runs into endless 403 errors, breaks after Cloudflare updates, or requires constant maintenance. ScrapingBee handles Cloudflare automatically and returns clean HTML, allowing you to reuse the same parsing code without fighting anti-bot changes.

Satyam Tripathi | 11 January 2026 | 24 min read

Table of contents

Cloudscraper Python is a popular package for scraping websites protected by Cloudflare without spinning up a full browser. It helps you bypass basic JavaScript challenges, handle cookies automatically, and get real HTML instead of those annoying block or "checking your browser" pages.

In this guide, we'll break down how to set Cloudscraper up the right way, what it actually does under the hood, and where its hard limits are. You'll also learn when Cloudscraper is totally fine to use, and when it's smarter to switch to heavier, more reliable tools for production-grade scraping.

Cloudscraper Python guide: Scrape Cloudflare sites step by step

Quick answer (TL;DR)

The Cloudscraper Python package helps you get past mid-level Cloudflare protections so you can fetch real HTML instead of block pages. It works best when a site still returns server-side HTML and you mainly need cookie handling, challenge delays, and a browser-like request fingerprint.

That said, it's not a real browser. It won't render JavaScript-heavy apps or execute client-side frameworks. If a site relies heavily on JS or uses newer Cloudflare protection layers, Cloudscraper may simply stop working. In those cases, browser-based tools or managed scraping APIs like ScrapingBee are usually the more reliable choice.

Below you'll find a minimal example that shows how to create a configured Cloudscraper session, fetch a page, extract book data, and save the results to a JSON file.

Full example: Scrape data and save results to JSON

import json
import random
import time
from urllib.parse import urljoin

import cloudscraper25
from bs4 import BeautifulSoup


def build_scraper():
    """
    Create a Cloudscraper session with sane, browser-like defaults.
    These settings are usually enough for basic Cloudflare JS challenges.
    """
    scraper = cloudscraper25.create_scraper(
        browser={
            "browser": "chrome",
            "platform": "windows",
            "desktop": True,
        },
        # Node.js interpreter tends to be the most stable option
        interpreter="nodejs",
        # Cloudflare sometimes expects a short delay before the real request
        delay=7,
    )

    # Minimal but realistic headers
    scraper.headers.update(
        {
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.9",
        }
    )

    return scraper


def fetch_html(scraper, url, tries=3, timeout=30):
    """
    Fetch HTML with simple retry logic and exponential-ish backoff.
    Designed to handle temporary Cloudflare blocks or rate limits.
    """
    for attempt in range(1, tries + 1):
        # Small jitter helps avoid looking too robotic
        time.sleep(random.uniform(0.5, 1.5))

        try:
            response = scraper.get(url, timeout=timeout)
            status = response.status_code
            print(f"[attempt {attempt}] GET {url} -> {status}")

            if status == 200:
                return response.text

            if status in (403, 429):
                # Blocked or rate-limited, back off a bit before retrying
                backoff = attempt * random.uniform(3, 6)
                print(f"Blocked or rate limited. Sleeping {backoff:.1f}s")
                time.sleep(backoff)
                continue

        except Exception as e:
            print(f"Request failed: {e}")
            time.sleep(attempt * 2)

    raise RuntimeError("Failed to fetch page after multiple attempts")


def parse_rating(star_p):
    """
    Extract rating word from class list like:
    ['star-rating', 'Three']
    """
    if not star_p:
        return "Unknown"

    for cls in star_p.get("class", []):
        if cls != "star-rating":
            return cls

    return "Unknown"


def extract_books(html, base_url):
    """
    Parse book cards from the Books to Scrape homepage.
    """
    soup = BeautifulSoup(html, "html.parser")
    books = []

    for pod in soup.select("article.product_pod"):
        link = pod.select_one("h3 a")

        title = link.get("title", "") if link else ""
        href = link.get("href", "") if link else ""
        book_url = urljoin(base_url, href)

        price_el = pod.select_one("p.price_color")
        price = price_el.get_text(strip=True) if price_el else ""

        avail_el = pod.select_one("p.instock.availability")
        availability = " ".join(avail_el.get_text(strip=True).split()) if avail_el else ""

        rating_el = pod.select_one("p.star-rating")
        rating = parse_rating(rating_el)

        books.append(
            {
                "title": title,
                "price": price,
                "availability": availability,
                "rating": rating,
                "url": book_url,
            }
        )

    return books


if __name__ == "__main__":
    target_url = "https://books.toscrape.com/"

    scraper = build_scraper()
    html = fetch_html(scraper, target_url)

    books = extract_books(html, target_url)
    print(f"Extracted {len(books)} books")

    # Persist results for later use or analysis
    with open("books.json", "w", encoding="utf-8") as f:
        json.dump(books, f, ensure_ascii=False, indent=2)

    print("Saved results to books.json")

If this starts failing due to newer Cloudflare protections, that is usually a signal to switch approaches. For those cases, browser-based scraping or a managed API like ScrapingBee tends to be more stable long-term.

Installing and setting up cloudscraper in Python

Cloudscraper is a Python library built for one job: fetching pages from Cloudflare-protected sites using normal HTTP requests, but with extra logic to handle common Cloudflare "are you a browser?" challenges (including JavaScript-based checks). In practice, Cloudscraper Python setups are what you reach for when requests gets blocked and you just want the raw HTML so you can parse it yourself with something like BeautifulSoup or lxml. It doesn't extract data, it doesn't parse pages, and it doesn't "scrape for you" — it only helps you get a usable response instead of a block page.

For this guide we'll use cloudscraper25, a newer enhanced package that's actively maintained and generally the most stable option right now. It also supports multiple JavaScript interpreters, which matters when you hit tougher challenge pages.

Cloudscraper Python install using pip

The package name is cloudscraper25. If you want maximum reproducibility, pin a known-good version (you can always update later after testing):

python -m pip install "cloudscraper25==2.7.0"

We'll be using Python 3.10+ in this tutorial.

To solve Cloudflare's JavaScript challenges, the library needs a way to execute (or emulate) small pieces of JavaScript. cloudscraper25 supports multiple approaches, and these are the options you'll usually see mentioned:

js2py (the easiest way to get started)
native solver (pure Python logic where possible)
Node.js (often the most reliable when sites get stricter)
V8-based options (via Python bindings, depending on your environment)
ChakraCore (a legacy option on some setups)

You don't need to choose one manually on day one. The default configuration works fine for many targets. If you later switch to interpreter="nodejs" or a V8-backed option for harder sites, just make sure the required runtime or bindings are installed on your machine.

If you're still wrapping your head around scraping basics in Python, this page covers common questions and pitfalls: Common questions about web scraping in Python.

Creating a virtual environment for scraping projects

Scraping dependencies change often. One site works with one version, another breaks unless you downgrade, and suddenly everything conflicts. Isolating each scraping project saves you from that pain.

We'll use uv because it's fast and modern. Create a new project:

uv init cloudscraper-project
cd cloudscraper-project

Install cloudscraper into the project:

uv add cloudscraper25

Installing BeautifulSoup for HTML parsing

Cloudscraper only fetches HTML, nothing more. Once you have the page content, you still need a parser to work with the DOM. BeautifulSoup does that job.

Install it with uv:

uv add beautifulsoup4

By default, BeautifulSoup uses Python's built-in html.parser, which is fine for most sites. You can switch to lxml later if you need better performance or more forgiving parsing, but don't overthink it early on.

If you're deciding between parsing libraries and frameworks, this comparison helps: Which is better Scrapy or BeautifulSoup?.

Creating a cloudscraper instance with custom settings

This is where Cloudscraper Python starts doing the real work. A plain, default scraper can fail almost instantly on Cloudflare-protected sites. A properly configured one behaves more like a real browser, waits when it should, and has a chance to solve challenges instead of immediately getting blocked.

The main function you'll work with is create_scraper(). Most of your success comes from how you configure this step, not from adding complex retry logic later.

Setting browser and platform in create_scraper()

Cloudflare fingerprints browsers pretty aggressively. It doesn't just look at the User-Agent string — it checks whether headers, platform signals, and request behavior all line up in a believable way.

You control most of this using the browser argument.

import cloudscraper25

scraper = cloudscraper25.create_scraper(
    browser={
        "browser": "chrome",
        "platform": "ios",
        "desktop": False
    }
)

What this configuration does in practice:

Generates a mobile Chrome User-Agent
Aligns headers and platform hints to an iOS-style environment
Avoids desktop-only signals that don't make sense for mobile devices

Mobile browser profiles often work better against more aggressive Cloudflare setups, simply because many sites are more permissive toward mobile traffic. Desktop profiles can still be faster and more stable on simpler targets. There's no universal "best" choice here — testing both is normal.

The key thing to remember is consistency. If you claim to be iOS but send headers or behaviors that only make sense for desktop Chrome, Cloudflare will notice. Mismatched signals are one of the fastest ways to get blocked.

Using interpreter='nodejs' for JavaScript challenges

Many Cloudflare protections rely on executing JavaScript to validate the client. The default Python-based interpreter works for simpler cases, but it can struggle with newer or more complex challenge logic.

Switching to Node.js is often more reliable:

scraper = cloudscraper25.create_scraper(
    interpreter="nodejs"
)

This requires Node.js to be installed on the machine running your scraper. If Node isn't available, Cloudscraper will fail when it tries to solve a JavaScript challenge, usually with errors related to missing binaries or execution failures.

One important thing to keep in mind: even with Node.js, some heavy or very new Cloudflare challenges can still fail. At that point, the limitation is on the tool itself rather than your configuration, and there isn't always a clean workaround.

If you want deeper context on how Cloudflare antibot systems work, this article helps: How to bypass cloudflare antibot protection at scale.

Adding delay to mimic human behavior

Cloudflare often expects a short pause before certain challenges are solved. Responding instantly can look unnatural and increase the chance of getting stuck in challenge loops or blocked outright.

You can control this behavior using the delay parameter:

scraper = cloudscraper25.create_scraper(
    delay=7
)

In practice, a delay somewhere between 5 and 10 seconds is usually safe. Very low values can cause Cloudflare to repeatedly re-issue challenges or flag the session as suspicious.

In more advanced scripts, this is commonly combined with small, random sleep intervals between requests. That helps break up obvious timing patterns, especially when you're scraping multiple pages in a row.

Cloudscraper Python usage example with 2captcha

Some sites add CAPTCHA challenges on top of Cloudflare's checks. When that happens, a CAPTCHA-solving service like 2captcha can be wired directly into Cloudscraper Python.

Here's a minimal example:

scraper = cloudscraper25.create_scraper(
    captcha={
        "provider": "2captcha",
        "api_key": "YOUR_2CAPTCHA_API_KEY"
    }
)

response = scraper.get("https://example.com")
print(response.text)

You only need this setup if the site actually presents CAPTCHA challenges. If a target doesn't trigger CAPTCHAs, you should remove the captcha argument entirely and keep the scraper configuration simpler.

Also worth stating clearly: 2captcha is a paid service. Never hardcode API keys into public repositories. Use environment variables, an .env file, or a secrets manager instead.

If your scraper loads the page but the data still looks empty, this issue is often related: Scraper doesn't see the data I see in the browser.

Cloudscraper's hard technical limits

cloudscraper25 is basically a requests.Session-style client with Cloudflare challenge solving on top. It can emulate a lot (browser profiles, JS execution via interpreters, some fingerprint stuff), but it's still not the same thing as running a real browser.

It's not a full browser runtime

Even with "stealth" features, you're not getting the whole browser package: real rendering, storage quirks, navigation timing, and all the little signals that come from an actual Chromium instance.

What that means:

you can look pretty legit and still get flagged on stricter setups
random header changes usually don't help and can make you less consistent

No real user behavior

A real browser generates behavior signals: scrolling, clicks, event timing, JS runtime behavior across a full page lifecycle. cloudscraper25 doesn't interact with the page like a user, so if the site expects interaction (or scores you based on it), you'll eventually hit a wall.

Why "just tweak it" stops working

Delays and sane browser profiles can fix simpler challenge flows. But once you're being judged on deeper fingerprint + IP reputation + behavior signals, you'll see the pattern: repeat 403s, looping challenges, or "works once, then dies".

Rule of thumb: if you've done the reasonable tuning and it's still stuck, switch to a real browser (Playwright/Selenium) or a managed scraping API instead of burning days on micro-tweaks.

Fetching and parsing HTML from Cloudflare-protected sites

Once your scraper is configured, the next step is actually fetching the page and confirming that you received real HTML instead of a block or challenge page. This section focuses on the basic request flow and the early sanity checks you should do before starting any data extraction.

Identify the Cloudflare protection you're facing

Before you touch headers, delays, or retries, figure out what kind of Cloudflare protection you're actually dealing with. This saves hours of pointless tuning.

Cloudflare doesn't block everything the same way. Different modes mean very different outcomes.

Legacy JS challenge / "checking your browser" challenge page

This is the classic challenge-page flow.

What it looks like:

Temporary redirect or short wait before loading the page
HTML contains Cloudflare challenge markers (often __cf_chl_*, sometimes "Checking your browser...")
After a delay, you get real server-side HTML

Reality check:

This is the zone where Cloudscraper has the best chance
Works more often when your IP reputation is decent
Node.js interpreter can help on stricter variants

JavaScript-heavy challenge pages

Still server-side HTML, but tougher JS logic.

What it looks like:

Challenge HTML includes heavier inline JS
Cookies get set after JS runs
Page eventually redirects to the real content

Reality check:

Cloudscraper may work
Success depends on interpreter, delay, and IP quality
Expect occasional breakage when Cloudflare changes challenge logic

Turnstile challenges

Turnstile is Cloudflare's CAPTCHA replacement, delivered via the same Challenge Platform.

What it looks like:

cf-turnstile references in HTML
Invisible or visible challenge
Page may load but content is gated

Reality check:

cloudscraper25 claims Turnstile support, but results vary by site
Often requires a CAPTCHA provider
Passing Turnstile does not guarantee access if fingerprint or IP is still flagged

Hard WAF / managed blocks

This is the "nope" zone.

What it looks like:

Instant 403, 1020, or access denied page
No challenge loop, no "wait 5 seconds", no redirect path

Reality check:

Error 1020 specifically means you're blocked by a Cloudflare firewall rule
Headers and delays won't fix a hard rule block
You'll need better IPs, a real browser, or a managed scraping service

Using scraper.get() to retrieve HTML content

The scraper.get() method behaves very similarly to requests.get(). The difference is that Cloudscraper handles Cloudflare challenges, cookies, and redirects for you behind the scenes.

response = scraper.get("https://example.com")
print(response.status_code)
print(response.text[:500])

Common status codes you will see:

200 means the request worked and HTML was returned
403 usually means Cloudflare or another WAF blocked the request
429 means you are sending too many requests and should slow down

It's a good habit to log the URL and status code for every request. When something suddenly stops working, those logs make it much easier to see what changed and why.

Checking response status codes for 403 errors

A 403 response is rarely a Python bug. In most cases, it simply means Cloudflare decided your request doesn't look legitimate enough. When this happens, a few things are worth trying before you give up:

Increase the delay between requests
Adjust the browser or platform settings
Switch between mobile and desktop profiles

If a site keeps returning 403 responses no matter how you tune the configuration, that's a strong signal that Cloudscraper Python is no longer effective for that target. At that point, using a browser-based scraper or a managed scraping API is often the only realistic option.

More background on Cloudflare bans here: How to bypass error 1005.

A practical 403 debugging checklist

When you hit 403s, don't start random tuning. First, confirm what you're actually getting back.

Save the raw response

Write response.text to debug.html so you can see if it's real content, a challenge page, or an access denied. Also log:

status code
final URL (response.url)
a couple headers like cf-ray / set-cookie

Challenge vs hard block

You might still win if:

it's a "Checking your browser…" style page
HTML contains __cf_chl_* or cf-turnstile
cookies change across attempts

Stop wasting time if:

you get 1020 (firewall rule) or instant access denied
repeated 403 with the same HTML/cookies every time
429 shows up (slow down hard)

Make only a couple sane tweaks

Try:

mobile vs desktop browser profile
more delay + jitter
lower request rate

If it's still stuck after that, switch approach (better IPs, real browser, or a managed API).

def dump_debug(response, path="debug.html"):
    print("status:", response.status_code)
    print("final url:", response.url)
    print("cf-ray:", response.headers.get("cf-ray"))
    print("set-cookie:", response.headers.get("set-cookie"))
    with open(path, "w", encoding="utf-8") as f:
        f.write(response.text)

Parsing HTML with BeautifulSoup and html.parser

Once you have real HTML, parsing it is the easy part.

from bs4 import BeautifulSoup

html = response.text
soup = BeautifulSoup(html, "html.parser")

Before you start writing selectors, always confirm that the content you want actually exists in the returned HTML. If it's missing, the site may be loading that data with JavaScript after the initial page load.

A quick debugging trick is to print a slice of the response body:

print(response.text[:1000])

If the data isn't there, no amount of BeautifulSoup logic will fix it. At that point you either need to execute JavaScript with a real browser, or find and call the site's underlying API directly.

Scraping nowsecure.nl with Cloudscraper Python

nowsecure.nl is a small "can my scraper get through?" kind of test page that people use to sanity-check their bot-protection setup. If you make it through, you should see simple HTML content like "NOWSECURE" and "by nodriver". The nice part is that this page isn't a heavy JavaScript app. When you're allowed in, it returns server-side HTML. That means you can usually fetch it with scraper.get() and then parse it normally with BeautifulSoup.

Creating a scraper that looks like a real browser

This setup aims to be stable and predictable. It uses a realistic browser profile, Node.js for tougher JavaScript challenges, and a delay so you don't look like a speed-run bot.

import random
import time

import cloudscraper25
from bs4 import BeautifulSoup


def build_scraper() -> "cloudscraper25.CloudScraper":
    """
    Create a configured CloudScraper session with a consistent fingerprint.
    """
    scraper = cloudscraper25.create_scraper(
        # Keep the fingerprint stable: don't mix platform + headers randomly.
        browser={
            "browser": "chrome",
            "platform": "windows",
            "desktop": True,
        },
        # Node.js is often the most reliable interpreter for JS-based checks.
        # Make sure Node.js is installed, or challenge solving can fail.
        interpreter="nodejs",
        # Cloudflare sometimes expects a short wait before IUAM-style challenges are solved.
        delay=7,
    )

    # Add a few realistic headers. Keep them consistent with the chosen platform.
    scraper.headers.update(
        {
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.9",
        }
    )

    return scraper


def fetch_html(scraper, url: str, timeout: int = 30, tries: int = 3) -> str:
    """
    Fetch HTML with simple retry logic and backoff.
    Returns response.text on success, raises RuntimeError on failure.
    """
    last_error = None

    for attempt in range(1, tries + 1):
        # Small jitter reduces obvious timing patterns.
        time.sleep(random.uniform(0.6, 1.6))

        try:
            response = scraper.get(url, timeout=timeout)
            status = response.status_code

            print(f"[attempt {attempt}/{tries}] GET {url} -> {status}")

            if status == 200:
                # Optional: if you want, you can sanity-check common block markers here.
                return response.text

            if status in (403, 429):
                # 403: blocked. 429: rate-limited. Back off harder before retrying.
                backoff = attempt * random.uniform(3.0, 6.0)
                print(f"Blocked or rate limited (status {status}). Backing off {backoff:.1f}s.")
                time.sleep(backoff)
                continue

            # Other statuses can happen; treat as retryable at first.
            backoff = attempt * random.uniform(1.5, 3.0)
            print(f"Unexpected status {status}. Retrying in {backoff:.1f}s.")
            time.sleep(backoff)

        except Exception as e:
            last_error = e
            backoff = attempt * random.uniform(2.0, 5.0)
            print(f"Request failed: {e}. Retrying in {backoff:.1f}s.")
            time.sleep(backoff)

    raise RuntimeError(f"Failed to fetch HTML after {tries} tries. Last error: {last_error}")


def parse_nowsecure_title(html: str) -> str:
    """
    Extract the visible title/heading from the nowsecure.nl page.
    """
    soup = BeautifulSoup(html, "html.parser")

    # The page typically shows "NOWSECURE" in a prominent heading.
    # Grab the first <h2> as a simple check that we got real content.
    h2 = soup.find("h2")
    return h2.get_text(strip=True) if h2 else ""


if __name__ == "__main__":
    url = "https://nowsecure.nl/"

    scraper = build_scraper()
    html = fetch_html(scraper, url)

    # Quick sanity check during debugging.
    print(html[:300])

    title = parse_nowsecure_title(html)

    if not title:
        raise RuntimeError("Fetched HTML, but could not find expected content. You may still be blocked.")

    print(f"Parsed title: {title}")

What success looks like

If your request makes it through, the parsed title should be something like NOWSECURE (you may also see NowSecure, depending on how you extract it). That's a good signal that you received real HTML and not a block or challenge page.

If you keep getting 403 responses instead, your requests are being blocked. At that point, try switching the browser profile, increasing the delay, and backing off more aggressively between attempts. If none of that helps, it's likely a sign that Cloudscraper Python isn't effective for this target anymore.

Extracting and saving data from target pages

Let's use books.toscrape.com as a demo target. It's a training site, but the workflow is exactly the same on real projects: fetch HTML, parse it, extract the fields you care about, then save the results so you can reuse them later.

This particular site doesn't require Cloudflare handling, so plain requests is enough. If you already have a cloudscraper25 session in your script, you can use that too — it won't hurt. Either way, the parsing and extraction logic stays the same.

Locating elements using class selectors

The basic loop looks like this:

Open DevTools
Inspect the thing you want (a book card)
Identify selectors that are stable (not random IDs)
Extract with BeautifulSoup

On Books to Scrape, the list sits inside ol.row, and each book card is an article.product_pod. Inside each card you'll find:

title in h3 a[title]
price in p.price_color
rating in p.star-rating (the second class name is the rating, like Three)
availability in p.instock.availability

Here's a clean extractor that pulls one page of books.

import json
from urllib.parse import urljoin

import requests
from bs4 import BeautifulSoup


def parse_rating(star_p) -> str:
    """
    Ratings are encoded as a class like:
    <p class="star-rating Three">
    We return the word part (e.g. 'Three'). If missing, return 'Unknown'.
    """
    if not star_p:
        return "Unknown"

    classes = star_p.get("class", [])
    # Typically looks like: ["star-rating", "Three"]
    for cls in classes:
        if cls != "star-rating":
            return cls

    return "Unknown"


def scrape_books_page(url: str) -> list[dict]:
    """
    Fetch a single listing page and extract book cards.
    """
    response = requests.get(url, timeout=30)
    print(f"GET {url} -> {response.status_code}")
    response.raise_for_status()

    soup = BeautifulSoup(response.text, "html.parser")
    books = []

    for pod in soup.select("article.product_pod"):
        link = pod.select_one("h3 a")

        # Title (prefer the title attribute, fall back to text)
        title = (link.get("title") or link.get_text(strip=True)) if link else ""

        # Convert relative URL to absolute
        href = link.get("href", "") if link else ""
        book_url = urljoin(url, href)

        # Price
        price_el = pod.select_one("p.price_color")
        price = price_el.get_text(strip=True) if price_el else ""

        # Availability (normalize whitespace)
        avail_el = pod.select_one("p.instock.availability")
        availability = " ".join(avail_el.get_text(strip=True).split()) if avail_el else ""

        # Rating (One / Two / Three / Four / Five)
        rating_el = pod.select_one("p.star-rating")
        rating = parse_rating(rating_el)

        books.append(
            {
                "title": title,
                "price": price,
                "availability": availability,
                "rating": rating,
                "url": book_url,
            }
        )

    return books


if __name__ == "__main__":
    start_url = "https://books.toscrape.com/"
    data = scrape_books_page(start_url)

    print(f"Extracted {len(data)} books")
    print(data[0])

One thing to keep in mind: class names and HTML structure can change over time. When a scraper breaks, it's often not because the code is wrong, but because the site layout changed.

In many cases, the fix is as simple as updating a few selectors after re-inspecting the page in DevTools.

Saving extracted data to JSON files

Once you have your extracted data as a list of dictionaries, saving it to disk is straightforward.

import json

# Save results to a JSON file
with open("books.json", "w", encoding="utf-8") as f:
    json.dump(data, f, ensure_ascii=False, indent=2)

print("Saved to books.json")

Reading the data back later is just as simple:

import json

with open("books.json", "r", encoding="utf-8") as f:
    books = json.load(f)

print(f"Loaded {len(books)} books")
print(books[0]["title"], books[0]["price"])

This pattern works well for small to medium datasets and makes it easy to debug, inspect, or pass data between scripts.

If you want a quick reality check on whether Python is a solid choice for scraping work, this is worth a read: Is Python good for web scraping?.

Limitations and alternatives to Cloudscraper

Cloudscraper Python is useful, but it's not magic. It works well against some Cloudflare setups and completely fails against others. Knowing when to stop tweaking settings and switch tools can save a lot of time and frustration.

Why Cloudscraper fails on newer Cloudflare versions

Cloudflare continuously evolves its bot detection stack. That includes:

more advanced browser fingerprinting
stricter and more dynamic JavaScript challenges
Turnstile-based checks
tighter behavior, timing, and interaction validation

Open-source solvers like Cloudscraper tend to lag behind these changes. When a site upgrades its protection, a scraper that worked yesterday can suddenly stop working even though your code hasn't changed. A common symptom is the "403 loop": you send a request, get 403, tweak headers or delays, retry, and still get 403. When this keeps happening, it usually means Cloudscraper is no longer effective for that specific target.

At that point, chasing Cloudflare changes yourself has a real cost. You end up spending time maintaining fragile workarounds instead of building features or actually collecting data.

Using ScrapingBee or ZenRows as alternatives

When Cloudscraper Python stops working, managed scraping APIs are often the fastest way forward. They handle Cloudflare challenges, proxy rotation, and browser behavior for you, so you don't have to keep chasing protection updates yourself.

ScrapingBee is a common choice because it focuses on HTML scraping and Cloudflare handling without forcing you into full browser automation. You send a URL, and you get clean HTML back.

Here's a simple example that replaces a cloudscraper.get() call with ScrapingBee:

import requests

API_KEY = "YOUR_SCRAPINGBEE_API_KEY"
target_url = "https://example.com/"

params = {
    "api_key": API_KEY,
    "url": target_url,
    # Enables Cloudflare-friendly proxy routing
    "stealth_proxy": True,
}

response = requests.get(
    "https://app.scrapingbee.com/api/v1/",
    params=params,
    timeout=60,
)

response.raise_for_status()
html = response.text

# Save HTML for parsing later
with open("page.html", "w", encoding="utf-8") as f:
    f.write(html)

print("Fetched page via ScrapingBee")

ZenRows is another managed option in the same space. The core idea is similar: you provide a URL, they deal with Cloudflare and other anti-bot systems, and you receive usable HTML. The main differences tend to be pricing, configuration style, and which protections they handle best.

If you're scraping something business-critical, switching to a managed API is often cheaper than spending days tuning Cloudscraper settings that may break again next week.

When to switch to Playwright or Selenium

Sometimes even managed HTML APIs aren't enough. This usually happens when:

the site is a heavy JavaScript application
data loads only after user interactions
there are multi-step flows or complex login sequences
Cloudflare challenges depend on real, interactive browser behavior

In these cases, full browser automation with Playwright or Selenium is often the most reliable option. You're driving a real browser, so JavaScript execution, cookies, rendering, and timing behave the way Cloudflare expects.

A common workflow looks like this:

Test the target locally using Playwright or Selenium
Confirm that browser automation can consistently load the data
Decide whether to scale it yourself or switch to a managed browser automation service

Browser automation is heavier and slower than pure HTTP scraping, but when a site is designed around client-side behavior, it's often the only approach that actually works.

If you want a broader comparison of Python scraping frameworks, this overview is useful: What is the best framework for web scraping with Python?.

Start scaling Cloudflare scraping with ScrapingBee

At this point, you have a working setup with Cloudscraper Python. You've seen how to configure it, fetch pages, parse HTML, and save real data. For many sites, that's enough to get started and understand how Cloudflare-protected scraping works. The main downside is maintenance. Cloudflare changes a lot, and when protections tighten up, Cloudscraper can suddenly fall into endless 403 loops even if your code didn’t change.

For production workloads, ScrapingBee removes most of that pain. It handles Cloudflare, proxies, and browser-like behavior for you, so you can spend your time on extraction instead of bypass logic. And you don’t need to rewrite your parser — ScrapingBee returns clean HTML that you can feed into the same BeautifulSoup code you already have.

If you are ready to scrape real targets at scale, the next step is simple. Sign up, point ScrapingBee at a Cloudflare-protected page, and plug the response into your existing parsing pipeline. It is usually the fastest way to go from a working script to something you can rely on long term.

Conclusion

The Cloudscraper Python package is a good way to learn how Cloudflare protection works and how to get past simpler setups. It's well suited for experiments, small projects, and sites that still return server-side HTML.

The main drawback is maintenance. As Cloudflare protections get stricter, keeping Cloudscraper working can take more effort than it's worth. For anything that needs to run reliably or at scale, switching to managed APIs or browser-based tools is usually the better long-term choice.

If you want to go deeper, these reads are worth your time:

Before you go, check out these related reads:

Frequently asked questions (FAQs)

What is Cloudscraper in Python used for?

Cloudscraper in Python is used to bypass older and mid-level Cloudflare protections so you can fetch real HTML instead of block pages. It handles JavaScript challenges, cookies, and basic browser fingerprinting, making it easier to scrape Cloudflare-protected sites without running a full browser.

Why does Cloudscraper return 403 even with interpreter="nodejs"?

A 403 usually means Cloudflare still does not trust your request. Even with Node.js, newer Cloudflare checks use stronger fingerprinting and behavior signals. When you see repeated 403 responses, it often means Cloudscraper is outdated for that site, not that your Python code is broken.

How do I fix "Cloudscraper cannot solve the challenge" errors?

First, increase delays and verify your browser and platform settings are consistent. Make sure Node.js is installed and accessible. If errors persist, the site likely upgraded its Cloudflare protection. At that point, further tweaking usually wastes time, and switching tools is the practical move.

When should I use ScrapingBee instead of Cloudscraper?

Use ScrapingBee when Cloudscraper hits endless 403s, breaks after Cloudflare updates, or requires constant maintenance. ScrapingBee handles Cloudflare for you and returns clean HTML, letting you reuse the same parsing code without fighting anti-bot changes.

Satyam Tripathi

Satyam is a senior technical writer who is passionate about web scraping, automation, and data engineering. He has delivered over 130 blog posts since 2021.

Cloudscraper Python guide: Scrape Cloudflare sites step by step

Quick answer (TL;DR)

Full example: Scrape data and save results to JSON

Installing and setting up cloudscraper in Python

Cloudscraper Python install using pip

Creating a virtual environment for scraping projects

Installing BeautifulSoup for HTML parsing

Creating a cloudscraper instance with custom settings

Setting browser and platform in create_scraper()

Using interpreter='nodejs' for JavaScript challenges

Adding delay to mimic human behavior

Cloudscraper Python usage example with 2captcha

Cloudscraper's hard technical limits

It's not a full browser runtime

No real user behavior

Why "just tweak it" stops working

Fetching and parsing HTML from Cloudflare-protected sites

Identify the Cloudflare protection you're facing

Legacy JS challenge / "checking your browser" challenge page

JavaScript-heavy challenge pages

Turnstile challenges

Hard WAF / managed blocks

Using scraper.get() to retrieve HTML content

Checking response status codes for 403 errors

A practical 403 debugging checklist

Save the raw response

Challenge vs hard block

Make only a couple sane tweaks

Parsing HTML with BeautifulSoup and html.parser

Scraping nowsecure.nl with Cloudscraper Python

Creating a scraper that looks like a real browser

What success looks like

Extracting and saving data from target pages

Locating elements using class selectors

Saving extracted data to JSON files

Limitations and alternatives to Cloudscraper

Why Cloudscraper fails on newer Cloudflare versions

Using ScrapingBee or ZenRows as alternatives

When to switch to Playwright or Selenium

Start scaling Cloudflare scraping with ScrapingBee

Conclusion

Frequently asked questions (FAQs)

What is Cloudscraper in Python used for?

Why does Cloudscraper return 403 even with interpreter="nodejs"?

How do I fix "Cloudscraper cannot solve the challenge" errors?

When should I use ScrapingBee instead of Cloudscraper?

You might also like:

BeautifulSoup tutorial: Scraping web pages with Python

How to scrape emails from a website with Python and ScrapingBee

API for dummies: Start building your first API today