Just launched - Fast Search API: organic SERP data in under 1 second Try it now

Scrapling: Adaptive Python web scraping library that handles website structure changes

11 February 2026 | 37 min read

Scrapling is blowing up right now with nearly 9k stars on GitHub, and for good reason: anyone who's done Python web scraping knows the pain of a site changing one tiny thing and breaking your whole setup. A div moves, an attribute disappears, the markup shuffles a bit — boom, your selectors die, your pipeline stalls, and you're debugging instead of shipping.

Most classic tools still fall into this trap. They work fine until the layout shifts or an anti-bot wall wakes up, and suddenly you're playing whack-a-mole with CSS paths, headless browser quirks, and Cloudflare mood swings. Scrapling tries to stop that mess. It's an adaptive web scraping library that keeps track of elements even when the structure changes, so your scrapers keep running instead of collapsing. Plus, it brings stealth fetching, strong performance, and an API that feels familiar if you've used BeautifulSoup, Selectolax, Selenium, or any of the usual suspects.

In this guide, we'll walk through how Scrapling's adaptive tracking works, the features that make it stand out, how fast it really is, where it fits in compared to other tools, and a few working examples so you can start hacking on it right away.

Scrapling: Adaptive Python web scraping library that handles website structure changes

TL;DR

Scrapling is a Python scraping library for devs who are tired of scrapers exploding every time a site sneezes. Key features:

  • Familiar selectors: CSS / XPath / find-style queries
  • Fast parsing: lightweight selector engine, good for batches
  • Adaptive element tracking: save an element fingerprint once → later, if the selector breaks, Scrapling can often relocate the closest match by similarity

Here's a tiny demo that shows the core idea (save → break → recover), plus a couple convenience helpers:

# Install Scrapling by running these two commands:
# pip install "scrapling[fetchers]"
# scrapling install

from scrapling import Selector
from scrapling.fetchers import Fetcher

URL = "https://quotes.toscrape.com/"


def main() -> None:
    # Turn on adaptive mode so auto_save/adaptive can use the fingerprint store.
    page = Fetcher.get(URL, selector_config={"adaptive": True})

    # 1) Normal scraping: grab data with familiar CSS selectors
    author_el = page.css_first("small.author", auto_save=True)  # <- saves fingerprint
    quote_el = page.css_first("span.text")

    author = (author_el.text or "").strip() if author_el else ""
    quote = (quote_el.text or "").strip() if quote_el else ""
    print("Run #1:", author, "|", quote[:60], "...")

    # 2) Convenience: jump to the quote "card" and find the author page link
    quote_box = author_el.find_ancestor(lambda a: a.has_class("quote")) if author_el else None
    about = quote_box.css_first('a[href^="/author/"]::attr(href)') if quote_box else None
    print("About link:", about or "N/A")

    # 3) Simulate a "site changed" scenario: selector breaks (class rename)
    html2 = page.body.decode(page.encoding or "utf-8", errors="replace").replace(
        'class="author"', 'class="writer"'
    )
    page2 = Selector(html2, adaptive=True, url=URL)

    # Plain selector fails...
    print("Run #2 plain hit:", bool(page2.css_first("small.author")))

    # ...but adaptive can often recover using the saved fingerprint
    recovered = page2.css_first("small.author", adaptive=True)
    print("Run #2 adaptive hit:", bool(recovered), "| author:", (recovered.text or "").strip() if recovered else "N/A")


if __name__ == "__main__":
    main()

If you remember one thing: Scrapling lets you write normal selectors, then gives you an adaptive fallback when the page layout shifts. It also ships with fetchers (fast HTTP + stealth options), sessions (cookies/state), and handy DOM navigation helpers so you don't end up reinventing half a scraping framework.

The web scraping maintenance problem

Before getting into what Scrapling brings to the table, it's worth calling out the real headache here: Python web scraping breaks mostly because the web won't sit still. Sites get redesigned, classes vanish, elements shuffle around, and suddenly the selector that worked yesterday is giving you nothing today.

Why traditional scrapers break

Most Python scrapers depend on CSS selectors or XPath. That works right up until the site ships a small layout tweak. A div shifts, a class gets renamed, some dynamic content loads differently, and your whole pipeline stalls. Now you're digging through HTML, trying to figure out why your queue stopped moving.

The fix is always the same loop: inspect the page, patch the selectors, test, redeploy, and pray the frontend doesn't change again tomorrow. It's repetitive, it's slow, and it pulls you away from actually building your product. Scrapers shouldn't collapse because a site shuffled its markup, but that's exactly where the usual tools leave you.

The current tooling landscape

If you've been doing Python web scraping for a while, you've probably cycled through the usual stack. Each tool covers part of the job, but none of them help when the site's structure shifts under your feet.

  • BeautifulSoup
    • Simple, friendly, great for quick parsing
    • Falls behind on performance with large documents
    • No stealth or anti-bot handling
    • Struggles with JavaScript-heavy pages
  • Scrapy
    • Solid framework for structured, long-running crawlers
    • Requires more setup and a deeper learning curve
    • Overkill for small scripts or prototypes
  • Selenium / Playwright
    • Full browser automation, so they render whatever the page does
    • More verbose for basic extraction
    • Heavier on memory and compute
    • Good choice when you need full JS execution, but not exactly lightweight

And the shared limitation across all of them is pretty straightforward: none react when the website changes; at least, most stacks don't provide an automatic relocation feature out of the box. Once the HTML shifts, you're back to updating selectors by hand. Scrapling's main goal is to reduce that maintenance loop without forcing you to learn a whole new scraping mindset.

Enter adaptive scraping

This is where Scrapling steps away from the usual Python web scraping playbook and actually tackles the "website changed again" problem head-on. Instead of treating selectors as brittle strings, it can store a lightweight fingerprint of an element (its attributes/text + surrounding context) and later use similarity matching to find the closest match if the original selector stops working.

The idea stays simple: your scraper should still find the right thing even if the surrounding markup gets moved around. Under the hood, Scrapling runs adaptive scraping in two phases. Let's cover those without diving too deep into all the dirty details.

Save phase

When you select an element with auto_save=True (and adaptive is enabled), Scrapling records that element's unique properties and stores them in its configured storage (SQLite DB by default). The stored fingerprint is keyed by:

  • domain (taken from the page URL; or default if you didn't provide one, or adaptive_domain if you set it), and
  • an identifier (for css/xpath methods this defaults to the selector string, unless you pass identifier= yourself).

The "unique properties" include:

  • the element's tag name, text, attributes (names+values), siblings (tag names only), and path (tag names only), plus
  • the parent's tag name, attributes, and text.

The save is not additive forever; saving again for the same domain+identifier overwrites the previous fingerprint.

Match phase

On a later run (after the site changes), you re-run the selector and, if it no longer matches, you call the same selector with adaptive=True.

Scrapling loads the stored fingerprint for that domain + identifier, then compares page elements against it and assigns a similarity score (which is fuzzy, even details like attribute order can be considered). The element(s) with the highest similarity score are returned, letting the scraper "recover" from minor structural/layout changes without you rewriting the selector.

The payoff is straightforward: way fewer scrapers dying after tiny layout tweaks, and way less time spent diffing HTML just to figure out why a job stopped. The README nails the sentiment with a line every scraper dev has felt: "Stop fighting anti-bot systems. Stop rewriting selectors after every website update."

We'll see examples demonstrating this behavior below.

What is Scrapling

Overview

Scrapling is an open-source Python web scraping library (BSD-3-Clause) built by people who were clearly done babysitting broken selectors. The whole idea is "scraping shouldn't be this painful," and honestly, that's the experience you get once you start using it.

It's lightweight, modern, and focused on the problems that actually slow scraping work down: fragile selectors, anti-bot roadblocks, and the constant maintenance loop every scraper eventually falls into. If you've used BeautifulSoup or Scrapy, the API feels instantly familiar, but Scrapling adds adaptive parsing, several fetcher types, and stealth capabilities without turning into a giant framework or a browser-automation monster.

Disclaimer: Scrapling is still a pretty young project and a bit rough around the edges. While testing for this post, I ran into a couple small bugs, and some parts of the docs/features can drift as the library evolves (so a few details may be outdated depending on the version you install). That said, the core idea is genuinely solid, and the project feels promising if you want a modern scraping toolkit with adaptive selectors.

Core architecture: three fetcher types

Scrapling ships with three fetchers, each aimed at a different level of difficulty. They all share the same API, so swapping them out is usually a one-line change.

  • Fetcher — fast HTTP requests
    • When to use: pages that don't require JavaScript and/or advanced stealth tricks
    • What it offers: high-performance HTTP via curl_cffi, TLS fingerprinting, HTTP/3, custom headers, proxy support
    • Stealth level: moderate. Looks closer to a real browser than a basic requests script
    • Speed: fastest option since nothing browser-related is running
  • StealthyFetcher — stronger anti-bot resistance
    • When to use: sites that throw blocks at simple HTTP clients or use lighter anti-bot filters
    • What it offers: a Playwright-based, browser fetcher with extra stealth hardening (anti-detection patches + more realistic browser behavior)
    • Stealth level: high. Built to look less like automation, uses Patchright as an engine
    • Speed: medium. Slower than HTTP Fetcher, usually faster than "full custom automation scripts" (but still in the browser-cost tier)
  • DynamicFetcher — full browser execution
    • When to use: JavaScript-heavy pages or anything that needs real rendering and interaction
    • What it offers: Playwright-powered browser control and full DOM after JS execution
    • Stealth level: depends on your Playwright configuration
    • Speed: slowest. You're running an actual browser

The general workflow is simple: start with Fetcher for speed, switch to StealthyFetcher if the site starts blocking you, and use DynamicFetcher only when proper JavaScript execution is unavoidable.

Why it's different

Scrapling isn't just "another parser." It's built around the stuff that makes real-world scraping painful, and it tries to reduce that pain instead of piling on more tools you have to babysit.

  • Adaptive element tracking as its core idea
    Scrapling's similarity-based matching lets a selector survive small layout shifts by relocating the element when the original path fails. It's not magic, but it cuts down how often you're rewriting selectors after every redesign.
  • Reliable behavior, not a weekend experiment
    The project has solid test coverage and an active community behind it. The adaptive parsing and fetcher system are stable parts of the library, not experimental features bolted on as an afterthought.
  • Faster than the older generation of tools
    Scrapling's parsing layer and fetchers are built on modern components, which makes them noticeably quicker than many classic HTML parsing stacks. If you're scraping at scale or running bigger pipelines, the speed bump actually matters.

Quick demo

Here's a small example to show what Scrapling feels like in day-to-day Python. Nothing fancy, just fetching a page and pulling out a few fields.

from scrapling.fetchers import Fetcher

url = "https://quotes.toscrape.com/"

# Fetch the HTML using the lightweight HTTP fetcher
page = Fetcher.get(url)

quotes = []

# Iterate over each quote block on the page
for block in page.css(".quote"):
    # Extract text and author using familiar CSS selectors
    text = block.css_first(".text::text")
    author = block.css_first(".author::text")
    quotes.append({"text": text, "author": author})

print(quotes)

Save this as main.py, then install Scrapling and set up its fetchers:

pip install "scrapling[fetchers]"
scrapling install

Now run it:

python main.py

You'll get output along these lines:

[2026-02-09 15:39:49] INFO: Fetched (200) <GET https://quotes.toscrape.com/>
[{'text': '"The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking."', 'author': 'Albert Einstein'}, {'text': '"It is our choices, Harry, that show what we truly are, far more than our abilities."', 'author': 'J.K. Rowling'}]

Key features deep dive

Scrapling keeps the API friendly, but under the hood it comes with tools that handle the parts of web scraping that usually turn into headaches: speed, session handling, anti-bot bypass quirks, JavaScript rendering, and keeping state across requests.

Advanced websites fetching with session support

Scrapling doesn't lock you into a single fetcher. Instead, you pick the tool that matches how stubborn the target site is.

Fast and stealthy HTTP requests

The basic Fetcher looks simple, but it's built on top of curl_cffi, so you're getting more than a plain HTTP client:

  • browser-like TLS fingerprints
  • realistic default headers
  • HTTP/3 support
  • full proxy and timeout control
  • low-overhead, high-speed requests

For static pages or anything that doesn't rely on JavaScript, this is usually the fastest and cleanest option.

Dynamic loading with full browser automation

When the page depends on JavaScript, Scrapling switches gears with DynamicFetcher, which uses Playwright under the hood. With it you can:

  • run bundled Chromium
  • run a local Chrome installation
  • connect to remote browsers via CDP
  • wait for selectors, network states, or custom conditions
  • block images, fonts, or other unnecessary resources

The output still lands in a Scrapling Response, so you can parse it the same way you'd parse an HTTP fetch. The fetcher choice doesn't force you into a completely different API.

Anti-bot bypass with stealthy browser fetcher

StealthyFetcher is the option you reach for when a site blocks plain HTTP clients or requires real browser behavior. It's a Playwright-based fetcher with extra stealth hardening, so you get a more "real browser session" feel without writing and maintaining a full Playwright script for every target.

It includes:

  • real browser execution via Playwright (you parse the resulting DOM with Scrapling's API)
  • stealth hardening (anti-detection patches + more realistic browser signals/behavior)
  • better resilience against common bot checks compared to a bare HTTP client
  • optional support for some challenge/interstitial flows (varies by site and changes over time)

If a site blocks your scraper before content loads (bot checks, interstitials, strict browser verification), StealthyFetcher is often the next step after Fetcher, but it's not a guaranteed unlock button.

Persistent session management

Each fetcher has a matching session class:

  • FetcherSession
  • StealthySession
  • DynamicSession

Sessions keep cookies, login state, and browser context around between requests. If the site requires authentication, stores preferences in cookies, or relies on multi-step navigation, using a session keeps everything consistent.

Full async support

Scrapling supports async fetching and sessions, so you can run many requests concurrently with asyncio:

  • AsyncFetcher (HTTP)
  • StealthyFetcher.async_fetch() / DynamicFetcher.async_fetch() (browser fetchers)
  • async sessions: FetcherSession, StealthySession, DynamicSession via async with

Example: StealthyFetcher handling Cloudflare bypass

Here's a small example showing how StealthyFetcher achieves Cloudflare bypass without complex setup. This comes from the official usage guide, and the whole point is that you don't need to juggle headers, delays, or custom cookies; the stealth web scraping engine handles the rough parts.

from scrapling.fetchers import StealthyFetcher

# Page protected by Cloudflare
url = "https://nopecha.com/demo"

# Fetch the page using the stealth browser engine
page = StealthyFetcher.fetch(url)

# Pull a simple text element to confirm the page loaded normally
h1 = page.css_first("h3::text")
print("H3 text:", h1)
# => CAPTCHA Demo

# Extract some links to verify the DOM is fully accessible
links = page.css("#padded_content a")
print("Links found:", len(links))
# => 11

This is the "one-shot" version: good when you just need to load a page and parse it.

As usual, I have to throw in the boring-but-important disclaimer: this isn't a guaranteed "Cloudflare off" switch. What works today might fail tomorrow, and results can vary a lot depending on the site, the specific challenge, your IP/proxy quality, and whatever new anti-bot tricks they roll out. Also: scrape like an adult. Respect the site's ToS and local laws. If a site offers an API, use it. Keep your request rate reasonable and don't be the guy DDOSing a website "by accident."

If you want persistent state (cookies, auth, multi-step navigation), sessions are the cleaner path. The session keeps the stealth browser open until you're done with it.

from scrapling.fetchers import StealthySession

# Keep the stealth browser alive across multiple requests
with StealthySession(headless=True, solve_cloudflare=True) as session:
    # Fetch a Cloudflare-protected demo page

    # Disclaimer on google_search: if enabled, the request/session simulates
    # arriving from Google Search (referrer context). Leave it disabled by default;
    # only enable it when you explicitly need to replicate search-traffic behavior
    # and you're permitted to do so.
    page = session.fetch("https://nopecha.com/demo/cloudflare", google_search=False)

    # Now parse as usual — session preserves cookies and context
    links = page.css("#padded_content a")
    print("Links found:", len(links))

So, the StealthyFetcher:

  • Runs a browser-based stealth fetch and tries to look more like a normal user session than a bare HTTP client (fingerprinting/cookies/session behavior).
  • It's good for sites that block simple clients or do lightweight bot checks, where full custom Playwright scripting feels like overkill.
  • Optional Cloudflare/Turnstile stuff can help with some challenge flows, but it's not universal and it changes over time.

However, keep in mind that:

  • It won't guarantee access. Some sites will still block you, rate-limit you, or require extra steps.
  • There are tradeoffs: it costs more CPU/RAM/time than plain HTTP fetching, because you're doing real browser work.
  • Reality check: anti-bot is an arms race, so expect occasional breakage and plan monitoring/fallbacks.
  • Legal/ToS: use it only where you're allowed to scrape.

Adaptive web scraping and AI integration

This is the second big area where Scrapling separates itself from the usual Python web scraping tools. It's not just about loading pages, it's about keeping your extraction stable when the layout changes and making that workflow play well with AI when you need it.

Smart element tracking

Scrapling's adaptive mode is built around "smart element tracking." Instead of assuming your selector will always work, it stores a lightweight fingerprint of the element and uses similarity scoring to relocate it if the HTML shifts. So you're not relying on something brittle like .some-class > div:nth-child(3) staying the same forever; Scrapling uses structure and context to keep things alive through minor redesigns.

Example: Adaptive element detection

Alright, let's actually prove this adaptive stuff works instead of just talking about it. Below is a tiny script with two hard-coded HTML versions (note that it's perfectly fine to feed raw HTML directly to the parser):

  • HTML_V1 is the "old" page where the selector works (#p1 exists). We grab the element once with auto_save=True, which stores its fingerprint.
  • HTML_V2 is the "new" page where the site has "changed" (the id is gone and wrappers/attributes moved). The plain selector fails... but with adaptive=True, Scrapling can relocate the same element by similarity and still pull the right data.
from scrapling import Selector

HTML_V1 = """
<div class="products">
  <article class="product" id="p1">
    <div class="product-info">
      <h3>Product 1</h3>
      <p class="description">Description 1</p>
    </div>
  </article>
</div>
"""

# Slightly changed: id is gone, wrappers changed, attributes moved
HTML_V2 = """
<div class="new-container">
  <div class="product-wrapper">
    <section class="products">
      <article class="product new-class" data-id="p1">
        <div class="product-info">
          <h3>Product 1</h3>
          <p class="new-description">Description 1</p>
        </div>
      </article>
    </section>
  </div>
</div>
"""

DOMAIN = "demo.local"  # important: same url => same adaptive namespace

# Run 1: selector exists, and we auto-save its fingerprint
page1 = Selector(HTML_V1, adaptive=True, url=DOMAIN)
el1 = page1.css_first("#p1", auto_save=True)
print("V1 hit:", bool(el1), "| h3:", el1.css_first("h3::text") if el1 else None)

# Run 2: selector fails normally, but adaptive=True relocates it
page2 = Selector(HTML_V2, adaptive=True, url=DOMAIN)

plain = page2.css_first("#p1")
print("V2 plain hit:", bool(plain))

adapted = page2.css_first("#p1", adaptive=True)
print("V2 adaptive hit:", bool(adapted), "| h3:", adapted.css_first("h3::text") if adapted else None)

If everything's working, you'll see:

V1 hit: True | h3: Product 1
V2 plain hit: False
V2 adaptive hit: True | h3: Product 1

Note: Adaptive mode saves element fingerprints to local storage (SQLite by default). Don't commit that DB to git, and avoid using auto_save on pages that might include personally identifiable information (emails, phone numbers, names+addresses, account IDs, session/auth tokens).

Flexible selection options

You still get the familiar ways of querying a page, and none of them are locked behind special syntax:

  • CSS selectors
  • XPath
  • filter-based queries
  • text search
  • regex search

If you've worked with BeautifulSoup, Scrapy, or Parsel, the experience feels natural, just more adaptable.

Finding similar elements

Once you've located a reliable "anchor" element, Scrapling can automatically find others that look similar. This helps on pages where repeated items aren't perfectly uniform or when you want to scale up from one match to a full list without writing extra logic.

MCP server for AI-assisted scraping

Scrapling also includes an MCP server designed for AI tooling. The idea is simple: let Scrapling extract the meaningful parts of the page first, then feed that smaller, cleaner result to an LLM. This avoids sending full HTML documents to the model, cuts down token usage, and makes the AI layer faster and more predictable.

Example: Advanced parsing and navigation

This is a small "kitchen sink" demo adapted from the project examples. The point is to show how Scrapling lets you mix selection styles (CSS, XPath, find-style queries), use text search, and then lean on its relationship/similarity helpers. It also includes a mini adaptive demo with two runs:

  • Run #1 fetches a real page, selects an element, and saves its fingerprint (auto_save=True).
  • Run #2 simulates a "site changed" scenario by tweaking the HTML (we rename a class so the original selector breaks). Then we show how adaptive=True can still relocate the element using the saved fingerprint.

This script was tested with Scrapling v0.3.14. Scrapling moves fast, so names/defaults can shift between releases. Treat this as a workflow demo you can learn from, not sacred production code.

from scrapling import Selector
from scrapling.fetchers import Fetcher

URL = "https://quotes.toscrape.com/"


def response_html(page: Selector) -> str:
    """Get decoded HTML out of a Scrapling Response/Selector."""
    body = getattr(page, "body", b"")
    encoding = getattr(page, "encoding", None) or "utf-8"
    if isinstance(body, (bytes, bytearray)):
        return bytes(body).decode(encoding, errors="replace")
    return str(body)


def main() -> None:
    # Enable adaptive parsing for THIS response.
    # Without adaptive=True, `auto_save` won't persist fingerprints
    # and `adaptive=True` lookups won't have anything to match against.
    page: Selector = Fetcher.get(URL, selector_config={"adaptive": True})

    # --- Multiple selection methods ---
    # Same page, three ways to query it. Pick whatever reads best for you.
    quotes_css: Sequence[Selector] = page.css(".quote")
    quotes_xpath: Sequence[Selector] = page.xpath('//div[@class="quote"]')
    quotes_find: Sequence[Selector] = page.find_all("div", class_="quote")

    print("CSS .quote:", len(quotes_css))
    print("XPath .quote:", len(quotes_xpath))
    print("find_all:", len(quotes_find))

    # --- Text search (defensive) ---
    # When markup is messy or classes are unstable, text search can be a nice fallback.
    found: Selector | None = page.find_by_text("life", first_match=True)
    print("find_by_text('life') found:", found is not None)

    snippet = (found.text or "").strip().replace("\n", " ")[:80] if found else ""
    print("Text match snippet:", snippet if snippet else "N/A")

    # --- Adaptive selector workflow (Run #1: save fingerprint) ---
    # We'll intentionally break this selector in "Run #2" by renaming class="author" → class="writer".
    selector = "small.author"

    # `auto_save=True` stores a fingerprint of the matched element (in local storage, SQLite by default),
    # so later we can say: "hey, if this selector stops working, try to relocate the same element".
    author_el: Selector | None = page.css_first(selector, auto_save=True)
    if author_el is None:
        raise RuntimeError("No author elements found; the page structure may have changed.")

    author_text = (author_el.text or "").strip()
    print("\nRun #1 author hit:", bool(author_el), "| author:", author_text)

    # --- Similarity + relationships ---
    # `find_similar()` is a quick way to say:
    # "this element looks like one item in a list — give me the other items that look like it."
    similar: Sequence[Selector] = author_el.find_similar()

    # Heads-up: `below_elements` means "descendants inside this node", not "visually below on the page".
    # `small.author` has basically no child elements (it's mostly text),
    # so we climb up to a bigger container that actually contains other nodes.
    #
    # On quotes.toscrape, the structure is roughly:
    # div.quote > span > small.author
    # We go up twice to land on div.quote's inner span container (or similar),
    # which does have real descendants.
    author_parent = author_el.parent.parent
    below: Sequence[Selector] = author_parent.below_elements

    print("Similar elements:", len(similar))
    print("Elements below saved author's parent:", len(below))

    # --- Find the quote container ("give me the whole card") ---
    # This is a common pattern:
    # 1) find a reliable sub-element (author),
    # 2) jump to the container (quote card),
    # 3) parse everything you need from inside that container.
    quote = author_el.find_ancestor(lambda a: a.has_class("quote"))

    # Now we can query within the quote block without re-searching the whole page.
    about_a = quote.css_first('a[href^="/author/"]') if quote else None

    # It's even possible to generate a full selector to a specific element.
    # Super handy for debugging / quick prototyping / "what selector would target this thing?" moments.
    # (Still: generated selectors can be brittle if the site changes a lot — use with common sense.)
    if about_a:
        print("CSS:", about_a.generate_css_selector)
        print("XPath:", about_a.generate_xpath_selector)

    print("\nSample similar elements (first 3):")
    for i, elem in enumerate(similar[:3], start=1):
        txt = (elem.text or "").strip()
        print(f"  {i}.", txt if txt else "(no text)")

    # --- Run #2: simulate a layout change that breaks the selector ---
    # In real life, "Run #2" is tomorrow/next deploy when the website changed.
    # For a self-contained demo, we just mutate the HTML to force the selector to fail.
    html_v1 = response_html(page)
    html_v2 = html_v1.replace('class="author"', 'class="writer"')

    # Important bit: same URL/domain namespace → Scrapling looks up the same saved fingerprint store.
    page2 = Selector(html_v2, adaptive=True, url=URL)

    # Plain selector: should fail because we renamed class="author".
    plain: Selector | None = page2.css_first(selector)
    print("\nRun #2 plain hit:", bool(plain))  # expected: False

    # Adaptive selector: should succeed by relocating via similarity against the saved fingerprint.
    adapted: Selector | None = page2.css_first(selector, adaptive=True)
    adapted_text = (adapted.text or "").strip() if adapted else ""
    print("Run #2 adaptive hit:", bool(adapted), "| author:", adapted_text)


if __name__ == "__main__":
    main()

After running this code, you'll see something like:

CSS .quote: 10
XPath .quote: 10
find_all: 10
find_by_text('life') found: True
Text match snippet: life

Run #1 author hit: True | author: Albert Einstein
Similar elements: 9
Elements below saved author's parent: 10
CSS: body > div > div:nth-of-type(2) > div > div > span:nth-of-type(2) > a
XPath: //body/div/div[2]/div/div/span[2]/a

Sample similar elements (first 3):
  1. J.K. Rowling
  2. Albert Einstein
  3. Jane Austen

Run #2 plain hit: False
Run #2 adaptive hit: True | author: Albert Einstein

What this demonstrates in practice:

  • You can mix CSS, XPath, and find-style queries on the same page and just use whatever reads best for that specific structure (quick CSS for lists, XPath when classes are annoying, find_all when you want tag+attrs style).
  • Defensive text search (find_by_text) is a nice fallback when markup is messy or class names are unstable. It's not your main strategy, but it's a solid "plan B" for locating a region of interest.
  • The adaptive flow is real and easy to reason about:
    • Run #1 selects small.author and saves its fingerprint with auto_save=True.
    • Run #2 simulates a "site changed" event by renaming the class in the HTML, so the plain selector fails.
    • Calling the same selector with adaptive=True makes Scrapling look up the saved fingerprint and relocate the closest match via similarity scoring.
  • Relationship helpers are useful, but they have specific meanings:
    • below_elements returns descendants inside the current node (not "stuff visually below"), so the demo climbs up to a parent container that actually contains other elements before using it.
    • find_ancestor(...) is the clean move when you've found a reliable inner element (like the author) and want the whole "card" (.quote) so you can parse everything inside it.
  • find_similar() shows the "one item → get the rest" workflow: once you've got one element that represents a repeated block, you can ask Scrapling for the other blocks that look like it.
  • Bonus: you can generate full selectors (generate_css_selector, generate_xpath_selector) for a matched element. Great for debugging and quick prototyping (just don't treat generated selectors as indestructible in production).

High-performance and battle-tested architecture

Scrapling isn't trying to be a full-blown framework. Its design is intentionally small, fast, and predictable — something you can drop into a scraper without dragging half the ecosystem with it. Most of the speed comes from the internals it's built on: efficient data structures, lazy evaluation where it makes sense, and a parsing layer that's noticeably quicker than the older Python web scraping stack. Memory usage stays low even on larger pages.

On the reliability side, Scrapling is no longer "experimental." It has solid test coverage, full type hints, and a growing user base that's been running it daily for real scraping work. It's still a young project, but it's stable enough that you're not dealing with code that falls apart the moment you scrape something non-trivial.

Developer and web-scraper friendly experience

Scrapling focuses on making day-to-day scraping smoother without dragging you into a big framework. A lot of the "developer comfort" features are things you'd otherwise have to build yourself: selector testing, navigation helpers, and tools for working with messy HTML.

  • Interactive scraping shell
    You can launch an IPython shell with Scrapling preloaded. It's handy for trying selectors, inspecting elements, converting curl commands, or opening fetched pages in your browser while you iterate.
  • Terminal-friendly usage
    Need to grab a page and check the output? You can fetch URLs directly from the terminal without writing a script first.
  • Rich navigation API
    The DOM model gives you clean access to parents, children, siblings, and previous/next elements, plus tree-based traversal when you need finer control.
  • Built-in text helpers
    Regex search, text cleanup, whitespace normalization, and other string utilities are included, so you don't have to chain multiple libraries for basic extraction tasks.
  • Selector generation tools
    You can generate CSS or XPath selectors for any element. Useful when you're mapping out pages and want stable entry points for a scraper.
  • Familiar API shape
    If you've used BeautifulSoup, Parsel, or Scrapy selectors, the syntax will feel natural. No big learning curve.
  • Full type hints
    Everything is typed end-to-end, which keeps IDEs happy and makes browsing the API much easier.
  • Ready-to-use Docker image
    Releases include a Docker image with the required browsers already installed, so you can skip the usual Playwright setup and dependency chasing.

Performance benchmarks

Speed comparison

Scrapling includes a small benchmark suite in the repo that measures how long different parsers take to walk through a large synthetic HTML document (about 5,000 nested elements). It's not an industry standard benchmark, but it's a decent sanity check for relative speed:

  • Scrapling (as of v0.3.14): 1.99 ms
  • Parsel/Scrapy: 2.01 ms (roughly the same)
  • PyQuery: 22.93 ms
  • Selectolax: 80.57 ms
  • BeautifulSoup with lxml: 1,541 ms

What this means in practice

If you're parsing a lot of pages, small per-page wins add up quickly. Using their synthetic test as a rough example, parsing 10,000 pages would look like:

  • BeautifulSoup: ~3-4 hours
  • Scrapling: ~20-30 seconds

Real-world performance will obviously bounce around depending on the HTML you're dealing with, your machine, and whether you're doing plain HTTP or dragging a browser into the mix. Also, these numbers from Scrapling's own benchmark, so treat them as directional: handy for relative comparison, not a guarantee you'll see the same speed on real sites. Big grain-of-salt required.

That said, the general idea still stands: Scrapling's parser overhead is low enough that large batches usually don't feel like an all-day grind just to chew through HTML.

Adaptive feature performance

The adaptive matching logic is also tuned to stay lightweight. A benchmark for "find similar elements" (again from the repo):

  • Scrapling: 2.46 ms
  • AutoScraper: 13.3 ms

The similarity search doesn't suddenly slow down your pipeline, it stays in the same performance range as the rest of the parser.

Full benchmark details and methodology are available in the Scrapling repo if you want to review or replicate the tests.

When to use Scrapling

Ideal use cases

Scrapling is a strong fit when you want:

  • A smooth learning curve — the API feels familiar if you've used BeautifulSoup, Parsel, or Scrapy.
  • Fast prototyping — pick a fetcher, write a few selectors, and you're up and running.
  • Resilience to layout changes — adaptive element tracking reduces the "fix selectors after every redesign" loop.
  • Light anti-bot handlingStealthyFetcher helps with common protection layers like Cloudflare challenges.
  • High parsing performance — useful when you're processing large batches.
  • Full control over your stack — everything runs locally, open-source, and customizable.

Consider alternatives when...

Scrapling is great if you want to run everything yourself, but a managed service like ScrapingBee can be a better fit when:

  • You're scraping at scale — managing proxies, browsers, sessions, and queues becomes real work fast.
  • Your team doesn't want to run scraping infrastructure — keeping browser clusters healthy isn't everyone's favorite chore.
  • You need quick deployment — calling an API is faster than bootstrapping a scraper stack.
  • You rely heavily on proxy rotation — Scrapling doesn't provide proxies out of the box, so you'll need to handle that layer.
  • You need support or uptime guarantees — managed services can offer SLAs and dedicated help.

It's not about Scrapling being "better" or "worse", it's about choosing between full DIY control or letting someone else handle the heavy operational parts.

Comparison with traditional libraries

Scrapling isn't meant to replace the whole Python web scraping ecosystem. Each tool has situations where it's the right choice. Here's how Scrapling fits in compared to the usual options.

vs. BeautifulSoup

  • Considerably faster (Scrapling's parser wins by a large margin in its benchmark tests)
  • Includes adaptive element tracking for handling layout changes
  • Offers stealthier fetching options out of the box
  • Keeps a similarly simple, approachable API shape

vs. Scrapy

  • Lighter and quicker to start with
  • A good fit for small and medium projects where a full crawling framework may be more than you need
  • Easier for fast iteration or one-off scripts

vs. Selenium / Playwright

  • More concise for most scraping tasks that don't require real browser interaction
  • Stealth web scraping features without spinning up a full browser
  • Better performance on non-JavaScript pages
  • StealthyFetcher fills the gap between plain HTTP and full browser automation

Each library solves a different part of the Python web scraping landscape. Scrapling covers the "fast, flexible, and lower maintenance" niche, especially when you want adaptive scraping without the overhead of a full automation stack.

Find the full Python web scraping tutorial in our blog.

Getting started with Scrapling

Installation

For basic HTTP scraping, the installation is simple:

pip install scrapling

Just make sure you're on Python 3.10+.

Also note that this command installs only parsers. If you want to use the browser-based fetchers (StealthyFetcher or DynamicFetcher), install the extra dependencies and let Scrapling set up the required browsers:

pip install "scrapling[fetchers]"
scrapling install

Your first scraper

Now let's write something real. If the page is JavaScript-rendered, basic HTTP fetching won't see the final content. In that case, jump straight to DynamicFetcher. It runs a real browser via Playwright, but you still parse the result with the same Scrapling selection API, so your Python web scraping code stays consistent.

Below is a "small but sane" scraper: it fetches a JS page, waits for product links to exist, extracts fields defensively, normalizes URLs, and writes JSON to disk.

import json
from dataclasses import dataclass, asdict
from datetime import datetime, timezone
from urllib.parse import urljoin, urlparse

from scrapling.fetchers import DynamicFetcher

URL = "https://www.scrapingcourse.com/javascript-rendering"
WAIT_SELECTOR = ".product-link:not([href=''])"


@dataclass(frozen=True, slots=True)
class Product:
    id: str          # stable-ish id derived from URL
    name: str
    price: str
    url: str
    image: str
    scraped_at: str


def utc_now_iso_z() -> str:
    """UTC timestamp in ISO format with Z suffix (seconds precision)."""
    # Produces e.g. "2026-02-09T14:18:14Z"
    return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")


def origin(url: str) -> str:
    """Return scheme+host origin with trailing slash."""
    p = urlparse(url)
    return f"{p.scheme}://{p.netloc}/"


def fetch_page(url: str, *, wait_selector: str, headless: bool = True):
    """
    Fetch a JS-rendered page using DynamicFetcher (Playwright under the hood).
    Returns a Scrapling Response-like object.
    """
    return DynamicFetcher.fetch(
        url,
        headless=headless,
        wait_selector=wait_selector,
    )


def stable_id_from_url(product_url: str) -> str:
    """
    Create a stable id from the product URL.
    (Slug-based; good enough for demos and dedupe.)
    """
    path = urlparse(product_url).path.rstrip("/")
    slug = path.split("/")[-1] if path else ""
    return slug or product_url


def extract_products(page) -> list[Product]:
    """
    Parse product cards from the page.
    Intentionally defensive: missing fields won't crash the run.
    """
    products: list[Product] = []
    scraped_at = utc_now_iso_z()

    # Best base: the final loaded URL (handles redirects/canonicals)
    base = origin(getattr(page, "url", URL))

    for item in page.css(".product-item"):
        name = (item.css_first(".product-name::text") or "").strip()
        price = (item.css_first(".product-price::text") or "").strip()
        link = (item.css_first(".product-link::attr(href)") or "").strip()
        img = (item.css_first(".product-image::attr(src)") or "").strip()

        # Skip broken cards instead of generating junk rows.
        if not name or not link:
            continue

        product_url = urljoin(base, link)
        image_url = urljoin(base, img) if img else ""

        products.append(
            Product(
                id=stable_id_from_url(product_url),
                name=name,
                price=price,
                url=product_url,
                image=image_url,
                scraped_at=scraped_at,
            )
        )

    return products


def save_products_json(products: list[Product], out_file: str) -> None:
    """Write results to disk as JSON."""
    payload = [asdict(p) for p in products]
    with open(out_file, "w", encoding="utf-8") as f:
        json.dump(payload, f, ensure_ascii=False, indent=2)


def main() -> None:
    page = fetch_page(URL, wait_selector=WAIT_SELECTOR, headless=True)
    products = extract_products(page)

    print(f"Scraped {len(products)} products")
    print("First product preview:", asdict(products[0]) if products else "N/A")

    out_file = "products.json"
    save_products_json(products, out_file)
    print(f"Saved to {out_file}")


if __name__ == "__main__":
    main()

What to notice:

  • wait_selector is doing real work here. Without it, you risk parsing an empty shell before JS finishes rendering.
  • All field extraction is guarded with or "" and .strip(), so one missing element won't crash the run.
  • urljoin() keeps URL normalization correct without hand-rolling string logic.

Here's the result:

[
  {
    "id": "chaz-kangeroo-hoodie",
    "name": "Chaz Kangeroo Hoodie",
    "price": "$52",
    "url": "https://scrapingcourse.com/ecommerce/product/chaz-kangeroo-hoodie",
    "image": "https://scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/mh01-gray_main.jpg",
    "scraped_at": "2026-02-09T15:39:42Z"
  },
  {
    "id": "teton-pullover-hoodie",
    "name": "Teton Pullover Hoodie",
    "price": "$70",
    "url": "https://scrapingcourse.com/ecommerce/product/teton-pullover-hoodie",
    "image": "https://scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/mh02-black_main.jpg",
    "scraped_at": "2026-02-09T15:39:42Z"
  }
]

Asyncio example

Here's an async version of a multi-page scraper. It uses FetcherSession in async mode so you keep cookies, headers, and connection state across requests while still running everything concurrently with asyncio. This is a good template for "parallel but friendly to the target site" scraping: batched requests, retries, timeouts, and defensive parsing.

import asyncio
import json
from dataclasses import asdict, dataclass
from datetime import datetime, timezone

from scrapling.fetchers import FetcherSession

BASE_URL = "https://quotes.toscrape.com/page/{}/"
MAX_PAGES = 12
BATCH_SIZE = 4
TIMEOUT_SECONDS = 15
RETRIES = 3


@dataclass(frozen=True, slots=True)
class Quote:
    text: str
    author: str
    tags: list[str]
    url: str


def utc_timestamp() -> str:
    """UTC timestamp string for filenames."""
    return datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")


def build_page_urls(page_start: int, batch_size: int, max_pages: int) -> list[str]:
    """Build a bounded batch of page URLs (never exceeding max_pages)."""
    end = min(page_start + batch_size - 1, max_pages)
    return [BASE_URL.format(i) for i in range(page_start, end + 1)]


async def fetch_and_parse_quotes(session: FetcherSession, url: str) -> list[Quote]:
    """
    Fetch a single page and extract quotes.

    We keep this function small and pure-ish: one URL in, list of Quote out.
    Using a shared session keeps cookies + TLS state consistent across requests.
    """
    page = await session.get(url)

    results: list[Quote] = []
    for block in page.css(".quote"):
        text = (block.css_first(".text::text") or "").strip()
        author = (block.css_first(".author::text") or "").strip()
        tags = [(t.text or "").strip() for t in block.css(".tag")]

        # Skip empty entries so we don't write junk to JSON.
        if not text:
            continue

        results.append(Quote(text=text, author=author, tags=tags, url=url))

    print(f"[OK] {len(results)} quotes from {url}")
    return results


async def scrape_quotes(
    *,
    max_pages: int = MAX_PAGES,
    batch_size: int = BATCH_SIZE,
    out_file_prefix: str = "quotes",
) -> str:
    """
    Scrape multiple pages concurrently in bounded batches.

    Returns the output filename.
    """
    all_quotes: list[Quote] = []
    page_num = 1

    # Async session keeps persistent state (cookies, headers, TLS) across fetches.
    async with FetcherSession(
        impersonate="chrome",  # optional: more browser-like request profile
        retries=RETRIES,
        timeout=TIMEOUT_SECONDS,
    ) as session:
        while page_num <= max_pages:
            urls = build_page_urls(page_num, batch_size, max_pages)
            print(f"\n[Batch] Pages {page_num}{page_num + len(urls) - 1}")

            tasks = [fetch_and_parse_quotes(session, url) for url in urls]
            results = await asyncio.gather(*tasks, return_exceptions=True)

            for result in results:
                if isinstance(result, Exception):
                    print("[ERR] Skipped one page:", result)
                    continue
                all_quotes.extend(result)

            page_num += batch_size

    outfile = f"{out_file_prefix}_{utc_timestamp()}.json"
    payload = [asdict(q) for q in all_quotes]

    with open(outfile, "w", encoding="utf-8") as f:
        json.dump(payload, f, ensure_ascii=False, indent=2)

    print(f"\n[Done] Scraped {len(all_quotes)} quotes")
    print(f"[Saved] {outfile}")
    return outfile


def main() -> None:
    asyncio.run(scrape_quotes())


if __name__ == "__main__":
    main()

Key points to note:

  • FetcherSession keeps TLS and cookies stable, so many pages load more consistently than with raw parallel requests.
  • Batching prevents hitting a site with dozens of concurrent requests at once.
  • Using async fetchers keeps the scraper fast without spawning threads or processes.
  • The parsing API is identical to the sync version, which keeps mental overhead low.

Also, some "good citizen defaults" (so you don't become the guy who melts servers):

  • Cap concurrency: keep parallel requests low (e.g. 2–8) and ramp up only if the site clearly tolerates it.
  • Add jitter: randomize delays between requests/batches so you don't look like a metronome.
  • Back off on pain signals: on 429/503/520–529, do exponential backoff + jitter; stop retrying after a few attempts.
  • Respect robots + ToS: if you're not allowed, don't scrape. If there's an API, prefer it.
  • Cache when possible: avoid re-downloading the same pages during dev runs; it's faster for you and nicer for them.
  • Use timeouts everywhere: connect/read timeouts + overall timeout; never let requests hang forever.
  • Log + monitor blocks: track status codes and block rates so you notice protection changes before your pipeline silently dies.

Next steps

If you want to dig deeper, these are the official places worth keeping open:

  • Docs
  • GitHub repo
  • "Basic Usage" in the README for a quick jumpstart
  • Interactive shell: scrapling shell (great for trying selectors live)

A simple learning path that covers most of Scrapling's strengths:

  1. Start with Fetcher on a couple static pages.
  2. Practice CSS selectors and chained queries.
  3. Move to StealthyFetcher when a site starts blocking simple HTTP requests.
  4. Try adaptive element tracking on a page that changes often or has multiple variants.
  5. Skim the docs for the more advanced pieces: sessions, async fetchers, dynamic fetching, selector helpers, CLI tooling.

Running the examples locally and tweaking them is the fastest way to get a feel for the API, especially with the interactive shell. Here's a tiny shell example to show how it works. First, install shell package and run it:

pip install "scrapling[shell]"
scrapling shell

Now within the shell you can run:

r = fetch("https://quotes.toscrape.com/")
r.css_first(".quote .text::text")
# Out[2]: '"The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking."'

r.css(".tag")
# Out[3]: [<data='<a class="tag" href="/tag/change/page/1/...' parent='<div class="tags"> Tags: <meta class="ke...'>, ... ]

You can poke at real pages, try selectors, inspect results, and iterate without writing a full script: perfect for figuring out structure before building a scraper.

Scrapling vs. managed solutions

The DIY vs. managed decision

Scrapling is a strong option when you want to own the whole scraping automation setup:

  • Full control — open-source, self-hosted, and fully customizable
  • Fits DevOps-heavy teams — you can run your own proxies, monitoring, and scaling logic
  • Good for learning and experimentation — you see how the entire pipeline works
  • Cost flexibility — no per-request billing; it's just your infrastructure costs

But once scraping becomes a production workload, most of the effort shifts away from parsing HTML and toward the operational parts:

  • sourcing and rotating proxies
  • keeping IPs clean and non-blocked
  • running and monitoring browser clusters
  • scaling systems as datasets grow
  • dealing with evolving anti-bot protections
  • choosing whether engineering time should go to scraper maintenance or product work

So the real decision is simple: do you want to run your own scraping automation infrastructure, or do you want a managed service to handle that layer for you?

When managed services make sense

Some teams don't want to operate the scraping machinery, they just want structured data delivered reliably. That's where managed APIs like ScrapingBee take a different approach.

Operational benefits:

  • no servers, browsers, or proxies to deploy
  • managed proxy rotation with a large, clean IP pool
  • browser automation handled for you (no Playwright installs, no headless Chrome headaches)
  • automatic scaling through a pay-per-request model

A different feature approach:

  • Scrapling reduces maintenance through adaptive element tracking.
  • ScrapingBee tackles the same pain from another angle: AI-powered extraction, where you describe the data you want in plain English instead of writing selectors.

Both approaches make scraping less brittle, just at different layers of the stack.

When a managed service is the better fit:

  • production workloads where stability and uptime matter
  • no in-house DevOps capacity to run proxies or browser clusters
  • tight delivery timelines where infrastructure would slow you down
  • enterprise environments that need support, SLAs, and predictable behavior

It's not a question of which tool is "better." It's about priorities: Scrapling gives you control; ScrapingBee gives you convenience.

If you want to try the managed route, you can start a ScrapingBee account with 1,000 free credits today.

That's a wrap: Time to optimize your scraping strategy

When to delegate the heavy lifting

If you've explored Scrapling and the scraping workflow feels right but the infrastructure part doesn't, a managed option like ScrapingBee can take over the operational load:

  1. Managed proxy rotation
    Scrapling leaves proxy sourcing and rotation to you. ScrapingBee ships with a maintained, large proxy pool and automatic rotation built in.
  2. Zero maintenance overhead
    With Scrapling, you run every piece of the stack yourself: proxies, browsers, scaling, monitoring. ScrapingBee handles all of that for you.
  3. AI-powered extraction
    Instead of writing selectors, ScrapingBee lets you describe the data you want in plain English, and the API extracts it for you.

Next steps

To explore Scrapling:

To explore managed scraping:

In the end, it's about what your team values more: Scrapling's adaptive, self-hosted control, or ScrapingBee's managed, low-overhead convenience. Both aim to make web scraping easier, but from different angles.

Scrapling for Python web scraping – FAQ

What is Scrapling?

Scrapling is an open-source Python web scraping library focused on fast parsing, adaptive element tracking, and flexible fetchers (HTTP, stealth, full browser). It aims to make scraping less brittle without turning into a heavy framework.

What is adaptive web scraping and why does it matter?

Adaptive web scraping lets Scrapling relocate elements when the page structure changes. It stores a lightweight fingerprint of an element (tag, attributes, neighbors, etc.) and uses similarity matching if the original selector stops working. Result: fewer broken scrapers after small layout updates.

How is Scrapling different from tools like BeautifulSoup, Scrapy, or Selenium?

  • Faster parser than older libraries
  • Optional stealth and browser-based fetchers
  • Adaptive element tracking to reduce maintenance
  • Familiar API shape (selectors, chaining, DOM navigation)

Each tool still has its place, and Scrapling targets the "fast, flexible, low-maintenance" niche.

Why not just build everything myself (full DIY)?

You can, but you'll end up maintaining proxies, browser pools, retries, scaling logic, monitoring, anti-bot handling, and infrastructure. Scrapling helps with parsing and fetching, but the ecosystem around scraping can get heavy fast.

When does a managed solution like ScrapingBee make sense?

A managed API is a better fit when you need:

  • automatic proxy rotation
  • fully managed browser automation
  • predictable scaling
  • no infrastructure overhead
  • AI-powered extraction instead of writing selectors

Basically: Scrapling gives you control; ScrapingBee handles the operations.

Can Scrapling and ScrapingBee be used together?

Absolutely. Scrapling can run your parsing logic, selectors, and adaptive extraction locally, while ScrapingBee handles the fetching side (proxies, browsers, anti-bot). Teams can take advantage of both: Scrapling for logic, ScrapingBee for infrastructure.

image description
Ilya Krukowski

Ilya is an IT tutor and author, web developer, and ex-Microsoft/Cisco specialist. His primary programming languages are Ruby, JavaScript, Python, and Elixir. He enjoys coding, teaching people and learning new things. In his free time he writes educational posts, participates in OpenSource projects, tweets, goes in for sports and plays music.