How to manage price scraping with Python: A guide to price tracking

Q: What Python libraries are used for price scraping?

Most setups use requests for fetching pages and BeautifulSoup or lxml for parsing HTML. For storing results, CSV or pandas are common choices. More advanced workflows may include browser tools or APIs to handle dynamic or protected sites.

Q: How often can I scrape prices from a website?

There is no fixed rule. It depends on how often prices change and how strict the site is. Scraping too frequently can trigger rate limits or blocks, so spacing requests out and keeping a consistent schedule works better for long-term tracking.

Q: Why do price scraping scripts get blocked?

Blocks usually happen due to too many requests, missing headers, or trying to access dynamic pages without proper rendering. Websites detect unusual patterns, so basic scripts often fail unless retries, proxies, or rendering are added.

Q: Can I scrape prices from Amazon using Python?

Yes, but it is not simple. Amazon has strong anti-bot protection, so basic scripts often get blocked. Reliable scraping usually requires proxies, proper headers, and sometimes dedicated APIs to handle requests more safely.

Ilya Krukowski | 23 March 2026 | 20 min read

Table of contents

Price scraping Python is one of the easiest ways to keep track of product prices across websites without doing everything manually. Instead of checking the same pages again and again, a small script can collect pricing data, store the results, and highlight changes right away.

This approach works well for many cases: monitoring competitors, tracking discounts, or making sure a product isn't overpriced. And this isn't just for developers — anyone curious enough can pick this up and build something useful pretty quickly.

Python is usually the go-to choice here because the language is simple, readable, and backed by a strong ecosystem for web scraping. Libraries like requests, BeautifulSoup, and others make fetching pages, parsing content, and extracting the data you need straightforward.

In this guide, we'll break everything down step by step, go through real examples, and show working code samples so the whole process feels clear and practical.

How to manage price scraping with Python: A guide to price tracking

Quick answer (TL;DR)

Price scraping with Python is about sending requests to product pages, parsing the returned HTML, and extracting price data into a structured format like CSV or a database. For simple static sites, a stack like requests and BeautifulSoup is usually enough.

When dealing with dynamic or protected websites, things get more complex. You'll need tools that can render JavaScript or handle blocks, such as headless browsers or scraping APIs, to reliably collect pricing data at scale.

Below, we'll walk through both approaches step by step, starting with a simple static example and then moving to a more realistic setup with dynamic pages and scaling.

👉 Learn more in this guide on price scraper explained

What is price scraping and when to use it

Collecting product prices from websites automatically is way easier than checking them manually like a caveman every day. A script pulls pricing data from public product pages and turns it into something actually useful. This starts making sense real quick when prices change often or when multiple stores need tracking at once. No one wants to open 20 tabs just to see which store lowered a price by a couple of bucks.

👉 Wanna go deeper into pricing rules and control? Check this guide on minimum advertised price monitoring.

Common price scraping use cases

This stuff shows up everywhere in e-commerce, especially where products are publicly listed. A few real-life scenarios:

Competitor tracking — see what others are charging and react faster instead of guessing
Market research — collect data over time and actually understand pricing trends
Dynamic pricing — adjust your own prices based on what's happening out there
Deal tracking — catch discounts and price drops without refreshing pages all day
Marketplace comparison — scan platforms like Amazon or others and compare listings side by side

A lot of people start with big marketplaces since they're packed with data.

👉 If Amazon is the target, here's a practical guide on how to scrape Amazon product prices.

Is price scraping legal

Alright, let's briefly cover the legal side. In general, scraping publicly available data is more on the safe side, especially when no login or private access is involved. But websites still have their own rules, and those are written in their terms of service.

Going full brute-force mode, bypassing protections, or hammering a site with requests is where problems start. That's how IPs get blocked and headaches begin. So stay chill, don't overload servers, don't touch private data, and respect the site you're pulling from.

If something feels sketchy, it probably is — better to double-check than deal with consequences later.

How to scrape prices using Python

The basic idea behind price scraping with Python is pretty simple. First, a script sends a request to a page. Then it reads the HTML, finds the parts that hold the product data, and extracts the fields that matter, like title and price.

For static pages, this flow is enough:

Send an HTTP request with requests
Parse the HTML with BeautifulSoup
Select the product elements from the page
Extract the text you need
Save the results somewhere useful, like a CSV file

This works nicely on pages where the HTML already contains the product data. Once JavaScript-heavy pages or anti-bot protections enter the chat, the setup gets a bit more involved. We'll get there later, but for now let's start with a static site.

👉 For another real-world example, here's a guide on how to scrape eBay prices with Python.

Set up a small uv project

Let's start fresh with a new project using uv:

uv init price-scraping-python
cd price-scraping-python
uv add requests beautifulsoup4 lxml

That gives us:

requests for downloading the page
beautifulsoup4 for parsing HTML
lxml as a fast parser backend

A simple project structure is more than enough here:

price-scraping-python/
├── main.py
└── pyproject.toml

Pick a static target page

For this example, we'll use books.toscrape.com, which is perfect for practice.

Each book sits inside an article with the class .product_pod, and inside that block we can grab:

the title from the h3 a tag
the price from p.price_color

So the plan is dead simple: find all .product_pod elements, loop through them, extract the title and price, and write everything into a CSV file.

Scrape titles and prices

Here's a complete example with basic error handling:

import csv
from pathlib import Path

import requests
from bs4 import BeautifulSoup


# Target page with books
URL = "https://books.toscrape.com/"

# Output file path
OUTPUT_FILE = Path("books_prices.csv")


def fetch_html(url):
    """Download HTML from the target page"""

    # Fake a real browser so we don't look like a bot
    headers = {
        "User-Agent": (
            "Mozilla/5.0 (X11; Linux x86_64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/122.0.0.0 Safari/537.36"
        )
    }

    # Send GET request
    response = requests.get(url, headers=headers, timeout=30)

    # Raise error if request failed (status != 200)
    response.raise_for_status()

    # Decode bytes explicitly to avoid charset issues on this site
    return response.content.decode("utf-8")


def parse_books(html):
    """Extract book titles and prices from HTML"""

    # Parse HTML with lxml parser
    soup = BeautifulSoup(html, "lxml")

    books = []

    # Loop through each product card
    for product in soup.select("article.product_pod"):
        # Find title and price elements
        title_tag = product.select_one("h3 a")
        price_tag = product.select_one("p.price_color")

        # Skip if something is missing
        if not title_tag or not price_tag:
            continue

        # Extract title from attribute
        title = title_tag.get("title", "").strip()

        # Extract visible price text
        price = price_tag.get_text(strip=True)

        # Skip empty values just in case
        if not title or not price:
            continue

        # Store result as dict
        books.append(
            {
                "title": title,
                "price": price,
            }
        )

    return books


def save_to_csv(rows, output_file):
    """Save extracted data into a CSV file"""

    # Open file for writing
    with output_file.open("w", newline="", encoding="utf-8") as csv_file:
        writer = csv.DictWriter(csv_file, fieldnames=["title", "price"])

        # Write header row
        writer.writeheader()

        # Write all collected rows
        writer.writerows(rows)


def main():
    """Main execution flow"""

    try:
        # Step 1: fetch page HTML
        html = fetch_html(URL)

        # Step 2: parse and extract data
        books = parse_books(html)

        # Handle empty result
        if not books:
            print("No books found on the page.")
            return

        # Step 3: save to CSV
        save_to_csv(books, OUTPUT_FILE)

        print(f"Saved {len(books)} rows to {OUTPUT_FILE}")

    except requests.RequestException as exc:
        # Network-related errors
        print(f"Request failed: {exc}")

    except OSError as exc:
        # File system errors
        print(f"File write failed: {exc}")

    except Exception as exc:
        # Catch-all for anything unexpected
        print(f"Unexpected error: {exc}")


if __name__ == "__main__":
    main()

Run the script:

uv run main.py

If all goes well, a file called books_prices.csv will show up in the project folder.

Example output:

title,price
A Light in the Attic,£51.77
Tipping the Velvet,£53.74
Soumission,£50.10
Sharp Objects,£47.82

This kind of page is the easiest place to start because the prices are already present in the HTML response. No browser automation, no JavaScript rendering, no extra headaches.

Handling dynamic websites

So far everything worked smoothly because the page was static. Dynamic websites are a different story.

Many modern sites load product data using JavaScript after the page is opened in a browser. When a simple requests call hits that same page, the response often comes back without the actual prices, or with empty placeholders. From the script's point of view, the data just isn't there.

That's why price scraping with Python gets trickier with these sites. The problem isn't parsing anymore — it's getting the fully rendered content.

There are two main ways to deal with this:

Use headless browsers like Playwright or Selenium, which load the page the same way a real browser does and execute JavaScript
Use a scraping API that handles rendering for you and returns the final HTML

The first option gives full control but adds complexity and overhead. The second one is usually faster to set up and easier to scale.

👉 If you want a deeper breakdown of how this works, check this guide on scraping dynamic content.

Scraping prices at scale

Scraping one page is easy. Scraping hundreds or thousands is where things start getting interesting. Once multiple pages or categories are involved, a few problems show up pretty quickly.

First one is rate limiting. If too many requests hit a site in a short time, the server may slow things down, return errors, or block the IP completely. That's a normal protection mechanism, not some special anti-scraping magic.
Then there's IP blocking. Repeated requests from the same address can get flagged, especially on larger e-commerce platforms. After that, requests might start failing or returning different content.
And finally, reliability becomes a thing. Some requests fail randomly, connections drop, pages timeout — all the usual network chaos.

So when scaling this kind of scraper, a few concepts become important:

Proxies — rotate IP addresses to avoid getting blocked too quickly
Rate limiting — slow things down and space out requests to look more natural
Retries — handle temporary failures instead of losing data
Parallelism — speed things up without overwhelming the target site

A common real-world example is monitoring prices across multiple product categories. Instead of scraping one page, the script loops through dozens or hundreds of category pages, collects product links, and then fetches each product page individually. That's where scaling challenges really kick in.

At this point, managing all of this manually can get messy. Between proxy rotation, retries, and handling blocks, the code starts growing fast.

👉 If you want to see how this can be handled with a ready-made solution, check out this Amazon keyword scraper API.

In the next part, we'll look at how to simplify this whole setup using a scraping API so you don't have to build all the infrastructure from scratch.

Start price scraping faster with an API

At some point, building everything yourself stops being fun. Handling proxies, retries, blocks, JavaScript rendering, and scaling across hundreds of pages can quickly turn a simple price scraping Python script into a full-time maintenance project.

That's where using a scraping API makes a lot of sense. Instead of dealing with all the moving parts, the API handles things like IP rotation, request retries, and rendering behind the scenes. The script stays clean and focused on what actually matters — extracting and using the data.

This approach saves time, reduces headaches, and makes the whole setup way more reliable, especially when scaling up.

👉 Check out the ScrapingBee web scraping API to see how you can simplify price scraping.

Get started with ScrapingBee

Now, we'll build a more robust price scraping setup using ScrapingBee to handle JavaScript rendering and proxies for us. This is especially useful when working with dynamic sites or when scaling beyond a few simple pages.

To get started, you'll need an API key. ScrapingBee offers a free plan with 1000 credits, which is enough to test things out and run a few real scraping tasks without paying upfront.

👉 Create a free account now!

Once inside the dashboard, copy your API key. Instead of hardcoding it, drop it into a .env file:

SCRAPINGBEE_API_KEY=your_api_key_here

This keeps credentials out of the code and makes things easier to manage.

In the next step, we'll plug this into a working example and see how the scraping flow changes.

Scrape multiple Newegg categories with ScrapingBee

Let's move from a basic static example to something more practical. In this version, the scraper will fetch multiple Newegg category pages through ScrapingBee, let ScrapingBee render the page with JavaScript, and then parse the final HTML with BeautifulSoup like before. ScrapingBee supports JavaScript rendering and lets you wait for specific selectors before returning the page.

We'll scrape these three categories in parallel:

Desktop computers
Server and workstation systems
Wireless routers

The script will extract product titles, prices, product URLs, and category names, then save everything into a single CSV file.

First, install the packages if you don't have them already:

uv add requests beautifulsoup4 lxml python-dotenv

And here is the script:

import csv
import os
from concurrent.futures import ThreadPoolExecutor, as_completed
from pathlib import Path
from urllib.parse import urljoin

import requests
from bs4 import BeautifulSoup
from dotenv import load_dotenv


# Load environment variables from .env
load_dotenv()

# ScrapingBee API key from .env
API_KEY = os.getenv("SCRAPINGBEE_API_KEY")

# ScrapingBee endpoint
API_URL = "https://app.scrapingbee.com/api/v1/"

# Output CSV file
OUTPUT_FILE = Path("newegg_prices.csv")

# Category pages we want to monitor
CATEGORIES = {
    "desktop_computers": "https://www.newegg.com/Desktop-Computer/SubCategory/ID-10",
    "servers_workstations": "https://www.newegg.com/Server-Workstation-System/SubCategory/ID-386",
    "wireless_routers": "https://www.newegg.com/Wireless-Routers/SubCategory/ID-145",
}


def fetch_category_page(category_name, url):
    """Fetch one Newegg category page through ScrapingBee."""

    if not API_KEY:
        raise ValueError("SCRAPINGBEE_API_KEY is missing from the environment.")

    # Ask ScrapingBee to:
    # - open the target URL
    # - render JavaScript
    # - use managed premium proxies
    # - wait until the product list container appears
    params = {
        "api_key": API_KEY,
        "url": url,
        "render_js": "true",
        "premium_proxy": "true",
        "wait_for": ".item-cells-wrap.items-list-view",
        # Optionally, add:
        # "wait": 2000,
    }

    response = requests.get(API_URL, params=params, timeout=90)
    response.raise_for_status()

    # Return raw bytes so BeautifulSoup can parse them directly
    return category_name, response.content


def parse_products(category_name, html_bytes, base_url="https://www.newegg.com"):
    """Parse product cards from one rendered category page."""

    soup = BeautifulSoup(html_bytes, "lxml")
    rows = []

    # Every product card lives in a div.item-cell
    for product in soup.select("div.item-cell"):
        # Main product link with the title text
        title_tag = product.select_one("a.item-title")

        # Current price wrapper
        price_wrap = product.select_one("li.price-current")

        # Skip broken or incomplete cards
        if not title_tag or not price_wrap:
            continue

        # Product title
        title = title_tag.get_text(" ", strip=True)

        # Product URL
        product_url = title_tag.get("href", "").strip()
        if product_url:
            product_url = urljoin(base_url, product_url)

        # Newegg often splits the price into:
        # $<strong>549</strong><sup>.99</sup>
        whole_tag = price_wrap.select_one("strong")
        fraction_tag = price_wrap.select_one("sup")

        whole = whole_tag.get_text(strip=True) if whole_tag else ""
        fraction = fraction_tag.get_text(strip=True) if fraction_tag else ""

        # Clean the decimal part:
        # ".99" -> "99"
        fraction = fraction.replace(".", "")

        # Build a normalized numeric price string like "549.99"
        if whole and fraction:
            price = f"{whole}.{fraction}"
        elif whole:
            price = whole
        else:
            price = ""

        # Skip rows with missing core fields
        if not title or not price:
            continue

        rows.append(
            {
                "category": category_name,
                "title": title,
                "price": price,
                "product_url": product_url,
            }
        )

    return rows


def save_to_csv(rows, output_file):
    """Save all scraped rows into a CSV file."""

    with output_file.open("w", newline="", encoding="utf-8") as csv_file:
        writer = csv.DictWriter(
            csv_file,
            fieldnames=["category", "title", "price", "product_url"],
        )
        writer.writeheader()
        writer.writerows(rows)


def main():
    """Run all category requests in parallel and save the final result."""
    all_rows = []

    try:
        # Fire 3 category requests in parallel
        with ThreadPoolExecutor(max_workers=3) as executor:
            futures = {
                executor.submit(fetch_category_page, name, url): name
                for name, url in CATEGORIES.items()
            }

            for future in as_completed(futures):
                category_name = futures[future]

                try:
                    fetched_category, html_bytes = future.result()
                    rows = parse_products(fetched_category, html_bytes)
                    all_rows.extend(rows)
                    print(f"Parsed {len(rows)} products from {category_name}")
                except requests.RequestException as exc:
                    print(f"Request failed for {category_name}: {exc}")
                except Exception as exc:
                    print(f"Unexpected error for {category_name}: {exc}")

        if not all_rows:
            print("No products were scraped.")
            return

        save_to_csv(all_rows, OUTPUT_FILE)
        print(f"Saved {len(all_rows)} rows to {OUTPUT_FILE}")

    except OSError as exc:
        print(f"File write failed: {exc}")
    except Exception as exc:
        print(f"Unexpected error: {exc}")


if __name__ == "__main__":
    main()

A few things are worth noting here:

Newegg splits the current price across multiple HTML elements, so the parser has to join the whole and fractional parts manually.
The requests also run in parallel, which helps when tracking multiple categories at once.
And because ScrapingBee handles JavaScript rendering plus proxy infrastructure on its side, the scraping code stays pretty clean instead of turning into browser automation soup.

Run it with:

uv run main.py

The output file will look like this:

category,title,price,product_url
servers_workstations,"HPE ProLiant MicroServer Gen11 server with Intel Xeon E-2414 Processor, 16 GB (1x16 GB UDIMM) Single Rank Memory, dedicated iLO-M.2 port kit, Embedded Intel® VROC SATA for HPE ProLiant - P78521-005","1,928.00",https://www.newegg.com/hpe-proliant-microserver-gen11-p78521-005-ultra-micro-tower/p/2NS-0006-3HRT2
servers_workstations,"GIGABYTE AI TOP 100 Z890 Desktop PC, Intel Core Ultra 9 285K, GIGABYTE RTX 5090, 128GB DDR5, 2TB + 320GB SSD, Windows 11 Pro, Black","6,299.99",https://www.newegg.com/p/N82E16859252041

Clean and normalize scraped data

Raw scraped values often look fine at first, but they can break later when you try to analyze them. Prices may include commas or be split across multiple elements, and titles can contain messy spacing.

Here's an updated version of parse_products() that cleans things up:

def parse_products(category_name, html_bytes, base_url="https://www.newegg.com"):
    """Parse product cards and normalize data."""

    soup = BeautifulSoup(html_bytes, "lxml")
    rows = []

    for product in soup.select("div.item-cell"):
        title_tag = product.select_one("a.item-title")
        price_wrap = product.select_one("li.price-current")

        if not title_tag or not price_wrap:
            continue

        # Clean title (remove weird spacing)
        title = " ".join(title_tag.get_text(" ", strip=True).split())

        # Normalize product URL
        product_url = title_tag.get("href", "").strip()
        if product_url:
            product_url = urljoin(base_url, product_url)

        # Extract and clean price parts
        whole_tag = price_wrap.select_one("strong")
        fraction_tag = price_wrap.select_one("sup")

        whole = whole_tag.get_text(strip=True) if whole_tag else ""
        fraction = fraction_tag.get_text(strip=True) if fraction_tag else ""

        # Remove commas and dots from parts
        whole = whole.replace(",", "")
        fraction = fraction.replace(".", "").replace(",", "")

        # Build clean numeric price
        if whole and fraction:
            price = f"{whole}.{fraction}"
        elif whole:
            price = whole
        else:
            price = ""

        if not title or not price:
            continue

        rows.append(
            {
                "category": category_name,
                "title": title,
                "price": price,
                "product_url": product_url,
            }
        )

    return rows

Key idea: always normalize prices into a consistent numeric format like 6299.99 and clean titles early, so the data stays usable later.

Store daily snapshots for price tracking

If the goal is price tracking and not just one-off scraping, it helps to save a daily snapshot instead of overwriting the same CSV every time. The main idea: each run creates a new file for that day, and each row includes a stable product ID. That way, today's data can be compared with yesterday's later on.

For Newegg, the safest identifier to keep is the product ID from the product URL. A title can change a bit over time, but the product ID is much better for matching the same item across snapshots.

First, update the output file name. Replace this:

OUTPUT_FILE = Path("newegg_prices.csv")

with this:

from datetime import datetime

TODAY = datetime.now().strftime("%Y-%m-%d")
OUTPUT_FILE = Path(f"newegg_prices_{TODAY}.csv")

Now, here is an updated version of parse_products() that keeps the cleaned fields and also extracts product_id from the product URL:

def parse_products(category_name, html_bytes, base_url="https://www.newegg.com"):
    """Parse product cards and normalize data."""

    soup = BeautifulSoup(html_bytes, "lxml")
    rows = []

    for product in soup.select("div.item-cell"):
        title_tag = product.select_one("a.item-title")
        price_wrap = product.select_one("li.price-current")

        if not title_tag or not price_wrap:
            continue

        # Clean title
        title = " ".join(title_tag.get_text(" ", strip=True).split())

        # Normalize product URL
        product_url = title_tag.get("href", "").strip()
        if product_url:
            product_url = urljoin(base_url, product_url)

        # Extract product ID from URLs like:
        # https://www.newegg.com/p/N82E16883101919
        product_id = product_url.rstrip("/").split("/")[-1] if product_url else ""

        # Extract and clean price parts
        whole_tag = price_wrap.select_one("strong")
        fraction_tag = price_wrap.select_one("sup")

        whole = whole_tag.get_text(strip=True) if whole_tag else ""
        fraction = fraction_tag.get_text(strip=True) if fraction_tag else ""

        whole = whole.replace(",", "")
        fraction = fraction.replace(".", "").replace(",", "")

        if whole and fraction:
            price = f"{whole}.{fraction}"
        elif whole:
            price = whole
        else:
            price = ""

        if not title or not price or not product_id:
            continue

        rows.append(
            {
                "category": category_name,
                "product_id": product_id,
                "title": title,
                "price": price,
                "product_url": product_url,
            }
        )

    return rows

Since we now store product IDs, update save_to_csv() like this:

def save_to_csv(rows, output_file):
    """Save all scraped rows into a CSV file."""

    with output_file.open("w", newline="", encoding="utf-8") as csv_file:
        writer = csv.DictWriter(
            csv_file,
            fieldnames=["category", "product_id", "title", "price", "product_url"],
        )
        writer.writeheader()
        writer.writerows(rows)

Compare two daily snapshots

Once daily snapshots are in place, the next step is easy: compare two CSV files and see what changed.

A simple way to do that is with a second script. It asks for two file names, loads both snapshots, matches products by product_id, and then shows price changes, new products, and removed products.

Here's the code:

import csv
from decimal import Decimal, InvalidOperation
from pathlib import Path


def load_snapshot(path):
    """Load one snapshot CSV into a dict keyed by product_id."""

    products = {}

    with path.open("r", newline="", encoding="utf-8") as csv_file:
        reader = csv.DictReader(csv_file)

        for row in reader:
            product_id = row.get("product_id", "").strip()
            price_raw = row.get("price", "").strip()

            if not product_id or not price_raw:
                continue

            try:
                price = Decimal(price_raw)
            except InvalidOperation:
                continue

            products[product_id] = {
                "category": row.get("category", "").strip(),
                "title": row.get("title", "").strip(),
                "price": price,
                "product_url": row.get("product_url", "").strip(),
            }

    return products


def compare_snapshots(old_data, new_data):
    """Compare two snapshots and print the differences."""

    old_ids = set(old_data.keys())
    new_ids = set(new_data.keys())

    common_ids = old_ids & new_ids
    added_ids = new_ids - old_ids
    removed_ids = old_ids - new_ids

    price_changes = []

    for product_id in common_ids:
        old_price = old_data[product_id]["price"]
        new_price = new_data[product_id]["price"]

        if old_price != new_price:
            price_changes.append(
                {
                    "product_id": product_id,
                    "title": new_data[product_id]["title"] or old_data[product_id]["title"],
                    "old_price": old_price,
                    "new_price": new_price,
                    "product_url": new_data[product_id]["product_url"] or old_data[product_id]["product_url"],
                }
            )

    return price_changes, added_ids, removed_ids


def print_report(price_changes, added_ids, removed_ids, old_data, new_data):
    """Print a simple comparison report."""

    print("\n=== Price changes ===")
    if not price_changes:
        print("No price changes found.")
    else:
        for item in sorted(price_changes, key=lambda x: x["title"].lower()):
            direction = "dropped" if item["new_price"] < item["old_price"] else "increased"
            print(
                f"- {item['title']}\n"
                f"  ID: {item['product_id']}\n"
                f"  Price {direction}: {item['old_price']} -> {item['new_price']}\n"
                f"  URL: {item['product_url']}\n"
            )

    print("\n=== New products ===")
    if not added_ids:
        print("No new products found.")
    else:
        for product_id in sorted(added_ids):
            item = new_data[product_id]
            print(
                f"- {item['title']}\n"
                f"  ID: {product_id}\n"
                f"  Price: {item['price']}\n"
                f"  URL: {item['product_url']}\n"
            )

    print("\n=== Removed products ===")
    if not removed_ids:
        print("No removed products found.")
    else:
        for product_id in sorted(removed_ids):
            item = old_data[product_id]
            print(
                f"- {item['title']}\n"
                f"  ID: {product_id}\n"
                f"  Last seen price: {item['price']}\n"
                f"  URL: {item['product_url']}\n"
            )


def main():
    """Ask for two snapshot files and compare them."""

    old_file = Path(input("Enter the older snapshot CSV file: ").strip())
    new_file = Path(input("Enter the newer snapshot CSV file: ").strip())

    if not old_file.exists():
        print(f"File not found: {old_file}")
        return

    if not new_file.exists():
        print(f"File not found: {new_file}")
        return

    try:
        old_data = load_snapshot(old_file)
        new_data = load_snapshot(new_file)

        price_changes, added_ids, removed_ids = compare_snapshots(old_data, new_data)
        print_report(price_changes, added_ids, removed_ids, old_data, new_data)

    except OSError as exc:
        print(f"File read failed: {exc}")
    except Exception as exc:
        print(f"Unexpected error: {exc}")


if __name__ == "__main__":
    main()

Run it like this:

uv run compare_prices.py

Then enter two filenames, for example:

newegg_prices_2026-03-18.csv
newegg_prices_2026-03-19.csv

Sample output:

=== Price changes ===
- Dell Pro Micro QCM1250 Desktop - Intel Core Ultra 5 235T - 16GB - 256GB SSD - Micro PC - Intel Chip - Windows 11 Pro - IEEE 802.11ax - 90W V6TNK
  ID: N82E16883988059
  Price increased: 859.99 -> 879.99
  URL: https://www.newegg.com/p/N82E16883988059

- HP Pro Mini 400 Business Mini Desktop (Intel i5-14500T, Intel UHD 770 shared, 16GB DDR4, 512GB PCIe SSD, WiFi 6E, Bluetooth 5.3, 90W PSU, RJ-45, 2 Display Port, 1 x HDMI 2.1, Win 11 Pro)
  ID: 1VK-001E-4UFE9
  Price dropped: 599.99 -> 499.99
  URL: https://www.newegg.com/p/1VK-001E-4UFE9


=== New products ===
No new products found.

=== Removed products ===
No removed products found.

This is enough to build a simple price tracking workflow without adding a database yet.

Tips for scaling and improving your scraper

Once the basic setup works, there's a lot of room to make the scraper more solid and production-ready. A few practical things worth adding over time:

Exponential backoff — instead of retrying instantly after a failed request, add delays that grow over time. This helps avoid getting blocked and makes retries more effective
Better error handling — log failures, track which pages didn't load, and avoid silently skipping data
Tune concurrency — running 3 parallel requests is fine for a demo, but in real scenarios you'll want to adjust this depending on limits and stability
Pagination support — most category pages don't stop at page 1, so looping through pages is key for full coverage
Data deduplication — avoid storing the same product multiple times when scraping repeatedly
Structured storage — CSV works for quick tests, but for larger setups a database or data pipeline makes more sense
Scheduling — run the scraper on a schedule to track price changes over time

Frequently asked questions (FAQs)

What Python libraries are used for price scraping?

Most price scraping Python setups use a simple stack: requests for fetching pages, BeautifulSoup or lxml for parsing HTML, and sometimes pandas or csv for storing results. For more advanced cases, tools like Playwright or APIs can help handle tougher sites.

👉 Check more options in these price scraper tools

How often can I scrape prices from a website?

There's no universal rule here. Frequency depends on the site, rate limits, and how often prices change. Scraping too aggressively can lead to blocks, so spacing requests out and staying consistent is usually safer for long-term monitoring.

👉 Learn more about scraping intervals in this MAP monitoring frequency guide

Why do price scraping scripts get blocked?

Scripts usually get blocked because of too many requests, missing headers, or trying to access dynamic content without proper rendering. Websites use protections to detect unusual behavior, so basic setups often fail without retries, proxies, or JavaScript support.

👉 See common issues in this guide on dynamic website scraping

Can I scrape prices from Amazon using Python?

Technically yes, but it's not straightforward. Amazon has strong anti-bot systems, so simple scripts often fail or get blocked quickly. Reliable scraping usually requires proxies, headers, and sometimes APIs designed specifically for handling Amazon pages.

👉 Here's a detailed guide on scraping Amazon prices

Ilya Krukowski

Ilya is an IT tutor and author, web developer, and ex-Microsoft/Cisco specialist. His primary programming languages are Ruby, JavaScript, Python, and Elixir. He enjoys coding, teaching people and learning new things. In his free time he writes educational posts, participates in OpenSource projects, tweets, goes in for sports and plays music.