Amazon API scraping is the most reliable way to pull product data without fighting Amazon's HTML, anti-bot rules, or constant layout changes. Instead of wrestling with proxies and brittle selectors, you call an endpoint and get clean Amazon product data ready for analysis: titles, prices, ratings, images, descriptions, reviews, availability, all structured in one JSON.
In this guide you'll see how to scrape Amazon product data with Python using an API-first workflow. We'll still touch on classic HTML concepts so you know what the API replaces, but the focus is on stable, low-maintenance Amazon product data scraping rather than building fragile scrapers.
As always, keep things ethical: scrape only public product information, respect Amazon's ToS, and follow local laws.
If you want more fundamentals on how to scrape Amazon in general, here's a good starting point: How to Web Scrape Amazon with Python.

Quick step-by-step walkthrough
Here's the short loop you'll use to scrape Amazon product data:
- Set up a small Python project with
requests+python-dotenvand load your API key from.env. - Get ASINs either from the Amazon Search endpoint (keywords → ASIN list) or directly if you already have them.
- For each ASIN, call the Amazon Product endpoint to fetch structured JSON instead of scraping HTML.
- Extract the fields you need and shape them into a clean dictionary or row.
- Export the results to whatever tool your workflow uses: JSON, CSV, Sheets, or a database.
- When scaling, add retries, rate limiting, and logging so the pipeline stays reliable.
That's the whole pattern for scraping Amazon product data with Python while keeping the workflow simple, stable, and API-first.
Before we begin
Before we dive in, here's the vibe: using an API is way more dependable than poking around Amazon's HTML and hoping their selectors don't change tomorrow. When you scrape straight from the page, one tiny layout tweak and your whole script face-plants. With an API, you just hit an endpoint and get the goods without brittle parsing or guessing.
Here's the setup we'll roll with:
- Grab a ScrapingBee API key and pick the Amazon Product endpoint.
- Pass the product ASIN as a query parameter.
- Get back clean JSON with fields like
product_name,price,rating,images,product_details,category,stock,delivery. - If you need ASINs first, use the Amazon Search endpoint or AI Web Scraping to turn keywords into ASIN lists.
- Drop the final data into CSV, a DB, or whatever tool you vibe with.
This gives you the full picture of how to scrape Amazon product data without drowning in markup, how the Amazon product API fits into the workflow, and how to keep things smooth instead of wrestling HTML all day.
Setting up your environment
Choose your scraping approach: Raw HTML vs product data API
When you scrape Amazon products you've basically got two roads to pick from:
- Raw HTML scraping.
You send requests, spoof headers, rotate proxies, and hope Amazon doesn't get cranky. Then you dig through the page with BeautifulSoup or Parsel, trying to keep track of class names that seem to change just to spite you. It can work, but you'll be dealing with 503s, CAPTCHAs, throttling, and markup that mutates faster than you can update your selectors. Half the workflow becomes babysitting the scraper instead of actually getting data. - Product data API.
This sidesteps the whole circus. The API takes care of proxies, headers, rendering, retries — the ugly stuff. You pass an ASIN, it sends back structured JSON with all the product details. No parsing. No selector spelunking. And the best part: the workflow stays stable even when Amazon reshuffles its HTML again.
For this tutorial we're sticking with the API approach. No HTML digging, no hunting for fragile selectors; just clean requests, clean responses, and a much smoother ride.
Prerequisites: Python, HTTP client, and virtual environment
Before we spin up our Amazon product scraper Python setup, here's the bare-bones kit you need, nothing wild:
- Python 3.10+ installed.
- A virtual environment. I'm rolling with uv so we don't micromanage dependencies.
- The
requestslibrary for making HTTP calls. ScrapingBee has a Python client if you want it, but plainrequestsgets the job done. python-dotenvso we can load a proper.envinstead of hard-coding secrets.- Basic comfort with JSON since the API spits back structured data.
- Your ScrapingBee API key; free tier gives you 1000 credits, more than enough to mess around.
Spin up your project:
uv init amazon-api-demo
cd amazon-api-demo
uv add requests python-dotenv
Create a .env file in the project root:
SCRAPINGBEE_API_KEY=your_key_here
Load it inside the main.py file:
from dotenv import load_dotenv
import os
load_dotenv()
API_KEY = os.getenv("SCRAPINGBEE_API_KEY")
Now your secrets stay out of the code, you stay out of trouble, and we can scrape Amazon product data Python-style without leaking keys like amateurs.
First test call to Amazon
Alright, picture this: you're "totally not" shopping for yourself. You're a responsible adult grabbing a cool LEGO set "for your kid." Yeah. Sure. We've all been there. Anyway, that gives us the perfect test target for our first Amazon API call.
Goal: pull clean product data for a LEGO Rivendell set using the Amazon product endpoint. Here's a minimal example using requests:
import os
import requests
from dotenv import load_dotenv
# Load API key from .env
load_dotenv()
API_KEY = os.getenv("SCRAPINGBEE_API_KEY")
# Sample ASIN for the LEGO LOTR Rivendell set
ASIN = "B0BNWCWG7L"
# ScrapingBee Amazon Product endpoint
url = "https://app.scrapingbee.com/api/v1/amazon/product"
# Query params for the request
params = {
"api_key": API_KEY, # your API key
"query": ASIN, # the ASIN we want info for
"light_request": True, # faster response (without a real browser), still plenty of data
"domain": "com" # amazon.com
}
# Fire the request
resp = requests.get(url, params=params, timeout=60)
resp.raise_for_status() # throw if something went wrong
# Parse JSON
data = resp.json()
# Print a few fields to confirm everything works
print(data["product_name"])
print(data["price"])
print(data["rating"])
print(data["images"][:2]) # first two images
# Stock info from buybox:
buybox = (data.get("buybox") or [{}])[0]
stock_text = (buybox.get("stock") or "").strip()
print(stock_text)
# Full data
print(data)
Run your script:
uv run python main.py
If your setup is solid, you'll get a large structured JSON blob back. Amazon product name, price, rating, images, product details, stock, delivery, sometimes even review snippets. It's a full data dump without touching a single selector.
To keep things tidy, here's a trimmed-down preview of what the API returns:
{
"asin": "B0BNWCWG7L",
"product_name": "LEGO Icons The Lord of The Rings: Rivendell Building Set for Adults...",
"price": 499.99,
"rating": 4.8,
"images": [
"https://m.media-amazon.com/images/I/91Un4M87n2L._AC_SL1500_.jpg",
"https://m.media-amazon.com/images/I/51E5mAhLkTL._AC_.jpg"
],
"product_details": {
"asin": "B0BNWCWG7L",
"item_weight": "15.2 pounds",
"product_dimensions": "20.59 x 18.9 x 8.58 inches",
"release_date": "June 1, 2023"
},
"stock": "In Stock"
}
And this right here is why people go API-first for Amazon scraping. No headless browsers, no CAPTCHAs, no proxy gymnastics. You send one ASIN → you get clean structured data → you move on with your life.
Scrape Amazon product data (API-first, with HTML context)
Get product ASIN
"ASIN" means Amazon Standard Identification Number. It's the anchor: the thing that stays stable while prices shift, titles get edited, or sellers swap out images. If you want clean long-term tracking, you start every workflow by locking onto the ASIN.
There are two common ways to get it.
Extract the ASIN straight from a product URL
Most Amazon URLs contain /dp/<ASIN> or /gp/product/<ASIN>, so you can pull it out with a simple split or a tiny regex:
import re
def asin_from_url(url: str) -> str | None:
"""
Extract ASIN from common Amazon product URL formats.
Returns the ASIN string or None if no match is found.
"""
m = re.search(r"/dp/([A-Z0-9]{10})|/gp/product/([A-Z0-9]{10})", url)
if not m:
return None
return m.group(1) or m.group(2)
print(asin_from_url("https://www.amazon.com/dp/B0BNWCWG7L"))
This works for the vast majority of product URLs and is perfect when you already know what item you're targeting.
Use search to discover products and get ASINs
If you're starting from keywords ("lego castle", "noise cancelling headphones", etc.), then you'll want to search Amazon and pull the ASINs out of the results. This is where Amazon search data becomes super handy: keywords → product list → ASINs → full product API calls. ScrapingBee has an Amazon Search endpoint for exactly this.
Here's a curl example that searches for "Pride and Prejudice":
curl "https://app.scrapingbee.com/api/v1/amazon/search?api_key=$SCRAPINGBEE_API_KEY&query=Pride+and+prejudice&light_request=true&domain=com&sort_by=bestsellers&start_page=1&pages=1"
light_request=true means "don't fire up a full real browser." It's quick, cheaper, and great when you just want ASINs.
A typical response looks like:
{
"page": 1,
"products": [
{
"asin": "B009CGCQPU",
"title": "Pride & Prejudice",
"price": 3.99,
"rating": 4.8,
"reviews_count": 37500
}
]
}
Once you have an ASIN, the product endpoint makes downstream tracking easy because it repeats identifiers in the response. You'll usually see both asin and parent_asin, which is handy when variations exist and you want to group related listings.
asin = data["asin"]
parent_asin = data.get("parent_asin")
Extract product name
Once you have the ASIN, pulling the product title is basically one field lookup. The Amazon product data endpoint already gives you a clean product_name value, so you don't have to scrape markup or hope the selector stays the same.
If you look at the actual Amazon page (like amazon.com/dp/B0BNWCWG7L), the title sits under the #productTitle element. That's the classic HTML scraping target. But since Amazon likes to change layouts, the API route is way more stable: you get the same field every time without worrying about selectors breaking.
Here's the request using the proper params and a tiny snippet to extract the product name:
import os
import requests
from dotenv import load_dotenv
# Load API key
load_dotenv()
API_KEY = os.getenv("SCRAPINGBEE_API_KEY")
ASIN = "B0BNWCWG7L"
url = "https://app.scrapingbee.com/api/v1/amazon/product"
params = {
"api_key": API_KEY,
"query": ASIN,
"light_request": True,
"domain": "com"
}
# Make the API call
resp = requests.get(url, params=params, timeout=60)
resp.raise_for_status()
data = resp.json()
# Extract and print product title
print("Product name:", data.get("product_name"))
That's all you need to extract product title Amazon page style, but without touching the page at all. The API normalizes the field for you, and your code stays simple even if Amazon tweaks their layout tomorrow.
Retrieve product description
Now let's pull the actual product description — the part that's usually a total pain if you scrape HTML directly. On the Amazon page, the description lives somewhere under #productDescription, sometimes in nested spans, sometimes in unpredictable UL/LI structures. It shifts often, and every scraper eventually ends up playing whack-a-mole with selectors.
The API route avoids all of that. Before we fetch the description, we upgrade our request params so the API returns data in the exact language, country, and currency we want. ScrapingBee lets you set these explicitly, instead of relying on whatever Amazon decides to serve.
Here's an updated request with the extra parameters:
import os
import requests
from dotenv import load_dotenv
# Load API key
load_dotenv()
API_KEY = os.getenv("SCRAPINGBEE_API_KEY")
ASIN = "B0BNWCWG7L"
url = "https://app.scrapingbee.com/api/v1/amazon/product"
params = {
"api_key": API_KEY,
"query": ASIN,
"light_request": True, # keeps the response fast and lightweight
"domain": "com",
"country": "us",
"language": "en_US",
"currency": "USD"
}
resp = requests.get(url, params=params, timeout=60)
resp.raise_for_status()
data = resp.json()
The Amazon product data API gives you three description-related fields:
description— the main long-form text block.product_overview— short structured details (when available).bullet_points— the feature bullets near the top of the page.
If this were raw HTML scraping, you'd hunt down #productDescription, hope the UL/LI tree didn't change again, and fight random nested tags. With the API, you just read whichever keys you need.
Here's a helper snippet to merge everything into one clean text blob:
def squish(x) -> str:
return " ".join(str(x).split()) if x else ""
def join_str_values(d: dict) -> str:
# best-effort: join all string-ish values in a dict
return " ".join(squish(v) for v in d.values() if isinstance(v, (str, int, float)) and squish(v))
def overview_text(o) -> str:
if isinstance(o, list):
parts = []
for row in o:
if isinstance(row, dict):
t = squish(row.get("title"))
d = squish(row.get("description"))
parts.append(f"{t}: {d}" if t and d else (t or d))
else:
parts.append(squish(row))
return " ".join(p for p in parts if p)
if isinstance(o, dict):
return join_str_values(o)
return squish(o)
def bullets_text(b) -> str:
if isinstance(b, list):
parts = []
for item in b:
if isinstance(item, dict):
parts.append(join_str_values(item))
else:
parts.append(squish(item))
return " ".join(p for p in parts if p)
return squish(b)
description = squish(data.get("description"))
overview = overview_text(data.get("product_overview"))
bullets = bullets_text(data.get("bullet_points"))
full_description = " ".join(x for x in (overview, bullets, description) if x)
print("Description preview:", full_description[:250], "...")
That's the entire workflow to scrape Amazon product descriptions without ever touching HTML. The API normalizes everything for you, the fields are stable, and you just decide how to combine the pieces.
Collect product specifications
Specs are usually the swamp of Amazon scraping. On the page they're scattered across tables, sidebars, "product details" blocks, half-random sections, and whatever else Amazon feels like trying this week. With the API, none of that matters as you just read structured fields like product_details and sometimes product_overview and call it a day.
Here's how to grab the usual buckets. Some specs (like brand) sit right at the top level, while the heavier stuff lives inside product_details. Depending on the listing, product_overview might show up too, so it's always worth checking.
# your code to send the API request ...
brand = data.get("brand")
details = data.get("product_details") or {}
overview = data.get("product_overview") or {}
print("Brand:", brand)
print("Weight:", details.get("item_weight"))
print("Dimensions:", details.get("product_dimensions"))
print("Release date:", details.get("release_date"))
For anything serious (analytics, enrichment, pushing data into a warehouse) it's best to flatten everything into one spec dictionary. That way you can store it, turn it into rows, or dump it to CSV without juggling nested structures.
specs = {}
# top-level fields that often behave like specs
for key in ["asin", "parent_asin", "brand", "manufacturer", "product_dimensions"]:
val = data.get(key)
if val not in (None, "", [], {}):
specs[key] = val
# product_details is already a nice spec map
for k, v in (data.get("product_details") or {}).items():
if v not in (None, "", [], {}):
specs[f"product_details.{k}"] = v
# product_overview may be missing; when present, flatten it too
if isinstance(overview, dict):
for k, v in overview.items():
if v not in (None, "", [], {}):
specs[f"product_overview.{k}"] = v
print("Flattened specs:")
for k, v in list(specs.items())[:10]:
print(f"- {k}: {v}")
This is where the API-first workflow really shines. You're still scraping Amazon product data, but you skip the whole "find the right table, parse the right row, pray Amazon doesn't redesign everything tomorrow" nonsense. You just get a clean spec map you can actually use.
Get product rating
Ratings are one of those things that feel small but turn into a mess if you scrape the raw page. Amazon uses star SVGs, aria-label text, nested spans, and a layout that shifts depending on the product category. With the API, you skip all of that and read clean numeric fields which is perfect for storing, comparing, and tracking Amazon product data over time.
From the same product response, you'll typically see fields like:
{
"rating": 4.8,
"rating_stars_distribution": [
{ "rating": 5, "percentage": 92 },
{ "rating": 4, "percentage": 4 },
{ "rating": 3, "percentage": 1 },
{ "rating": 2, "percentage": 1 },
{ "rating": 1, "percentage": 2 }
],
"reviews": [
{
"author": "Danil",
"rating": 5,
"title": "5.0 out of 5 stars Stunning set...",
"timestamp": "Reviewed in the United States July 15, 2025"
}
]
}
Here's how to grab the rating fields in Python:
rating = data.get("rating")
distribution = data.get("rating_stars_distribution") or []
reviews = data.get("reviews") or []
print("Rating:", rating)
print("Distribution:")
for row in distribution:
print(f"- {row['rating']} stars: {row['percentage']}%")
print("Total reviews fetched:", len(reviews))
And if you want clean numeric metrics for long-term tracking or analytics, you can flatten everything into a compact dictionary:
metrics = {
"asin": ASIN,
"rating": rating,
"five_star_pct": next((x["percentage"] for x in distribution if x["rating"] == 5), None),
"one_star_pct": next((x["percentage"] for x in distribution if x["rating"] == 1), None),
"review_count": len(reviews)
}
This is the easiest way to pull Amazon rating data without wading through HTML, guessing which span contains the real value, or parsing star icons. Structured JSON keeps everything clean and predictable, and you can track trends or build dashboards without worrying about markup changes.
Collect product reviews
Reviews come in two flavors, and mixing them up is how perfectly good projects turn into spaghetti.
First, you've got the summary stuff: average rating, star distribution, review count. That's what you use for rankings, snapshots, dashboards; the quick hit that tells you whether a product is loved, hated, or mid. If all you want is a clean way to extract product data from Amazon without building a monster pipeline, this is the lane.
Second, there's the real content: full review text, titles, timestamps, verified purchase flags, helpful votes, author info. This is what you use for sentiment analysis, topic extraction, duplicate detection, and all the nerdy text-processing magic.
Option 1: Use the product endpoint and turn off light mode when you need more
The Amazon Product endpoint can return review data, but light_request=True keeps things fast and sometimes trims deeper fields. If you start noticing missing reviews or skinny fields, just flip off light mode so ScrapingBee spins up a real browser and grabs the richer content.
params = {
"api_key": API_KEY,
"query": ASIN,
"light_request": False,
# ... other params ...
}
This lets you stay in the same workflow you already built, just with more review meat when you decide you need it.
Option 2: Use a dedicated reviews scraper when you need reviews at scale
If you're after a lot of reviews (full text, metadata, timestamps, verified status, helpful votes) you'll get a smoother workflow with a dedicated review scraper. It's built for pagination, consistency, and treating reviews as structured records instead of trying to force them through the product endpoint.
This is usually where people end up when "how to scrape Amazon product data" starts drifting into NLP, long-term monitoring, fraud detection, or cases where you want to approximate "Amazon product sales data" by looking at review velocity and rating swings.
If you want full walkthroughs and endpoints, start here:
My general rule: begin with summary metrics (they're stable and cheap) and only fetch full review text when you actually need the deeper analysis.
Product price retrieval
Price is one of the messiest parts of scraping Amazon the old-school HTML way. Amazon loves switching price formats depending on seller, Prime status, coupons, variations, stock, or wherever on Earth they think you're browsing from. In markup that turns into scattered spans, random containers, duplicated numbers, and the eternal "bro, which one is the actual price?" headache.
The API cleans all that up. You get normalized fields, not a scavenger hunt. The key ones are:
priceandcurrency— the basic current price signalprice_buybox— the Buy Box price (usually the real one you care about)price_shipping— shipping cost if it existspricing_url— the offer listing page, useful when you want to dig into sellers
Here's a tiny snippet that grabs them, calculates a final price, and tells you if the shipping is free:
import re
from decimal import Decimal, InvalidOperation
def to_money(x) -> Decimal | None:
"""
Convert API price-ish values to Decimal.
Handles:
- numbers (int/float/Decimal)
- strings like "$12.99", "EUR 12,99", "12.99", "12,99"
- "FREE", "", None -> 0
Returns Decimal or None if it can't parse.
"""
if x is None:
return None
if isinstance(x, (int, float, Decimal)):
return Decimal(str(x))
if isinstance(x, str):
s = x.strip()
if not s:
return Decimal("0")
if s.upper() in {"FREE", "FREE SHIPPING"}:
return Decimal("0")
# keep digits, minus, dot, comma; drop currency symbols/text
s = re.sub(r"[^0-9,\.\-]", "", s)
# normalize commas:
# - "12,99" -> "12.99"
# - "1,234.56" -> "1234.56"
if s.count(",") and s.count("."):
s = s.replace(",", "")
elif s.count(",") and not s.count("."):
s = s.replace(",", ".")
try:
return Decimal(s)
except InvalidOperation:
return None
return None
price_raw = data.get("price_buybox") or data.get("price")
shipping_raw = data.get("price_shipping")
price = to_money(price_raw)
shipping = to_money(shipping_raw)
# treat missing shipping as 0 (common)
shipping = shipping if shipping is not None else Decimal("0")
final_price = (price + shipping) if price is not None else None
is_free_shipping = (shipping == Decimal("0"))
currency = (data.get("currency") or "").strip()
print("Base price:", price_raw, "->", price, currency)
print("Shipping:", shipping_raw, "->", shipping, currency, "(free)" if is_free_shipping else "")
print("Final price:", final_price, currency)
print("Offers URL:", data.get("pricing_url"))
And that's why API-first is the smooth way to scrape product data from Amazon. You're not digging through a forest of spans trying to guess which $49.99 is real. You're reading stable fields, which makes price tracking, alerts, comparisons, and even indirect "Amazon product sales data" modeling way easier, especially when you combine it with stock changes and review velocity.
If you want a deeper dive on pricing edge cases, here's a good reference: How to Scrape Amazon Prices with ScrapingBee.
Product image extraction
Images are one of those things that get ugly fast when you scrape the raw page. Normally you're poking at #landingImage, then trying to decode the carousel logic, hidden JSON, hover swaps, and variant-specific images: all of which break the moment Amazon sneezes. With the API, you skip the whole circus. The Amazon product data includes an images array, and that's pretty much the whole story.
Example:
image_urls = data.get("images") or []
print("Found", len(image_urls), "images")
for url in image_urls[:3]:
print("-", url)
You can store the URLs directly or download them later if you really need local copies. Most workflows just keep the URLs because it's faster, cheaper, and scales better when you scrape Amazon product data in bulk.
If you ever need a full-page screenshot of the product instead (maybe for a catalog, QA checks, or visual monitoring), ScrapingBee also exposes a screenshot API. It's a simple add-on to the same flow and can save you a lot of time compared to scripting a headless browser.
Getting product video URLs and 360-degree views
Rich media is where expectations need to stay sane. Titles, prices, ratings, images, specs — those come through clean. But videos and 360-degree / 3D product views are trickier because Amazon loads them via JavaScript, separate media endpoints, and dynamic players. They're not guaranteed in every response, even with full rendering.
Try a non-light request
If you think a listing should have richer media, the first move is turning off light mode so ScrapingBee fires up a real browser session. That's your best shot at pulling the dynamic pieces easily.
params = {
"api_key": API_KEY,
"query": ASIN,
"light_request": False,
"domain": "com",
"country": "us",
"language": "en_US",
"currency": "USD"
}
resp = requests.get(url, params=params, timeout=60)
resp.raise_for_status()
data = resp.json()
If the product response includes direct media fields, you're good: just store the URLs and use them however your workflow needs (playback, catalog previews, QA checks, etc.). That's the cleanest, least painful way to work with whatever extra media the API can surface.
Reality check: 3D views are usually not returned as a nice API field
Here's the honest truth: after researching this for a bit, I've come to a conclusion that Amazon (almost) never exposes 3D or 360-degree viewers as a clean JSON field. Those fancy spin-around models and interactive views are loaded through JavaScript, stitched from multiple files, and often depend on Amazon's internal viewers. If you were scraping the raw page, you'd usually end up poking through script tags, canvas, and embedded JSON blobs, and even then you'd still hit a mess of separate assets like masks, tiles, and segment files. Reconstructing all of it is extremely painful, and sometimes basically impossible.
If you still want to try your luck, the fallback method is checking the rendered HTML returned by the API. For that you must explicitly ask for raw HTML in the response:
params = {
"api_key": API_KEY,
"query": ASIN,
"add_html": True
# ... other params ...
}
Once you have the HTML, search for the 3D viewer entry point. A common pattern you'll see is something like:
data-src="/view-3d?asin=ASIN_HERE"
Not all products have this, and even when they do, it's often just the first domino in a long chain of assets. But detecting it is still useful because it lets you link users to Amazon's own viewer.
Here's a simple way to scan the HTML for it:
import re
html = data.get("html") or ""
m = re.search(r'data-src=\\"(/view-3d\?asin=[A-Z0-9]{10}[^\\"]*)', html)
if m:
view3d_path = m.group(1)
view3d_url = "https://www.amazon.com" + view3d_path
print("3D view URL:", view3d_url)
else:
print("No 3D view link found in HTML for this ASIN.")
If you do find a viewer URL, you can save it as metadata, or open it in a browser and inspect the Network tab. Just be aware that, from what I was able to deduce, Amazon serves the "3D" content as a collection of assets (tile sets, mask files, JSON descriptors, and incremental resources loaded on demand). Rebuilding the actual 3D experience yourself is a project on its own.
If you really wanted to scrape and process those assets manually, you'd need something like BeautifulSoup to navigate the HTML, plus a ton of custom logic to follow the network requests. For most workflows, it's just not worth it. So the practical rule is: use the API for the structured Amazon product data, and treat 3D/360 assets as optional extras you detect and link to when they exist, not something you try to fully reconstruct.
Extract product brand and category
Brand and category become really important once you stop doing single-item lookups and start building actual datasets. They're what let you slice Amazon product search data into useful segments: "all LEGO sets", "all running shoes", "everything in Electronics → Headphones", etc. If you're building dashboards or doing category-level analysis, these fields are the backbone.
With the API, brand is usually simple. It often appears as a top-level field like brand, and many listings also include manufacturer:
brand = data.get("brand") or ""
manufacturer = data.get("manufacturer") or ""
print("Brand:", brand)
print("Manufacturer:", manufacturer)
Category can show up in different shapes depending on the listing. Sometimes it's an empty list, sometimes a list of strings, sometimes a ladder of objects representing the full path through Amazon's taxonomy. When that ladder is available, it's perfect for downstream work; you can save all levels, or just the final "leaf" category.
category = data.get("category") or []
# category can be an empty list or a ladder of strings/objects depending on the listing
print("Category raw:", category)
leaf_category = None
if isinstance(category, list) and category:
last = category[-1]
leaf_category = last.get("name") if isinstance(last, dict) else str(last)
print("Leaf category:", leaf_category)
Some listings also expose ranking signals: fields like sales_rank or best_sel`lers_rank. They're not always present, but when they are, you can trend them over time within the same category ladder. That's incredibly helpful for competitive monitoring and lightweight "Amazon product sales data" modeling when you can't get real sales numbers.
sales_rank = data.get("sales_rank") or data.get("best_sellers_rank")
print("Sales rank:", sales_rank)
Why use the API for this? Because brand and category are notoriously annoying to scrape in HTML. Amazon shifts their layout constantly, runs A/B tests, and shows different structures by marketplace and language. With the API you avoid the chaos — you read stable fields, store them, and your segmentation logic stays the same even as you scale your Amazon product data extraction across thousands of items.
Export product availability and stock status
Availability is one of those signals that punches way above its weight. Even if you never get real sales numbers, watching stock flips + price swings gives you a surprisingly good feel for what's moving. It's basically the "budget version" of how to get Amazon product sales data when Amazon doesn't give you the real thing. For competitor tracking or reseller monitoring, this stuff is gold.
In the API response, the stock story usually shows up in a few spots:
buybox[0]["stock"]— the human-readable stock label ("In Stock", "Only 2 left", "Currently unavailable", etc.)max_quantity— the highest quantity Amazon will let you add to cart (super handy as a soft low-stock indicator)buyboxinfo — the whole offer context: seller name, condition, price, returns policy, all the meta
Here's an example that pulls the fields and turns them into simple status labels you can actually use downstream:
buybox = (data.get("buybox") or [{}])[0]
stock_text = (buybox.get("stock") or "").strip()
max_qty = data.get("max_quantity")
def stock_label(stock_text: str, max_qty) -> str:
s = stock_text.lower()
if not stock_text:
return "Unknown"
if "out of stock" in s or "currently unavailable" in s:
return "Out of stock"
if "in stock" in s:
# max_quantity is not perfect, but it's a handy "low stock-ish" hint
if isinstance(max_qty, int) and max_qty <= 3:
return "Low stock"
return "In stock"
return "Unknown"
label = stock_label(stock_text, max_qty)
print("Stock text:", stock_text)
print("Max quantity:", max_qty)
print("Status label:", label)
This simple labeling gets ridiculously useful once you log it over time. You start spotting patterns instantly:
- Products bouncing between "In stock" and "Out of stock"
- Max quantity suddenly dropping (classic low-stock pressure)
- Buy Box switching sellers at 2 a.m. (your competitor sneaking in)
- Variations running out one by one
And the best part: you're not scraping a jungle of page text or parsing weird "only 6 left in stock (more on the way)" strings. The API keeps it clean so you can track availability like a proper dataset instead of herding HTML cats.
Get product delivery details
Delivery info is another area where the API saves you from a ton of headaches. On the raw Amazon page, delivery dates can appear in multiple spots, shift when you change ZIP codes, or hide behind dynamic widgets. With the API, everything shows up in a clean delivery array you can treat as structured data.
A typical response looks like:
"delivery": [
{
"type": "FREE delivery",
"date": { "by": "Tuesday, January 20" }
},
{
"type": "get FREE delivery Prime members",
"date": { "by": "Saturday, January 17" }
}
]
You can normalize this into a format that plays nicely with dashboards or databases:
delivery_entries = data.get("delivery") or []
normalized = []
for entry in delivery_entries:
dtype = entry.get("type") or ""
ddate = (entry.get("date") or {}).get("by") or ""
normalized.append({
"delivery_type": dtype,
"delivery_by": ddate
})
print("Delivery details:")
for row in normalized:
print(f"- {row['delivery_type']} -> {row['delivery_by']}")
If you need to extract product data from Amazon at scale, keeping delivery info normalized makes life much easier. You can track how delivery windows change, compare fulfillment speeds across sellers, and monitor whether Prime delivery stays consistent across regions or product types.
Extract product details from Amazon search
If you want to extract product data from Amazon starting from keywords, the Search endpoint is the fastest way to get moving. It returns multiple products in one call and gives you the essentials right in the results: ASINs, titles, prices, ratings, review counts, and even flags like Prime or sponsorship. That becomes your Amazon product search data layer, the discovery step before you dig deeper.
Here's the Search request, using the same locale controls as before:
curl "https://app.scrapingbee.com/api/v1/amazon/search?api_key=$SCRAPINGBEE_API_KEY&query=Pride+and+prejudice&light_request=true&country=us&domain=com&language=en_US¤cy=USD&sort_by=bestsellers&start_page=1&pages=1"
And here's the shape of the response:
{
"page": 1,
"products": [
{
"asin": "B009CGCQPU",
"title": "Pride & Prejudice",
"price": 3.99,
"currency": "USD",
"rating": 4.8,
"reviews_count": 37500,
"is_prime": false,
"organic_position": 1
}
]
}
This is perfect for discovery, but it's still just search-level fields. The standard flow for how to scrape Amazon product data looks like this:
- Search by keyword → get a bunch of ASINs
- Loop through those ASINs → call the Product endpoint for full details
Here's a minimal Python example that does exactly that:
import os
import requests
from dotenv import load_dotenv
load_dotenv()
API_KEY = os.getenv("SCRAPINGBEE_API_KEY")
search_url = "https://app.scrapingbee.com/api/v1/amazon/search"
product_url = "https://app.scrapingbee.com/api/v1/amazon/product"
# --- Step 1: Search ---
search_params = {
"api_key": API_KEY,
"query": "Pride and prejudice",
"light_request": True,
"country": "us",
"domain": "com",
"language": "en_US",
"currency": "USD",
"sort_by": "bestsellers",
"start_page": 1,
"pages": 1
}
search_resp = requests.get(search_url, params=search_params, timeout=60)
search_resp.raise_for_status()
search_data = search_resp.json()
asins = [p["asin"] for p in (search_data.get("products") or []) if p.get("asin")]
# --- Step 2: Fetch product details for each ASIN ---
for asin in asins[:5]:
product_params = {
"api_key": API_KEY,
"query": asin,
"light_request": True,
"country": "us",
"domain": "com",
"language": "en_US",
"currency": "USD"
}
product_resp = requests.get(product_url, params=product_params, timeout=60)
product_resp.raise_for_status()
product_data = product_resp.json()
print(asin, "-", product_data.get("product_name"))
That two-step pipeline is the most practical way to combine Amazon product search data with full product details while staying API-first, stable, and scalable.
If you want a focused reference for keyword-based discovery, check the ScrapingBee Amazon keyword scraper page.
Exporting your Amazon product data
Extract data using JSON
Since the API already hands you clean JSON, exporting is the easy part. No HTML scraping, no selectors waiting to betray you — just grab the fields you care about, shape them into a tidy dictionary, and save. That's the whole vibe of scraping Amazon product data with Python in a sane, API-first way.
Here's a tiny example that fetches a product by ASIN and writes a trimmed version into a .json file:
import os
import json
import requests
from dotenv import load_dotenv
load_dotenv()
API_KEY = os.getenv("SCRAPINGBEE_API_KEY")
ASIN = "B0BNWCWG7L"
url = "https://app.scrapingbee.com/api/v1/amazon/product"
params = {
"api_key": API_KEY,
"query": ASIN,
"light_request": True,
"domain": "com",
"country": "us",
"language": "en_US",
"currency": "USD"
}
resp = requests.get(url, params=params, timeout=60)
resp.raise_for_status()
data = resp.json()
# build a minimal export dict
export_item = {
"asin": data.get("asin"),
"product_name": data.get("product_name"),
"brand": data.get("brand"),
"price": data.get("price"),
"currency": data.get("currency"),
"rating": data.get("rating"),
"images": data.get("images") or [],
"stock": (data.get("buybox") or [{}])[0].get("stock"),
}
with open("product_export.json", "w", encoding="utf-8") as f:
json.dump(export_item, f, ensure_ascii=False, indent=2)
print("Saved product_export.json")
You end up with a clean, predictable JSON export you can drop into Pandas, push into Sheets, toss into Excel, or feed into your next pipeline step. No brittle markup, no guessing games, just structured data doing exactly what you want.
If you want more examples of working with JSON safely, here's a solid guide: How to read and parse JSON data with Python.
This is the core pattern of scrape Amazon product data Python style: treat everything as structured JSON and push it where you need it.
Export scraped data to Google Sheets
If you're scraping Amazon product data for a whole crew, Google Sheets is usually the friendliest landing zone. Everyone can poke at it, filter stuff, leave comments, and build tiny dashboards without touching your code. Since we're already pulling clean JSON from the API, the only job left is: turn that JSON into rows.
You've basically got two ways to play it:
Option 1: Python + Google Sheets API
Same pattern every time:
- Fetch Amazon product data from the API
- Build a row dict or list with the fields you actually care about
- Append it into your Google Sheet
The workflow is easy: install a Google Sheets client library, authenticate with a service account (the sane way) or OAuth (the painful way), then call the "append values" endpoint.
A typical row you'd push in looks like:
- ASIN
- Product name
- Brand
- Price
- Currency
- Rating
- Stock status
- First image URL
Because all your Amazon product data is already normalized JSON, the "format row" step is literally just grabbing fields. No HTML cleanup, no weird selectors; pure data in, pure data out.
A rough sketch of the structure (not full code, just the idea):
row = [
data.get("asin"),
data.get("product_name"),
data.get("brand"),
data.get("price"),
data.get("currency"),
data.get("rating"),
(data.get("buybox") or [{}])[0].get("stock"),
(data.get("images") or [None])[0],
]
# sheets.append_row(row)
Once auth is set up, appending is painless.
Option 2: No-code automation
If you don't feel like juggling service accounts or OAuth tokens, you can hand this off to no-code tools (Zapier, Make, n8n, etc.). They all follow the same logic: fetch product data → map JSON fields to columns → append row(s).
This is perfect when your stakeholders want a simple live sheet and you don't want to keep redeploying a backend just to update a spreadsheet.
Why Sheets works so well here
Since the API already gives you clean structured Amazon product data, not HTML, Sheets becomes a dead-simple reporting layer. You can:
- build price-tracking dashboards
- compare products by rating or brand
- detect stock changes
- log ASINs from search results
- share everything with non-technical teammates instantly
If you want a practical guide focused on Sheets workflows, this one is a good reference: How to scrape websites with Google Sheets.
Export scraped data to Excel
If your analysts live in Excel, the smoothest handoff is a CSV file. Since the API already gives you structured Amazon product data, exporting becomes a straight "select fields → write rows" kind of task. No parsing markup, no selector juggling.
Here's a minimal example that writes a single product into a CSV row using Python's built-in csv module. Expanding this to multiple products is just a matter of looping and appending rows.
import csv
export_row = {
"asin": data.get("asin"),
"product_name": data.get("product_name"),
"brand": data.get("brand"),
"price": data.get("price"),
"currency": data.get("currency"),
"rating": data.get("rating"),
"stock": (data.get("buybox") or [{}])[0].get("stock"),
}
with open("product_export.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=export_row.keys())
writer.writeheader()
writer.writerow(export_row)
print("Saved product_export.csv")
Once exported, Excel users can filter, pivot, chart trends, or blend your dataset with other sources. It's the simplest way to scrape product data from Amazon and hand it off in a format analysts already know.
If you want a deeper dive into spreadsheet workflows, here's a solid guide: How to scrape data from a website to Excel.
Tips for efficient and ethical Amazon scraping
Scraping Amazon products without code
If nobody on your team writes Python, you can still pull solid Amazon product data without touching a single line of code. No-code platforms (Make, Zapier, n8n, Airtable automations, etc.) can hit the ScrapingBee Amazon API just like a script would — they just wrap it in a click-and-drag UI.
The usual flow looks like this:
- Trigger — a schedule ("run every morning"), a button press, or a new keyword dropped into a sheet
- API call — fire the Amazon Product or Search endpoint
- Iterator — loop over each product in the results
- Storage — push rows into Google Sheets, Airtable, Notion, a DB, or wherever you want the data to live
This setup is great for analysts, PMs, and marketing folks who want structured Amazon product data without hassling developers every time they need a new report. It's perfect for lightweight monitoring, quick experiments, and internal dashboards. When you need real data cleaning, rate control, or custom logic, that's when you switch to Python.
If you want a breakdown aimed specifically at non-coders, here you go: Scrape Amazon products' price with no code.
Scraping Amazon products with AI
If you want to skip selectors entirely and let a model figure out the structure for you, the AI Web Scraping API is basically the "tell me what you want and I'll bring it" option. Instead of coding extraction logic, you describe the fields in plain language and the API handles the HTML, layout quirks, and dynamic widgets.
Here's a simple conceptual payload for grabbing Amazon search results:
{
"ai_query": "Return a list of products, their prices, ratings, and product links for DSLR cameras.",
"ai_extract_rules": {
"products[]": {
"title": "text of product title",
"price": "price value",
"rating": "star rating",
"url": "link to product page"
}
}
}
The API fetches the page, interprets all the shifting Amazon layouts, and hands you structured Amazon product data without relying on fixed selectors or brittle scraping rules. It's clutch when you want extraction that survives redesigns, or when you're building something quick and don't want to babysit a scraper forever.
If you want the full feature set, check out the AI Web Scraping API docs.
Food for thought: Scaling, compliance, and advanced use cases
Once you're comfortable scraping Amazon product data, it's worth stepping back and thinking about the bigger picture. Pulling one ASIN is nothing. Pulling thousands a day, tens of thousands a week, or a million a month? That's a whole different animal. At that point you're not running a script, you're running a pipeline.
When you scale, you'll want:
- Rate limiting so you don't hammer endpoints or trigger avoidable blocks
- Retries with backoff to smooth over temporary errors or timeouts
- Logging so you know exactly what ran, when, and what it returned
- Monitoring to detect anomalies or shifts in Amazon's behavior early
- A storage strategy to track historical changes (price, stock, rank, delivery, etc.)
Once you're collecting at scale, you're building a dataset, not a one-off export. Many teams blend Amazon product data with other e-commerce sources to analyze pricing, identify gaps, predict trends, detect arbitrage opportunities, or monitor competitors across marketplaces. If your architecture is clean from day one, adding new sources or scaling to new markets is painless.
On the compliance side, scrape responsibly:
- Respect Amazon's infrastructure — spread out requests, avoid unnecessary load
- Stay at product-level data — don't scrape personal information
- Review Amazon's Terms of Service, especially for commercial or enterprise use
- Get legal guidance if you're building something customer-facing at scale
- Use battle-tested tools (ScrapingBee) instead of rolling your own proxy farms
Long-term Amazon scraping is all about resilience. A quick script is fragile. A proper pipeline survives layout changes, API hiccups, and marketplace shifts without constant babysitting. Think ahead and design smart, and future you will thank you.
Ready to scrape Amazon product data with ScrapingBee?
By now you've seen the entire workflow: how Amazon structures product data, why DIY HTML scraping is a house of cards, and how the Amazon Product API turns that chaos into a clean, predictable JSON response. Titles, prices, ratings, reviews, images, specs, stock, delivery windows; all in one place, no proxies, no headless browsers.
You're fully equipped to build something real:
- Pass an ASIN into the Product endpoint → extract structured fields
- Push the results into Sheets, Excel, a database, or a pipeline
- Use the Search endpoint for keyword-based discovery
- Use the AI Web Scraping API when you want structure without selectors
And the entry cost is tiny: free credits, a simple request, and zero infrastructure. From there you can scale to thousands, tens of thousands, or millions of products without rewriting scrapers every time Amazon nudges its HTML.
If you want to dive deeper into the Amazon API itself, check the docs.
You've got the tooling, you've got the structure, so go build something!
Scraping Amazon product data FAQs
Is it legal to scrape Amazon product data with an API?
It depends on how you use it. Product-level data is generally less sensitive, but you should always read Amazon's Terms of Service and talk to legal counsel if you're doing commercial or high-volume scraping. ScrapingBee helps you stay ethical by handling infrastructure, rate limiting, and stable rendering, but compliance is ultimately on you.
What's the difference between scraping Amazon HTML and using an Amazon product data API?
HTML scraping is brittle: selectors break, layouts shift, widgets move, and CAPTCHAs bite. You spend more time fixing scrapers than analyzing data. The Amazon product data API returns clean JSON with titles, prices, ratings, images, specs, stock, and delivery info in one shot. No proxies, no headless browser, no markup archaeology.
How do I get an ASIN if I only have a product name or keyword?
Use the Amazon Search endpoint. Send your keyword, get a list of products with ASINs, titles, prices, ratings, and images. Then plug those ASINs into the Product endpoint to grab full details.
How can I avoid getting blocked while scraping Amazon?
Use rate limits, space out calls, and rely on tools that handle proxies, headers, retries, and browser simulation for you. ScrapingBee manages the heavy lifting so you don't have to wrestle Amazon's anti-bot systems by hand.
How do I get historical Amazon product prices?
Log API responses on a schedule — daily, hourly, whatever fits your use case. Store price, Buy Box info, stock labels, and delivery windows. Over time you build your own historical dataset, which is exactly what you need for trend analysis and forecasting.
Can I scrape Amazon product reviews at scale?
Yes, but don't rely on the product page alone. Use a dedicated review scraper or the Amazon Review API. They're designed for volume, expose full review text and metadata, and won't break every time Amazon rearranges their layout.
What's the best way to share scraped Amazon data with non-technical teams?
Push structured rows into Google Sheets, Excel, or a no-code dashboard. A simple pipeline is: scrape → map fields → append rows. Analysts, PMs, and marketers can explore the data instantly without waiting on engineering.

Ilya is an IT tutor and author, web developer, and ex-Microsoft/Cisco specialist. His primary programming languages are Ruby, JavaScript, Python, and Elixir. He enjoys coding, teaching people and learning new things. In his free time he writes educational posts, participates in OpenSource projects, tweets, goes in for sports and plays music.
