Mastering the Python curl request is one of the fastest ways to turn API docs or browser network calls into working code. Instead of rewriting everything by hand, you can drop curl straight into Python, or translate it into Requests or PycURL for cleaner, long-term projects.
In this guide, we'll show practical ways to run curl in Python, when to use each method (subprocess, PycURL, Requests), and how ScrapingBee improves reliability with proxies and optional JavaScript rendering, so you can ship scrapers that actually work.

Quick answer (TL;DR)
Here's the fastest way to run a Python curl request equivalent powered by ScrapingBee proxies:
import requests
url = "https://app.scrapingbee.com/api/v1"
params = {
"api_key": "YOUR_API_KEY",
"url": "https://example.com"
}
response = requests.get(url, params=params, timeout=30)
print("Status:", response.status_code)
print("Body preview:", response.text[:200])
Replace YOUR_API_KEY
with the ScrapingBee API key. The code fetches https://example.com
, prints response status and the first 200 characters.
Prefer literal curl? Use subprocess
:
import subprocess
cmd = ["curl", "https://example.com"]
result = subprocess.run(cmd, capture_output=True, text=True)
print(result.stdout)
This approach will be covered in greater detail later in this guide.
Understanding curl and its role in Python
What is curl and how it works
Curl is everywhere. It's the command-line tool devs reach for when they just want to poke a URL and see what happens. Type:
curl https://example.com
and boom: raw HTML floods your screen.
Tack on a few flags and it suddenly levels up from toy hammer to Swiss Army bulldozer. Need headers? JSON payloads? File uploads? Cookies? Binary downloads? Curl does them all, and usually faster than you can remember that one obscure flag (-i
? -v
? -L
?).
At some point, every developer crosses paths with curl. Sometimes while debugging an API, sometimes while lazily copy-pasting a snippet from the docs, and sometimes just because a teammate yelled across the room: "Just curl it, bro!" That ubiquity is why curl has survived since the late 90s and why nearly every API reference still shows a curl command first. It's precise, copyable, and easy to share.
Why this matters for Python folks: most docs give you curl first, so being able to turn a curl request into Python (via subprocess, PycURL, or requests) is a daily superpower, especially when you want to run it through ScrapingBee for real-world scraping.
Fun trivia: the name literally stands for Client + URL. Nothing fancy — just "that thing that talks to URLs."
Common use cases: download, upload, API calls
So what do people actually use curl for in real life? A few classics:
- Download files: JSON, CSV, PDFs, images, and so on. If it lives on HTTP, curl can fetch it.
- Upload files: curl speaks multiple protocols (FTP, HTTP, you name it), so sending files up is fair game.
- API testing: whether it's REST, GraphQL, or even SOAP (yes, somehow still alive), curl makes it dead simple to replay the request exactly as the docs describe it.
- Web scraping basics: you can fetch a page's source directly, but if you need the heavy-duty stuff — rotating proxies, JavaScript rendering, and anti-bot countermeasures — plug curl-style calls into ScrapingBee's Web Scraping API.
Because curl is open-source and battle-tested, you'll find it everywhere: in shell scripts, inside IoT firmware, baked into enterprise apps, and even running inside cars.
Why use curl in Python scripts
So why even bother running curl inside Python?
- Quick wins: Already have a curl command from the docs or your browser devtools? Drop it into a Python script with
subprocess
and you're off to the races. No rewrites, no setup: just instant requests. - Testing and one-offs: Great for quick checks — maybe hitting an API through ScrapingBee, or replaying that weird POST request you just copied from the Network tab. Perfect when you don't want to over-engineer.
- But for real projects... curl shows its limits fast. Larger workflows usually need retries, timeouts, persistent sessions, and HTML parsing. That's where Python libraries like Requests or PycURL shine: you get the same request logic, but with cleaner code, proper error handling, and smooth integration with tools like BeautifulSoup.
Three ways to use curl in Python
There isn't just one way to "do curl" in Python. You've got three main approaches, each with its own vibe: from quick-and-dirty hacks to clean production-ready code.
If you're holding a curl snippet from the docs or devtools, you can either:
- Run it as is with
subprocess
- Go low-level and stick close to curl itself with PycURL
- Rewrite it the "Pythonic" way using Requests
Here's the cheat sheet before we dive deeper:
Method | Pros | Cons | Best For |
---|---|---|---|
subprocess | Fastest to drop in, works with any existing curl snippet | Messy quoting, no real error handling | Quick hacks, one-off requests |
PycURL | Close to curl's power, low-level options, solid performance | Verbose code, steeper learning curve | Heavy jobs, SSL control, fine-tuning |
Requests | Clean, "Pythonic", integrates with BeautifulSoup | Not literally curl, may require translating flags | Most real-world projects and scraping |
All three approaches get the job done, so it just depends on what kind of job you're doing.
Using subprocess to call curl CLI
So, the simplest way to "use curl in Python" is... well, to literally call curl from Python. If you already have a working curl command (maybe copy-pasted from API docs), you can just drop it into subprocess
and capture the output.
Here's the bare-bones version:
import subprocess
cmd = ["curl", "https://example.com"]
result = subprocess.run(cmd, capture_output=True, text=True)
print(result.stdout)
What's happening here:
subprocess
is a built-in Python module (no install needed) that lets Python run other programs. Here it's just calling the system'scurl
.- We build the command as a list (
["curl", "https://example.com"]
) to avoid messy quoting issues. subprocess.run
executes the command and waits until it finishes, returning a result object.capture_output=True
grabs whatever curl prints to stdout/stderr.text=True
decodes the output into a regular Python string.
When to use this:
Perfect for quick experiments and one-offs:
- Replaying a curl snippet from API docs
- Testing an endpoint without extra setup
- Sanity-checking network calls straight from Python
Downsides: you don't get strong error handling (check result.returncode
), and it's not the cleanest option for bigger projects.
For now, think of it as the fastest shortcut to run curl inside Python.
Using PycURL for native integration
If you want curl-level control but in native Python, PycURL is the way to go. It's a Python interface to libcurl
, exposing almost everything curl can do: timeouts, SSL tweaks, POST bodies, headers, cookies, and more.
⚠️ Note: PycURL isn't built into Python. You'll need to install it separately, for example with:
pip install pycurl
.
Let's check the following sample:
import pycurl
from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'https://example.com')
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close()
print(buffer.getvalue().decode('utf-8'))
What's happening here:
- We create a buffer (
BytesIO
) to capture the response data. pycurl.Curl()
gives us a curl object we can configure with options (setopt
).c.setopt(c.URL, ...)
sets the target URL.c.setopt(c.WRITEDATA, buffer)
tells curl to write the response into our buffer instead of printing it.c.perform()
executes the request, andc.close()
cleans up.- Finally, we decode the buffer into a string for printing or parsing.
When to use this:
As you can see, PycURL is more verbose than subprocess
, but the trade-off is power and stability. It shines when you need:
- Fine-grained control over retries or timeouts
- Working with SSL certificates
- Performance tuning for bigger jobs
Think of PycURL as the "pro" toolkit: closer to raw curl, but baked right into Python.
Using Requests as a curl alternative
Most Python devs eventually land here. The Requests library isn't curl, but it covers almost everything you'd actually need: query parameters, headers, cookies, POST bodies — all with much cleaner, more Pythonic syntax.
⚠️ Note: Requests is not a part of the standard library so don't forget to install it:
pip install requests
.
Here's a simple example:
import requests
url = "https://example.com"
response = requests.get(url, timeout=30)
print(response.text)
What's happening here:
requests.get(url)
sends a GET request to the URL.- The returned response object holds everything you need:
response.text
— the response body as a stringresponse.status_code
— the HTTP status (e.g. 200, 404)response.headers
— the server's headers
When to use this:
If you're building something that goes beyond quick hacks (say an integration, a scraper, or a small internal tool) Requests is usually the sweet spot. It's:
- Readable: no quoting mess, no boilerplate.
- Flexible: easy to add headers, cookies, authentication.
- Extensible: plays nicely with libraries like BeautifulSoup for HTML parsing.
Think of Requests as the "Python-native curl." It may not be literally curl under the hood, but for day-to-day API calls and web scraping tasks, it's probably the friendliest option.
Using ScrapingBee proxies
In the next sections we'll be sending HTTP requests to different services. That's cool, but here's the catch: many websites don't really like automated traffic. They might rate limit you, throw CAPTCHAs, or flat out block your IP.
The usual workaround is to set up and rotate proxies, but managing them yourself is a headache: finding fresh ones, dealing with expired IPs, configuring authentication... not exactly fun.
This is where ScrapingBee comes in. Instead of babysitting proxies, you just send your request to ScrapingBee and let it handle:
- IP rotation and geolocation
- Bypassing rate limits and bot checks
- Optional JavaScript rendering when the page needs it
To follow along with the examples, you can:
- Sign up for a free trial. You'll get 1000 free credits (enough for testing).
- Grab your API key from the dashboard.
- Use the HTML Request Builder to generate ready-to-go Python code if you want a jumpstart.
We'll plug ScrapingBee into our Python curl and requests examples shortly, so you'll see how it all fits together.
Using ScrapingBee Curl converter to quickly convert commands
Ever copy a curl snippet from API docs and then waste 15 minutes figuring out how to rewrite it in Python? That's exactly the pain ScrapingBee's Curl Converter solves.
You paste in your curl command, and it instantly gives you equivalent code in Python Requests, JS, Go, and more. No more guessing which curl flag maps to which method, no boilerplate hunting. If you bounce between curl-heavy docs and Python scripts, this tool is basically free productivity.
Example: Convert curl GET to requests.get()
Imagine you've got this curl snippet for ScrapingBee:
curl "https://app.scrapingbee.com/api/v1?api_key=YOUR_KEY&url=https://example.com"
Drop it into the converter, and it spits out clean Python code you can use right away:
import requests
url = "https://app.scrapingbee.com/api/v1"
params = {
"api_key": "YOUR_KEY",
"url": "https://example.com"
}
response = requests.get(url, params=params)
print(response.text)
It's the same request, just instantly Pythonized. Yeah, Pythonized... Let's call that a word.
Handling headers and parameters in conversion
The real magic shows up when your curl command isn't just a bare URL but comes with extra flags and options. For example:
curl "https://app.scrapingbee.com/api/v1?api_key=YOUR_KEY&url=https://example.com&render_js=True" \
-H "Accept: application/json"
Converted Python:
import requests
headers = {
"Accept": "application/json",
}
response = requests.get(
"https://app.scrapingbee.com/api/v1?api_key=YOUR_KEY&url=https://example.com&render_js=True",
headers=headers
)
print(response.text)
How curl flags map to Python:
-H
(headers) — becomes a headers dict.--data
— turns into thedata
argument for POST requests.--form
— becomesfiles
for file uploads.- Query params like
render_js=True
— can either stay inline in the URL or be moved into aparams
dict.
The converter handles all of this automatically, which makes moving from curl to working Python almost effortless. One paste, one click, and you've got ready-to-run code.
Making curl requests with subprocess
If you want to run curl in Python without installing anything extra, subprocess
is the way to go. It's part of Python's standard library, so no pip install
required:
import subprocess
Under the hood, subprocess
spawns an external process and gives you access to its output. That means you can take any curl command from the docs and execute it directly inside a Python script.
This approach is quick and simple, but there's one important catch: subprocess
runs real shell commands. To keep it safe:
- Always pass arguments as a list instead of a raw string (e.g.
["curl", "https://example.com"]
). This avoids quoting issues and prevents injection bugs if a value comes from user input. - Never run subprocess-based scripts you got from an unknown source without reviewing them first. If you don't understand the logic, don't execute it.
In short: if you want to make curl requests in Python with subprocess, it's a handy tool for quick scripts and testing, but you'll need to be mindful of how you pass arguments. Let's walk through some safe, practical patterns next.
Simple GET request using subprocess
Here's a minimal Python curl request that fetches a page via ScrapingBee using subprocess
. Note how we pass args as a list:
import subprocess
cmd = [
"curl",
"https://app.scrapingbee.com/api/v1",
"-G", # treat -d as query params (GET)
"-d", "api_key=YOUR_KEY",
"-d", "url=https://example.com"
]
result = subprocess.run(cmd, capture_output=True, text=True)
print(result.stdout)
- We call the system's
curl
from Python usingsubprocess
. - ScrapingBee handles the heavy lifting (proxies, anti-bot, JS rendering if needed), so you don't manage that manually.
- The
-G
flag (same as--get
) tells curl to send-d
key-value pairs as query parameters rather than a POST body.
Capturing stdout and stderr
You'll often want both the response body and any error messages. capture_output=True
grabs both streams, and text=True
gives you strings instead of bytes:
import subprocess
# create cmd here as before...
result = subprocess.run(cmd, capture_output=True, text=True)
print("Response:", result.stdout)
print("Errors:", result.stderr)
capture_output=True
— collects stdout (response body) and stderr (curl warnings/errors).text=True
— decodes bytes to string, so you can print/parse easily.
Handling errors and return codes
By default, subprocess.run
won't raise an exception when curl fails — you have to check it yourself.
import subprocess
result = subprocess.run(cmd, capture_output=True, text=True)
# Option A: manual check
if result.returncode != 0:
raise RuntimeError(f"curl failed with code {result.returncode}: {result.stderr}")
# Option B: built-in helper (raises CalledProcessError on non-zero)
result.check_returncode()
result.returncode
is the exit code from curl (0 = success, non-zero = failure).- Option A: manually check the code and raise a
RuntimeError
with stderr for context. - Option B: call
result.check_returncode()
to auto-raiseCalledProcessError
on failure. Of course, you can wrap this line intotry/catch
.
If the endpoint returns JSON, parse it and handle API-level errors separately from curl failures:
import json
# Get the result ...
try:
data = json.loads(result.stdout)
except json.JSONDecodeError:
print("Response was not JSON:\n", result.stdout)
else:
if isinstance(data, dict) and "errors" in data:
print("API returned an error payload:")
print(json.dumps(data, indent=2))
else:
print("Success:")
print(json.dumps(data, indent=2))
This pattern ensures your curl in Python calls don't silently swallow failures and gives you a clean place to log, retry, or surface errors.
Advanced PycURL usage in Python
PycURL gives you libcurl's power without leaving Python so it's handy when you need low-level control (timeouts, SSL, cookies) and performance. It's a strong alternative to curl in Python via subprocess
, especially for long-running scrapers.
Install it:
pip install pycurl
If pip complains on Linux, you may need system packages for libcurl/OpenSSL, e.g.:
sudo apt-get update
sudo apt-get install -y libcurl4-openssl-dev libssl-dev
Now let's check some examples with PycURL.
Python curl GET request with PycURL
First, let's see how to send a GET request:
import pycurl
from io import BytesIO
from urllib.parse import urlencode
buf = BytesIO()
params = {
"api_key": "YOUR_KEY",
"url": "https://example.com",
"render_js": "True",
}
c = pycurl.Curl()
try:
c.setopt(c.URL, "https://app.scrapingbee.com/api/v1?" + urlencode(params))
c.setopt(c.WRITEFUNCTION, buf.write) # stream response bytes into buffer
c.perform()
status = c.getinfo(pycurl.RESPONSE_CODE)
finally:
c.close()
html = buf.getvalue().decode("utf-8", errors="replace")
print(status, html[:200])
What's happening:
- We build query parameters with
urlencode
and append them to the URL (equivalent tocurl -G -d ...
). WRITEFUNCTION=buf.write
streams the response body into an in-memory buffer.c.perform()
executes the HTTP request;getinfo(RESPONSE_CODE)
returns the HTTP status.- We call
close()
in afinally
block to clean up even on errors. - Decoding with
errors="replace"
avoids crashes on odd encodings; slicing with[:200]
gives a quick preview.
Notes:
- If you're using ScrapingBee, toggle JS rendering via the
render_js
boolean parameter (True
by default). Learn more in the JavaScript Web Scraping API article.
Python curl POST request with form data
Next, let's see how to send POST requests. Note that ScrapingBee proxies forward your HTTP method and body to the target site.
import pycurl
from io import BytesIO
from urllib.parse import urlencode
buf = BytesIO()
base = "https://app.scrapingbee.com/api/v1"
qs = urlencode({"api_key": "YOUR_KEY", "url": "https://httpbin.scrapingbee.com/post"})
c = pycurl.Curl()
c.setopt(c.URL, f"{base}?{qs}")
c.setopt(c.POST, True)
# classic form body (application/x-www-form-urlencoded)
c.setopt(c.POSTFIELDS, "a=1&b=two")
c.setopt(c.WRITEFUNCTION, buf.write)
c.perform()
print(buf.getvalue().decode("utf-8"))
c.close()
What's happening:
- We call ScrapingBee's endpoint and pass the target URL via the
url
query parameter. c.POST = True
sets the HTTP method to POST.c.POSTFIELDS
sends a URL-encoded form body (likecurl -d "a=1&b=two"
).WRITEFUNCTION=buf.write
captures the response for printing/parsing.
Notes:
- For JSON instead of form data, set headers and pass a JSON string:
import json
c.setopt(c.HTTPHEADER, ["Content-Type: application/json"])
c.setopt(c.POSTFIELDS, json.dumps({"a": 1, "b": "two"}))
Python curl request with headers and cookies
Next, let's cover sending requests with headers and cookies. If you're using ScrapingBee, you'll need to set forward_headers=true
and prefix headers with Spb-...
for forwarding to work correctly. Cookies can be sent directly as a normal Cookie
header.
import pycurl
from io import BytesIO
from urllib.parse import urlencode
buf = BytesIO()
params = {
"api_key": "YOUR_KEY",
"url": "https://httpbin.scrapingbee.com/anything",
"forward_headers": "true",
}
c = pycurl.Curl()
c.setopt(c.URL, "https://app.scrapingbee.com/api/v1?" + urlencode(params))
c.setopt(c.HTTPHEADER, [
"Spb-User-Agent: MyScraper/1.0",
"Spb-X-Test: hello",
"Cookie: sessionid=abc123; theme=dark",
])
c.setopt(c.WRITEFUNCTION, buf.write)
c.perform()
print(buf.getvalue().decode("utf-8"))
c.close()
What's happening:
forward_headers=true
— instructs ScrapingBee to forward selected headers to the target.- Any header starting with
Spb-
(ScrapingBee) is forwarded with the prefix removed (e.g.,Spb-User-Agent
becomesUser-Agent
). Cookie: ...
— can be sent as-is (noSpb-
needed).HTTPHEADER=[...]
— sets headers in PycURL;WRITEFUNCTION=buf.write
captures the response.
Notes:
- Only forward the headers you need; avoid leaking secrets.
- Some sites are picky about
User-Agent
and cookies — this pattern helps you mirror a browser-like request while still doing curl-style calls in Python.
Handling redirects with pycurl.FOLLOWLOCATION
Next, let's see how to follow redirects:
import pycurl
from io import BytesIO
from urllib.parse import urlencode
buf = BytesIO()
qs = urlencode({
"api_key": "YOUR_KEY",
"url": "https://httpbin.scrapingbee.com/redirect/2"
})
c = pycurl.Curl()
try:
c.setopt(c.URL, f"https://app.scrapingbee.com/api/v1?{qs}")
c.setopt(c.FOLLOWLOCATION, True) # follow 3xx redirects (like curl -L)
c.setopt(c.MAXREDIRS, 5) # safety cap
c.setopt(c.WRITEFUNCTION, buf.write)
c.perform()
final_status = c.getinfo(pycurl.RESPONSE_CODE)
print("Final status:", final_status)
finally:
c.close()
What's happening:
FOLLOWLOCATION=True
— automatically follows3xx
redirects (same ascurl -L
).MAXREDIRS=5
— prevents infinite redirect loops.- We pass a target URL that deliberately redirects; ScrapingBee makes the hops and returns the final response.
WRITEFUNCTION=buf.write
captures the final body;RESPONSE_CODE
gives you the last status code.
Downloading files using pycurl
Next, we'll see how to stream to disk and resume if needed (simple HTTP Range pattern):
import os
import pycurl
out_path = "image.png"
url = "https://app.scrapingbee.com/api/v1?api_key=YOUR_KEY&url=https://httpbin.scrapingbee.com/image/png"
# Resume from current size if file exists
resume_from = os.path.getsize(out_path) if os.path.exists(out_path) else 0
mode = "ab" if resume_from else "wb"
with open(out_path, mode) as f:
c = pycurl.Curl()
try:
c.setopt(c.URL, url)
c.setopt(c.FOLLOWLOCATION, True) # handle redirects just in case
if resume_from:
# CURLOPT_RANGE expects "start-" (libcurl adds "bytes=" header)
c.setopt(c.RANGE, f"{resume_from}-")
c.setopt(c.WRITEDATA, f) # stream binary bytes directly to file
c.perform()
status = c.getinfo(pycurl.RESPONSE_CODE)
finally:
c.close()
print("HTTP status:", status)
print("Saved:", out_path)
What's happening:
- We open the file in
append
mode if it already exists and compute the offset withos.path.getsize
. CURLOPT_RANGE
takes"<start>-"
(e.g.,"1024-"
) — you don't includebytes=
,libcurl
does that for you.WRITEDATA
streams the response directly to disk (no buffering in RAM, safe for large files).FOLLOWLOCATION=True
is defensive if the URL redirects.- Works just like
curl -C - -o image.png ...
, but stays fully in Python.
(Plain curl -o
downloads work too; this pattern keeps everything inside Python. For basic curl downloads, see our guide.)
Sending JSON data with PycURL
Finally, let me show you how to send JSON data:
import json
import pycurl
from io import BytesIO
from urllib.parse import urlencode
buf = BytesIO()
endpoint = "https://app.scrapingbee.com/api/v1"
qs = urlencode({
"api_key": "YOUR_KEY",
"url": "https://httpbin.scrapingbee.com/post"
})
payload = {"name": "Tequila", "role": "Sunshine"}
headers = ["Content-Type: application/json"]
c = pycurl.Curl()
try:
c.setopt(c.URL, f"{endpoint}?{qs}")
c.setopt(c.HTTPHEADER, headers)
# POSTFIELDS sets method to POST and uses the provided string as the request body
c.setopt(c.POSTFIELDS, json.dumps(payload))
c.setopt(c.WRITEFUNCTION, buf.write)
c.perform()
status = c.getinfo(pycurl.RESPONSE_CODE)
finally:
c.close()
# Minimal error handling
body = buf.getvalue().decode("utf-8", errors="replace")
if 200 <= status < 300:
print("OK", status, body[:200])
else:
print("Error", status, body[:400])
What's happening:
- We set
Content-Type: application/json
and pass a JSON-encoded string viaPOSTFIELDS
(equivalent tocurl -H "Content-Type: application/json" -d '{"..."}'
). - ScrapingBee receives the POST and forwards the same method + body to the target URL.
WRITEFUNCTION=buf.write
captures the response;RESPONSE_CODE
lets us branch on success vs. error.- Basic handling prints a short preview of the response to keep logs readable.
Web scraping with curl and BeautifulSoup
Here's the end-to-end flow many scrapers follow: fetch HTML (we'll use ScrapingBee for stable proxies, optional JS rendering and anti-bot handling), parse with BeautifulSoup, extract structured data, and save as JSON/CSV.
Keep it boringly reliable: always check HTTP status codes, handle missing nodes, and never assume the page structure is stable.
Fetching HTML content using PycURL
This is a small GET helper that pulls down a page's HTML as a string. Unlike subprocess
, you're not shelling out to an external curl binary; it's all Python via the PycURL bindings.
import pycurl
from io import BytesIO
from urllib.parse import urlencode
def fetch_html(target_url: str, api_key: str) -> str:
buf = BytesIO()
params = {
"api_key": api_key,
"url": target_url,
# tune as needed:
"render_js": "True", # render page JS, should be enabled by default
# "premium_proxy": "True",
# "country_code": "us",
# "timeout": "15000"
}
c = pycurl.Curl()
c.setopt(c.URL, "https://app.scrapingbee.com/api/v1?" + urlencode(params))
c.setopt(c.WRITEFUNCTION, buf.write)
c.perform()
status = c.getinfo(pycurl.RESPONSE_CODE)
c.close()
html = buf.getvalue().decode("utf-8", errors="replace")
if not (200 <= status < 300):
raise RuntimeError(f"Fetch failed: HTTP {status}\n{html[:400]}")
return html
html = fetch_html("https://example.com", "YOUR_KEY")
What's happening:
- We wrap the ScrapingBee request in a
fetch_html
function for reuse. - Query params (
url
,api_key
,render_js
) are URL-encoded and sent with the request. WRITEFUNCTION=buf.write
collects the body into an in-memory buffer.- After
c.perform()
, we grab the HTTP status viagetinfo(RESPONSE_CODE)
. - Non-2xx responses raise a
RuntimeError
with a short preview of the body. - On success, the raw HTML is returned as a Python string, ready to parse with BeautifulSoup.
Parsing DOM with BeautifulSoup
Now that you've got HTML, the next step is turning it into something you can query. For that, you'll need to install BeautifulSoup first:
pip install beautifulsoup4
For large pages, installing lxml (pip install lxml
) and using this parser can speed things up.
So, here's a simple script that finds the page title and some repeated items:
# other imports ...
from bs4 import BeautifulSoup
# fetch_html function ...
def parse_dom(html: str):
soup = BeautifulSoup(html, "html.parser")
# Example targets — adapt selectors to your site:
title_el = soup.select_one("h1, title")
title = title_el.get_text(strip=True) if title_el else None
# List items (e.g., product cards, articles)
cards = soup.select(".card, article, .product") # be flexible
return soup, title, cards
# provide your own URL here:
html = fetch_html("https://example.com", "YOUR_KEY")
soup, title, cards = parse_dom(html)
print("Title:", title)
print("Found cards:", len(cards))
Tips for scraping reliably:
- Prefer stable attributes (like
id
ordata-*
) over fragile class names that might change. Though unfortunately this by itself guarantees nothing as websites tend to change rapidly. select_one(".price")
returnsNone
if not found — always guard withif ... else
before calling.get_text()
.- For collections,
select(".item")
always returns a list (empty if nothing matches). Safer for loops.
With this, you can safely extract text, attributes (el["href"]
), or nested content from your ScrapingBee-fetched HTML.
Extracting structured data from HTML
Once you have DOM nodes, the goal is to turn them into structured data. Think of it as a safe loop that won't blow up if a field is missing.
def extract_items(cards):
items = []
for el in cards:
# defensive lookups
name_el = el.select_one(".name, h2, .title")
price_el = el.select_one(".price, [data-price]")
link_el = el.select_one("a[href]")
item = {
"name": name_el.get_text(strip=True) if name_el else None,
"price": price_el.get_text(strip=True) if price_el else None,
"url": link_el["href"] if link_el else None,
}
# only keep items that have at least one field
if any(v for v in item.values()):
items.append(item)
return items
items = extract_items(cards)
print(items[:3])
Now save the results as JSON or CSV with just a few lines:
import json, csv
def save_json(path, data):
with open(path, "w", encoding="utf-8") as f:
json.dump(data, f, ensure_ascii=False, indent=2)
def save_csv(path, rows):
if not rows:
return
with open(path, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=rows[0].keys())
writer.writeheader()
writer.writerows(rows)
save_json("data.json", items)
save_csv("data.csv", items)
Takeaways:
- Always guard your selectors — for example, don't assume name, price, or link exist for every card.
- Filter out "empty shells" so your dataset doesn't fill up with useless rows.
- The pipeline is simple but robust: fetch → parse → extract → save.
- If the site starts throwing CAPTCHAs or fingerprint checks, adjust your headers/cookies, add retries/delays, and check out ScrapingBee's Anti-Bot Evasion guide.
Start scraping with Python and curl using ScrapingBee
You've now got the full toolkit:
- Subprocess + curl for quick hacks
- PycURL for low-level power
- Requests for clean, maintainable code
All can flow right into BeautifulSoup for parsing.
ScrapingBee handles the hard parts of scraping — rotating proxies, stealth, JavaScript rendering, even screenshots — so you don’t waste time fighting CAPTCHAs or bot blocks. That frees you up to focus on the scraping logic itself: fetching, parsing, and extracting data.
👉 Grab your ScrapingBee free trial, paste API key into one of the examples, and point it at a real page.
When you're ready to go bigger, check the ScrapingBee Pricing page and scale without headaches.
Conclusion
Using curl in Python isn't one-size-fits-all — you've got options depending on speed, control, and maintainability.
subprocess
is the fastest way to drop a curl snippet into your script and see results.- PycURL gives you curl's full power in Python for more advanced scraping jobs.
- Requests is the Pythonic choice for building clean, reliable web scrapers.
Whichever path you pick, the workflow stays the same: fetch the page, parse with BeautifulSoup, and turn raw HTML into structured data. When websites start throwing up roadblocks (rate limits, CAPTCHAs, or aggressive bot detection) that's when ScrapingBee steps in to handle proxies, JavaScript rendering, and scaling so you don't have to.
With these tools, you can move from copy-pasting curl commands out of API docs to building full Python scrapers that are stable, maintainable, and production-ready.
Ready to try it out? Grab your ScrapingBee API key, run a quick GET example, and you'll have your first Python-powered curl request working in minutes. From there, it's just a short hop to extracting real data and automating the web.
Frequently asked questions
How can I make a curl request in Python?
Two quick approaches:
1. subprocess (fast drop-in)
import subprocess
result = subprocess.run(
["curl", "https://example.com"],
capture_output=True, text=True
)
print(result.stdout)
2. PycURL (native control)
import pycurl
from io import BytesIO
buf = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, "https://example.com")
c.setopt(c.WRITEFUNCTION, buf.write)
c.perform()
c.close()
print(buf.getvalue().decode("utf-8"))
For long-term code, convert curl into requests.get()
/post()
with params, headers, and error handling.
What are the advantages of using PycURL over other Python HTTP libraries?
PycURL gives you curl's low-level knobs:
- timeouts
- SSL options
- redirects
- upload/download tuning
- efficient streaming
Perfect for heavier scraping jobs or when you need precise control. Trade-off: more verbose than Requests, which is cleaner but less granular.
How do I convert a curl command to Python code?
Use a converter like ScrapingBee's Curl Converter or map flags manually:
-G
/-d
→params
-H
→headers
--data
/--data-raw
→data
(orjson
)--form
→files
Keep the same endpoint; move query params into a params
dict.
Can I use curl for web scraping in Python?
Yes. Fetch with curl (via subprocess or PycURL) and parse with BeautifulSoup:
from bs4 import BeautifulSoup
html = "<html><h1>Hello</h1></html>"
soup = BeautifulSoup(html, "html.parser")
print(soup.h1.text)
For tougher sites (blocks, JavaScript, proxies), use ScrapingBee, add retries/timeouts, and forward needed headers/cookies. Then extract structured data and save to JSON/CSV.

Alexander is a software engineer and technical writer with a passion for everything network related.