Limited Time Offer: Use code CYBER at checkout and get 50% off for your 1st month! Start Free Trial 🐝

Mastering the Python curl request: A practical guide for developers

08 October 2025 | 22 min read

Mastering the Python curl request is one of the fastest ways to turn API docs or browser network calls into working code. Instead of rewriting everything by hand, you can drop curl straight into Python, or translate it into Requests or PycURL for cleaner, long-term projects.

In this guide, we'll show practical ways to run curl in Python, when to use each method (subprocess, PycURL, Requests), and how ScrapingBee improves reliability with proxies and optional JavaScript rendering, so you can ship scrapers that actually work.

cover image

Quick answer (TL;DR)

Here's the fastest way to run a Python curl request equivalent powered by ScrapingBee proxies:

import requests

url = "https://app.scrapingbee.com/api/v1"
params = {
    "api_key": "YOUR_API_KEY",
    "url": "https://example.com"
}

response = requests.get(url, params=params, timeout=30)

print("Status:", response.status_code)
print("Body preview:", response.text[:200])

Replace YOUR_API_KEY with the ScrapingBee API key. The code fetches https://example.com, prints response status and the first 200 characters.

Prefer literal curl? Use subprocess:

import subprocess

cmd = ["curl", "https://example.com"]
result = subprocess.run(cmd, capture_output=True, text=True)
print(result.stdout)

This approach will be covered in greater detail later in this guide.

Understanding curl and its role in Python

What is curl and how it works

Curl is everywhere. It's the command-line tool devs reach for when they just want to poke a URL and see what happens. Type:

curl https://example.com

and boom: raw HTML floods your screen.

Tack on a few flags and it suddenly levels up from toy hammer to Swiss Army bulldozer. Need headers? JSON payloads? File uploads? Cookies? Binary downloads? Curl does them all, and usually faster than you can remember that one obscure flag (-i? -v? -L?).

At some point, every developer crosses paths with curl. Sometimes while debugging an API, sometimes while lazily copy-pasting a snippet from the docs, and sometimes just because a teammate yelled across the room: "Just curl it, bro!" That ubiquity is why curl has survived since the late 90s and why nearly every API reference still shows a curl command first. It's precise, copyable, and easy to share.

Why this matters for Python folks: most docs give you curl first, so being able to turn a curl request into Python (via subprocess, PycURL, or requests) is a daily superpower, especially when you want to run it through ScrapingBee for real-world scraping.

Fun trivia: the name literally stands for Client + URL. Nothing fancy — just "that thing that talks to URLs."

Common use cases: download, upload, API calls

So what do people actually use curl for in real life? A few classics:

  • Download files: JSON, CSV, PDFs, images, and so on. If it lives on HTTP, curl can fetch it.
  • Upload files: curl speaks multiple protocols (FTP, HTTP, you name it), so sending files up is fair game.
  • API testing: whether it's REST, GraphQL, or even SOAP (yes, somehow still alive), curl makes it dead simple to replay the request exactly as the docs describe it.
  • Web scraping basics: you can fetch a page's source directly, but if you need the heavy-duty stuff — rotating proxies, JavaScript rendering, and anti-bot countermeasures — plug curl-style calls into ScrapingBee's Web Scraping API.

Because curl is open-source and battle-tested, you'll find it everywhere: in shell scripts, inside IoT firmware, baked into enterprise apps, and even running inside cars.

Why use curl in Python scripts

So why even bother running curl inside Python?

  • Quick wins: Already have a curl command from the docs or your browser devtools? Drop it into a Python script with subprocess and you're off to the races. No rewrites, no setup: just instant requests.
  • Testing and one-offs: Great for quick checks — maybe hitting an API through ScrapingBee, or replaying that weird POST request you just copied from the Network tab. Perfect when you don't want to over-engineer.
  • But for real projects... curl shows its limits fast. Larger workflows usually need retries, timeouts, persistent sessions, and HTML parsing. That's where Python libraries like Requests or PycURL shine: you get the same request logic, but with cleaner code, proper error handling, and smooth integration with tools like BeautifulSoup.

Three ways to use curl in Python

There isn't just one way to "do curl" in Python. You've got three main approaches, each with its own vibe: from quick-and-dirty hacks to clean production-ready code.

If you're holding a curl snippet from the docs or devtools, you can either:

  • Run it as is with subprocess
  • Go low-level and stick close to curl itself with PycURL
  • Rewrite it the "Pythonic" way using Requests

Here's the cheat sheet before we dive deeper:

MethodProsConsBest For
subprocessFastest to drop in, works with any existing curl snippetMessy quoting, no real error handlingQuick hacks, one-off requests
PycURLClose to curl's power, low-level options, solid performanceVerbose code, steeper learning curveHeavy jobs, SSL control, fine-tuning
RequestsClean, "Pythonic", integrates with BeautifulSoupNot literally curl, may require translating flagsMost real-world projects and scraping

All three approaches get the job done, so it just depends on what kind of job you're doing.

Using subprocess to call curl CLI

So, the simplest way to "use curl in Python" is... well, to literally call curl from Python. If you already have a working curl command (maybe copy-pasted from API docs), you can just drop it into subprocess and capture the output.

Here's the bare-bones version:

import subprocess

cmd = ["curl", "https://example.com"]
result = subprocess.run(cmd, capture_output=True, text=True)
print(result.stdout)

What's happening here:

  • subprocess is a built-in Python module (no install needed) that lets Python run other programs. Here it's just calling the system's curl.
  • We build the command as a list (["curl", "https://example.com"]) to avoid messy quoting issues.
  • subprocess.run executes the command and waits until it finishes, returning a result object.
  • capture_output=True grabs whatever curl prints to stdout/stderr.
  • text=True decodes the output into a regular Python string.

When to use this:

Perfect for quick experiments and one-offs:

  • Replaying a curl snippet from API docs
  • Testing an endpoint without extra setup
  • Sanity-checking network calls straight from Python

Downsides: you don't get strong error handling (check result.returncode), and it's not the cleanest option for bigger projects.

For now, think of it as the fastest shortcut to run curl inside Python.

Using PycURL for native integration

If you want curl-level control but in native Python, PycURL is the way to go. It's a Python interface to libcurl, exposing almost everything curl can do: timeouts, SSL tweaks, POST bodies, headers, cookies, and more.

⚠️ Note: PycURL isn't built into Python. You'll need to install it separately, for example with: pip install pycurl.

Let's check the following sample:

import pycurl
from io import BytesIO

buffer = BytesIO()
c = pycurl.Curl()

c.setopt(c.URL, 'https://example.com')
c.setopt(c.WRITEDATA, buffer)

c.perform()
c.close()

print(buffer.getvalue().decode('utf-8'))

What's happening here:

  • We create a buffer (BytesIO) to capture the response data.
  • pycurl.Curl() gives us a curl object we can configure with options (setopt).
  • c.setopt(c.URL, ...) sets the target URL.
  • c.setopt(c.WRITEDATA, buffer) tells curl to write the response into our buffer instead of printing it.
  • c.perform() executes the request, and c.close() cleans up.
  • Finally, we decode the buffer into a string for printing or parsing.

When to use this:

As you can see, PycURL is more verbose than subprocess, but the trade-off is power and stability. It shines when you need:

  • Fine-grained control over retries or timeouts
  • Working with SSL certificates
  • Performance tuning for bigger jobs

Think of PycURL as the "pro" toolkit: closer to raw curl, but baked right into Python.

Using Requests as a curl alternative

Most Python devs eventually land here. The Requests library isn't curl, but it covers almost everything you'd actually need: query parameters, headers, cookies, POST bodies — all with much cleaner, more Pythonic syntax.

⚠️ Note: Requests is not a part of the standard library so don't forget to install it: pip install requests.

Here's a simple example:

import requests

url = "https://example.com"
response = requests.get(url, timeout=30)
print(response.text)

What's happening here:

  • requests.get(url) sends a GET request to the URL.
  • The returned response object holds everything you need:
    • response.text — the response body as a string
    • response.status_code — the HTTP status (e.g. 200, 404)
    • response.headers — the server's headers

When to use this:

If you're building something that goes beyond quick hacks (say an integration, a scraper, or a small internal tool) Requests is usually the sweet spot. It's:

  • Readable: no quoting mess, no boilerplate.
  • Flexible: easy to add headers, cookies, authentication.
  • Extensible: plays nicely with libraries like BeautifulSoup for HTML parsing.

Think of Requests as the "Python-native curl." It may not be literally curl under the hood, but for day-to-day API calls and web scraping tasks, it's probably the friendliest option.

Using ScrapingBee proxies

In the next sections we'll be sending HTTP requests to different services. That's cool, but here's the catch: many websites don't really like automated traffic. They might rate limit you, throw CAPTCHAs, or flat out block your IP.

The usual workaround is to set up and rotate proxies, but managing them yourself is a headache: finding fresh ones, dealing with expired IPs, configuring authentication... not exactly fun.

This is where ScrapingBee comes in. Instead of babysitting proxies, you just send your request to ScrapingBee and let it handle:

  • IP rotation and geolocation
  • Bypassing rate limits and bot checks
  • Optional JavaScript rendering when the page needs it

To follow along with the examples, you can:

  • Sign up for a free trial. You'll get 1000 free credits (enough for testing).
  • Grab your API key from the dashboard.
  • Use the HTML Request Builder to generate ready-to-go Python code if you want a jumpstart.

We'll plug ScrapingBee into our Python curl and requests examples shortly, so you'll see how it all fits together.

Using ScrapingBee Curl converter to quickly convert commands

Ever copy a curl snippet from API docs and then waste 15 minutes figuring out how to rewrite it in Python? That's exactly the pain ScrapingBee's Curl Converter solves.

You paste in your curl command, and it instantly gives you equivalent code in Python Requests, JS, Go, and more. No more guessing which curl flag maps to which method, no boilerplate hunting. If you bounce between curl-heavy docs and Python scripts, this tool is basically free productivity.

Example: Convert curl GET to requests.get()

Imagine you've got this curl snippet for ScrapingBee:

curl "https://app.scrapingbee.com/api/v1?api_key=YOUR_KEY&url=https://example.com"

Drop it into the converter, and it spits out clean Python code you can use right away:

import requests

url = "https://app.scrapingbee.com/api/v1"
params = {
  "api_key": "YOUR_KEY",
  "url": "https://example.com"
}
response = requests.get(url, params=params)
print(response.text)

It's the same request, just instantly Pythonized. Yeah, Pythonized... Let's call that a word.

Handling headers and parameters in conversion

The real magic shows up when your curl command isn't just a bare URL but comes with extra flags and options. For example:

curl "https://app.scrapingbee.com/api/v1?api_key=YOUR_KEY&url=https://example.com&render_js=True" \
  -H "Accept: application/json"

Converted Python:

import requests

headers = {
    "Accept": "application/json",
}

response = requests.get(
    "https://app.scrapingbee.com/api/v1?api_key=YOUR_KEY&url=https://example.com&render_js=True",
    headers=headers
)

print(response.text)

How curl flags map to Python:

  • -H (headers) — becomes a headers dict.
  • --data — turns into the data argument for POST requests.
  • --form — becomes files for file uploads.
  • Query params like render_js=True — can either stay inline in the URL or be moved into a params dict.

The converter handles all of this automatically, which makes moving from curl to working Python almost effortless. One paste, one click, and you've got ready-to-run code.

Making curl requests with subprocess

If you want to run curl in Python without installing anything extra, subprocess is the way to go. It's part of Python's standard library, so no pip install required:

import subprocess

Under the hood, subprocess spawns an external process and gives you access to its output. That means you can take any curl command from the docs and execute it directly inside a Python script.

This approach is quick and simple, but there's one important catch: subprocess runs real shell commands. To keep it safe:

  • Always pass arguments as a list instead of a raw string (e.g. ["curl", "https://example.com"]). This avoids quoting issues and prevents injection bugs if a value comes from user input.
  • Never run subprocess-based scripts you got from an unknown source without reviewing them first. If you don't understand the logic, don't execute it.

In short: if you want to make curl requests in Python with subprocess, it's a handy tool for quick scripts and testing, but you'll need to be mindful of how you pass arguments. Let's walk through some safe, practical patterns next.

Simple GET request using subprocess

Here's a minimal Python curl request that fetches a page via ScrapingBee using subprocess. Note how we pass args as a list:

import subprocess

cmd = [
    "curl",
    "https://app.scrapingbee.com/api/v1",
    "-G", # treat -d as query params (GET)
    "-d", "api_key=YOUR_KEY",
    "-d", "url=https://example.com"
]

result = subprocess.run(cmd, capture_output=True, text=True)
print(result.stdout)
  • We call the system's curl from Python using subprocess.
  • ScrapingBee handles the heavy lifting (proxies, anti-bot, JS rendering if needed), so you don't manage that manually.
  • The -G flag (same as --get) tells curl to send -d key-value pairs as query parameters rather than a POST body.

Capturing stdout and stderr

You'll often want both the response body and any error messages. capture_output=True grabs both streams, and text=True gives you strings instead of bytes:

import subprocess

# create cmd here as before...

result = subprocess.run(cmd, capture_output=True, text=True)

print("Response:", result.stdout)
print("Errors:", result.stderr)
  • capture_output=True — collects stdout (response body) and stderr (curl warnings/errors).
  • text=True — decodes bytes to string, so you can print/parse easily.

Handling errors and return codes

By default, subprocess.run won't raise an exception when curl fails — you have to check it yourself.

import subprocess

result = subprocess.run(cmd, capture_output=True, text=True)

# Option A: manual check
if result.returncode != 0:
    raise RuntimeError(f"curl failed with code {result.returncode}: {result.stderr}")

# Option B: built-in helper (raises CalledProcessError on non-zero)
result.check_returncode()
  • result.returncode is the exit code from curl (0 = success, non-zero = failure).
  • Option A: manually check the code and raise a RuntimeError with stderr for context.
  • Option B: call result.check_returncode() to auto-raise CalledProcessError on failure. Of course, you can wrap this line into try/catch.

If the endpoint returns JSON, parse it and handle API-level errors separately from curl failures:

import json

# Get the result ...

try:
    data = json.loads(result.stdout)
except json.JSONDecodeError:
    print("Response was not JSON:\n", result.stdout)
else:
    if isinstance(data, dict) and "errors" in data:
        print("API returned an error payload:")
        print(json.dumps(data, indent=2))
    else:
        print("Success:")
        print(json.dumps(data, indent=2))

This pattern ensures your curl in Python calls don't silently swallow failures and gives you a clean place to log, retry, or surface errors.

Advanced PycURL usage in Python

PycURL gives you libcurl's power without leaving Python so it's handy when you need low-level control (timeouts, SSL, cookies) and performance. It's a strong alternative to curl in Python via subprocess, especially for long-running scrapers.

Install it:

pip install pycurl

If pip complains on Linux, you may need system packages for libcurl/OpenSSL, e.g.:

sudo apt-get update
sudo apt-get install -y libcurl4-openssl-dev libssl-dev

Now let's check some examples with PycURL.

Python curl GET request with PycURL

First, let's see how to send a GET request:

import pycurl
from io import BytesIO
from urllib.parse import urlencode

buf = BytesIO()
params = {
    "api_key": "YOUR_KEY",
    "url": "https://example.com",
    "render_js": "True",
}

c = pycurl.Curl()
try:
    c.setopt(c.URL, "https://app.scrapingbee.com/api/v1?" + urlencode(params))
    c.setopt(c.WRITEFUNCTION, buf.write)   # stream response bytes into buffer
    c.perform()
    status = c.getinfo(pycurl.RESPONSE_CODE)
finally:
    c.close()

html = buf.getvalue().decode("utf-8", errors="replace")
print(status, html[:200])

What's happening:

  • We build query parameters with urlencode and append them to the URL (equivalent to curl -G -d ...).
  • WRITEFUNCTION=buf.write streams the response body into an in-memory buffer.
  • c.perform() executes the HTTP request; getinfo(RESPONSE_CODE) returns the HTTP status.
  • We call close() in a finally block to clean up even on errors.
  • Decoding with errors="replace" avoids crashes on odd encodings; slicing with [:200] gives a quick preview.

Notes:

  • If you're using ScrapingBee, toggle JS rendering via the render_js boolean parameter (True by default). Learn more in the JavaScript Web Scraping API article.

Python curl POST request with form data

Next, let's see how to send POST requests. Note that ScrapingBee proxies forward your HTTP method and body to the target site.

import pycurl
from io import BytesIO
from urllib.parse import urlencode

buf = BytesIO()
base = "https://app.scrapingbee.com/api/v1"
qs = urlencode({"api_key": "YOUR_KEY", "url": "https://httpbin.scrapingbee.com/post"})

c = pycurl.Curl()

c.setopt(c.URL, f"{base}?{qs}")
c.setopt(c.POST, True)
# classic form body (application/x-www-form-urlencoded)
c.setopt(c.POSTFIELDS, "a=1&b=two")
c.setopt(c.WRITEFUNCTION, buf.write)

c.perform()

print(buf.getvalue().decode("utf-8"))

c.close()

What's happening:

  • We call ScrapingBee's endpoint and pass the target URL via the url query parameter.
  • c.POST = True sets the HTTP method to POST.
  • c.POSTFIELDS sends a URL-encoded form body (like curl -d "a=1&b=two").
  • WRITEFUNCTION=buf.write captures the response for printing/parsing.

Notes:

  • For JSON instead of form data, set headers and pass a JSON string:
import json

c.setopt(c.HTTPHEADER, ["Content-Type: application/json"])
c.setopt(c.POSTFIELDS, json.dumps({"a": 1, "b": "two"}))

Python curl request with headers and cookies

Next, let's cover sending requests with headers and cookies. If you're using ScrapingBee, you'll need to set forward_headers=true and prefix headers with Spb-... for forwarding to work correctly. Cookies can be sent directly as a normal Cookie header.

import pycurl
from io import BytesIO
from urllib.parse import urlencode

buf = BytesIO()
params = {
    "api_key": "YOUR_KEY",
    "url": "https://httpbin.scrapingbee.com/anything",
    "forward_headers": "true",
}

c = pycurl.Curl()

c.setopt(c.URL, "https://app.scrapingbee.com/api/v1?" + urlencode(params))
c.setopt(c.HTTPHEADER, [
    "Spb-User-Agent: MyScraper/1.0",
    "Spb-X-Test: hello",
    "Cookie: sessionid=abc123; theme=dark",
])
c.setopt(c.WRITEFUNCTION, buf.write)

c.perform()

print(buf.getvalue().decode("utf-8"))
c.close()

What's happening:

  • forward_headers=true — instructs ScrapingBee to forward selected headers to the target.
  • Any header starting with Spb- (ScrapingBee) is forwarded with the prefix removed (e.g., Spb-User-Agent becomes User-Agent).
  • Cookie: ... — can be sent as-is (no Spb- needed).
  • HTTPHEADER=[...] — sets headers in PycURL; WRITEFUNCTION=buf.write captures the response.

Notes:

  • Only forward the headers you need; avoid leaking secrets.
  • Some sites are picky about User-Agent and cookies — this pattern helps you mirror a browser-like request while still doing curl-style calls in Python.

Handling redirects with pycurl.FOLLOWLOCATION

Next, let's see how to follow redirects:

import pycurl
from io import BytesIO
from urllib.parse import urlencode

buf = BytesIO()
qs = urlencode({
    "api_key": "YOUR_KEY",
    "url": "https://httpbin.scrapingbee.com/redirect/2"
})

c = pycurl.Curl()

try:
    c.setopt(c.URL, f"https://app.scrapingbee.com/api/v1?{qs}")
    c.setopt(c.FOLLOWLOCATION, True)   # follow 3xx redirects (like curl -L)
    c.setopt(c.MAXREDIRS, 5)           # safety cap
    c.setopt(c.WRITEFUNCTION, buf.write)

    c.perform()

    final_status = c.getinfo(pycurl.RESPONSE_CODE)
    print("Final status:", final_status)
finally:
    c.close()

What's happening:

  • FOLLOWLOCATION=True — automatically follows 3xx redirects (same as curl -L).
  • MAXREDIRS=5 — prevents infinite redirect loops.
  • We pass a target URL that deliberately redirects; ScrapingBee makes the hops and returns the final response.
  • WRITEFUNCTION=buf.write captures the final body; RESPONSE_CODE gives you the last status code.

Downloading files using pycurl

Next, we'll see how to stream to disk and resume if needed (simple HTTP Range pattern):

import os
import pycurl

out_path = "image.png"
url = "https://app.scrapingbee.com/api/v1?api_key=YOUR_KEY&url=https://httpbin.scrapingbee.com/image/png"

# Resume from current size if file exists
resume_from = os.path.getsize(out_path) if os.path.exists(out_path) else 0
mode = "ab" if resume_from else "wb"

with open(out_path, mode) as f:
    c = pycurl.Curl()
    try:
        c.setopt(c.URL, url)
        c.setopt(c.FOLLOWLOCATION, True)    # handle redirects just in case

        if resume_from:
            # CURLOPT_RANGE expects "start-" (libcurl adds "bytes=" header)
            c.setopt(c.RANGE, f"{resume_from}-")

        c.setopt(c.WRITEDATA, f)            # stream binary bytes directly to file

        c.perform()
        status = c.getinfo(pycurl.RESPONSE_CODE)
    finally:
        c.close()

print("HTTP status:", status)
print("Saved:", out_path)

What's happening:

  • We open the file in append mode if it already exists and compute the offset with os.path.getsize.
  • CURLOPT_RANGE takes "<start>-" (e.g., "1024-") — you don't include bytes=, libcurl does that for you.
  • WRITEDATA streams the response directly to disk (no buffering in RAM, safe for large files).
  • FOLLOWLOCATION=True is defensive if the URL redirects.
  • Works just like curl -C - -o image.png ..., but stays fully in Python.

(Plain curl -o downloads work too; this pattern keeps everything inside Python. For basic curl downloads, see our guide.)

Sending JSON data with PycURL

Finally, let me show you how to send JSON data:

import json
import pycurl
from io import BytesIO
from urllib.parse import urlencode

buf = BytesIO()
endpoint = "https://app.scrapingbee.com/api/v1"
qs = urlencode({
    "api_key": "YOUR_KEY",
    "url": "https://httpbin.scrapingbee.com/post"
})

payload = {"name": "Tequila", "role": "Sunshine"}
headers = ["Content-Type: application/json"]

c = pycurl.Curl()

try:
    c.setopt(c.URL, f"{endpoint}?{qs}")
    c.setopt(c.HTTPHEADER, headers)
    # POSTFIELDS sets method to POST and uses the provided string as the request body
    c.setopt(c.POSTFIELDS, json.dumps(payload))
    c.setopt(c.WRITEFUNCTION, buf.write)

    c.perform()
    status = c.getinfo(pycurl.RESPONSE_CODE)
finally:
    c.close()

# Minimal error handling
body = buf.getvalue().decode("utf-8", errors="replace")
if 200 <= status < 300:
    print("OK", status, body[:200])
else:
    print("Error", status, body[:400])

What's happening:

  • We set Content-Type: application/json and pass a JSON-encoded string via POSTFIELDS (equivalent to curl -H "Content-Type: application/json" -d '{"..."}').
  • ScrapingBee receives the POST and forwards the same method + body to the target URL.
  • WRITEFUNCTION=buf.write captures the response; RESPONSE_CODE lets us branch on success vs. error.
  • Basic handling prints a short preview of the response to keep logs readable.

Web scraping with curl and BeautifulSoup

Here's the end-to-end flow many scrapers follow: fetch HTML (we'll use ScrapingBee for stable proxies, optional JS rendering and anti-bot handling), parse with BeautifulSoup, extract structured data, and save as JSON/CSV.

Keep it boringly reliable: always check HTTP status codes, handle missing nodes, and never assume the page structure is stable.

Fetching HTML content using PycURL

This is a small GET helper that pulls down a page's HTML as a string. Unlike subprocess, you're not shelling out to an external curl binary; it's all Python via the PycURL bindings.

import pycurl
from io import BytesIO
from urllib.parse import urlencode

def fetch_html(target_url: str, api_key: str) -> str:
    buf = BytesIO()
    params = {
        "api_key": api_key,
        "url": target_url,
        # tune as needed:
        "render_js": "True", # render page JS, should be enabled by default
        # "premium_proxy": "True",
        # "country_code": "us",
        # "timeout": "15000"
    }

    c = pycurl.Curl()

    c.setopt(c.URL, "https://app.scrapingbee.com/api/v1?" + urlencode(params))
    c.setopt(c.WRITEFUNCTION, buf.write)

    c.perform()

    status = c.getinfo(pycurl.RESPONSE_CODE)
    c.close()

    html = buf.getvalue().decode("utf-8", errors="replace")
    if not (200 <= status < 300):
        raise RuntimeError(f"Fetch failed: HTTP {status}\n{html[:400]}")

    return html


html = fetch_html("https://example.com", "YOUR_KEY")

What's happening:

  • We wrap the ScrapingBee request in a fetch_html function for reuse.
  • Query params (url, api_key, render_js) are URL-encoded and sent with the request.
  • WRITEFUNCTION=buf.write collects the body into an in-memory buffer.
  • After c.perform(), we grab the HTTP status via getinfo(RESPONSE_CODE).
  • Non-2xx responses raise a RuntimeError with a short preview of the body.
  • On success, the raw HTML is returned as a Python string, ready to parse with BeautifulSoup.

Parsing DOM with BeautifulSoup

Now that you've got HTML, the next step is turning it into something you can query. For that, you'll need to install BeautifulSoup first:

pip install beautifulsoup4

For large pages, installing lxml (pip install lxml) and using this parser can speed things up.

So, here's a simple script that finds the page title and some repeated items:

# other imports ...
from bs4 import BeautifulSoup

# fetch_html function ...

def parse_dom(html: str):
    soup = BeautifulSoup(html, "html.parser")

    # Example targets — adapt selectors to your site:
    title_el = soup.select_one("h1, title")
    title = title_el.get_text(strip=True) if title_el else None

    # List items (e.g., product cards, articles)
    cards = soup.select(".card, article, .product")  # be flexible
    return soup, title, cards

# provide your own URL here:
html = fetch_html("https://example.com", "YOUR_KEY")

soup, title, cards = parse_dom(html)

print("Title:", title)
print("Found cards:", len(cards))

Tips for scraping reliably:

  • Prefer stable attributes (like id or data-*) over fragile class names that might change. Though unfortunately this by itself guarantees nothing as websites tend to change rapidly.
  • select_one(".price") returns None if not found — always guard with if ... else before calling .get_text().
  • For collections, select(".item") always returns a list (empty if nothing matches). Safer for loops.

With this, you can safely extract text, attributes (el["href"]), or nested content from your ScrapingBee-fetched HTML.

Extracting structured data from HTML

Once you have DOM nodes, the goal is to turn them into structured data. Think of it as a safe loop that won't blow up if a field is missing.

def extract_items(cards):
    items = []
    for el in cards:
        # defensive lookups
        name_el = el.select_one(".name, h2, .title")
        price_el = el.select_one(".price, [data-price]")
        link_el = el.select_one("a[href]")

        item = {
            "name":  name_el.get_text(strip=True) if name_el else None,
            "price": price_el.get_text(strip=True) if price_el else None,
            "url":   link_el["href"] if link_el else None,
        }

        # only keep items that have at least one field
        if any(v for v in item.values()):
            items.append(item)

    return items

items = extract_items(cards)
print(items[:3])

Now save the results as JSON or CSV with just a few lines:

import json, csv

def save_json(path, data):
    with open(path, "w", encoding="utf-8") as f:
        json.dump(data, f, ensure_ascii=False, indent=2)

def save_csv(path, rows):
    if not rows: 
        return

    with open(path, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=rows[0].keys())
        writer.writeheader()
        writer.writerows(rows)

save_json("data.json", items)
save_csv("data.csv", items)

Takeaways:

  • Always guard your selectors — for example, don't assume name, price, or link exist for every card.
  • Filter out "empty shells" so your dataset doesn't fill up with useless rows.
  • The pipeline is simple but robust: fetch → parse → extract → save.
  • If the site starts throwing CAPTCHAs or fingerprint checks, adjust your headers/cookies, add retries/delays, and check out ScrapingBee's Anti-Bot Evasion guide.

Start scraping with Python and curl using ScrapingBee

You've now got the full toolkit:

  • Subprocess + curl for quick hacks
  • PycURL for low-level power
  • Requests for clean, maintainable code

All can flow right into BeautifulSoup for parsing.

ScrapingBee handles the hard parts of scraping — rotating proxies, stealth, JavaScript rendering, even screenshots — so you don’t waste time fighting CAPTCHAs or bot blocks. That frees you up to focus on the scraping logic itself: fetching, parsing, and extracting data.

👉 Grab your ScrapingBee free trial, paste API key into one of the examples, and point it at a real page.

When you're ready to go bigger, check the ScrapingBee Pricing page and scale without headaches.

Conclusion

Using curl in Python isn't one-size-fits-all — you've got options depending on speed, control, and maintainability.

  • subprocess is the fastest way to drop a curl snippet into your script and see results.
  • PycURL gives you curl's full power in Python for more advanced scraping jobs.
  • Requests is the Pythonic choice for building clean, reliable web scrapers.

Whichever path you pick, the workflow stays the same: fetch the page, parse with BeautifulSoup, and turn raw HTML into structured data. When websites start throwing up roadblocks (rate limits, CAPTCHAs, or aggressive bot detection) that's when ScrapingBee steps in to handle proxies, JavaScript rendering, and scaling so you don't have to.

With these tools, you can move from copy-pasting curl commands out of API docs to building full Python scrapers that are stable, maintainable, and production-ready.

Ready to try it out? Grab your ScrapingBee API key, run a quick GET example, and you'll have your first Python-powered curl request working in minutes. From there, it's just a short hop to extracting real data and automating the web.

Frequently asked questions

How can I make a curl request in Python?

Two quick approaches:

1. subprocess (fast drop-in)

import subprocess

result = subprocess.run(
    ["curl", "https://example.com"],
    capture_output=True, text=True
)

print(result.stdout)

2. PycURL (native control)

import pycurl
from io import BytesIO

buf = BytesIO()
c = pycurl.Curl()

c.setopt(c.URL, "https://example.com")
c.setopt(c.WRITEFUNCTION, buf.write)

c.perform()
c.close()

print(buf.getvalue().decode("utf-8"))

For long-term code, convert curl into requests.get()/post() with params, headers, and error handling.

What are the advantages of using PycURL over other Python HTTP libraries?

PycURL gives you curl's low-level knobs:

  • timeouts
  • SSL options
  • redirects
  • upload/download tuning
  • efficient streaming

Perfect for heavier scraping jobs or when you need precise control. Trade-off: more verbose than Requests, which is cleaner but less granular.

How do I convert a curl command to Python code?

Use a converter like ScrapingBee's Curl Converter or map flags manually:

  • -G / -dparams
  • -Hheaders
  • --data / --data-rawdata (or json)
  • --formfiles

Keep the same endpoint; move query params into a params dict.

Can I use curl for web scraping in Python?

Yes. Fetch with curl (via subprocess or PycURL) and parse with BeautifulSoup:

from bs4 import BeautifulSoup

html = "<html><h1>Hello</h1></html>"
soup = BeautifulSoup(html, "html.parser")

print(soup.h1.text)

For tougher sites (blocks, JavaScript, proxies), use ScrapingBee, add retries/timeouts, and forward needed headers/cookies. Then extract structured data and save to JSON/CSV.

image description
Alexander M

Alexander is a software engineer and technical writer with a passion for everything network related.