Have you ever tried learning how to scrape Craigslist and run into a wall of CAPTCHAs and IP blocks? Trust me, my first web scraping attempt was just as rocky.
Craigslist is a gold mine of data. It contains everything from job ads, housing, items for sale, to various services. But it's not an easy nut to crack for beginners in scraping.
Just like in any other web scraping project, you won't get anywhere without proxy rotation, JavaScript rendering, and solving CAPTCHAs. Fortunately, ScrapingBee handles all of it on autopilot. I think of it as an automated scraping assistant that handles all the technicalities.
But here's the best part: to scrape data from Craigslist with ScrapingBee, all you need is to set up your environment and a code to extract valuable data. I'll walk through every step of the process. So, shall we get started?
Quick Answer (TL;DR)
ScrapingBee lets you scrape Craigslist by fetching any URL with built-in proxy support and automatic JavaScript rendering. With the Craigslist Scraping API and just a few lines of code, you can extract listing titles, prices, and links from search results pages.
Here’s a quick example:
from scrapingbee import ScrapingBeeClient
from bs4 import BeautifulSoup
# Initialize ScrapingBee client
client = ScrapingBeeClient(api_key='YOUR_API_KEY')
# Define the target URL
url = 'https://newyork.craigslist.org/search/sss?query=laptop'
# Make the request
response = client.get(url, params={"render_js": "true"})
# Check for successful response
if response.status_code == 200:
soup = BeautifulSoup(response.content, 'html.parser')
# Extract listing data
listings = soup.select('.result-info')
for i, listing in enumerate(listings[:3]): # Limit to first 3 for demo
title_elem = listing.select_one('.result-title')
price_elem = listing.select_one('.result-price')
title = title_elem.text.strip() if title_elem else 'No title'
price = price_elem.text.strip() if price_elem else 'No price'
link = title_elem['href'] if title_elem else 'No link'
print(f"Listing {i+1}")
print(f"Title: {title}")
print(f"Price: {price}")
print(f"Link: {link}")
print('---')
else:
print(f"Request failed with status code: {response.status_code}")
Don't forget to change 'YOUR_API_KEY' with your API credentials, and you'll get Craigslist HTML data.
That's it, with just one API call, you can extract structured data using extract_rules or parse HTML manually with BeautifulSoup. This approach works reliably even when Craigslist would normally block traditional scrapers.
How to Scrape Craigslist Data with ScrapingBee
Before diving into the code, let’s understand what we’re trying to accomplish. We’ll build a scraper that extracts listing information from Craigslist search results. Meanwhile, our solution will handle the heavy lifting of making requests without getting blocked behind the scenes.
But first things first: before kickstarting the Craigslist scraping projects, you need to prep the environment.
Step 1: Install Requests and Set API Key
We need to install the necessary libraries and set up your ScrapingBee API key. You can start by creating a free account and receive 1,000 API calls immediately.
Now, it's time to install the following libraries:
# Install required libraries
pip install scrapingbee requests beautifulsoup4
# Import libraries
from scrapingbee import ScrapingBeeClient
import json
from bs4 import BeautifulSoup
Next, initialize the client with your API key:
# Initialize ScrapingBee client
api_key = 'YOUR_API_KEY'
client = ScrapingBeeClient(api_key=api_key)
Replace 'YOUR_API_KEY' with your actual API key from the dashboard. This key authenticates your requests to the ScrapingBee API.
In the next step-by-step tutorial, I'll show you how to kickstart your scraping activities.
Step 2: Make an API Call to Craigslist
Let’s make your first request to Craigslist. We’ll target a search results page for laptops in New York City:
# Target URL - Craigslist search for laptops in NYC
url = 'https://newyork.craigslist.org/search/sss?query=laptop'
# Make the request
response = client.get(url)
# Check if the request was successful
if response.status_code == 200:
print('Request successful!')
html_content = response.content
else:
print(f'Request failed with status code: {response.status_code}')
In this code, we’re making an HTTP GET request to get data from a specific URL.
For more parameters and options, check the ScrapingBee documentation.
Step 3: Parse HTML Results with BeautifulSoup
Once we have the HTML content, we can use BeautifulSoup to parse it and extract the data we need:
# Parse HTML with BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')
# Find all listing elements
listings = soup.select('.result-info')
# Extract data from each listing
extracted_data = []
for listing in listings:
# Extract title
title_element = listing.select_one('.result-title')
title = title_element.text.strip() if title_element else 'No title'
# Extract link
link = title_element['href'] if title_element else 'No link'
# Extract price
price_element = listing.select_one('.result-price')
price = price_element.text.strip() if price_element else 'No price'
# Store the extracted data
extracted_data.append({
'title': title,
'price': price,
'link': link
})
# Print the first few results
for i, data in enumerate(extracted_data[:3]):
print(f"Listing {i+1}:")
print(f"Title: {data['title']}")
print(f"Price: {data['price']}")
print(f"Link: {data['link']}")
print("---")
In this code, we’re:
Creating a BeautifulSoup object from the HTML content
Selecting all elements with the class .result-info, which contains listing information
For each listing, we're extracting the title, link, and price
Storing the extracted data in a list of dictionaries
Printing the first three results for verification
Understanding the HTML structure of Craigslist pages is essential for targeting the correct elements. The above code targets the typical structure of Craigslist search results, but you might need to adjust the selectors if the structure changes or if you’re targeting different types of pages.
For a more comprehensive understanding of HTML parsing, check out this Python web scraping tutorial.
Extract Clean JSON Using extract_rules
While BeautifulSoup works well for parsing HTML, ScrapingBee offers a more elegant solution with extract_rules. This feature allows you to define rules for data extraction and receive a clean, structured JSON response directly.
Here’s how to use extract_rules to scrape Craigslist listings and prepare data for a CSV file:
# Define extraction rules
extract_rules = {
"listings": {
"selector": ".result-info",
"type": "list",
"output": {
"title": {
"selector": ".result-title",
"type": "text"
},
"price": {
"selector": ".result-price",
"type": "text"
},
"link": {
"selector": ".result-title",
"type": "attribute",
"attribute": "href"
},
"date": {
"selector": ".result-date",
"type": "text"
}
}
}
}
# Make request with extract_rules
response = client.get(
url,
params={
'extract_rules': json.dumps(extract_rules)
}
)
# Process the JSON response
if response.status_code == 200:
try:
data = json.loads(response.content)
print(f"Extracted {len(data['listings'])} listings")
# Print the first few results
for i, listing in enumerate(data['listings'][:3]):
print(f"Listing {i+1}:")
print(f"Title: {listing['title']}")
print(f"Price: {listing['price']}")
print(f"Link: {listing['link']}")
print(f"Date: {listing['date']}")
print("---")
except json.JSONDecodeError:
print("Failed to parse JSON response")
else:
print(f"Request failed with status code: {response.status_code}")
The script delivers a clean, structured representation of the data, making it easier to process, analyze, or store in a CSV file. This approach is beneficial for Craigslist web scraping projects where you need more control over the data format.
For more information on data extraction with ScrapingBee, check out our data extraction documentation.
Add Craigslist Filters and Pagination
Craigslist offers various filters to narrow down search results. You can incorporate these filters into your scraping by modifying the URL parameters. Additionally, you can paginate through results to collect more data.
Here’s how to add filters and handle pagination:
def scrape_craigslist(query, location='newyork', category='sss', min_price=None, max_price=None, page=0):
"""
Scrape Craigslist with filters and pagination
Args:
query (str): Search query
location (str): Craigslist location subdomain
category (str): Category code (sss=all, apa=apartments, etc.)
min_price (int): Minimum price filter
max_price (int): Maximum price filter
page (int): Page number (0-based)
Returns:
list: Extracted listings
"""
# Build URL with filters
base_url = f'https://{location}.craigslist.org/search/{category}'
params = {'query': query}
if min_price is not None:
params['min_price'] = min_price
if max_price is not None:
params['max_price'] = max_price
if page > 0:
params['s'] = page * 120
query_string = '&'.join([f"{k}={v}" for k, v in params.items()])
url = f"{base_url}?{query_string}"
print(f"Scraping URL: {url}")
response = client.get(
url,
params={
'extract_rules': json.dumps(extract_rules)
}
)
if response.status_code == 200:
try:
data = json.loads(response.content)
return data['listings']
except (json.JSONDecodeError, KeyError):
print("Failed to parse response")
return []
else:
print(f"Request failed with status code: {response.status_code}")
return []
This script allows you to:
Specify a search query
Choose a location (Craigslist subdomain)
Select a category
Set price filters
Paginate through results
Craigslist uses the s parameter for pagination, with each page showing 120 results. By incrementing the page number, you can collect data from multiple pages.
When scraping multiple pages, be mindful of rate limits to avoid triggering Craigslist CAPTCHAs or IP bans. While ScrapingBee helps mitigate these issues, it’s still good practice to space out your requests.
Why Scraping Craigslist Is Normally Difficult
Scraping Craigslist listings without proper tools can interrupt your data collection. Data from Craigslist is protected by several anti-scraping techniques that make direct scraping challenging.
Let's take a look at them in detail:
IP rate limiting: Craigslist tracks the number of requests from each IP address and blocks IPs that make too many requests in a short period. That's why it's challenging to scrape Craigslist data at scale without using multiple proxies.
CAPTCHA challenges: When the web page detects suspicious activity, it presents CAPTCHA challenges that automated scrapers struggle to solve.
User-agent monitoring: Craigslist may prevent scraping its public data by detecting suspicious or consistent user-agent strings. For this reason, reliable data scraping tools use rotating user-agents.
Session tracking: The web page may track session cookies and other identifiers to detect and prevent unauthorized scraping.
Temporary IP bans: If it detects scraping activities, the web page may temporarily ban the IP address, making it impossible to extract data from Craigslist.
Avoiding detection while scraping publicly available information requires a complex infrastructure. You can't just access Craigslist or any other target website without the right tool.
ScrapingBee is built to addresses these challenges by:
Automatic proxy rotation: ScrapingBee uses a pool of the best proxies for scraping and rotates them for each request, preventing IP blocks.
CAPTCHA handling: The service enables a scraping process by minimizing CAPTCHA screens.
User-agent rotation: Our Craigslist API rotates user-agent strings to mimic human behavior.
JavaScript rendering: For pages that require JavaScript, ScrapingBee can render the whole page before scraping.
Request throttling: The service manages request rates to avoid triggering anti-bot measures while data scraping.
Start Scraping Craigslist with ScrapingBee
Ready to start scraping Craigslist data? ScrapingBee makes it easy with their powerful API. Here’s how to get started:
Sign up for a free account that includes 1,000 API calls
Get your API key from the dashboard
Install the ScrapingBee client library for your programming language
Start making requests using the code examples in this tutorial
Whether you’re building a price monitoring tool, conducting market research, or need efficient lead generation, ScrapingBee provides a reliable solution for Craigslist scraping. Try ScrapingBee and see how easy it can be to extract data from Craigslist.
Your Path to Successful Craigslist Data Extraction
In this tutorial, I’ve shown you how to scrape using our Craigslist API. We’ve covered everything from making basic requests to extracting structured data. Scraping Craigslist data should no longer feel like a technical challenge.
Whether you’re collecting data for competitor analysis, lead generation, or market research, the techniques in this tutorial will help you scrape data from Craigslist.
Frequently Asked Questions (FAQs)
Can I legally scrape Craigslist?
Web scraping exists in a gray area legally. While accessing public data isn’t inherently illegal, Craigslist’s terms prohibit unauthorized scraping. Consider the purpose of your scraping –personal use is less likely to cause legal consequences than commercial purposes or gathering contact details. For large-scale projects, consult professional legal counsel and the website's terms first.
How often can I scrape Craigslist using ScrapingBee?
ScrapingBee handles rate limiting automatically, but it’s still advisable to implement reasonable delays between requests. For small web scraping projects, 1-2 requests per second is generally safe. For larger datasets, spread requests over time to mimic human behavior.
Does Craigslist have a public API?
No, Craigslist doesn’t offer a public API for developers. They previously had an experimental API but discontinued it. This is why the web scraping process remains the primary method for accessing Craigslist data. However, ScrapingBee provides a reliable alternative to direct API access.
How do I avoid being blocked when scraping Craigslist?
To scrape data from Craigslist without IP blocks, use residential proxies and automatic IP rotation. Implement reasonable delays between requests, and don't gather information, such as phone numbers. If this seems too challenging, use ScrapingBee to address the technical issues.

Kevin worked in the web scraping industry for 10 years before co-founding ScrapingBee. He is also the author of the Java Web Scraping Handbook.