Learning how to scrape images from website sources is a skill that can unlock various benefits. Whether you’re extracting product photos for competitive analysis, building datasets or gathering visual content for machine learning projects you need to know how to scrape.
In this article, I'll walk you through the process of building website image scraper. But don't worry, you won't have code everything from scratch. ScrapingBee’s web scraping API allows automating content collection with minimal technical knowledge. The best part, it has built in technical infrastructure, so you don't need to think about proxies, JavaScript rendering or other difficulties. Let me show exactly how it works.
Quick Answer
Let me show exactly how to scrape images from websites. First, you need to make a request to ScrapingBee’s API with your target URL, use extract_rules to pull image URLs and alt text with CSS selectors. Then you should enable JavaScript rendering for lazy-loading images, and finish by normalizing the extracted URLs. At this point, you can also downloading the image files directly.
Here's the full code you get you started:
from scrapingbee import ScrapingBeeClient
import json
import csv
import os
from urllib.parse import urljoin, urlparse
class ImageScraper:
def __init__(self, api_key):
self.client = ScrapingBeeClient(api_key=api_key)
def scrape_all_images_from_website(self, url):
"""Extract all images from a website page"""
extract_rules = {
"images": {
"selector": "img",
"type": "list",
"output": {
"src": "@src",
"data_src": "@data-src",
"alt": "@alt",
"title": "@title",
"srcset": "@srcset"
}
}
}
response = self.client.get(
url,
params={
'render_js': 'true',
'wait': 3000,
'extract': json.dumps(extract_rules)
}
)
return response.json().get('images', [])
def normalize_urls(self, base_url, images):
"""Clean and normalize image URLs"""
normalized = []
for img in images:
# Get the best URL (prefer data-src for lazy loading)
url = img.get('data_src') or img.get('src')
if not url:
continue
# Convert to absolute URL
if url.startswith('//'):
url = 'https:' + url
elif url.startswith('/'):
url = urljoin(base_url, url)
elif not url.startswith('http'):
url = urljoin(base_url, url)
img['normalized_url'] = url
normalized.append(img)
return normalized
def export_results(self, images, filename='scraped_images.csv'):
"""Export results to CSV"""
with open(filename, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=['normalized_url', 'alt', 'title'])
writer.writeheader()
for img in images:
writer.writerow({
'normalized_url': img.get('normalized_url', ''),
'alt': img.get('alt', ''),
'title': img.get('title', '')
})
# Usage example
if __name__ == "__main__":
scraper = ImageScraper('YOUR-API-KEY')
# Scrape images
target_url = 'https://example.com'
raw_images = scraper.scrape_all_images_from_website(target_url)
# Normalize URLs
clean_images = scraper.normalize_urls(target_url, raw_images)
# Export results
scraper.export_results(clean_images)
print(f"Scraped {len(clean_images)} images from {target_url}")
This website image scraper approach handles the complexity of modern websites while avoiding common blocking mechanisms that plague traditional scraping methods. If you want to know exactly what you're doing, continue reading and I'll explain everything.
Prerequisites
To create a working website image extractor that can pull images from website sources and organize alt text into structured data formats like JSON or CSV you'll need:
A ScrapingBee account with API key (free trial available)
Python 3.7+ or Node.js 14+ installed
Basic understanding of CSS selectors
Text editor or IDE of your choice
The image web scraper we’ll build can handle both static and dynamic content, making it suitable for complex websites that load images asynchronously. However, you’ll learn to pull images from website pages regardless of their complexity.
How Website Image Scraping Works
Before diving into code, let’s understand how websites structure image content. Most images live inside <img>
tags with several key attributes:
src: The primary image URL
alt: Alternative text for accessibility
srcset: Multiple image sizes for responsive design
data-src: Lazy-loading placeholder (common on modern sites)
When you extract images from website pages, you’re essentially parsing these HTML elements and extracting their attributes. The concept differs from downloading – extraction gives you the URLs and metadata, while downloading saves the actual image files.
Websites often use lazy-loading, where images only load when scrolled into view. These images might have placeholder URLs in the src attribute and the real URL in data-src. Understanding this distinction is crucial when you extract photos from website sources, as you might miss content without proper handling.
Set Up ScrapingBee and Your First Request
Getting started with ScrapingBee is straightforward. First, sign up for an account at their website and grab your API key from the dashboard.
The free tier includes 1,000 requests, perfect for testing.
Python setup
Install the ScrapingBee Python client and make your first request:
pip install scrapingbee
Here’s a minimal example to scrape all images from website pages:
from scrapingbee import ScrapingBeeClient
import json
client = ScrapingBeeClient(api_key='YOUR-API-KEY')
response = client.get(
'https://example.com',
params={'render_js': 'true'}
)
print(response.content)
Node setup
For Node.js developers, the setup is equally simple:
npm install scrapingbee
Here’s a minimal example:
const scrapingbee = require('scrapingbee');
const client = new scrapingbee.ScrapingBeeClient('YOUR-API-KEY');
client.get({
url: 'https://example.com',
params: {
'render_js': 'true'
}
}).then(function(response) {
console.log(response.data);
});
Both approaches enable JavaScript rendering by default, which is essential when you want to extract all images from website pages that load content dynamically.
Extract Image URLs with extract_rules
The web scraping API's extract_rules feature is where the magic happens. Instead of parsing HTML manually, you define JSON rules that specify exactly what data to extract.
Here’s how to extract images from website content:
extract_rules = {
"images": {
"selector": "img",
"type": "list",
"output": {
"src": "@src",
"alt": "@alt",
"title": "@title"
}
}
}
response = client.get(
'https://example.com',
params={
'extract': json.dumps(extract_rules),
'render_js': 'true'
}
)
images = response.json()['images']
The @ symbol tells ScrapingBee to extract an attribute value. You can also extract text content or HTML. When you scrape images from website pages this way, you get structured data that’s easy to process further.
For more complex scenarios, you might want to extract photos from website sections with specific CSS classes:
extract_rules = {
"product_images": {
"selector": ".product-gallery img",
"type": "list",
"output": "@src"
},
"thumbnail_images": {
"selector": ".thumbnails img",
"type": "list",
"output": "@data-src"
}
}
This method lets you categorize images based on their location in the HTML structure.
Handle Lazy Loading and Dynamic Images
Most websites don’t load all images immediately. They use lazy loading to improve performance, which means you need special handling to grab an image from a website that loads dynamically. Our platform makes this manageable with JavaScript rendering and wait parameters.
Enable JavaScript rendering and add wait conditions:
response = client.get(
'https://example.com',
params={
'render_js': 'true',
'wait': 3000, # Wait 3 seconds
'wait_for': 'img[data-src]', # Wait for lazy images
'extract': json.dumps(extract_rules)
}
)
Sometimes you need to disable resource blocking to ensure all images load:
response = client.get(
'https://example.com',
params={
'render_js': 'true',
'block_resources': 'false',
'wait_for': '.image-container img'
}
)
This image web scraper configuration ensures that JavaScript-dependent images have time to load before extraction begins. The wait_for parameter is particularly useful - it tells ScrapingBee to wait until specific elements appear on the page.
Work with Infinite Scroll or “Load More”
Many image-heavy sites use infinite scroll or “Load More” buttons to display additional content. To extract image from website pages with these patterns, you’ll need to simulate user interactions using ScrapingBee’s js_scenario feature.
Here’s how to handle infinite scroll:
js_scenario = {
"instructions": [
{"scroll": {"x": 0, "y": 2000}},
{"wait": 2000},
{"scroll": {"x": 0, "y": 4000}},
{"wait": 2000},
{"scroll": {"x": 0, "y": 6000}},
{"wait": 3000}
]
}
response = client.get(
'https://example.com',
params={
'render_js': 'true',
'js_scenario': json.dumps(js_scenario),
'extract': json.dumps(extract_rules)
}
)
For “Load More” buttons, you can simulate clicks:
js_scenario = {
"instructions": [
{"click": {"selector": ".load-more-btn"}},
{"wait": 3000},
{"click": {"selector": ".load-more-btn"}},
{"wait": 3000}
]
}
This approach helps you scrape all images from website pages that don’t show all content initially. The key is finding the right balance between waiting for content to load and not making requests too slow.
Normalize and Clean Extracted URLs
Raw extracted URLs often need cleaning before use. You might encounter relative URLs, multiple sizes in srcset attributes, or URLs with unnecessary parameters. Here’s a Python helper to normalize URLs when you extract images from website sources:
from urllib.parse import urljoin, urlparse
import re
def normalize_image_urls(base_url, raw_urls):
cleaned_urls = []
for url in raw_urls:
if not url:
continue
# Handle relative URLs
if url.startswith('//'):
url = 'https:' + url
elif url.startswith('/'):
url = urljoin(base_url, url)
elif not url.startswith('http'):
url = urljoin(base_url, url)
# Clean parameters (optional)
parsed = urlparse(url)
clean_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
# Validate image extensions
if any(clean_url.lower().endswith(ext) for ext in ['.jpg', '.jpeg', '.png', '.gif', '.webp']):
cleaned_urls.append(clean_url)
return list(set(cleaned_urls)) # Remove duplicates
For srcset attributes that contain multiple image sizes, you might want to pick the largest:
def extract_best_srcset_url(srcset):
if not srcset:
return None
# Parse srcset: "image1.jpg 300w, image2.jpg 600w, image3.jpg 1200w"
candidates = []
for item in srcset.split(','):
parts = item.strip().split()
if len(parts) >= 2:
url = parts[0]
width = int(re.findall(r'(\d+)w', parts[1])[0]) if 'w' in parts[1] else 0
candidates.append((url, width))
# Return URL with highest width
return max(candidates, key=lambda x: x[1])[0] if candidates else None
This normalization step is crucial when you extract all images from website pages, as it ensures you get usable, absolute URLs.
Download Images with ScrapingBee
ScrapingBee can also function as an image downloader, fetching the actual image files rather than just URLs. This is useful when you need the images locally for processing or storage.
Here’s how to download images directly:
curl "https://app.scrapingbee.com/api/v1/?api_key=YOUR-API-KEY&url=IMAGE-URL&render_js=false" > image.jpg
In Python, you can create a download loop:
import requests
import os
from urllib.parse import urlparse
def download_images(image_urls, download_folder='images'):
os.makedirs(download_folder, exist_ok=True)
for i, url in enumerate(image_urls):
try:
response = client.get(
url,
params={'render_js': 'false'} # Faster for direct image downloads
)
# Generate filename
parsed_url = urlparse(url)
filename = os.path.basename(parsed_url.path) or f'image_{i}.jpg'
filepath = os.path.join(download_folder, filename)
# Save image
with open(filepath, 'wb') as f:
f.write(response.content)
print(f"Downloaded: {filename}")
except Exception as e:
print(f"Failed to download {url}: {e}")
Keep in mind that the API has a 2MB per-request limit for downloads. For larger images, you might need to use direct HTTP requests after extracting the URLs.
This website image scraper approach gives you both the flexibility to extract photos from website pages and download them when needed.
Full Python Walkthrough
Let’s put everything together in a complete script that demonstrates how to scrape images from website pages end-to-end:
from scrapingbee import ScrapingBeeClient
import json
import csv
import os
from urllib.parse import urljoin, urlparse
class ImageScraper:
def __init__(self, api_key):
self.client = ScrapingBeeClient(api_key=api_key)
def scrape_all_images_from_website(self, url):
"""Extract all images from a website page"""
extract_rules = {
"images": {
"selector": "img",
"type": "list",
"output": {
"src": "@src",
"data_src": "@data-src",
"alt": "@alt",
"title": "@title",
"srcset": "@srcset"
}
}
}
response = self.client.get(
url,
params={
'render_js': 'true',
'wait': 3000,
'extract': json.dumps(extract_rules)
}
)
return response.json().get('images', [])
def normalize_urls(self, base_url, images):
"""Clean and normalize image URLs"""
normalized = []
for img in images:
# Get the best URL (prefer data-src for lazy loading)
url = img.get('data_src') or img.get('src')
if not url:
continue
# Convert to absolute URL
if url.startswith('//'):
url = 'https:' + url
elif url.startswith('/'):
url = urljoin(base_url, url)
elif not url.startswith('http'):
url = urljoin(base_url, url)
img['normalized_url'] = url
normalized.append(img)
return normalized
def export_results(self, images, filename='scraped_images.csv'):
"""Export results to CSV"""
with open(filename, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=['normalized_url', 'alt', 'title'])
writer.writeheader()
for img in images:
writer.writerow({
'normalized_url': img.get('normalized_url', ''),
'alt': img.get('alt', ''),
'title': img.get('title', '')
})
# Usage example
if __name__ == "__main__":
scraper = ImageScraper('YOUR-API-KEY')
# Scrape images
target_url = 'https://example.com'
raw_images = scraper.scrape_all_images_from_website(target_url)
# Normalize URLs
clean_images = scraper.normalize_urls(target_url, raw_images)
# Export results
scraper.export_results(clean_images)
print(f"Scraped {len(clean_images)} images from {target_url}")
This complete solution shows you how to scrape all images from website pages, handle the common challenges, and export results in a usable format.
Node.js Walkthrough (Optional)
For JavaScript developers, here’s a concise async/await implementation of a website image scraper:
const scrapingbee = require('scrapingbee');
const fs = require('fs').promises;
class ImageScraper {
constructor(apiKey) {
this.client = new scrapingbee.ScrapingBeeClient(apiKey);
}
async extractImageFromWebsite(url) {
const extractRules = {
images: {
selector: 'img',
type: 'list',
output: {
src: '@src',
dataSrc: '@data-src',
alt: '@alt'
}
}
};
try {
const response = await this.client.get({
url: url,
params: {
'render_js': 'true',
'wait': 3000,
'extract': JSON.stringify(extractRules)
}
});
return response.data.images || [];
} catch (error) {
console.error('Scraping failed:', error);
return [];
}
}
async saveResults(images, filename = 'images.json') {
await fs.writeFile(filename, JSON.stringify(images, null, 2));
console.log(`Saved ${images.length} images to ${filename}`);
}
}
// Usage
(async () => {
const scraper = new ImageScraper('YOUR-API-KEY');
const images = await scraper.extractImageFromWebsite('https://example.com');
await scraper.saveResults(images);
})();
This Node.js approach provides the same functionality with a more JavaScript-native feel, perfect for developers who want to extract image from website sources using familiar async patterns.
Troubleshooting & Optimization
Even with our platform handling most complexities, you’ll occasionally encounter issues. Here’s how to diagnose and fix common problems when building your image web scraper.
Missing images after render
If your scraper isn’t finding images that you can see in the browser, try these solutions:
Increase the wait time: 'wait': 5000
Use specific wait conditions: 'wait_for': 'img[data-src]'
Check for different selectors: some sites use div elements with background images
Disable resource blocking: 'block_resources': 'false'
Handling blocks or geo-restrictions
When websites block your requests or show different content based on location:
Enable premium proxies: 'premium_proxy': 'true'
Use country targeting: 'country_code': 'US'
Add custom headers to mimic real browsers
Implement retry logic with exponential backoff
Broken/relative URLs
If extracted URLs don’t work when you try to pull images from website sources:
Always normalize URLs using urljoin()
Test URL resolution before downloading
Handle edge cases like protocol-relative URLs (//example.com/image.jpg)
Validate URLs before processing
Success Checklist
Here’s your checklist to ensure you can extract images from website sources successfully:
Extraction Setup
ScrapingBee API key configured
Extract rules properly formatted
CSS selectors targeting correct elements
Dynamic Content Handling
JavaScript rendering enabled (render_js: true)
Appropriate wait times configured
Lazy loading detection implemented
URL Processing
Relative URLs converted to absolute
Duplicate URLs removed
Invalid URLs filtered out
Data Export
Results saved in structured format (CSV/JSON)
Image metadata preserved (alt text, titles)
File paths organized logically
This website image scraper checklist ensures you’ve covered all the essential aspects of reliable image extraction.
Use Case: Scrape Google Images
When scraping images from a single site isn’t enough, ScrapingBee’s Google image scraper capabilities help automate image collection directly from Google Images by keyword. This website image scraper approach opens up massive datasets for research and analysis. The image web scraper functionality extends beyond individual websites to pull images from website search results at scale.
AI and Machine Learning Datasets
Developers building computer vision models need thousands of labeled images for training and validation. A google image scraper can collect diverse datasets efficiently:
def scrape_training_data(keywords, images_per_keyword=100):
"""Collect labeled images for ML training"""
dataset = {}
for keyword in keywords:
search_url = f"https://www.google.com/search?q={keyword}&tbm=isch"
extract_rules = {
"images": {
"selector": "img[data-src]",
"type": "list",
"output": "@data-src"
}
}
response = client.get(
search_url,
params={
'render_js': 'true',
'premium_proxy': 'true',
'extract': json.dumps(extract_rules)
}
)
dataset[keyword] = response.json().get('images', [])[:images_per_keyword]
return dataset
# Collect animal images for classification model
animal_keywords = ['cats', 'dogs', 'birds', 'fish', 'horses']
training_data = scrape_training_data(animal_keywords, 200)
This approach creates labeled datasets where the search keyword becomes the image category, perfect for supervised learning projects.
Market and Trend Research
Businesses can track visual trends in fashion, design, or consumer products by analyzing Google Images results. This reveals what visual styles are popular and how they evolve over time:
def analyze_visual_trends(product_category, time_periods):
"""Track visual trends over time"""
trends = {}
for period in time_periods:
query = f"{product_category} {period}"
# Scrape images and analyze colors, styles, etc.
images = scrape_google_images(query)
trends[period] = analyze_image_characteristics(images)
return trends
# Example: Track sneaker design trends
sneaker_trends = analyze_visual_trends('sneakers', ['2020', '2021', '2022', '2023', '2024'])
Content and Creative Enrichment
Marketers and creators can use scraped image sets for inspiration, mood boards, or non-commercial enrichment. However, always remember copyright compliance. Scraped images should be used for inspiration and analysis rather than direct republication without permission.
Now that you know how to grab an image from a website, you get access to vast visual libraries, but responsible use requires understanding intellectual property rights and fair use principles.
Ready to get started?
Building a robust image web scraper doesn’t have to be complicated when you have the right tools. ScrapingBee eliminates the technical headaches of proxy management, JavaScript rendering, and anti-bot measures, letting you focus on extracting the data you need. Whether you’re starting with a simple project or scaling to scrape images from website sources across thousands of pages, ScrapingBee’s infrastructure grows with your needs.
The platform’s flexible pricing starts at just $49/month for 150,000 API calls, with enterprise options available for larger projects. Check out ScrapingBee pricing to find the plan that fits your requirements. Beyond image scraping, explore their complete suite of web scrapers for comprehensive data extraction solutions.
Image Scraping with ScrapingBee - FAQs
Can I scrape all images from a website using ScrapingBee?
Yes, ScrapingBee can scrape all images from a website using extract_rules with CSS selectors like img to target all image elements. The platform handles both static and dynamically loaded images through JavaScript rendering, making it possible to capture comprehensive image datasets from modern websites.
Can I scrape images from any website with ScrapingBee?
ScrapingBee can scrape images from most websites, but success depends on the site’s anti-bot measures and terms of service. The platform’s premium proxies and JavaScript rendering handle most blocking mechanisms, though some sites with aggressive protection may require additional configuration or may prohibit scraping entirely.
How can I avoid being blocked when using a website image scraper?
To avoid blocks, use ScrapingBee’s premium proxies, enable JavaScript rendering, implement reasonable delays between requests, and rotate user agents. The platform automatically handles most anti-bot measures, but respecting rate limits and following website terms of service remains important for sustainable scraping.
How do I grab images from lazy-loading pages?
Enable JavaScript rendering with render_js: true and use wait parameters like wait: 3000 or wait_for: 'img[data-src]' to ensure lazy-loaded images have time to appear. ScrapingBee’s browser rendering executes the JavaScript that triggers lazy loading, making these images accessible for extraction.
Can ScrapingBee download image files directly?
Yes, ScrapingBee can download image files directly by making requests to image URLs with render_js: false for faster processing. The service returns the binary image data, which you can save locally. However, there’s a 2MB per-request limit for direct downloads.
Are there any image download limits?
ScrapingBee has a 2MB per-request limit for direct image downloads. For larger images or bulk downloads, extract the URLs first, then download them separately. Your API plan determines the total number of requests available, with plans ranging from 1,000 to millions of monthly API calls.

Kevin worked in the web scraping industry for 10 years before co-founding ScrapingBee. He is also the author of the Java Web Scraping Handbook.