How to Scrape IMDb: Step-by-Step with ScrapingBee

30 August 2025 | 9 min read

If you want to learn how to scrape IMDb data, you’re in the right place. This step-by-step tutorial shows you how to extract data, including movie details, ratings, actors, and review dates, using a Python script. You’ll see how to set up the required libraries, process the HTML content, and store your results in a CSV file for further analysis using ScrapingBee’s API.

Why ScrapingBee? Here's the thing – if you want to scrape IMDb data, you need an infrastructure of proxies, JavaScript rendering, and other tools to avoid IP blocks. Scraping this website is particularly challenging due to its strict anti-scraping measures, with no exceptions. But setting up everything manually costs time and resources.

By using our solution, you gain access to residential proxies, IP rotation, and other tools to scrape IMDb data efficiently. This means you can focus on data extraction and movie analysis rather than worrying about infrastructure.

When I started building web scrapers in Python years ago, tools like that were still scarce and ineffective. Now, with a simple setup, you can quickly scrape IMDb data with no issues. Continue reading to see for yourself.

Quick Answer (TL;DR)

Extracting data from the Internet Movie Database with ScrapingBee is straightforward. You send a request with your API key, a target URL, and extraction rules. The IMDb Scraping API simplifies web scraping and returns clean JSON. You can then export the movie details to a CSV file for easier processing.

Here’s a complete Python code for scraping movie data, including extraction rules for clean JSON output:

import requests
import json

# Step 1: Set your ScrapingBee API Key
API_KEY = 'your_scrapingbee_api_key'

# Step 2: Target the movie URL
url = 'https://www.imdb.com/title/tt1375666/'

# Step 3: Define API parameters with extraction rules
params = {
    'api_key': API_KEY,
    'url': url,
    'premium_proxy': 'true',
    'extract_rules': json.dumps({
        "title": {"selector": "h1", "type": "text"},
        "rating": {"selector": "span[role='img']", "type": "text"},
        "genres": {"selector": "div[data-testid='genres'] a", "type": "list"},
        "summary": {"selector": "span[data-testid='plot-xl']", "type": "text"},
        "director": {"selector": "a[href*='/name/']", "type": "text"}
    })
}

# Step 4: Make the API call to ScrapingBee
response = requests.get('https://app.scrapingbee.com/api/v1', params=params)

# Step 5: Parse the response JSON
data = response.json()

# Step 6: Output results
print(json.dumps(data, indent=2))

# Optional: Save to file
with open('inception.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, indent=2)

This Python code extracts the movie titles, ratings, genre list, plot summary, and director name. The beauty of this approach is that you don’t need to worry about parsing HTML content or handling JavaScript – our platform does it all for you by default.

Scraping IMDb with ScrapingBee

Let's start from the beginning – web scraping the Internet Movie Database can be challenging due to its anti-scraping measures. In my experience, scraping a website directly often results in IP blocks after just a few requests.

I once spent days building a complex web scraping solution with proxy rotation, only to have it break when IMDb updated its site layout. That’s why now I use ScrapingBee for JavaScript Rendering and handling these challenges.

Let’s walk through the process step by step:

Step 1: Sign Up for a ScrapingBee Account

First things first, you’ll need to:

  1. Go to ScrapingBee.com

  2. Click Get Started

  3. Register your account and verify your email

  4. Navigate to your dashboard and copy your API key

API key

When I first signed up, I was impressed by how quickly I could get started. The free tier gives you 1,000 API calls to test things out, which is plenty for experimenting with movie database scraping.

Step 2: Install Required Libraries

You’ll need Python and the requests library. These necessary libraries will establish a connection to the target website and help us process the data:

pip install requests

If you’re starting fresh, you can create a new virtual environment:

python -m venv imdb-scraper
source imdb-scraper/bin/activate  # Use `imdb-scraper\Scripts\activate` on Windows
pip install requests

Now, go ahead and create a new file named scrape_imdb.py.

I always recommend using virtual environments for projects. It keeps your dependencies organized and prevents conflicts between different projects.

The ScrapingBee documentation provides excellent examples if you need more guidance on setting up your environment.

Step 3: Scraping a Movie Page

When you want to scrape data, planning what to extract before writing any code saves a lot of time. For movie data, you might need the movie name, rating, genres, director, cast, release date, and plot summary.

I’ll use Inception as an example, since it’s one of my favorites.

We'll start by defining the target URL:

https://www.imdb.com/title/tt1375666/

Typically, when scraping dynamic pages, such as a movie database, you need to handle JavaScript-rendered content; however, ScrapingBee does this automatically.

Now, let's create a basic Python code to establish a connection to the webpage. This allows us to access the HTML content we need to extract data from the website:

import requests

API_KEY = 'your_scrapingbee_api_key'
url = 'https://www.imdb.com/title/tt1375666/'

params = {
    'api_key': API_KEY,
    'url': url,
 
}

response = requests.get('https://app.scrapingbee.com/api/v1', params=params)

with open('inception_raw.html', 'w', encoding='utf-8') as f:
    f.write(response.text)

This script uses the requests library to make an HTTP request to ScrapingBee’s API, which then fetches the IMDb page for us.

It's time to extract specific movie details. Let's identify the elements we want to capture:

  • Name: Found in the <h1> tag

  • Rating: Look for span[role='img']

  • Genres: Inside div[data-testid='genres']

  • Runtime

  • Release Date

  • Director(s), Actors, Writers

Step 4: Using Extraction Rules for Clean JSON

One of my favorite features is extraction rules. Instead of parsing HTML content yourself, you can tell the web scraping solution exactly what data you want, and it returns clean JSON. This saves so much time and makes your code much simpler.

The following command shows how it supports extraction rules to turn HTML content into structured JSON:

import requests
import json

API_KEY = 'your_scrapingbee_api_key'
url = 'https://www.imdb.com/title/tt1375666/'

params = {
    'api_key': API_KEY,
    'url': url,
    'premium_proxy': 'true',
    'extract_rules': json.dumps({
        "title": {"selector": "h1", "type": "text"},
        "rating": {"selector": "span[role='img']", "type": "text"},
        "genres": {"selector": "div[data-testid='genres'] a", "type": "list"},
        "summary": {"selector": "span[data-testid='plot-xl']", "type": "text"},
        "director": {"selector": "a[href*='/name/']", "type": "text"}
    })
}

response = requests.get('https://app.scrapingbee.com/api/v1', params=params)
data = response.json()

print(json.dumps(data, indent=2))

The extract_rules parameter is where the magic happens. You define CSS selectors for each piece of data you want, and ScrapingBee extracts it for you. The output looks something like this:

{
  "title": "Inception",
  "rating": "8.8/10",
  "genres": ["Action", "Adventure", "Sci-Fi"],
  "summary": "A thief who steals corporate secrets...",
  "director": "Christopher Nolan"
}

You can save this data to a JSON file for analysis:

with open('inception.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, indent=2)

If you prefer a CSV file for your data, you can easily convert the JSON to CSV using Python’s built-in libraries. I often use pandas for this when I need to analyze movie data or perform research.

Final Code Example

Now, let’s put everything together with a complete sample:

import requests
import json

# Replace with your actual ScrapingBee API Key
API_KEY = 'YOUR_SCRAPINGBEE_API_KEY'

url = 'https://www.imdb.com/title/tt1375666/'

params = {
    'api_key': API_KEY,
    'url': url,
    'premium_proxy': 'true',
    'extract_rules': json.dumps({
        "title": {"selector": "h1", "type": "text"},
        "rating": {"selector": "span[role='img']", "type": "text"},
        "genres": {"selector": "div[data-testid='genres'] a", "type": "list"},
        "summary": {"selector": "span[data-testid='plot-xl']", "type": "text"},
        "director": {"selector": "li[data-testid='title-pc-principal-credit']:first-child a", "type": "text"}
    })
}

response = requests.get('https://app.scrapingbee.com/api/v1', params=params)
data = response.json()

print(json.dumps(data, indent=2))

# Optional: Save to file
with open('inception.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, indent=2)

Example output:

{
  "title": "Inception",
  "rating": "8.8/10",
  "genres": ["Action", "Adventure", "Sci-Fi"],
  "summary": "A thief who steals corporate secrets through the use of dream-sharing technology is given the inverse task of planting an idea into the mind of a CEO.",
  "director": "Christopher Nolan"
}

Hopefully, after following this tutorial, you will get the exact results you need. Now, let's take a look at common challenges of extracting movie data at scale.

Common Issues with IMDb Scraping

In my years of web scraping, I’ve encountered numerous challenges with sites like IMDb. Let’s talk about some common issues and how our platform helps solve them.

First, IMDb heavily relies on JavaScript to render content. If you’re using basic HTTP requests or even Beautiful Soup, you’ll miss a lot of data. ScrapingBee handles JavaScript rendering automatically, so you get the complete page as if you were viewing it in a browser.

Second, the IMDb webpage implements anti-scraping measures like IP blocks and CAPTCHAs. I once had a project where I needed to collect data on thousands of movies for a research paper, and my IP got blocked after just 50 requests. Our platform's proxy rotation feature solves this by using different IP addresses for each request.

Third, IMDb’s layout changes frequently. When you’re parsing HTML content directly, these changes can break your scraper. The extraction rules feature is more resilient to layout changes since it focuses on specific elements rather than the overall structure.

For more complex data extraction tasks, the platform offers JS Scenarios, which let you automate interactions like clicking buttons or scrolling. This is useful for gathering user reviews or accessing content that loads dynamically as you scroll.

Start Scraping IMDb Now with ScrapingBee

Ready to extract data from the IMDb webpage? Try ScrapingBee with 1,000 free API calls – no credit card needed. Skip the proxy and headless browser setup and focus on what matters: the data.

I’ve used the platform for several projects, from analyzing movie trends, searching for new TV series to watch, to comparing IMDb ratings with Rotten Tomatoes scores. The time and cost saved on infrastructure alone made it worth it for me.

Whether you’re conducting market research, building a movie recommendation system, or simply exploring movie trends, our platform makes web scraping IMDb accessible and reliable.

Frequently Asked Questions (FAQs)

Web scraping IMDb is generally legal, but you should always check the terms of service. For personal projects and research, it’s usually fine, but commercial web scraping might have restrictions. Our platform helps you scrape responsibly by respecting robots.txt and rate limits.

How do I extract IMDb ratings with ScrapingBee?

Use the extraction rule "rating": {"selector": "span[role='img']", "type": "text"} in your API call when web scraping IMDb. This targets the element containing the rating. You can then process this data for further analysis or store it in a database.

Can ScrapingBee bypass IMDb CAPTCHA or JavaScript?

Yes, ScrapingBee can extract information by handling both CAPTCHAs and JavaScript rendering automatically. Its premium proxies and browser rendering capabilities ensure you get the complete page content without being blocked.

What happens if IMDb blocks my scraper?

With ScrapingBee, this is rarely an issue, as it rotates IPs and utilizes premium proxies. If you do encounter blocks, try reducing your request frequency or contact ScrapingBee support for assistance with optimizing your parameters.

image description
Kevin Sahin

Kevin worked in the web scraping industry for 10 years before co-founding ScrapingBee. He is also the author of the Java Web Scraping Handbook.