How to scrape emails from a website with Python and ScrapingBee

Alexander M | 14 December 2025 | 25 min read

Table of contents

If you've ever tried to scrape emails from website pages by hand, you know how messy it can get. Some sites hide emails in mailto: links, others bury them in JavaScript, and a few try to obfuscate them entirely. Still, email remains one of the most reliable ways to reach partners, leads, or customers, and having a clean, targeted list can make a huge difference.

The good news: scraping emails doesn't have to be painful. With a bit of Python and ScrapingBee handling the heavy lifting (HTML fetching, JS rendering, anti-bot stuff), you can pull contact info from real pages without juggling proxies or browser automation. And if coding isn't your thing, ScrapingBee also offers no-code and low-code options to get the job done.

In this guide we'll walk through the whole process: how to scrape responsibly, how to fetch and parse pages, how to combine CSS selectors with regex, and how to scale from a single page to a full list of domains. By the end, you'll have a workflow you can reuse anywhere.

💡 Free trial
ScrapingBee gives you 1,000 free API credits to play with when you sign up — perfect for testing your first email-scraping script without any setup stress.

How to scrape emails from a website with Python and ScrapingBee

Quick answer (TL;DR)

If you just want a copy-paste script to scrape emails from website (single page) with Python and ScrapingBee, here you go. Later sections walk through legal checks, CSS selectors, regex, obfuscation, and scaling to many URLs.

# Quick setup (or use uv instead of pip):
# pip install scrapingbee beautifulsoup4 python-dotenv

import os
import re
from dotenv import load_dotenv
from scrapingbee import ScrapingBeeClient
from bs4 import BeautifulSoup

# Load SCRAPINGBEE_API_KEY from .env
load_dotenv()
API_KEY = os.getenv("SCRAPINGBEE_API_KEY")

if not API_KEY:
    raise RuntimeError("Missing SCRAPINGBEE_API_KEY in .env")

# Initialize ScrapingBee client
client = ScrapingBeeClient(api_key=API_KEY)

# Simple email-matching pattern for fallback extraction
EMAIL_REGEX = re.compile(
    r"[a-zA-Z0-9._%+-]+"
    r"@"
    r"[a-zA-Z0-9.-]+"
    r"\.[a-zA-Z]{2,}"
)

def fetch_html(url: str) -> str:
    """Fetch page HTML via ScrapingBee and return decoded text."""
    resp = client.get(
        url,
        params={
            # Enable this if the page loads data with JavaScript:
            # "render_js": "true",
        },
    )
    resp.raise_for_status()
    return resp.content.decode("utf-8", errors="ignore")

def extract_mailto_emails(html: str) -> set[str]:
    """Extract emails from <a href='mailto:...'> links."""
    soup = BeautifulSoup(html, "html.parser")
    emails: set[str] = set()

    for link in soup.select('a[href^="mailto:"]'):
        href = link.get("href", "")

        # Strip "mailto:" prefix
        if href.lower().startswith("mailto:"):
            href = href[len("mailto:"):]

        # Remove query params, e.g. ?subject=Hello
        if "?" in href:
            href = href.split("?", 1)[0]

        email = href.strip().lower()
        if email:
            emails.add(email)

    return emails

def extract_regex_emails(html: str) -> set[str]:
    """Fallback: scan raw HTML with a regex for any email-looking strings."""
    candidates = re.findall(EMAIL_REGEX, html)
    return {email.strip().lower() for email in candidates}

if __name__ == "__main__":
    TARGET_URL = "https://news.ycombinator.com/"

    # Fetch HTML through ScrapingBee
    html = fetch_html(TARGET_URL)

    # Two-layer extraction
    mailto_emails = extract_mailto_emails(html)
    regex_emails = extract_regex_emails(html)

    # Combine results
    all_emails = mailto_emails | regex_emails
    print(all_emails)

This gets you a quick set of emails from one page. From here you can add obfuscation handling, CSV export, and loops over many URLs. If you'd rather let ScrapingBee do most of the contact parsing for you, check out the Contact Scraper API.

Understand the basics of email scraping

Before we touch any code, let's get the idea straight in plain terms. Scraping emails from a website just means grabbing the page's HTML and spotting anything that looks like an email address. Nothing fancy, nothing sneaky. Still, it helps to keep a bit of common sense around purpose, consent, and local laws.

If you ever mix up scraping and crawling, here's a quick refresher: Web scraping vs Web crawling.

What is email scraping?

Email scraping is basically reading a page's HTML and pulling out strings that match an email pattern. That's the whole trick. We're not talking about buying sketchy "email lists" or doing anything shady. Just finding the contact info that's already sitting on a site.

People use this for small, legit tasks—finding a contact for a potential partner, filling in missing fields in a CRM, or pulling a few addresses for targeted outreach.

In the script we'll build later, ScrapingBee grabs the HTML for you, and Python does the pattern-matching. Nice and predictable.

Is it legal to scrape emails from websites?

Short answer: it depends on where you live and what you're doing. Laws like GDPR, CAN-SPAM, and local privacy rules come into play. Site terms of service matter too. Some sites simply don't allow automated collection, and you should respect that.

This isn't legal advice, so treat it as a friendly reminder: scrape emails only when you have a lawful basis and a good reason. Stick to opt-in lists, follow unsubscribe rules, and don't spam people. Keep it clean — this is bowling, not Vietnam, so there are rules.

When is scraping useful for businesses?

There are good use cases. Maybe you want to enrich your CRM with emails that are publicly listed. Maybe you're doing partner research and need the right contact. Maybe your outreach list is small, focused, and tied to a niche you actually work in.

That's fine. What you don't want is scraping the whole internet and blasting random emails. That's how you ruin your domain and annoy half the planet. So:

Targeted, relevant lists = good.
Huge random spam lists = nope.

Setup and preparations

Before we scrape emails from website, let's set up the basics. You only need modern Python 3, a project folder, a couple of libraries, and your ScrapingBee API key. Nothing heavy.

If you're new to selecting elements in Python, our guide on How to use CSS Selectors in Python? is a nice warm-up.

Install required libraries

We'll use three things:

scrapingbee — talks to the ScrapingBee API for fetching HTML.
beautifulsoup4 — helps us parse the HTML.
re — built-in Python regex engine for catching email patterns.

You can set this up two ways:

Using uv (recommended):

mkdir email-scraper
cd email-scraper
uv init
uv add scrapingbee beautifulsoup4

Using pip (also fine):

pip install scrapingbee beautifulsoup4

Python 3.9+ is a good baseline here.

Create your ScrapingBee account and get your API key

If you're not signed in yet, head to the ScrapingBee sign up page and create an account. You'll get 1,000 free trial credits right away, so you can start testing without any setup drama.

What you really need is your API key. This key is how you authenticate with the API — you drop it into your requests, and ScrapingBee handles the rest. You'll find the key in your dashboard:

Instead of hardcoding your API key, drop it into a .env file in your project folder. Keeps things clean and out of version control.

Create a file called .env:

SCRAPINGBEE_API_KEY=your_key_here

Now install python-dotenv (uv or pip — both work):

uv add python-dotenv
# or
pip install python-dotenv

In your main.py, read the key like this:

from dotenv import load_dotenv
import os

load_dotenv()  # reads .env in the current folder

API_KEY = os.getenv("SCRAPINGBEE_API_KEY")

if not API_KEY:
    raise RuntimeError("Missing SCRAPINGBEE_API_KEY in .env")

Choose your target websites

Start small. Pick one site that actually shows a public email address: usually a "Contact," "Team," or "About" page. Since you're scraping emails from website content, the niche matters: tighter niche → more relevant results → less noise.

Once you're confident the script works, you can scale up:

a short list of URLs in a CSV
a handful of domains from your research
or even results from a Google Search API workflow

But first, nail one domain. It keeps everything simple.

Understand site structure and email patterns

Before you scrape emails from website pages, take a quick look in DevTools and figure out what you're actually dealing with. You don't need a deep audit, just enough to spot the patterns.

Check for a few things:

mailto: links
Easiest case. You'll grab these with a CSS selector in 1 line.
Plain text emails
Some sites just drop the email into a footer or paragraph. That means regex will do most of the work.
Obfuscated formats
Stuff like info [at] example [dot] com or team(at)domain.com. These need simple replacements before regex can catch them.
Hidden or dynamic content
If the email shows in the browser but not in the page source, the page is rendered with JavaScript. That's your cue to enable render_js=true in ScrapingBee.
Multiple emails on one page
Look for patterns — same class, same container, same section. That helps you write one selector that catches them all cleanly.

A 20-second inspection like this tells you exactly which extraction method to use, so you don't waste time debugging code that was tracking the wrong thing from the start.

If you want a deeper dive into finding specific elements, check out our guide on How to find HTML elements by attribute using DOM Crawler?

Scrape emails from website using Python

Now let's walk through the actual scraping flow step by step. The idea is simple:

Use ScrapingBee to fetch the HTML for a page.
Use CSS selectors to grab obvious mailto: links.
Run a regex on the HTML to catch emails that aren't in links.
Optionally handle simple obfuscation tricks.
Do a tiny bit of validation/cleanup.

To keep things concrete, we'll use https://news.ycombinator.com/ as an example. On the front page, there's a span.yclinks block with a "Contact" link:

<span class="yclinks">
  ...
  <a href="mailto:hn@ycombinator.com">Contact</a>
</span>

We'll show how to grab that email using both CSS selectors and regex. If you want a deeper dive into selectors in general, check out: How to use CSS Selectors in Python?

Use CSS selectors to extract mailto links

First layer: grab all mailto: links. This is usually the cleanest source of emails, because the page is literally telling you "this is an email address".

With BeautifulSoup, CSS selectors are straightforward. For the Hacker News example, the "Contact" link lives in span.yclinks, and the email is inside an <a> whose href starts with mailto:. The selector looks like this:

span.yclinks a[href^="mailto:"]

Here's a small helper that:

fetches HTML with ScrapingBee
parses it with BeautifulSoup
extracts and cleans mailto: links

import os
from scrapingbee import ScrapingBeeClient
from bs4 import BeautifulSoup
from dotenv import load_dotenv

load_dotenv()
API_KEY = os.getenv("SCRAPINGBEE_API_KEY")

client = ScrapingBeeClient(api_key=API_KEY)

def fetch_html(url: str) -> str:
    """
    Fetch page HTML via ScrapingBee and return it as a decoded string.
    """
    try:
        response = client.get(
            url,
            params={
                # Uncomment if the page is JS-heavy:
                # "render_js": "true",
            },
        )
        response.raise_for_status()
    except Exception as exc:
        raise RuntimeError(f"Failed to fetch {url}: {exc}") from exc

    return response.content.decode("utf-8", errors="ignore")


def extract_mailto_emails(html: str) -> set[str]:
    """
    Extract emails from mailto: links using CSS selectors.
    """
    soup = BeautifulSoup(html, "html.parser")
    emails: set[str] = set()

    # Grab all <a> tags whose href starts with "mailto:"
    # span.yclinks is specific to Hacker News, the plain a[href^="mailto:"] works anywhere.
    for link in soup.select('span.yclinks a[href^="mailto:"], a[href^="mailto:"]'):
        href = link.get("href", "")

        # href looks like: "mailto:hn@ycombinator.com?subject=Hello"
        if href.lower().startswith("mailto:"):
            href = href[len("mailto:") :]

        # Strip query string if present
        if "?" in href:
            href = href.split("?", 1)[0]

        email = href.strip()
        if email:
            emails.add(email)

    return emails


if __name__ == "__main__":
    html = fetch_html("https://news.ycombinator.com/")
    mailto_emails = extract_mailto_emails(html)
    print(mailto_emails)

That already covers a lot of real-world pages, since many sites use mailto: links on contact or footer sections.

Using ScrapingBee extract_rules for emails

ScrapingBee can also extract data server-side before the HTML even reaches your Python script. This is super handy when you already know the structure of the page you're scraping. For our Hacker News example, the email lives inside:

<span class="yclinks">
  <a href="mailto:hn@ycombinator.com">Contact</a>
</span>

So we can set up an extract rule that grabs every href starting with mailto: inside span.yclinks.

Let's expand this properly and show how the rule plugs into the params of the ScrapingBee request.

Here's an extract rule that:

targets span.yclinks a[href^="mailto:"]
returns HTML

{
  "emails": {
    "selector": "span.yclinks a[href^=\"mailto:\"]",
    "output": "html"
  }
}

Now let's use this rule directly in Python inside the request:

# ... imports and token fetching ...

def fetch_email_with_extract_rules(url: str) -> list[str]:
    # Build extract rules for ScrapingBee
    extract_rules = {
        "emails": {
            "selector": 'span.yclinks a[href^="mailto:"]',
            "output": "html",
        }
    }

    try:
        response = client.get(
            url,
            params={
                "extract_rules": extract_rules,
            },
        )
        response.raise_for_status()
    except Exception as exc:
        # You can also log instead of printing
        print(f"[!] Failed to fetch extract_rules for {url}: {exc}")
        return []

    data = response.json()
    print(data)
    emails_html = data.get("emails", "")

    if not emails_html:
        return []

    soup = BeautifulSoup(emails_html, "html.parser")
    cleaned: List[str] = []

    for link in soup.select('a[href^="mailto:"]'):
        href = link.get("href", "")

        # mailto:hn@ycombinator.com?subject=Hello
        if href.lower().startswith("mailto:"):
            href = href[len("mailto:") :]

        if "?" in href:
            href = href.split("?", 1)[0]

        email = href.strip()
        if email:
            cleaned.append(email)

    return cleaned


if __name__ == "__main__":
    emails = fetch_email_with_extract_rules("https://news.ycombinator.com/")
    print(emails)

This approach is great when the structure of the site is stable. ScrapingBee does most of the heavy lifting, and your Python code only needs to tidy up the final strings before merging them with regex or obfuscation results.

Use regular expressions to find hidden emails

Not every email sits inside a mailto: link. Some are just plain text in a paragraph or a footer. Others might be in a random div. In those cases, scanning the full HTML with a regex is a solid second layer.

Learn about email regex in our tutorial.

Here's a simple, fairly standard pattern:

import re

EMAIL_REGEX = re.compile(
    r"[a-zA-Z0-9._%+-]+"
    r"@"
    r"[a-zA-Z0-9.-]+"
    r"\.[a-zA-Z]{2,}"
)

In plain language:

left side: username part (john.doe+test)
@ sign in the middle
right side: domain and TLD (example.co.uk, news.ycombinator.com, etc.)

You can make this stricter or looser depending on your use case. Looser patterns find more stuff but also more garbage. Stricter patterns miss edge cases but keep things clean.

Here's how you can use it on the HTML returned by ScrapingBee:

import re
# ... other imports ...

# ... load API key ...

# ... other code ...

EMAIL_REGEX = re.compile(
    r"[a-zA-Z0-9._%+-]+"
    r"@"
    r"[a-zA-Z0-9.-]+"
    r"\.[a-zA-Z]{2,}"
)

def extract_regex_emails(html: str) -> set[str]:
    """
    Scan raw HTML with a regex as a second layer.
    """
    candidates = re.findall(EMAIL_REGEX, html)
    return {email.strip().lower() for email in candidates}


if __name__ == "__main__":
    html = fetch_html("https://news.ycombinator.com/")
    regex_emails = extract_regex_emails(html)
    print(regex_emails)

Combine this with the mailto: approach:

all_emails = extract_mailto_emails(html) | extract_regex_emails(html)
print(all_emails)

On some sites, regex will find extra emails that aren't linked, for example in footer text or in inline copy.

Handle obfuscated email formats

Some sites try to block bots by obfuscating emails on purpose. Classic examples:

info [at] example [dot] com
support (at) example.com
name at example dot org

We can handle simple versions of this by:

Normalizing text: replacing common obfuscation strings
Running our email regex again on the cleaned text

Here's a small helper for that:

def normalize_obfuscated(text: str) -> str:
    """
    Replace simple obfuscation patterns like [at] and [dot].
    Adjust rules for your own niche if needed.
    """
    replacements = [
        ("[at]", "@"),
        ("(at)", "@"),
        (" at ", "@"),
        ("[dot]", "."),
        ("(dot)", "."),
        (" dot ", "."),
    ]

    normalized = text
    for old, new in replacements:
        normalized = normalized.replace(old, new)

    return normalized


def extract_obfuscated_emails(html: str) -> set[str]:
    """
    Handle simple obfuscated emails in the HTML text.
    """
    normalized = normalize_obfuscated(html.lower())
    return {email.strip() for email in re.findall(EMAIL_REGEX, normalized)}

And tying everything together:

if __name__ == "__main__":
    html = fetch_html("https://news.ycombinator.com/")

    mailto_emails = extract_mailto_emails(html)
    regex_emails = extract_regex_emails(html)
    obfuscated_emails = extract_obfuscated_emails(html)

    all_emails = mailto_emails | regex_emails | obfuscated_emails
    print(all_emails)

Real talk: if a site goes heavy on obfuscation (weird JavaScript, images instead of text, crazy encodings), that's usually a signal that they really don't want automated collection. Don't brute-force it. Respect the intent, keep it simple, and stay on the right side of "responsible scraping".

Deduplicate and save emails to CSV

Once you glue everything together, it's easy to end up with the same email found three different ways. Let's clean that up and save the result somewhere useful.

First, a tiny helper to normalize and deduplicate emails:

# ... other imports ...
from typing import Iterable, List

# ... load api token and all other code ...

def deduplicate_emails(emails: Iterable[str]) -> List[str]:
    """
    Normalize, deduplicate, and sort emails for stable output.
    """
    normalized = {
        email.strip().lower()
        for email in emails
        if email and "@" in email
    }
    return sorted(normalized)

Now a helper to save them into a CSV file:

# ... other imports ...
import csv
from pathlib import Path

# ... load api token and all other code ...

def save_emails_to_csv(emails: Iterable[str], filename: str = "emails.csv") -> None:
    """
    Save emails into a simple one-column CSV file.
    """
    path = Path(filename)
    emails_list = list(emails)

    with path.open("w", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)
        writer.writerow(["email"])
        for email in emails_list:
            writer.writerow([email])

    print(f"Saved {len(emails_list)} emails to {path.resolve()}")

And now we wire these into the main block:

if __name__ == "__main__":
    html = fetch_html("https://news.ycombinator.com/")

    mailto_emails = extract_mailto_emails(html)
    regex_emails = extract_regex_emails(html)
    obfuscated_emails = extract_obfuscated_emails(html)

    # Combine everything into one big bag
    combined = (
        list(mailto_emails)
        + list(regex_emails)
        + list(obfuscated_emails)
    )

    # Normalize + dedup
    unique_emails = deduplicate_emails(combined)
    print(unique_emails)

    # Save to CSV
    save_emails_to_csv(unique_emails, "emails_hn.csv")

This way you:

grab emails via multiple strategies
collapse them into a normalized, sorted, unique list
and drop that list into a simple email column in a CSV file you can open in anything (Excel, Sheets, your CRM import, whatever).

Final code version

So, here's the final version of our code:

import os
import csv
import re
from pathlib import Path
from typing import Iterable, List

from bs4 import BeautifulSoup
from dotenv import load_dotenv
from scrapingbee import ScrapingBeeClient

# Load SCRAPINGBEE_API_KEY from .env or env variables
load_dotenv()
API_KEY = os.getenv("SCRAPINGBEE_API_KEY")

if not API_KEY:
    raise RuntimeError(
        "Missing SCRAPINGBEE_API_KEY. "
        "Set it in your .env file or environment variables."
    )

client = ScrapingBeeClient(api_key=API_KEY)

# Basic email regex used across all extraction functions
EMAIL_REGEX = re.compile(
    r"[a-zA-Z0-9._%+-]+"
    r"@"
    r"[a-zA-Z0-9.-]+"
    r"\.[a-zA-Z]{2,}"
)


def fetch_html(url: str) -> str:
    """
    Fetch page HTML via ScrapingBee and return it as a decoded string.
    """
    try:
        response = client.get(
            url,
            params={
                # Uncomment if the page is JS-heavy:
                # "render_js": "true",
            },
        )
        response.raise_for_status()
    except Exception as exc:
        raise RuntimeError(f"Failed to fetch {url}: {exc}") from exc

    return response.content.decode("utf-8", errors="ignore")


def deduplicate_emails(emails: Iterable[str]) -> List[str]:
    """
    Normalize, deduplicate, and sort emails for stable output.
    """
    normalized = {
        email.strip().lower()
        for email in emails
        if email and "@" in email
    }
    return sorted(normalized)


def save_emails_to_csv(emails: Iterable[str], filename: str = "emails.csv") -> None:
    """
    Save emails into a simple one-column CSV file.
    """
    path = Path(filename)
    emails_list = list(emails)

    with path.open("w", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)
        writer.writerow(["email"])
        for email in emails_list:
            writer.writerow([email])

    print(f"Saved {len(emails_list)} emails to {path.resolve()}")


def extract_mailto_emails(html: str) -> set[str]:
    """
    Extract emails from mailto: links using CSS selectors.
    """
    soup = BeautifulSoup(html, "html.parser")
    emails: set[str] = set()

    # Grab all <a> tags whose href starts with "mailto:"
    # span.yclinks is specific to Hacker News, the plain a[href^="mailto:"] works anywhere.
    for link in soup.select('span.yclinks a[href^="mailto:"], a[href^="mailto:"]'):
        href = link.get("href", "")

        # href looks like: "mailto:hn@ycombinator.com?subject=Hello"
        if href.lower().startswith("mailto:"):
            href = href[len("mailto:") :]

        # Strip query string if present
        if "?" in href:
            href = href.split("?", 1)[0]

        email = href.strip()
        if email:
            emails.add(email)

    return emails


def extract_regex_emails(html: str) -> set[str]:
    """
    Scan raw HTML with a regex as a second layer.
    """
    candidates = re.findall(EMAIL_REGEX, html)
    return {email.strip().lower() for email in candidates}


def fetch_email_with_extract_rules(url: str) -> List[str]:
    """
    Example: use ScrapingBee extract_rules to grab mailto links server-side,
    then clean the results with BeautifulSoup.
    """
    extract_rules = {
        "emails": {
            "selector": 'span.yclinks a[href^="mailto:"]',
            "output": "html",
        }
    }

    response = client.get(
        url,
        params={
            "extract_rules": extract_rules,
        },
    )

    data = response.json()
    emails_html = data.get("emails", "")

    if not emails_html:
        return []

    soup = BeautifulSoup(emails_html, "html.parser")
    cleaned: List[str] = []

    for link in soup.select('a[href^="mailto:"]'):
        href = link.get("href", "")

        # mailto:hn@ycombinator.com?subject=Hello
        if href.lower().startswith("mailto:"):
            href = href[len("mailto:") :]

        if "?" in href:
            href = href.split("?", 1)[0]

        email = href.strip()
        if email:
            cleaned.append(email)

    return cleaned


def normalize_obfuscated(text: str) -> str:
    """
    Replace simple obfuscation patterns like [at] and [dot].
    Adjust rules for your own niche if needed.
    """
    replacements = [
        ("[at]", "@"),
        ("(at)", "@"),
        (" at ", "@"),
        ("[dot]", "."),
        ("(dot)", "."),
        (" dot ", "."),
    ]

    normalized = text
    for old, new in replacements:
        normalized = normalized.replace(old, new)

    return normalized


def extract_obfuscated_emails(html: str) -> set[str]:
    """
    Handle simple obfuscated emails in the HTML text.
    """
    normalized = normalize_obfuscated(html.lower())
    return {email.strip() for email in re.findall(EMAIL_REGEX, normalized)}


if __name__ == "__main__":
    TARGET_URL = "https://news.ycombinator.com/"

    html = fetch_html(TARGET_URL)

    mailto_emails = extract_mailto_emails(html)
    regex_emails = extract_regex_emails(html)
    obfuscated_emails = extract_obfuscated_emails(html)

    # Combine everything into one big bag
    combined = (
        list(mailto_emails)
        + list(regex_emails)
        + list(obfuscated_emails)
    )

    # Normalize + dedup
    unique_emails = deduplicate_emails(combined)
    print("Found emails:", unique_emails)

    # Save to CSV
    save_emails_to_csv(unique_emails, "emails_hn.csv")

Handle challenges while scraping

Even a simple scrape emails from website workflow can run into a few bumps once you start testing different sites. Some hide data behind JavaScript. Some trigger bot protection. Some don't want automated collection at all. Let's walk through the common issues and how to stay on the clean, polite side of scraping.

Dealing with JavaScript-rendered pages

Sometimes your script can't find an email that's clearly visible in the browser. That usually means the page loads content after the initial HTML, using JavaScript. Your Python code only sees the raw source, not the version the browser builds.

ScrapingBee can handle this for you. Just ask it to render JavaScript server-side:

html = client.get(
    "https://example.com/contact",
    params={
        "render_js": "true",   # wait for JS to run
    },
).content.decode("utf-8", errors="ignore")

Now your extraction functions will see what your browser sees.

If you want more detail on this kind of mismatch, we have a quick guide: Scraper doesn't see the data I see in the browser

Avoiding bot detection and CAPTCHAs

Most websites don't love aggressive bots. If you hit them too fast, too often, or from an IP that screams "scraper," you can run into all kinds of annoying stuff:

temporary IP blocks
rate limiting
CAPTCHAs
"Access denied" or "Are you human?" pages
random redirects to nowhere
pages that look empty even though the browser shows the data

When you scrape emails from website pages at any real scale, these blockers become the main headache. The scraping logic is the easy part — staying undetected is what usually eats your time.

ScrapingBee handles most of this for you out of the box:

rotating residential and datacenter proxies
browser-like fingerprints
correct headers and user-agents
session handling
automatic retries under the hood
reliable "looks like a real browser" behavior

All of this means you don't have to build your own proxy pools, tune request headers, or debug why a page suddenly returns a CAPTCHA at midnight.

Check out our tutorial Web Scraping without getting blocked to learn more.

Respecting robots.txt and rate limits

Good scraping is polite scraping. Before you scrape emails from website pages, always check the site's robots.txt — usually at a location like https://example.com/robots.txt. It tells you which sections automated agents should or shouldn't touch, and some sites explicitly forbid harvesting contact details.

A few simple habits go a long way:

Stay within the allowed paths in robots.txt.
Slow down your request rate — even a short delay shows respect.
Don't blast a site with thousands of URLs if it's a tiny blog or portfolio.
Avoid scraping login areas, internal dashboards, or anything marked as "no-go."
If a site hides emails behind images, JavaScript puzzles, or heavy obfuscation, take the hint and don't try to brute-force your way through.

Following these rules keeps you on the safe, friendly side of scraping — and helps ensure that the sites you rely on remain accessible and unblocked.

If you want a refresher on the difference between crawling and scraping (and why robots.txt matters), check out: Web Scraping vs Web Crawling.

Explore alternative tools and methods

Once you understand the basic "Scrape → Extract → Clean → Save" cycle in Python, you can branch out into other workflows too. Some are no-code, some are low-code, and some let you scale without touching regex ever again. The main idea: ScrapingBee stays the engine behind the scenes, but you choose the interface that fits your workflow.

Using APIs like ScrapingBee

ScrapingBee already includes purpose-built flows for extracting contact information — emails, social links, phone numbers, you name it. You don't always need to hand-roll CSS selectors or regex if you don't want to.

Two ways to shortcut everything:

Contact Info API — a dedicated endpoint that tries to extract emails and other contact data directly from the page.
More info here: Contact Scraper API
Dashboard Request Builder — the fastest way to test things without writing code.
There's a ready template called "extract email addresses".
Drop the URL in, tweak the options, hit Run, and you get JSON with the contacts already parsed.

This is perfect if you want results quickly without maintaining your own selectors or extraction logic.

Scraping emails with Google Sheets

If your workflow lives in spreadsheets (small lead lists, quick research tasks, internal ops), you can absolutely scrape emails straight from Google Sheets — no Python required. It's a good fit when you only have a few dozen or a few hundred URLs and want something quick and shareable.

There are two main ways to do it.

Call ScrapingBee directly from Sheets (recommended)

This is the cleanest approach because ScrapingBee handles all the heavy lifting: JavaScript rendering, anti-bot challenges, and consistent HTML fetching. You just wire up a tiny Apps Script function that calls the API and returns the extracted emails.

The flow looks like this:

Put your target URLs in column A.
Write a short Apps Script function that sends each URL to ScrapingBee (using your API key).
ScrapingBee returns JSON — either your raw HTML, extracted contact info, or both.
Parse that JSON into cells (one row per URL, or one email per row).

This gives non-developers a painless way to scrape emails from website pages without worrying about CSS selectors or browser issues. And because ScrapingBee handles rendering, it works even on dynamic sites where Google Sheets' native functions would fail.

It's suitable for small campaigns, quick research tasks, and internal ops teams that prefer Sheets over code. For thousands of URLs or automated refreshes, switching to Python will be easier to scale.

Learn more in the How to scrape websites with Google Sheets article.

Use built-in Google Sheets scraping functions (IMPORTXML, IMPORTDATA, REGEXEXTRACT)

Google Sheets also has native scraping formulas you can use when the page is simple and fully static. This works best when the email is publicly listed in plain HTML and doesn't require JavaScript rendering.

Using IMPORTXML() to extract mailto: links:

=IMPORTXML("https://www.pinchchinese.com/", "//a[starts-with(@href, 'mailto:')]/@href")

If you're new to XPath, this guide helps: Practical XPath for Web Scraping.

Using regex with Sheets:

You can also extract emails using REGEXEXTRACT() combined with IMPORTDATA():

=REGEXEXTRACT(
  CONCATENATE(IMPORTDATA("https://www.pinchchinese.com/")),
  "[a-zA-Z0-9+._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+"
)

This loads the page, merges it into one long string, and then pulls the first matching email.

A few limitations to keep in mind:

REGEXEXTRACT() only returns the first match — no multi-email support.
IMPORTXML() breaks easily if a site blocks automated requests or changes its HTML.
Neither function works on pages that require JavaScript rendering.
Google Sheets becomes noticeably slow when you throw large URL lists at it.

When to choose Sheets vs Python

Use Google Sheets if:

your list is small
you need something simple and shareable for a non-technical team
the target pages are static, public, and easy to scrape
you just want a quick one-off pull without setting up a full script

Use Python + ScrapingBee if:

you're scraping dynamic or JavaScript-heavy sites
you need reliability and real scale
you want to loop through hundreds or thousands of URLs
you need advanced extraction logic (regex, obfuscation cleanup, CSV export, etc.)

Sheets gives you quick wins. ScrapingBee + Python is the workflow you can actually depend on when things get serious.

Using browser extensions and no-code tools

For quick, one-off lookups (like grabbing the email from a single contact page) browser extensions or lightweight no-code scrapers do the job. They're simple and fast.

But they come with limits:

they don't scale well
they usually struggle with dynamic pages
you can't automate large batches
you can't build proper pipelines or cleaning logic around them

So they're fine for "hey, what email is on this page?", but not great for outreach workflows, research automation, or anything where consistency matters.

For serious scraping, either use Python + ScrapingBee or ScrapingBee's built-in templates and APIs. Those give you real control, real scale, and predictable results.

Start scraping emails from websites with ScrapingBee

By now you've seen the whole flow end to end: understand the legal side, set up Python with uv or pip, plug in your ScrapingBee API key, scrape the HTML, extract emails with selectors + regex, handle simple obfuscation, and deal with the usual blockers like JavaScript rendering or bot protection. Once this works for one site, scaling to dozens or hundreds of domains is just a loop away.

If you haven't already, grab a ScrapingBee account — the free trial gives you 1,000 credits, which is more than enough for testing. A basic request usually costs about 5 credits (more if you enable things like JS rendering or extra features), so you can easily scrape a bunch of pages, try different selectors, and run the full TL;DR script without worrying about burning through your balance.

You don't have to deal with proxies, rotating IPs, fingerprints, or headless browsers. ScrapingBee handles all of that behind the scenes so you can focus on building clean, targeted email lists without the chaos. Go for it.

Conclusion

Today we've discussed how to scrape emails from website. With a small Python script and ScrapingBee doing the heavy lifting, you can pull clean, usable contact info from real pages without wrestling with proxies, CAPTCHAs, or half-broken browser automations. You learned how to fetch HTML, extract emails with CSS selectors, catch hidden ones with regex, deal with simple obfuscation, and save everything in a tidy CSV. You also saw how to stay responsible — respect laws, respect sites, and keep your scraping focused and intentional.

From here, you can scale up to bigger lists, plug the results into your outreach tools, or switch to ScrapingBee's built-in contact extraction if you want even less code. Whether you're building a workflow for a team or just exploring, the path forward is the same: start small, keep it clean, and let ScrapingBee handle the messy parts.

I'd highly recommend the following articles as follow-up reads:

Frequently asked questions (FAQs)

Is it safe to scrape emails from any website I find?

Not always. Some sites forbid automated collection, and privacy laws vary by country. Check robots.txt, terms of service, and your legal basis before scraping. Stick to pages that clearly publish contact info and use the data responsibly to avoid compliance issues.

How do I know if a page needs JavaScript rendering to get emails?

If your script can't find an email that's visible in the browser, the page is likely rendered by JavaScript. View the page source—if the email isn't there but appears in DevTools, you need render_js=true with ScrapingBee to load the dynamic content server-side.

Yes. The same flow works for phone numbers, LinkedIn profiles, Twitter handles, or other contact fields. Just adjust your CSS selectors and regex patterns. For a simpler path, ScrapingBee's Contact Info API can return structured contact data with minimal setup.

What if I just want a no-code way to collect contact emails?

You can use ScrapingBee's Request Builder templates or Google Sheets integrations to extract emails without writing Python. These options are great for small lists or one-off tasks. For larger batches or automated workflows, the Python approach scales better.

Alexander M

Alexander is a software engineer and technical writer with a passion for everything network related.

How to scrape emails from a website with Python and ScrapingBee

Quick answer (TL;DR)

Understand the basics of email scraping

What is email scraping?

Is it legal to scrape emails from websites?

When is scraping useful for businesses?

Setup and preparations

Install required libraries

Create your ScrapingBee account and get your API key

Choose your target websites

Understand site structure and email patterns

Scrape emails from website using Python

Use CSS selectors to extract mailto links

Using ScrapingBee extract_rules for emails

Use regular expressions to find hidden emails

Handle obfuscated email formats

Deduplicate and save emails to CSV

Final code version

Handle challenges while scraping

Dealing with JavaScript-rendered pages

Avoiding bot detection and CAPTCHAs

Respecting robots.txt and rate limits

Explore alternative tools and methods

Using APIs like ScrapingBee

Scraping emails with Google Sheets

Call ScrapingBee directly from Sheets (recommended)

Use built-in Google Sheets scraping functions (IMPORTXML, IMPORTDATA, REGEXEXTRACT)

When to choose Sheets vs Python

Using browser extensions and no-code tools

Start scraping emails from websites with ScrapingBee

Conclusion

Frequently asked questions (FAQs)

Is it safe to scrape emails from any website I find?

How do I know if a page needs JavaScript rendering to get emails?

What if I just want a no-code way to collect contact emails?

You might also like:

How to Web Scrape Airbnb data (Easy Working Code Example)

How to scrape Google search results data in Python easily

Practical XPath for Web Scraping

How to scrape emails from a website with Python and ScrapingBee

Quick answer (TL;DR)

Understand the basics of email scraping

What is email scraping?

Is it legal to scrape emails from websites?

When is scraping useful for businesses?

Setup and preparations

Install required libraries

Create your ScrapingBee account and get your API key

Choose your target websites

Understand site structure and email patterns

Scrape emails from website using Python

Use CSS selectors to extract mailto links

Using ScrapingBee extract_rules for emails

Use regular expressions to find hidden emails

Handle obfuscated email formats

Deduplicate and save emails to CSV

Final code version

Handle challenges while scraping

Dealing with JavaScript-rendered pages

Avoiding bot detection and CAPTCHAs

Respecting robots.txt and rate limits

Explore alternative tools and methods

Using APIs like ScrapingBee

Scraping emails with Google Sheets

Call ScrapingBee directly from Sheets (recommended)

Use built-in Google Sheets scraping functions (IMPORTXML, IMPORTDATA, REGEXEXTRACT)

When to choose Sheets vs Python

Using browser extensions and no-code tools

Start scraping emails from websites with ScrapingBee

Conclusion

Frequently asked questions (FAQs)

Is it safe to scrape emails from any website I find?

How do I know if a page needs JavaScript rendering to get emails?

Can I use this approach to scrape other contact data like phone numbers or social links?

What if I just want a no-code way to collect contact emails?

You might also like:

How to Web Scrape Airbnb data (Easy Working Code Example)

How to scrape Google search results data in Python easily

Practical XPath for Web Scraping