Pyppeteer is a handy way to let a browser do the repetitive work for you. The web is packed with useful data, but collecting it manually takes forever. Web scraping speeds things up by letting your code gather information on its own, and browser automation goes further by handling things like clicking, scrolling, and navigating just like a real user.
Python already has plenty of scraping tools, but sometimes you need the power of a real browser without the extra weight or complexity. Pyppeteer fills that gap. It gives you a straightforward way to control a headless (or full) Chrome instance from Python, making it easier to scrape dynamic sites, load JavaScript-heavy pages, and automate tasks that simple HTTP requests can't handle.

Food for thought
Pyppeteer is, at its core, the Python port of Puppeteer. It's a way to automate a real browser and scrape dynamic pages.
But here's the sad 2025 reality: the Pyppeteer Python project is only lightly maintained these days, and the owners say so directly on the GitHub page. That can make it risky for long-term or large-scale use. If you need something sturdier or easier to run in production, a service like a web scraping API from ScrapingBee is usually a more reliable option.
What is Pyppeteer?
Pyppeteer is basically the Python sidekick of the Puppeteer library from the Node world. It fills a similar niche to Selenium: you can spin up a real browser, run it in normal or headless mode, and automate whatever you need. Just keep in mind that it's built around Chromium and JavaScript-style workflows.
Learn how to use Selenium with Python in our tutorial.
Headless mode just means the browser runs quietly in the background without opening a window. For scraping, automation, or testing, this is usually the way to go as it loads faster, uses fewer resources, and keeps everything running behind the scenes without getting in your way.
Why Pyppeteer?
Tools like requests and BeautifulSoup are great for simple, static pages — but the moment a site leans on heavy JavaScript from React, Angular, Vue, or anything similar, they hit a wall. Those libraries never see the content the browser actually builds on the fly.
Pyppeteer steps in by letting you control a real browser instead of poking at pages through HTTP libraries. Since you're driving the actual page — scripts, DOM updates, all of it — you get way more flexibility. Some common things people use Pyppeteer for:
- Taking screenshots or generating PDFs of web pages
- Automating actions like typing, clicking, filling out forms, and doing basic UI tests
- Crawling single-page apps to get fully rendered HTML for prerendering or SEO
- Running tests in a real, up-to-date Chrome environment with full JS support
🤖 Check out how Pyppeteer performs vs other headless browsers when trying to go undetected by browser fingerprinting technology in our How to Bypass CreepJS and Spoof Browser Fingerprinting guide.
Pyppeteer vs Puppeteer — Key differences
If you've used Puppeteer before, Pyppeteer will feel pretty familiar. It basically ports Puppeteer's API into Python, so a lot of the concepts stay the same: you launch a browser, open a page, wait for selectors, and run evaluate calls to interact with the DOM.
The main shift is the syntax as you're writing Python instead of JavaScript, but you still live in the async/await world. Selectors, page.evaluate, and event handling all behave similarly, just with Python's coroutine style.
That said, Pyppeteer isn't a perfect mirror. Some newer Puppeteer features never made it over, and as of 2025 the Pyppeteer Python project is no longer actively maintained. If you want the freshest features and updates, running Puppeteer in Python through workarounds is tricky, and sticking to Node might be the smoother path.
You can learn how Puppeteer works in its native environment in our tutorial.
Implementing Pyppeteer
Now that you've got a feel for what Pyppeteer does, let's walk through how to actually use it in a clean Python setup.
Setting up your virtual environment
You'll need Python 3 for everything that follows. To keep things tidy, it's a good idea to work inside a virtual environment so your project has its own isolated packages.
I recommend using uv which is a super fast package manager and venv tool. Spin a new project in the following way:
uv init pyppeteer_bee_demo
cd pyppeteer_bee_demo
It'll setup a virtual environment automatically once we execute scripts inside. We'll install Pyppeteer next.
Installing Pyppeteer
Let's add Pyppeteer into our project:
uv add pyppeteer
A quick warning for M1/M2 Mac users: Pyppeteer can be finicky on arm64. If Chromium refuses to launch or crashes immediately, running your terminal under Rosetta usually solves the issue.
How to use Pyppeteer (step-by-step tutorial)
Time for some hands-on stuff. Let's walk through a minimal Pyppeteer tutorial: launch a browser, open a page, visit a URL, and close everything cleanly. This will give you a base you can reuse in your own Python Pyppeteer scripts.
Here's a simple example using asyncio.run() (works great with Python 3.11+). Paste it into the main.py in your project root:
import asyncio
from pyppeteer import launch
async def main():
# 1. Start the browser (headless by default)
browser = await launch(
executablePath=r"C:\Program Files\Google\Chrome\Application\chrome.exe"
)
# 2. Open a new tab
page = await browser.newPage()
# 3. Go to a page
await page.goto("https://example.com")
# 4. Print page title
title = await page.title()
print("Page title:", title)
# 5. Print part of the HTML body
content = await page.content()
print("Page content starts with:", content[:200], "...")
# 6. Close the browser
await browser.close()
if __name__ == "__main__":
asyncio.run(main())
This is the simplest of Pyppeteer examples, but it already shows the core workflow.
launch()starts a headless Chrome instance- In the example, the
executablePathis set manually because Pyppeteer often fails to auto-detect Chrome, tries to download its own Chromium build (and fails as the URL seems to be old and points to 404 page). - On macOS, the usual path is
/Applications/Google Chrome.app/Contents/MacOS/Google Chrome. - On Linux, common paths are
/usr/bin/google-chromeor/usr/bin/chromium. - You can also set the
PYPPETEER_SKIP_CHROMIUM_DOWNLOAD=1environment variable to skip Chromium download.
- In the example, the
newPage()opens a fresh tabgoto()loads the target URL- The example prints the page title so you know navigation worked
- It also shows the first ~200 chars of the HTML so you can confirm Pyppeteer actually rendered the page
- Finally, the browser shuts down cleanly with
browser.close()
This tiny script is enough to confirm your environment is working. In the next sections, we'll build on this pattern and move toward real scraping scenarios.
Run it with:
uv run python main.py
You should see something like this:
Page title: Example Domain
Page content starts with: <!DOCTYPE html><html lang="en"><head><title>Example Domain</title><meta name="viewport" content="width=device-width, initial-scale=1"><style>body{background:#eee;width:60vw;margin:15vh auto;font-famil ...
Capturing screenshots with Pyppeteer
To capture a page screenshot with Pyppeteer, use the page.screenshot() function. Let's update the script from the previous section:
import asyncio
from pyppeteer import launch
async def main():
browser = await launch(
executablePath=r"C:\Program Files\Google\Chrome\Application\chrome.exe"
)
page = await browser.newPage()
await page.goto("https://example.com")
title = await page.title()
print("Page title:", title)
# Capture screenshot:
await page.screenshot(path="example.png")
print("Screenshot saved as example.png")
await browser.close()
if __name__ == "__main__":
asyncio.run(main())
The key part here is page.screenshot(). By default, it captures the visible viewport and saves it to the file you specify with path. If the file doesn't exist yet, Pyppeteer creates it automatically.
You can also customize screenshots with extra options:
fullPage=True— capture the entire scrollable pagetype="jpeg"— save as JPEG instead of PNGquality=80— set JPEG quality (only works for JPEG, duh!)
Example:
await page.screenshot(path="full.png", fullPage=True)
This makes Pyppeteer scroll through the page internally and stitch the whole thing into one tall screenshot. Handy for long docs or dashboards.
Pyppeteer examples – From basics to advanced
Now that screenshots are working, let's push a bit further. Pyppeteer can handle most day-to-day automation tasks: generating PDFs, pretending to be a real browser with a custom user agent, clicking buttons, filling out forms, dealing with infinite scroll, and even managing cookies or sessions. Here's a quick tour of the most useful tricks you'll need.
1. Saving a page as PDF
Pyppeteer also supports PDF generation through page.pdf(). This works best on pages with clean layouts.
await page.pdf(path="page.pdf", format="A4")
Common tweaks:
await page.pdf(
path="page.pdf",
format="A4",
printBackground=True,
margin={"top": "1cm", "bottom": "1cm"}
)
2. Setting viewport and user agent
Useful when a site behaves differently for mobile/desktop or checks your browser identity.
await page.setUserAgent("MyCustomAgent/1.0")
await page.setViewport({"width": 1280, "height": 800})
Switch to a mobile-like viewport:
await page.setViewport({"width": 390, "height": 844, "isMobile": True})
3. Clicking buttons and filling forms
Interaction is straightforward:
await page.click("#login-btn")
await page.type("#email", "test@example.com")
await page.type("#password", "hunter2")
await page.click("#submit")
If you need to wait for navigation afterward:
await page.waitForNavigation()
4. Handling infinite scroll
A classic scraping trick. Use page.evaluate() to run JavaScript inside the browser:
async def scroll_to_bottom(page):
await page.evaluate("""
() => new Promise(resolve => {
let total = 0;
const dist = 300;
const timer = setInterval(() => {
window.scrollBy(0, dist);
total += dist;
if (total >= document.body.scrollHeight) {
clearInterval(timer);
resolve();
}
}, 200);
})
""")
Call it like:
await scroll_to_bottom(page)
This forces the page to load dynamic content that appears on scroll (social feeds, product lists, etc.).
5. Working with cookies and sessions
You can load or save cookies to persist login or session data across runs.
Get cookies:
cookies = await page.cookies()
print(cookies)
Set cookies:
await page.setCookie({
"name": "sessionid",
"value": "abc123",
"domain": "example.com"
})
This is handy when scraping dashboards or logged-in areas without re-authenticating every time.
These Pyppeteer examples should give you a solid set of tools for real-world scraping and automation: PDFs, screenshots, user agents, scrolling, forms, and sessions.
Scraping complex page content with Pyppeteer
Here's a more realistic example. Imagine you need to collect article ideas for a list of topics from educative.io/answers. The site updates results on the fly as you type, so you can't just fetch the HTML as the browser has to actually interact with the page.

Looking at how the interface behaves, you can map out the basic steps your script needs to follow:
- Find the search box on the page
- Type the topic you're looking for
- Wait for the dynamic results to load
- Extract all the article titles that appear
- Clear the search box
- Repeat the process for each topic in your list
Pyppeteer is a good fit here because it can mimic exactly what a user would do, making it possible to scrape content that only appears after JavaScript runs.
Setting up
Before we start coding the actual scraping logic, a quick reminder: Pyppeteer launches Chromium in headless mode by default. That's great for speed, but when you're building a script that clicks around and reacts to UI changes, it's often much easier to debug with a visible browser window.
Here's how to launch Pyppeteer in non-headless mode:
from pyppeteer import launch
# Launch browser in non-headless mode
browser = await launch(headless=False)
If you want the window to start maximized (handy for complex layouts), you can add a CLI flag via args:
from pyppeteer import launch
browser = await launch(
headless=False,
args=["--start-maximized"],
)
You can combine this with an executablePath if you're pointing to a custom Chrome/Chromium binary. With this in place, you'll actually see what the page is doing while you wire up the rest of the script.
1. Locating the search box
After opening the page, the first step is to target the search box. On the Educative page, the search field sits inside #__next and .ed-grid-main, but at the end of the day it's just a regular input[type="text"].
Here's the updated code with a clean selector, a printed page title, and a try/finally block to ensure the browser closes properly:
import asyncio
from pyppeteer import launch
async def main():
browser = await launch(
headless=False,
args=["--start-maximized"]
)
try:
page = await browser.newPage()
await page.setViewport({"width": 1600, "height": 900})
await page.goto("https://www.educative.io/answers")
# Print page title for confirmation
print("Page title:", await page.title())
# Locate the search input
search_box = await page.querySelector("input[type='text']")
if not search_box:
raise RuntimeError("Search box not found. The selector may have changed.")
finally:
await browser.close()
if __name__ == "__main__":
asyncio.run(main())
Don't forget to place your code inside an async function!
2. Typing a keyword into the search box
Once we have the search input, the next step is simple: type a topic into it so the page loads the related article suggestions.
Pyppeteer handles typing with page.type(), which simulates real keyboard input (including delays between keystrokes). This is important because many interactive sites won't react properly if you try to set the value directly through JavaScript.
Here's how the code looks when we type a keyword like "binary trees":
import asyncio
from pyppeteer import launch
async def main():
browser = await launch(
headless=False,
args=["--start-maximized"],
executablePath=r"C:\Program Files\Google\Chrome\Application\chrome.exe"
)
try:
page = await browser.newPage()
await page.setViewport({"width": 1600, "height": 900})
await page.goto("https://www.educative.io/answers")
print("Page title:", await page.title())
search_box = await page.querySelector("input[type='text']")
if not search_box:
raise RuntimeError("Search box not found. The selector may have changed.")
# Fill in the search box:
keyword = "binary trees"
await page.type("input[type='text']", keyword)
print(f"Typed keyword: {keyword}")
finally:
await browser.close()
if __name__ == "__main__":
asyncio.run(main())
3. Waiting for the results to load
After typing the keyword, the page needs a moment to fetch and render the matching articles. Educative loads items dynamically, so there isn't a guaranteed "ready" signal; in simple cases, the easiest method is to wait a short, fixed amount of time for the UI to update.
Using await asyncio.sleep(5) (5 seconds) is enough for this demo.
# ... import ...
async def main():
# ... prepare browser ...
try:
# ... open page and locate search box ...
# Fill in the search box:
keyword = "binary trees"
await page.type("input[type='text']", keyword)
print(f"Typed keyword: {keyword}")
await asyncio.sleep(5) # wait 5 seconds
print("Waiting for results...")
finally:
await browser.close()
# ... run the main() function ...
4. Extracting the article titles
Now that the results have had a few seconds to load, we can grab the titles from the list. Each result item lives somewhere inside .ed-grid, and the title itself is stored in a <span> whose class looks like this:
dropdownSearch_answer-search-title-hit__FYS9n
That last part is some auto-generated hash, so the exact class name can change at any time. Instead of hard-coding the whole ugly mess, we can use a substring matching.
Here's how we extract all the titles:
# ... import ...
async def main():
# ... prepare browser ...
try:
# ... open page and locate search box ...
# Fill in the search box:
keyword = "binary trees"
await page.type("input[type='text']", keyword)
print(f"Typed keyword: {keyword}")
await asyncio.sleep(5) # wait 5 seconds
print("Waiting for results...")
titles = await page.querySelectorAllEval(
'span[class*="dropdownSearch_answer-search-title-hit"]',
"nodes => nodes.map(n => n.textContent.trim())",
)
print(f"Found {len(titles)} results for '{keyword}':")
for t in titles:
print(" -", t)
finally:
await browser.close()
# ... run the main() function ...
So, here we use span[class*="dropdownSearch_answer-search-title-hit"] to find all the span tags that contain a class similar to the one provided.
5. Clearing the search box
Before searching for the next keyword, you need to clear whatever you typed previously. Since the input is a normal text field, the simplest and most reliable way is to:
- Click inside the input
- Select all text
- Press Backspace (or just repeatedly press Backspace if you prefer brute force)
Pyppeteer lets you simulate real keyboard actions, so clearing the field looks like this:
# ... import ...
async def main():
# ... prepare browser ...
try:
# ... open page and locate search box ...
# ... fill in the search box ...
titles = await page.querySelectorAllEval(
'span[class*="dropdownSearch_answer-search-title-hit"]',
"nodes => nodes.map(n => n.textContent.trim())",
)
print(f"Found {len(titles)} results for '{keyword}':")
for t in titles:
print(" -", t)
# Click on the search box:
await search_box.click()
# Select all text: Ctrl+A (Windows/Linux)
# Or try to use "Meta" key (Mac)
mod = "Meta" if sys.platform == "darwin" else "Control"
await page.keyboard.down(mod)
await page.keyboard.press("A")
await page.keyboard.up(mod)
# Delete the selected text
await page.keyboard.press("Backspace")
print("Search box cleared.")
finally:
await browser.close()
# ... run the main() function ...
6. Looping through multiple keywords
Right now the script handles only one keyword.
To scrape several topics in a row, all we need is a simple loop that repeats the same steps:
- Type the keyword
- Wait for results
- Extract titles
- Clear the search box
- Move to the next keyword
Here's an updated version of your script with a clean loop for multiple searches:
import asyncio
import sys
from pyppeteer import launch
async def main():
browser = await launch(
headless=False,
args=["--start-maximized"],
executablePath=r"C:\Program Files\Google\Chrome\Application\chrome.exe"
)
try:
page = await browser.newPage()
await page.setViewport({"width": 1600, "height": 900})
await page.goto("https://www.educative.io/answers")
print("Page title:", await page.title())
# Keywords to search for
keywords = ["binary trees", "linked list", "heaps"]
# Locate the search input once
search_box = await page.querySelector("input[type='text']")
if not search_box:
raise RuntimeError("Search box not found.")
for kw in keywords:
print(f"\n=== Searching for: {kw} ===")
# Type the keyword
await page.type("input[type='text']", kw)
print(f"Typed keyword: {kw}")
# Wait for results to load
await asyncio.sleep(5)
# Extract titles
titles = await page.querySelectorAllEval(
'span[class*="dropdownSearch_answer-search-title-hit"]',
"nodes => nodes.map(n => n.textContent.trim())",
)
print(f"Found {len(titles)} results:")
for t in titles:
print(" -", t)
# Clear the search box
await search_box.click()
mod = "Meta" if sys.platform == "darwin" else "Control"
await page.keyboard.down(mod)
await page.keyboard.press("A")
await page.keyboard.up(mod)
await page.keyboard.press("Backspace")
print("Search box cleared.")
await asyncio.sleep(1) # small pause between iterations
finally:
await browser.close()
if __name__ == "__main__":
asyncio.run(main())
What this loop does:
- Reuses the same browser and page (much faster)
- Types each keyword one by one
- Scrapes the dynamic results
- Properly clears the input before the next search
- Prints everything in a readable format
- Add or remove keywords freely, the script will handle them in sequence.
Now you can run this script and observe how Pyppeteer searches for the given terms. Nice!
Using Proxies with Pyppeteer
If you're scraping anything even mildly protected, running Pyppeteer behind a proxy is pretty much a must. Sites love rate-limits, CAPTCHAs, and other fun blockers, so changing your IP helps you stay under the radar. Here's how to set up a Pyppeteer proxy, authenticate it, and keep things rotating so you don't get shut down mid-run.
Launching Pyppeteer with a proxy
You can pass proxy settings directly through pyppeteer launch:
browser = await launch(
executablePath="/path/to/chrome",
args=[
"--proxy-server=http://123.45.67.89:8000"
]
)
This forwards all browser traffic through the proxy.
Authenticating your proxy
If your proxy needs a username and password, you can authenticate at the page level:
await page.authenticate({
"username": "myuser",
"password": "mypassword"
})
Make sure you call this before the first request is made on that page.
Rotating proxies
Even with a single proxy, you'll eventually hit rate limits. Rotating IPs (manually or through a provider) keeps your scraping steady.
You can rotate by:
- cycling through a list of proxy servers
- restarting the browser with a new proxy IP
- using a managed rotating-proxy service
This helps avoid blocks, session resets, and fingerprint checks.
Extra stealth options
Pyppeteer doesn't include Stealth Mode natively like Puppeteer Stealth, but you can still reduce fingerprints by:
- setting user agents
- spoofing viewport sizes
- clearing cookies between runs
- rotating proxies regularly
If you need an easier way to handle fully stealthy JavaScript scraping without maintaining proxies yourself, a hosted solution like a JavaScript web scraper can take care of that automatically.
Using a Pyppeteer proxy setup is one of the simplest ways to make your browser automation survive longer in the wild. Combine it with good fingerprint hygiene, basic stealth tricks, and rotating IPs, and you'll dodge most soft blocks without too much hassle.
Best practices for Pyppeteer in 2025
Pyppeteer still works fine for many workflows, but using it smoothly in 2025 means sticking to a few habits that keep your scripts stable and predictable. These tips come up a lot in real-world usage, and they fill in the gaps that aren't always obvious in the Pyppeteer documentation or basic Pyppeteer examples.
Use asyncio.run() instead of older loop patterns
Old tutorials often show asyncio.get_event_loop() or loop.run_until_complete(). Forget those because asyncio.run() is the clean, modern, less-buggy way to run your async main function.
Prefer waitForSelector() over waitFor()
waitFor() is vague and can cause flaky behavior since it waits for "something" to happen. waitForSelector() is explicit: wait until the element you need actually exists in the DOM.
Always close the browser (even on errors)
Wrap your browser in a try/finally so it doesn't get stuck running in the background.
browser = await launch(...)
try:
page = await browser.newPage()
await page.goto("https://example.com")
finally:
await browser.close()
Throttle your requests and respect robots.txt
Even with a headless browser, firing too many page loads too fast can get you rate-limited or blocked. Add small pauses, rotate IPs, and check robots.txt before scraping anything sensitive.
Use headless mode for speed and stability
Headless mode is faster, lighter, and more reliable for scraping and automation. Running the full browser UI slows everything down for no benefit unless you're debugging.
Ready to scrape without the overhead?
Pyppeteer is great when you need hands-on browser control, but it also comes with the usual browser-automation baggage: downloads breaking, proxies failing, headless quirks, Chrome paths, and all the other moving parts that slow you down. If your goal is reliable Python web scraping at scale, you might not want to babysit a full browser at all.
That's where a scraping API like ScrapingBee steps in. You get:
- No browser setup — no Chrome installs, no crashes, no patches
- Built-in proxies and automatic IP rotation
- Fast rendering for JavaScript-heavy sites
- Higher throughput without managing dozens of headless instances
Basically: all the power of a browser, none of the maintenance. If you're ready to skip the overhead and scrape at full speed, you can get started now!
Conclusion
Pyppeteer gives Python developers a direct line into full browser automation, making it possible to scrape modern, JavaScript-heavy websites that simple HTTP libraries can't handle. You've now seen how to launch Chrome, interact with dynamic elements, wait for UI updates, extract real rendered data, loop through multiple queries, take screenshots, generate PDFs, use proxies, and follow best practices for stable automation in 2025.
That said, Pyppeteer comes with its own overhead: browser binaries, async quirks, maintenance gaps, and the occasional broken download path. For small projects or hands-on scripting, it's still a useful tool. But when you need something production-ready, scalable, and easier to operate, a dedicated scraping API like ScrapingBee handles dynamic rendering, rotating IPs, proxies, and scaling without any of the browser maintenance.
Either way, you now have everything you need to choose the right tool for the job — and to approach Python web scraping with a lot more confidence.
Pyppeteer FAQs
How do I install Pyppeteer in Python?
Install it inside a virtual environment using your package manager. With uv: uv add pyppeteer.
Pyppeteer will attempt to download Chromium, but you can skip that by providing your own Chrome path.
What can I use Pyppeteer for?
You can automate real browsers: scraping dynamic sites, clicking elements, filling forms, capturing screenshots, generating PDFs, testing UI flows, and interacting with JavaScript-heavy pages. It's essentially Puppeteer-style automation but written for Python users.
Can Pyppeteer take screenshots and PDFs?
Yes. Use page.screenshot(path="file.png") for PNG/JPEG screenshots and page.pdf(path="file.pdf") for PDFs. You can enable fullPage=True, set formats, margins, and backgrounds depending on your needs.
How do I use proxies with Pyppeteer?
Launch Chrome with proxy arguments: launch(args=["--proxy-server=http://ip:port"]).
If authentication is required, call page.authenticate({"username": "...", "password": "..."}) before the first request. Rotating proxies helps avoid rate limits and blocks.
Is Pyppeteer still maintained in 2025?
Not really. The maintainers clearly state on GitHub that the project is minimally maintained. It still works for many tasks, but newer Puppeteer features rarely get ported, and Chromium downloads can break.
What's the best alternative to Pyppeteer for scraping?
For browser-style scraping without the infrastructure hassle, a hosted solution like a scraping API (ScrapingBee) is the most reliable option. It renders JavaScript, rotates IPs, handles proxies, and scales without requiring you to manage Chrome instances.



