Lightpanda is a headless browser built for a problem every automation engineer knows too well: Chrome works, until you have to run a lot of it. At scale, Chromium-based tools eat RAM, burn CPU, and turn headless browser automation for scraping and testing into an infrastructure problem.
That is where the Lightpanda browser stands out. Built from scratch in Zig, this headless browser for AI is designed for high-performance automation with a much smaller footprint than traditional Chrome-based stacks. It supports CDP for Playwright and Puppeteer, includes native MCP support, and ships with CLI tools for fetching and converting pages.
In this guide, we will look at how Lightpanda works, what makes it different, and where it fits in the broader headless browser landscape.

TL;DR
Lightpanda is worth a look if you need a lighter alternative to Chrome for headless browser automation. It is especially relevant for teams building AI agents, scrapers, or other server-side workflows where memory usage, startup time, and concurrency matter.
In practice, Lightpanda sits in an interesting middle ground: more capable than an HTTP client, but far leaner than a full Chromium stack. It gives you JavaScript execution, DOM access, and compatibility with existing automation tooling, while keeping the browser layer much closer to what many machine-driven workloads actually need.
Interested in AI scraping tools beyond Lightpanda? Check out our guide to Crawl4AI.
The headless browser problem: Why Chrome isn't always the answer
Traditional headless browsers get expensive fast
Headless Chrome is powerful, but it was never built to be a lightweight server tool. Once you run it at scale, memory usage climbs fast, CPU usage follows, and infrastructure costs start to pile up. What works fine in a small project can quickly become an operational burden when you need high concurrency.
There is also the maintenance side of it. Running browser clusters means dealing with updates, crashes, dependencies, and the usual edge cases that come with shipping a full browser engine in production. For many teams, that overhead becomes almost as painful as the automation task itself.
To learn more about headless browsers, check out our guide: What is a headless browser.
The current headless browser landscape still has a gap
Most headless browser options fall into one of three buckets:
- Chromium-based tools are the default, but they are heavy.
- WebKit-based options can be lighter, but they still carry much of the complexity of full browser engines.
- On the other end, HTTP clients like cURL or Python Requests are fast and inexpensive, but they cannot execute JavaScript, which makes them ineffective on a large part of the modern web.
That leaves a clear gap in the market: developers need something that can handle JavaScript and the DOM like a real browser, but without the cost and bulk of running Chrome on the server for every task.
Enter Lightpanda
Lightpanda is built for exactly that gap: it focuses on headless browser automation, agents, and server-side performance.
Many automation workloads do not need tabs, UI rendering, or the rest of the desktop-browser baggage. They just need fast page loading, JavaScript execution, DOM access, and a straightforward way to plug into agent or scraping workflows.
In other words, Lightpanda is a browser for machines, not humans. It fits modern automation infrastructure better: lighter than Chrome-based tools, more capable than raw HTTP clients, and better suited to AI-native workflows.
What is Lightpanda? Architecture and philosophy
Overview
Lightpanda is an open-source headless browser built from scratch in Zig for AI agents and web automation. The idea is straightforward: give developers a browser that can execute JavaScript, work with the DOM, and handle modern websites without dragging a full desktop browser stack into server workloads.
Built from scratch in Zig
At the core of Lightpanda is Zig, a low-level systems language chosen for tight control over memory and performance. That is a major part of the project's philosophy: if you want a truly lightweight headless browser, memory usage and startup time cannot be afterthoughts.
For JavaScript execution, Lightpanda uses V8, which gives it compatibility with the modern web instead of falling back to static HTML-only scraping. Around that, it relies on battle-tested components like libcurl for HTTP loading and html5ever for HTML parsing. In other words, the project is not trying to reinvent every wheel, only the parts of the browser stack that matter for machine-first automation.
The other major architectural choice is what Lightpanda leaves out: there is no full graphical rendering engine. That means less CPU work, lower memory pressure, and less browser overhead for server-side tasks that only need DOM access and JavaScript support.
Why it is different from the usual browser stack
Lightpanda is not another Chromium fork with a few features stripped out. It is a new browser built specifically for machine workflows, which is why it feels closer to an automation engine than a hidden desktop browser.
That also helps explain why Lightpanda fits AI workflows so naturally. It supports CDP for existing tools like Puppeteer and Playwright, but it also includes native MCP support and CLI-based page extraction for more agent-oriented use cases. The result is a headless browser designed around automation from day one, not adapted to it later.
Quick demo: Fetch a page and dump Markdown
One of the easiest ways to understand Lightpanda is to use its CLI. You can install the binary, point it at a URL, and dump the rendered page as Markdown. That is especially useful for AI agents, since you get JavaScript-processed content in a format that is much easier to pass into RAG pipelines, prompts, or downstream extraction steps.
Install the binary:
curl -L -o lightpanda \
https://github.com/lightpanda-io/browser/releases/download/nightly/lightpanda-x86_64-linux && \
chmod a+x ./lightpanda
Fetch a page and dump it as Markdown:
./lightpanda fetch --obey-robots --dump markdown --log-level info https://example.com
Here's the result:
# Example Domain
This domain is for use in documentation examples without needing permission. Avoid use in operations.
[Learn more](https://iana.org/domains/example)
This small command already shows the core appeal of Lightpanda: you get a lightweight headless browser that can load a page, execute JS, and return structured output without spinning up a full Chrome-based stack. If you prefer, you can also dump the page as raw HTML instead of Markdown by switching --dump markdown to --dump html.
Key features: Powering AI agents and scalable automation
- Lightpanda is built for headless usage, with a small memory footprint, fast startup, and published benchmark claims of 16x lower memory use and 9x faster execution than Chrome.
- It includes a native MCP server over stdio, which makes it easier to plug into AI agent workflows.
- It supports CDP, so you can connect existing tools like Puppeteer, Playwright, and other CDP clients with minimal changes.
- The CLI includes commands like
fetch,serve, andmcp, which cover page extraction, CDP server mode, and agent integrations. - It executes JavaScript and supports core Web APIs, so it can handle dynamic websites that plain HTTP clients cannot.
- It can dump pages as HTML, Markdown, or simplified semantic trees, which is useful for scraping pipelines and AI workflows.
- It includes network controls such as proxies, timeouts, concurrency limits, custom user-agent suffixes, and an option to obey
robots.txt. - Installation is straightforward through a one-line installer, official Docker images, or nightly binaries for Linux and macOS.
- The project also has a growing open-source community, with active documentation, a public GitHub repository, and a Discord server.
Use Lightpanda with existing automation tools
If you already use CDP-based tooling, Lightpanda is easy to drop in. The main change is that instead of launching a local Chromium instance, you start Lightpanda as a CDP server and point your client to its WebSocket endpoint.
Start the Lightpanda CDP server:
./lightpanda serve --host 127.0.0.1 --port 9222
JavaScript example with Puppeteer
In Puppeteer, the main change is connecting to Lightpanda through browserWSEndpoint instead of launching a local Chromium instance:
import puppeteer from "puppeteer-core";
const WS_ENDPOINT = "ws://127.0.0.1:9222";
const TARGET_URL = "https://wikipedia.com/";
async function main() {
let browser;
let context;
let page;
try {
// Connect to the running Lightpanda CDP server.
browser = await puppeteer.connect({
browserWSEndpoint: WS_ENDPOINT,
});
// Create an isolated browser context and page.
context = await browser.createBrowserContext();
page = await context.newPage();
// Open the target page and wait for the DOM to be ready.
await page.goto(TARGET_URL, {
waitUntil: "domcontentloaded",
timeout: 30_000,
});
// Extract all link href values from the page.
const links = await page.evaluate(() => {
return Array.from(document.querySelectorAll("a"))
.map((a) => a.getAttribute("href"))
.filter(Boolean);
});
console.log(links);
} catch (error) {
console.error("Puppeteer + Lightpanda error:", error);
process.exitCode = 1;
} finally {
// Clean up in reverse order.
if (page) {
await page.close().catch(() => {});
}
if (context) {
await context.close().catch(() => {});
}
if (browser) {
await browser.disconnect().catch(() => {});
}
}
}
await main();
If you already have a Puppeteer workflow, that endpoint swap is usually the main thing you need. The rest of the script can stay largely the same.
Python example with Playwright over CDP
If your stack is Python-based, you can do the same thing with Playwright by connecting over CDP:
from __future__ import annotations
from playwright.sync_api import Browser, BrowserContext, Error, Page, sync_playwright
CDP_ENDPOINT = "ws://127.0.0.1:9222"
TARGET_URL = "https://wikipedia.com/"
def main() -> None:
browser: Browser | None = None
context: BrowserContext | None = None
page: Page | None = None
# Keep cleanup inside the sync_playwright() context manager.
# Otherwise, you can hit "Event loop is closed" during teardown.
with sync_playwright() as playwright:
try:
# Connect to a running Lightpanda CDP server.
browser = playwright.chromium.connect_over_cdp(CDP_ENDPOINT)
context = browser.new_context()
page = context.new_page()
# Open the page and wait for the DOM to be ready.
page.goto(TARGET_URL, wait_until="domcontentloaded", timeout=30_000)
# Read the main heading text.
title = page.locator("h1").text_content()
print(title)
except Error as exc:
# Playwright-specific errors: connection issues, timeouts, etc.
print(f"Playwright error: {exc}")
except Exception as exc:
# Fallback for anything unexpected.
print(f"Unexpected error: {exc}")
finally:
# Close resources before leaving sync_playwright().
# Closing the context will close its pages as well.
if context is not None:
context.close()
if browser is not None:
browser.close()
if __name__ == "__main__":
main()
This is one of the nicest parts of Lightpanda: you get a lighter browser backend without having to rewrite your entire automation stack from scratch.
Performance benchmarks: The numbers don't lie
Lightpanda vs. Headless Chrome
In Lightpanda's published benchmark summary, the gap versus headless Chrome is substantial.
In the 25-process crawler benchmark, the project reports a peak memory footprint of 123 MB for Lightpanda versus 2.0 GB for Chrome, along with a crawl time of 4.81 seconds versus 46.70 seconds.
The benchmark itself crawls 933 URLs from the Amiibo demo site on an AWS m5.xlarge test machine, and the full methodology is documented in the project's benchmark details.
What this means in practice
For headless browser automation, those numbers are not just marketing. Lower memory usage can reduce cloud costs and let you run more concurrent jobs on the same hardware. Faster execution can shorten scraping runs, speed up automation pipelines, and make AI agents feel more responsive when browser interaction is part of the loop.
If your current setup is bottlenecked by Chromium overhead, a lighter browser backend can materially reduce both infrastructure cost and operational drag.
Reliability and testing
Performance only matters if the browser is stable enough to trust in real workloads. Lightpanda's repository documents several layers of testing, including unit tests, end-to-end tests that use the demo repository, and runs against standardized Web Platform Tests. That is a useful sign that the project is not only chasing speed, but also validating browser behavior against broader test suites and real automation scenarios.
When to choose Lightpanda
Ideal use cases
Lightpanda makes the most sense when you need a real browser engine without the usual Chrome overhead. It is a good fit for:
- AI agents that need fast, scriptable web access
- high-volume scraping of JavaScript-heavy sites
- cost-sensitive cloud workloads where memory efficiency matters
- custom headless browser automation pipelines that need more than an HTTP client
- teams that want to control and customize their own browser automation stack
Consider alternatives when...
Lightpanda is not the right fit for every browser job. You may want something else when:
- you need visual or pixel-perfect UI testing
- you must match the behavior of a specific Chrome or Firefox build as closely as possible
- you want a fully managed platform that handles proxies, anti-bot protection, scaling, and browser operations for you
Lightpanda vs. other headless options
- Headless Chrome / Chromium: more mature and broadly compatible, but much heavier in memory and CPU. Lightpanda is aimed at leaner server-side automation.
- Playwright / Puppeteer: these are automation frameworks, not browser engines. Lightpanda can act as the browser backend through CDP, so they are often complementary rather than direct alternatives.
- Selenium: better suited for broad browser testing and WebDriver-based workflows. Lightpanda is more focused on lightweight automation and AI-agent use cases.
- Lightweight DOM / JS runtimes such as jsdom, Happy DOM, or LinkeDOM: much lighter than a real browser and often good enough for tests, scraping, SSR, or DOM manipulation. But they do not offer the same level of browser compatibility or behavior as a dedicated browser engine, so they are closer to emulation than to full browser automation.
- HTTP clients like cURL or Requests: much faster and simpler, but they cannot execute JavaScript or interact with the DOM on modern sites.
- Managed browser or scraping APIs: easier operationally because they handle infrastructure for you, but with less control than running your own browser stack.
Lightpanda's sweet spot is fairly clear: it sits between heavy browser stacks and simple HTTP clients, giving you real browser capabilities in a much lighter package.
Check out our guide on scraping with Puppeteer!
Getting started with Lightpanda
Installation
You can get started with Lightpanda in a few different ways.
Use the one-line installer on Linux or macOS:
curl -fsSL https://pkg.lightpanda.io/install.sh | bash
Download a nightly build manually:
curl -L -o lightpanda \
https://github.com/lightpanda-io/browser/releases/download/nightly/lightpanda-x86_64-linux && \
chmod a+x ./lightpanda
Run it with Docker:
docker run -d --name lightpanda -p 127.0.0.1:9222:9222 lightpanda/browser:nightly
On Windows, install it inside WSL2, then use the Linux binary there.
Core CLI usage
Once installed, the CLI is straightforward.
Dump a page as HTML:
./lightpanda fetch --dump html https://example.com
Dump a page as Markdown:
./lightpanda fetch --obey-robots --dump markdown https://example.com
Start a CDP server for Puppeteer or Playwright:
./lightpanda serve --host 127.0.0.1 --port 9222
Start MCP mode for AI agent integrations:
./lightpanda mcp
When a managed solution like ScrapingBee makes more sense
You want browser results without browser operations
Lightpanda solves the browser engine problem. But many teams do not actually want to run their own browser layer in production. They want the outcome: reliable page access, JavaScript rendering, proxy handling, and structured data extraction without maintaining browser instances themselves.
That is where a managed solution like ScrapingBee makes more sense.
What ScrapingBee handles for you
Instead of self-hosting the full scraping stack, ScrapingBee gives you a managed layer on top of it:
- headless browser rendering for JavaScript-heavy pages
- rotating proxies and geotargeting
- anti-bot handling
- API-based access instead of browser fleet management
This is the part many teams underestimate. Running a fast browser is only one piece of the puzzle. Avoiding blocks, scaling reliably, and keeping pipelines stable is often the harder part.
The CLI makes large workflows easier
This is also where the ScrapingBee CLI becomes interesting. Instead of wiring everything together yourself, you can run higher-level workflows from the terminal:
- scrape pages in batches
- crawl sites starting from one or more URLs
- export crawl or batch results to NDJSON, CSV, or text
- schedule recurring jobs with cron-style intervals
So if Lightpanda gives you a lean browser engine, ScrapingBee covers more of the operational layer around it.
Better fit for AI extraction workflows
ScrapingBee also goes beyond raw page retrieval. It supports AI-powered extraction with plain-English queries and can return structured output, which is useful when you need data that is ready for downstream AI workflows instead of raw HTML.
That makes it a better fit when your goal is not just to access a page, but to turn web content into usable JSON, Markdown, or other structured formats for agents, pipelines, or RAG systems.
Not a replacement, but a different layer
So basically:
- Lightpanda is a lightweight browser engine you run yourself.
- ScrapingBee is a managed scraping and extraction platform.
So this is not really an "either-or" decision. If you want full control over the browser layer, Lightpanda is compelling. If you want to skip the operational burden and move closer to ready-to-use data extraction, ScrapingBee is often the better fit.
Try ScrapingBee today! Sign up with no credit card required and get 1,000 free scraping credits.
Demo: Extract data with Lightpanda in Python
Here is a simple Python example that connects to a running Lightpanda instance, opens books.toscrape.com, collects the books on the page, and prints structured data.
Start Lightpanda first:
./lightpanda serve --host 127.0.0.1 --port 9222
Install dependencies:
pip install playwright bs4
And then write your script:
from __future__ import annotations
import json
from typing import TypedDict
from urllib.parse import urljoin
from bs4 import BeautifulSoup, Tag
from playwright.sync_api import Browser, BrowserContext, Error, Page, sync_playwright
BASE_URL = "https://books.toscrape.com/"
CDP_ENDPOINT = "ws://127.0.0.1:9222"
class Book(TypedDict):
title: str | None
url: str | None
price: str | None
availability: str | None
rating: str | None
def extract_rating(rating_element: Tag | None) -> str | None:
"""Return the rating name from classes like 'star-rating Three'."""
if rating_element is None:
return None
classes = rating_element.get("class", [])
if not isinstance(classes, list):
return None
return next((cls for cls in classes if cls != "star-rating"), None)
def parse_books(html: str) -> list[Book]:
"""Parse the HTML and return structured book data."""
soup = BeautifulSoup(html, "html.parser")
books: list[Book] = []
for book in soup.select("article.product_pod"):
book_tag = book if isinstance(book, Tag) else None
if book_tag is None:
continue
link = book_tag.select_one("h3 a")
price = book_tag.select_one(".price_color")
availability = book_tag.select_one(".availability")
rating_element = book_tag.select_one(".star-rating")
href = link.get("href") if isinstance(link, Tag) else None
title = link.get("title") if isinstance(link, Tag) else None
books.append(
{
"title": title,
"url": urljoin(BASE_URL, href) if href else None,
"price": price.get_text(strip=True) if isinstance(price, Tag) else None,
"availability": (
availability.get_text(" ", strip=True)
if isinstance(availability, Tag)
else None
),
"rating": extract_rating(rating_element if isinstance(rating_element, Tag) else None),
}
)
return books
def main() -> None:
browser: Browser | None = None
context: BrowserContext | None = None
page: Page | None = None
with sync_playwright() as playwright:
try:
# Connect to a running Lightpanda CDP server.
browser = playwright.chromium.connect_over_cdp(CDP_ENDPOINT)
context = browser.new_context()
page = context.new_page()
# Load the page and wait until the product cards are present.
page.goto(BASE_URL, wait_until="domcontentloaded", timeout=30_000)
page.wait_for_selector("article.product_pod", timeout=10_000)
# Get the rendered HTML and parse it with BeautifulSoup.
html = page.content()
books = parse_books(html)
print(json.dumps(books, indent=2, ensure_ascii=False))
except Error as exc:
print(f"Playwright error: {exc}")
except Exception as exc:
print(f"Unexpected error: {exc}")
finally:
# Close resources before sync_playwright() exits.
if context is not None:
context.close()
if browser is not None:
browser.close()
if __name__ == "__main__":
main()
This script:
- Connects to a running Lightpanda instance over CDP instead of launching a local browser directly.
- Opens
books.toscrape.comand waits for the book cards to appear in the page HTML. - Grabs the rendered page content, then uses BeautifulSoup to extract each book's title, URL, price, availability, and rating.
- Prints the results as structured JSON, which makes the output easy to reuse in scraping pipelines or AI workflows.
Here's the sample result:
[
{
"title": "A Light in the Attic",
"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
"price": "£51.77",
"availability": "In stock",
"rating": "Three"
},
{
"title": "Tipping the Velvet",
"url": "https://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html",
"price": "£53.74",
"availability": "In stock",
"rating": "One"
}
]
Route Lightpanda through a proxy
Lightpanda also supports custom proxies, which is useful if you want IP rotation or managed proxy infrastructure without changing your extraction logic.
One option is to use ScrapingBee proxy mode. In that setup, Lightpanda stays the browser engine, while ScrapingBee handles the proxy layer.
First, register at ScrapingBee and proceed to your dashboard. You'll find an API key there.
To launch Lightpanda through ScrapingBee, pass the proxy when starting the CDP server:
./lightpanda serve \
--host 127.0.0.1 \
--port 9222 \
--http-proxy "https://YOUR_API_KEY:render_js=False&premium_proxy=True@proxy.scrapingbee.com:8887" \
--insecure-disable-tls-host-verification
A few notes:
YOUR_API_KEYis your ScrapingBee API key.render_js=Falseis a sensible default here, since Lightpanda is already rendering the page.- You can add other ScrapingBee parameters the same way, separated with
&. - The
--insecure-disable-tls-host-verificationflag disables strict TLS certificate checks. This is sometimes required when routing traffic through proxies like ScrapingBee, where certificate validation may fail due to how the proxy handles HTTPS connections.
Once Lightpanda is started with that proxy, your Python script does not need to change. All outgoing requests from that Lightpanda instance will go through the configured proxy automatically.
If you prefer, you can also configure the proxy from your Playwright script at the browser context level instead of passing it on startup. But for most examples, setting the proxy once on the Lightpanda side is simpler and keeps the extraction script unchanged.
Conclusion: Choosing your path to web automation
For teams that need more than just a browser engine
Lightpanda is one of the most compelling new headless browser projects to watch. Designed with AI agents and automation in mind, it gives developers a faster, lighter way to run browser workloads without carrying the full overhead of Chromium.
But for many teams, the real challenge is not just running a browser. It is everything around it: proxies, anti-bot handling, scaling, scheduling, and turning raw page content into structured data for downstream systems. If that is where your time goes, a managed solution like ScrapingBee may be the better fit. It gives you browser access and data extraction capabilities without forcing you to operate the entire stack yourself.
Next steps and resources
If you want full control over the browser layer, start here:
If you want managed, production-ready scraping and AI-friendly extraction instead:
Whichever path you choose, the goal is the same: reliable, scalable web access for automation and AI agents. If you want to skip the browser infrastructure overhead and move faster, try ScrapingBee.
Lightpanda browser: FAQ
What is Lightpanda?
Lightpanda is an open-source headless browser built from scratch in Zig for AI agents and automation. Rather than relying on a Chromium fork, it focuses on lightweight browser execution, DOM access, and JavaScript support for server-side workloads where performance and memory efficiency matter more than graphical rendering.
Why is Lightpanda getting so much attention?
Because it takes a different approach to headless browser automation. Lightpanda is designed specifically for headless use, with a strong focus on speed, low memory usage, and AI-friendly workflows. For developers tired of running heavy Chrome-based stacks, that makes it one of the more interesting browser projects to watch.
What kinds of projects is Lightpanda best suited for?
Lightpanda is a strong fit for AI agents, JavaScript-heavy scraping, large-scale automation, and cloud workloads where efficiency matters. It works well when you need real browser capabilities but want something leaner than Chromium. That makes it especially appealing for high-concurrency systems and custom automation pipelines.
Does Lightpanda work with Puppeteer or Playwright?
Yes. Lightpanda supports CDP, which means you can connect tools like Puppeteer and Playwright to it without rebuilding your entire workflow. That lowers the barrier to trying the Lightpanda browser in an existing stack, since the main change is often just pointing your automation client to a different browser endpoint.
Is Lightpanda always the best choice?
Not always. Lightpanda is a good fit when you want control over your own browser layer, but some teams do not want to manage proxies, anti-bot handling, scaling, and extraction pipelines themselves. In those cases, a scraping API like ScrapingBee can be a better fit because it handles more of the operational layer for you.

Ilya is an IT tutor and author, web developer, and ex-Microsoft/Cisco specialist. His primary programming languages are Ruby, JavaScript, Python, and Elixir. He enjoys coding, teaching people and learning new things. In his free time he writes educational posts, participates in OpenSource projects, tweets, goes in for sports and plays music.

