How to Scrape Infinite Scroll Websites Using Puppeteer

Q: How do I make Puppeteer scroll to the bottom of a page?

You can use page.evaluate() to execute window.scrollTo(0, document.body.scrollHeight) within the browser context. In practice, evaluate runs code in the page context and can be combined with incremental scroll steps. Placing this in a loop allows you to continuously hit the bottom of the page to trigger new content loads.

Maxine Meurer | 22 June 2026 (updated) | 13 min read

Table of contents

Web scraping methods must adapt based on how a website displays its data. Today, many single-page applications (SPAs) and social media platforms rely on infinite scrolling, loading new content continuously as the user scrolls down the page.

While this provides a seamless user experience, it makes extracting data much more complicated because traditional static HTML parsers cannot capture content that hasn't rendered yet. Creating a Puppeteer infinite scroll script allows you to bypass these limitations by emulating real user behavior. By the end of this article, you will know exactly how to automate this process and scrape dynamic data seamlessly.

How to Scrape Infinite Scroll Websites Using Puppeteer

TL;DR

To scrape infinite scroll websites, you cannot rely on simple HTTP requests. You need to use a headless browser like Puppeteer to programmatically scroll to the bottom of the page, wait for the DOM to load new elements, and extract the required data as it appears.

Key Takeaways

Dynamic loading breaks traditional scrapers: Infinite scroll relies on JavaScript to fetch data, meaning static HTML parsers (like Cheerio alone) will miss the content.
Puppeteer emulates human behavior: By using Puppeteer, you can automate a real browser to scroll down, tricking the site into loading more items.
Timing is everything: You must use built-in functions like waitForFunction or set specific timeouts to allow network requests to finish before trying to extract the newly loaded data.

Prerequisites & Goals

To fully benefit from this post, you should have the following:

Some experience with writing ES6 JavaScript.
A proper understanding of promises and some experience with async/await.
Node.js installed on your development machine.

How to Scrape Data from Infinite Scrolling Websites Using Puppeteer

Presuming you already have npm installed, create a folder to store your Puppeteer project and install the library:

mkdir infinite-scroll
cd infinite-scroll
npm install --save puppeteer

Using npm installs both Puppeteer and a compatible version of the Chromium browser. Note: On Linux machines, Puppeteer might require some additional dependencies.

Now open your favorite text editor and create a file named scrape-infinite-scroll.js. Add the following boilerplate to load the necessary modules. Alongside Puppeteer, you also import the built-in fs module for file output.

// Puppeteer will not run without these lines
const fs = require('fs');
const puppeteer = require('puppeteer');

Next, create a function that defines exactly what data you want to extract. On e-commerce layouts, that often means targeting selectors for fields such as product name.

Tip: Open up your browser console and examine the page HTML to determine the correct tags and classes for your extractedElements constant.

function extractItems() {
  const extractedElements = document.querySelectorAll('#container > div.blog-post');
  const items = [];
  for (let element of extractedElements) {
    items.push(element.innerText);
  }
  return items;
}

The next function is scrapeItems, the core logic of the script. Here is how it works:

Controls scrolling and extraction: Uses page.evaluate as the method that executes code in the page context and repeatedly runs your extractItems function.
Loops until the goal is met: Keeps scrolling and extracting until at least itemCount items have been scraped.
Handles Promises cleanly: Since Puppeteer's methods are Promise-based, wrapping the logic in an async function and using await lets you write the code as if it were executing synchronously.
A while loop handles the repeated scrolling cycle until the target count is reached.

async function scrapeItems(
  page,
  extractItems,
  itemCount,
  scrollDelay = 800,
) {
  let items = [];
  try {
    let previousHeight;
    while (items.length < itemCount) {
      items = await page.evaluate(extractItems);
      previousHeight = await page.evaluate('document.body.scrollHeight');
      await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');
      await page.waitForFunction(`document.body.scrollHeight > ${previousHeight}`);
      await page.waitForTimeout(scrollDelay);
    }
  } catch(e) { }
  return items;
}

If an unexpected interruption occurs, add recovery actions such as retrying the step or using a log entry so you can trace what failed.

Finally, you need a block of code to run the process. This final chunk handles the following:

Sets up the browser: Puppeteer launch starts the browser session, opens a new page with a specific viewport, and setting headless: false lets you watch the browser work.
Navigating to the target web page: Sends the browser to the target URL.
Executes the scraper: Calls your scrapeItems function and tells it how many items to grab (in this case, 10).
Saves the data: Takes the extracted array and writes it to a new text file (items.txt).
Cleans up: Closes the browser once the task is complete.

(async () => {
  // Set up Chromium browser and page.
  const browser = await puppeteer.launch({
    headless: false,
    args: ['--no-sandbox', '--disable-setuid-sandbox'],
  });
  const page = await browser.newPage();
  page.setViewport({ width: 1280, height: 926 });

  // Navigate to the example page.
  await page.goto('https://mmeurer00.github.io/infinite-scroll-example/');

  // Auto-scroll and extract desired items from the page. Currently set to extract ten items.
  const items = await scrapeItems(page, extractItems, 10);

  // Save extracted items to a new text file.
  fs.writeFileSync('./items.txt', items.join('\n') + '\n');

  // Close the browser.
  await browser.close();
})();

Important: Everything you need for item extraction must live inside the extractItems function definition. When you call items = await page.evaluate(extractItems), Puppeteer serializes the function and executes it as an injected extractItems method inside the browser context. Any variables defined in your Node.js environment will be unavailable during execution.

When finished, your complete file should look like this:

// Puppeteer will not run without these lines
const fs = require('fs');
const puppeteer = require('puppeteer');

function extractItems() {
  const extractedElements = document.querySelectorAll('#container > div.blog-post');
  const items = [];
  for (let element of extractedElements) {
    items.push(element.innerText);
  }
  return items;
}

async function scrapeItems(
  page,
  extractItems,
  itemCount,
  scrollDelay = 800,
) {
  let items = [];
  try {
    let previousHeight;
    while (items.length < itemCount) {
      items = await page.evaluate(extractItems);
      previousHeight = await page.evaluate('document.body.scrollHeight');
      await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');
      await page.waitForFunction(`document.body.scrollHeight > ${previousHeight}`);
      await page.waitForTimeout(scrollDelay);
    }
  } catch(e) { }
  return items;
}

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    args: ['--no-sandbox', '--disable-setuid-sandbox'],
  });
  const page = await browser.newPage();
  page.setViewport({ width: 1280, height: 926 });

  await page.goto('https://mmeurer00.github.io/infinite-scroll-example/');

  const items = await scrapeItems(page, extractItems, 10);

  fs.writeFileSync('./items.txt', items.join('\n') + '\n');

  await browser.close();
})();

Run the script with:

node scrape-infinite-scroll.js

This opens the demo page in the headless browser and scrolls until ten #container > div.blog-post items are loaded, saving the extracted text to ./items.txt. If a site uses a Load More pattern, first locate the button and use page.click() with a short delay before resuming scroll-based extraction. To view the results:

open ./items.txt

You can also run tail ./items.txt to see the last 10 scraped items in your terminal.

If the script cannot reach the target item count, page.waitForFunction will wait up to 30 seconds (customizable) for the page height to increase after each scroll. By making Puppeteer scroll to bottom, the function waits for the page height to increase after each scroll, ensuring new items are rendered. page.waitForTimeout(scrollDelay) applies a delay in milliseconds, and logging each iteration can help debug progress. If the height stops changing within that window, the loop breaks and the script exits cleanly.

What is Infinite Scrolling?

Before you attempt to scrape data from a never-ending timeline, it is essential to ask yourself, what exactly is infinite scrolling?

Infinite scrolling is a web-design technique that loads content continuously as the user scrolls down the page. There is an Infinite Scroll JavaScript plugin that automatically adds the next page, preventing a full page load. The first version was created in 2008 by Paul Irish and was a breakthrough in web development. The plugin uses AJAX to pre-fetch content from a subsequent page and then adds it directly to the current page. There are many other ways to produce infinite scrolling content, such as data delivered through API endpoints to incrementally deliver more data, processing data from multiple endpoints before injecting something into a webpage, or data delivery in real-time through WebSockets.

Advantages

Discovery applications: Ideal for interfaces where users browse without knowing exactly what they're looking for.
Mobile devices: Creates a smoother UX on smaller screens with limited space.
User engagement: Continuously loading results keeps users in the application longer.

Disadvantages

Poor page performance: More content loading on a single page means progressively slower performance as users scroll deeper.
Poor item search and location: Users cannot bookmark a specific position in the stream. Leaving and returning means losing their place.
Loss of footers: Footer content, often containing important links and information, becomes inaccessible.

What is Puppeteer?

Puppeteer is a headless Chrome Node API that lets you emulate scrolling on the page and retrieve the desired data from rendered elements, so it has a clear role to play in Node-based scraping workflows. It behaves almost exactly like a regular browser, but programmatically and without a user interface, which is useful when interacting with the page while navigating dynamic content.

Some of what you can do with Puppeteer:

Generate screenshots and PDFs of pages.
Crawl a SPA and generate pre-rendered content.
Automate form submission, UI testing, and keyboard input.
Run tests directly in the latest version of Chrome using the latest JavaScript and browser features.
Capture a timeline trace of your site to help diagnose performance issues.
Test Chrome Extensions.

Intercepting Network Requests for Infinite Scroll Data

Modern sites like Twitter and Instagram don't just load HTML when you scroll, they fire background XHR or Fetch requests to hidden API endpoints that return clean JSON data. Instead of scraping the rendered HTML, you can intercept these calls directly using Puppeteer's page.on('response', ...) event listener, which also makes it easier to preserve the correct order of records than scraping changing DOM nodes.

This approach is often faster and more reliable than parsing DOM elements, especially when class names are dynamic or the HTML structure changes frequently, and you can parse the JSON response directly instead of parsing HTML. Here's a basic example:

await page.setRequestInterception(true);

page.on('response', async (response) => {
  const url = response.url();
  if (url.includes('/api/feed') || url.includes('/timeline')) {
    const data = await response.json();
    console.log(data); // Clean JSON payload from the API
  }
});

await page.goto('https://example.com/feed');
await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');

Here, console.log(data) is useful to log the response output during debugging.

This pattern works well when each response contains structured product data rather than generic page markup.

To find the right endpoint, open your browser's Network tab, scroll the target page, and filter for XHR or Fetch requests. Once you identify the API call pattern, you can often replicate it with a simple HTTP client, no headless browser required.

Managing Memory Limits During Infinite Scrolling

Infinite scrolling loads thousands of DOM elements over time. Left unchecked, this can cause the headless browser to run out of memory and crash.

Two practical ways to handle this:

1. Clear already-scraped elements from the DOM periodically

await page.evaluate(() => {
  const items = document.querySelectorAll('#container > div.blog-post');
  // Remove all but the last few to keep the DOM lean
  items.forEach((el, i) => {
    if (i < items.length - 5) el.remove();
  });
});

2. Save data in batches instead of accumulating a massive array

Instead of collecting everything in memory and writing once at the end, write to disk in chunks:

const BATCH_SIZE = 50;
let batch = [];

while (/* scroll condition */) {
  const newItems = await page.evaluate(extractItems);
  batch.push(...newItems);

  if (batch.length >= BATCH_SIZE) {
    fs.appendFileSync('./items.txt', batch.join('\n') + '\n');
    batch = [];
  }
}

Both strategies reduce the memory footprint of long-running scrapes and keep the browser stable across hundreds or thousands of scroll iterations.

Infinite Scroll vs. Pagination: Scraping Differences

Understanding the structural difference between these two patterns determines which scraping approach to use.

Pagination changes the URL with each page load (e.g., ?page=2, ?page=3). This makes it easy to scrape with simple, fast HTTP requests using lightweight tools like Cheerio, no browser needed.

Infinite scroll keeps the URL static while content loads dynamically via JavaScript. You can't just request page=2 because there is no page 2. The content only exists once the browser has scrolled and triggered the JavaScript fetch. This is why infinite scrolling requires emulating life-like scrolling using a headless browser like Puppeteer.

	Pagination	Infinite Scroll
URL changes	Yes	No
JavaScript required	No	Yes
Best tool	Cheerio / HTTP client	Puppeteer
Speed	Fast	Slower

Improving Infinite Scrolling Performance

As more content loads, the browser must manage thousands of DOM elements. This can cause slowdowns when scraping large infinite scrolling feeds.

One way to improve performance is to break scrolling into smaller increments, and adding a small delay between scroll steps can improve stability and trigger lazy loading more reliably:

await page.evaluate(() => {
  window.scrollBy(0, 500);
});

This works on that line because window.scrollBy moves the current page incrementally instead of jumping straight to the bottom, which lets the browser process smaller chunks of content while maintaining more stable memory usage.

Challenges of Web Scraping Infinite Scrolling Websites

A few things can break an infinite scroll scraper that wouldn't affect a standard one:

Dynamic class names: React and Vue apps often generate class names at build time (e.g., div.sc-abc123). These change between deployments, breaking selectors that rely on them. Use stable attributes like data-testid, aria-label, or structural selectors instead.

Bot detection: Many sites actively detect and block automated scrolling. Jumping instantly to the bottom of a page is a clear signal. Adding randomized delays between scroll increments, varying scroll distances, and using realistic viewport sizes all help reduce detection risk. Waiting for key resources to finish loading before the next action can also reduce flaky extraction on dynamic pages. For sites with aggressive protection, consider bot detection bypass techniques with Puppeteer Stealth.

A robust scraping script should not make you worry about every site-specific front-end detail, but you still need retries and defensive checks for edge cases.

Alternative Scraping Methods for Infinite Scrolling

Puppeteer is powerful but not always the right tool. If the site fires XHR/Fetch requests when scrolling (see the Network Interception section above), you may be able to hit the API endpoint directly using a lightweight HTTP client. Cheerio works well for static pages where the full HTML is returned in the initial request; you typically import the library first and load the HTML into it to parse static responses. Check out the Web Scraping with node-fetch article for more on that approach.

Puppeteer Infinite Scroll Scraping — FAQs

How do I make Puppeteer scroll to the bottom of a page? You can use page.evaluate() to execute window.scrollTo(0, document.body.scrollHeight) within the browser context. In practice, evaluate runs code in the page context and can be combined with incremental scroll steps. Placing this in a loop allows you to continuously hit the bottom of the page to trigger new content loads.

Why is my Puppeteer script not loading infinite scroll content? This usually happens because the script doesn't wait long enough for the network request to finish fetching new data. Always implement a timeout or use page.waitForFunction to verify the page height has increased before extracting data, and if content loads slowly, use page.waitForTimeout with a delay measured in milliseconds.

Can I scrape an infinite scroll page without Puppeteer? Yes, if you can intercept the background network requests (XHR/Fetch) the page makes as you scroll. By finding the hidden API endpoint, you can often request the JSON data directly using a simpler HTTP client.

Is emulating life-like scrolling necessary for scraping? On heavily protected websites, yes. Instantaneously jumping to the bottom of the page can trigger bot detection systems. Adding random delays and scrolling in smaller increments helps mimic real human behavior.

How do I stop Puppeteer from scrolling forever? Define an exit condition in your loop, a maximum number of items to extract (itemCount), a maximum number of scroll attempts, or a check to see if document.body.scrollHeight has stopped increasing.

What causes infinite scrolling scrapers to stop unexpectedly? Infinite scrolling scripts may stop when network requests fail, page structures change, or memory usage becomes excessive. Using try-catch blocks, taking recovery actions like retries, and adding a log for failed iterations can improve scraper reliability.

Conclusion

Thanks to Puppeteer, you can extract data from infinite scrolling applications quickly and efficiently. The script in this article should serve as a starting point for emulating human-like scrolling on any dynamic application to avoid basic bot detection.

If you enjoyed this article, give ScrapingBee a try and get your first 1,000 requests free. Check out the getting started guide here!

Scraping the web is challenging, anti-scraping mechanisms grow by the day. ScrapingBee lets you skip the noise and focus on what matters most: the data.

Before you go, check out these related reads:

Maxine Meurer

Maxine is a software engineer and passionate technical writer, who enjoys spending her free time incorporating her knowledge of environmental technologies into web development.