Websites that render using JavaScript work in many different ways. Hence, waiting for the page to load might mean different things based on what we're looking to do. Sometimes the elements we need will appear on the first render, sometimes an app shell will load first and then the content. Sometimes we may even have to interact (click or scroll). Let's look at the different methods to wait in Playwright, so you can use the one that best works for your task.
1. Waiting For Selector In Playwright
You can wait for the page to load in Playwright using the wait_for_selector
method of the Page
object. By default, Playwright will pause before the page has fully loaded but this does not take into account any XHR or AJAX requests triggered after the page load. You can account for those by using the wait_for_selector
method and waiting for an element that confirms the page has fully loaded.
Here is a sample code that searches for the hottest sneakers on Pinterest and then waits for the pins to show up before saving a screenshot of the page:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
# Go to Pinterest
page.goto("https://www.pinterest.com/search/pins/?rs=ac&len=2&q=hottest%20sneakers")
# Wait for the pins to show up
page.wait_for_selector("div[data-grid-item=true]")
# Save the screenshot
page.screenshot(path="pinterest.png")
2. Playwright Auto-Wait
Unlike other web browser automation frameworks, Playwright has this cool feature where it will automatically wait for a locator if you're doing something with the locator (like clicking or taking a screenshot). Let's get the Pinterest screenshot using this method:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
# Go to Pinterest
page.goto("https://www.pinterest.com/search/pins/?rs=ac&len=2&q=hottest%20sneakers")
# Auto-wait for the pins and screenshot when they load
page.locator("div[data-test-id=search-feed]").screenshot(path="pinterest.png")
When we use the locator function with CSS selector, Playwright actually waits for that element to load instead of complaining that it couldn't find it. It will throw an error if it doesn't find the element within a timeout period though.
3. Playwright Wait For Timeout
You can use the page.wait_for_timeout
to simply wait for a specified number of milliseconds. This is a straightforward method for pages that have a complicated load mechanism or HTML structure. Let's see the code:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
# Go to Pinterest
page.goto("https://www.pinterest.com/search/pins/?rs=ac&len=2&q=hottest%20sneakers")
# Wait for 10 seconds
page.wait_for_timeout(10_000)
# Save the screenshot
page.screenshot(path="pinterest.png")
The optimal strategy here is to pick a timeout that works the majority of the time. You can write the code to retry a few times only if something goes wrong.
4. Playwright Network-Based Waiting
Playwright offers several methods to wait based on certain network conditions. For scraping, the most commonly used one is page.wait_for_load_state('networkidle')
. This wait method waits until all network requests that happen on a page load are completed and the network is back to an idle state. It is also very useful to wait for some data to load after triggering a scroll or a click. Let's see some demo code below:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
# Go to target page
page.goto("https://scrapingbee.com")
# Wait until network is idle
page.wait_for_load_state('networkidle')
# Save the screenshot
page.screenshot(path="scrapingbee.png")
You could also wait for load
or domcontentloaded
instead of networkidle
. In addition, Playwright also offers wait_for_request
and wait_for_response
methods which wait for the page to make certain network requests and until a certain response for a request is received, respectively.