How to web scrape with python selenium?

Using Python with Requests library can help you scrape data from static websites, that means websites that have the content within the server's original HTML response. However, you will not be able to get data from websites that load information dynamically, using JavaScript that gets executed after the server's initial response. For that, we will have to use tools that allows us to mimic a typical user's behavior, like Selenium.

Selenium is a set of different open-source projects used for browser automation. It supports bindings for all major programming languages, including Python. The Selenium API uses the WebDriver protocol to control a web browser, like Chrome, Firefox or Safari.

So Selnium will not only allow us to control a regular web browser to fetch data that loads dynamically, but it will also let us make actions that a regular user could, such as:

  • Clicking on buttons
  • Filling forms
  • Scrolling
  • Executing custom JS code
  • etc...

Let's take a simple example! This webpage will load a text (This is content) after 5 seconds.

Scraping it in Python with Requets will only result in an empty div element:

<!DOCTYPE html>
<html>
...

<div id="content"></div>

...
</html>

However, scraping the web page using Python with Selenium while adding some waiting time:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

options = Options()
options.headless = True

driver = webdriver.Chrome(options=options, executable_path="PATH_TO_CHROMEDRIVER") # Setting up the Chrome driver
driver.get("https://demo.scrapingbee.com/content_loads_after_5s.html")
time.sleep(6) # Sleep for 6 seconds
print(driver.page_source)
driver.quit()

Will result in the page we're looking for:

<!DOCTYPE html>
<html>
...

<div id="content">This is content</div>

...
</html>​​​​​

For more information about Python & Selenium, make sure to check this thorough blog article: Web Scraping using Selenium and Python

Related Python web scraping questions: