How to use CSS Selectors in Python?

What Are CSS Selectors?

CSS selectors are patterns that are used to reference HTML elements, primarily for the purpose of styling them using CSS. Over the years, they've evolved into one of the key ways to select and manipulate HTML elements using in-browser JavaScript and other programming languages such as Python.

Why Use CSS Selectors in Python?

In Python, CSS selectors are primarily used to select one or more HTML elements while working with web pages, usually for scraping and browser automation.

CSS Selector Examples For Scraping

  • Getting the heading of the page, using the h1 selector.
  • Getting all links in the page matching a particular pattern, using a[href*="pattern"].
  • Getting a span with the class price inside a div with the class product-info: div.product-info>span.price

CSS Selector Examples For Browser Automation

  • Selecting an email input: input[type=email]
  • Selecting a form with the class login: form.login
  • Selecting the button of the form containing the email input: input[type=email] ~ button

Using CSS Selectors In Python

There are multiple ways to use CSS Selectors in Python. The method you choose will depend on the library you are using. Some of the most famous libraries that allow using CSS Selectors are BeautifulSoup, Selenium, and cssselect.

CSS Selectors With BeautifulSoup

With BeautifulSoup, you can use CSS selectors by calling its select() or select_one() methods. You pass the CSS selector string as an argument to these methods.

  • The select() method searches for all HTML elements that match the provided CSS selector and returns them as a list of Tag objects. If no elements match, it returns an empty list.

  • Conversely, the select_one() method searches for the first HTML element that matches the CSS selector and returns that single Tag object. If no element matches, it returns None.

Here's some example code demonstrating their usage:

import re
import requests
from bs4 import BeautifulSoup

html = requests.get("https://scrapingbee.com").text
soup = BeautifulSoup(html)

#Selects all h2 elements
select_all_elements = soup.select("h2")

for h in select_all_elements:
    print(h.text)

"""
#Output: 
Render your web page as if it were a real browser.
Render JavaScript to scrape any website.
Rotate proxies tobypass rate limiting.
Use the power of AI web scraping.
Simple, transparent pricing.
Developers are asking...
Who are we?
Contact us
Ready to get started?
"""

#Selects the first h2 element
print(soup.select_one("h2").text)

#Output: Render your web page as if it were a real browser.

Using CSS Selectors With Selenium

Alternatively, you can use CSS selectors in Selenium to do the same thing. Here is some sample code:

from selenium import webdriver
from selenium.webdriver.common.by import By

DRIVER_PATH = '/path/to/chromedriver'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)

# Open Scrapingbee's website
driver.get("http://www.scrapingbee.com")

# Get the first h1 element using find_element
h1 = driver.find_element(By.CSS_SELECTOR, "h1")

print(h1.text)
# Output: 'Tired of getting blocked while scraping the web?'

# Get the first h1 element with the class of mb-33
h1 = driver.find_element(By.CSS_SELECTOR, "h1.mb-33")

print(h1.text)
# Output: 'Tired of getting blocked while scraping the web?'

Using The cssselect Python Library

Finally, there is also the cssselect library, which was originally a part of the lxml library. LXML is a low-level, lightweight alternative to BeautifulSoup. This library can be used with LXML as follows:

from cssselect import GenericTranslator
from lxml.etree import fromstring

html = fromstring('<div><h1>Heading 1</h1><h2>Heading 2</h2></div>')
selector = GenericTranslator().css_to_xpath('h1') ## 'h1' is the CSS selector
h1 = html.xpath(selector)[0]
print(h1.text)

The LXML library itself has some limitations, such as not being able to parse malformed HTML, and the cssselect library only translates CSS selectors to XPath selectors to be used with LXML. This makes the whole approach slightly cumbersome, but it pays off if you need efficiency.

Related CSS Selectors web scraping questions: