What Are CSS Selectors?
CSS selectors are patterns that are used to reference HTML elements, primarily for the purpose of styling them using CSS. Over the years, they've evolved into one of the key ways to select and manipulate HTML elements using in-browser JavaScript and other programming languages such as Python.
Why Use CSS Selectors in Python?
In Python, CSS selectors are primarily used to select one or more HTML elements while working with web pages, usually for scraping and browser automation.
CSS Selector Examples For Scraping
- Getting the heading of the page, using the
h1
selector. - Getting all links in the page matching a particular pattern, using
a[href*="pattern"]
. - Getting a span with the class
price
inside a div with the classproduct-info
:div.product-info>span.price
CSS Selector Examples For Browser Automation
- Selecting an email input:
input[type=email]
- Selecting a form with the class
login
:form.login
- Selecting the button of the form containing the email input:
input[type=email] ~ button
Using CSS Selectors In Python
There are multiple ways to use CSS Selectors in Python. The method you choose will depend on the library you are using. Some of the most famous libraries that allow using CSS Selectors are BeautifulSoup
, Selenium
, and cssselect
.
CSS Selectors With BeautifulSoup
With BeautifulSoup, you can use CSS selectors by calling its select() or select_one() methods. You pass the CSS selector string as an argument to these methods.
The select() method searches for all HTML elements that match the provided CSS selector and returns them as a list of Tag objects. If no elements match, it returns an empty list.
Conversely, the select_one() method searches for the first HTML element that matches the CSS selector and returns that single Tag object. If no element matches, it returns None.
Here's some example code demonstrating their usage:
import re
import requests
from bs4 import BeautifulSoup
html = requests.get("https://scrapingbee.com").text
soup = BeautifulSoup(html)
#Selects all h2 elements
select_all_elements = soup.select("h2")
for h in select_all_elements:
print(h.text)
"""
#Output:
Render your web page as if it were a real browser.
Render JavaScript to scrape any website.
Rotate proxies tobypass rate limiting.
Use the power of AI web scraping.
Simple, transparent pricing.
Developers are asking...
Who are we?
Contact us
Ready to get started?
"""
#Selects the first h2 element
print(soup.select_one("h2").text)
#Output: Render your web page as if it were a real browser.
Using CSS Selectors With Selenium
Alternatively, you can use CSS selectors in Selenium to do the same thing. Here is some sample code:
from selenium import webdriver
from selenium.webdriver.common.by import By
DRIVER_PATH = '/path/to/chromedriver'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
# Open Scrapingbee's website
driver.get("http://www.scrapingbee.com")
# Get the first h1 element using find_element
h1 = driver.find_element(By.CSS_SELECTOR, "h1")
print(h1.text)
# Output: 'Tired of getting blocked while scraping the web?'
# Get the first h1 element with the class of mb-33
h1 = driver.find_element(By.CSS_SELECTOR, "h1.mb-33")
print(h1.text)
# Output: 'Tired of getting blocked while scraping the web?'
Using The cssselect Python Library
Finally, there is also the cssselect
library, which was originally a part of the lxml
library. LXML is a low-level, lightweight alternative to BeautifulSoup. This library can be used with LXML as follows:
from cssselect import GenericTranslator
from lxml.etree import fromstring
html = fromstring('<div><h1>Heading 1</h1><h2>Heading 2</h2></div>')
selector = GenericTranslator().css_to_xpath('h1') ## 'h1' is the CSS selector
h1 = html.xpath(selector)[0]
print(h1.text)
The LXML library itself has some limitations, such as not being able to parse malformed HTML, and the cssselect
library only translates CSS selectors to XPath selectors to be used with LXML. This makes the whole approach slightly cumbersome, but it pays off if you need efficiency.