Data extraction in Python

One of the most important features of ScrapingBee, is the ability to extract exact data without need to post-process the request’s content using external libraries.

We can use this feature by specifying an additional parameter with the name extract_rules. We specify the label of elements we want to extract, their CSS Selectors and ScrapingBee will do the rest!

Let’s say that we want to extract the title & the subtitle of the data extraction documentation page. Their CSS selectors are h1 and span.text-20 respectively. To make sure that they’re the correct ones, you can use the JavaScript function: document.querySelector("CSS_SELECTOR") in that page’s developer tool’s console.

The full code will look like this:

from scrapingbee import ScrapingBeeClient # Importing SPB's client
client = ScrapingBeeClient(api_key='YOUR-API-KEY') # Initialize the client with your API Key, and using screenshot_full_page parameter to take a screenshot!

response = client.get("https://www.scrapingbee.com/documentation/data-extraction/", params={
'extract_rules':{
                 "title": "h1",
                 "subtitle": "span.text-20"
                }
}) # Scrape!
if response.ok:
    print(response.content)

And as you can see, the result is:

b'{"title": "Documentation - Data Extraction", "subtitle": "Extract data with CSS selector"}'

You can find more about this feature in our documentation: Data Extraction. And more about CSS selectors in W3Schools - CSS Selectors page.

Go back to tutorials