How to extract a table's content in Python

Data can be found online in various formats, but the most popular one is table format, especially that it displays information in a very structured and well organized layout. So it is very important to be able to extract data from tables with ease. 

And this is of the most important features of ScrapingBee's data extraction tool, you can scrape data from tables without having to do any post-processing of the HTML response. We can use this feature by specifying a table's CSS selector within a set of extract_rules, and let ScrapingBee do the rest!

In this example, we're going to scrape NASDAQ's top 100 stock prices from this demo page.

The CSS selector of the table that contains the information we need is .BasicTable-table.

So, our code will look like this:

from scrapingbee import ScrapingBeeClient # Importing SPB's client
client = ScrapingBeeClient(api_key='YOUR-API-KEY') # Initialize the client with your API Key, and using screenshot_full_page parameter to take a screenshot!

response = client.get("https://www.cnbc.com/nasdaq-100/", params={
'extract_rules':{
    "table_json" : {
        "selector": ".table",
        "output": "table_json" # Extracting data in JSON representation
     },
     "table_array" : {
     "selector": ".table",
     "output": "table_array" # Extracting data in Array representation
     },
  }
}) # Scrape!
if response.ok:
    print(response.content)
    # Further data manipulation can be done here

And the result will look like this:

{"table_json": [{"SYMBOL ": "AMD", "NAME ": "Advanced Micro Devices Inc", "PRICE ": "94.82", "CHANGE ": "-3.98", "%CHANGE ": "-4.03"},...], "table_array": [["AMD", "Advanced Micro Devices Inc", "94.82", "-3.98", "-4.03"],...]}

You can find more details about the differences between JSON Representation and Array Representation in our Data Extraction documentation page.

 

Go back to tutorials