How to extract a table's content in Ruby

Data can be found online in various formats, but the most popular one is table format, especially that it displays information in a very structured and well organized layout. So it is very important to be able to extract data from tables with ease.

And this is of the most important features of ScrapingBee's data extraction tool, you can scrape data from tables without having to do any post-processing of the HTML response. We can use this feature by specifying a table's CSS selector within a set of extract_rules, and let ScrapingBee do the rest!

In this example, we're going to scrape NASDAQ's top 100 stock prices from this demo page.

The CSS selector of the table that contains the information we need is .BasicTable-table.

So, our code will look like this:

require 'net/http'
require 'net/https'
require 'addressable/uri'
require 'json'

# Get

def scrape_table(user_url, rules)

    uri = Addressable::URI.parse("https://app.scrapingbee.com/api/v1/")
    api_key = "YOUR-API-KEY"
    uri.query_values = {
      'api_key'  => api_key,
      'url' => user_url,
      'extract_rules' => rules
    }
    uri = URI(uri)

    # Create client
    http = Net::HTTP.new(uri.host, uri.port)
    http.use_ssl = true
    http.verify_mode = OpenSSL::SSL::VERIFY_PEER

    # Create Request
    req =  Net::HTTP::Get.new(uri)

    # Fetch Request
    res = http.request(req)

    # Print response body
    return res
rescue StandardError => e
    puts "HTTP Request failed (#{ e.message })"
end

url = "https://demo.scrapingbee.com/table_content.html"
rules = {
    "table_json" : {
        "selector": ".table",
        "output": "table_json" # Extracting data in JSON representation
    },
    "table_array" : {
        "selector": ".table",
        "output": "table_array" # Extracting data in Array representation
    }
}
rules = rules.to_json # Convert the hash object into JSON format
request = scrape_table(url, rules)

puts request.body

And the result will be like this:

{"table_json": [{"SYMBOL ": "AMD", "NAME ": "Advanced Micro Devices Inc", "PRICE ": "94.82", "CHANGE ": "-3.98", "%CHANGE ": "-4.03"},...], "table_array": [["AMD", "Advanced Micro Devices Inc", "94.82", "-3.98", "-4.03"],...]}

You can find more details about the differences between JSON Representation and Array Representation in our Data Extraction documentation page.

Go back to tutorials