Data extraction in NodeJS

One of the most important features of ScrapingBee, is the ability to extract exact data without need to post-process the request’s content using external libraries.

We can use this feature by specifying an additional parameter with the name extract_rules. We specify the label of elements we want to extract, their CSS Selectors and ScrapingBee will do the rest!

Let’s say that we want to extract the title & the subtitle of the data extraction documentation page. Their CSS selectors are h1 and span.text-20 respectively. To make sure that they’re the correct ones, you can use the JavaScript function: document.querySelector("CSS_SELECTOR") in that page’s developer tool’s console.


const scrapingbee = require('scrapingbee'); // Import ScrapingBee's SDK
const fs = require('fs');

async function get_title_and_subtitle(url) {
  var client = new scrapingbee.ScrapingBeeClient('YOUR-API-KEY'); // New ScrapingBee client
  var response = await client.get({
    url: url,
    params: { // Parameters:
    'extract_rules':{ // Data extraction
                     "title": "h1",
                     "subtitle": "span.text-20"
                    }
    }
  });
    return response;
}

get_title_and_subtitle("https://www.scrapingbee.com/documentation/data-extraction/").then(function (response) {
    var decoder = new TextDecoder();
    var text = decoder.decode(response.data);
    console.log(text);
}).catch((e) => console.log('A problem occurs : ' + e.response.data));

And as you can see, the result is:

'{"title": "Documentation - Data Extraction", "subtitle": "Extract data with CSS selector"}'

You can find more about this feature in our documentation: Data Extraction. And more about CSS selectors in W3Schools - CSS Selectors page.

 

Go back to tutorials