Data extraction in Go

One of the most important features of ScrapingBee, is the ability to extract exact data without need to post-process the request’s content using external libraries.

We can use this feature by specifying an additional parameter with the name extract_rules. We specify the label of elements we want to extract, their CSS Selectors and ScrapingBee will do the rest!

Let’s say that we want to extract the title & the subtitle of the data extraction documentation page. Their CSS selectors are h1 and span.text-20 respectively. To make sure that they’re the correct ones, you can use the JavaScript function: document.querySelector("CSS_SELECTOR") in that page’s developer tool’s console.

The full code will look like this:

package main

import (
  "encoding/json"
  "fmt"
  "io"
  "log"
  "net/http"
)

const API_KEY = "YOUR-API-KEY"
const SCRAPINGBEE_URL = "https://app.scrapingbee.com/api/v1"

func extract(target_url string, rules interface{}) ([]byte, error) {
  raw_rules, err := json.Marshal(rules)
  if err != nil {
    return nil, fmt.Errorf("Failed to encode rules: %s", err)
  }

  req, err := http.NewRequest("GET", SCRAPINGBEE_URL, nil)
  if err != nil {
    return nil, fmt.Errorf("Failed to build the request: %s", err)
  }

  q := req.URL.Query()
  q.Add("api_key", API_KEY)
  q.Add("url", target_url)
  q.Add("extract_rules", string(raw_rules))
  req.URL.RawQuery = q.Encode()

  client := &http.Client{}
  resp, err := client.Do(req)
  if err != nil {
    return nil, fmt.Errorf("Failed to request ScrapingBee: %s", err)
  }
  defer resp.Body.Close()

  if resp.StatusCode != http.StatusOK {
    return nil, fmt.Errorf("Error request response with status code %d", resp.StatusCode)
  }

  bodyBytes, err := io.ReadAll(resp.Body)
  if err != nil {
    return nil, fmt.Errorf("Failed to read the response body: %s", err)
  }

  return bodyBytes, nil
}

func main() {
  target_url := "https://www.scrapingbee.com/documentation/data-extraction"
  rules := map[string]interface{}{
    "title":    "h1",
    "subtitle": "span.text-20",
  }

  raw_json, err := extract(target_url, rules)
  if err != nil {
    log.Fatal(err)
  }

  fmt.Println(string(raw_json))
}

And as you can see, the result is:

'{"title": "Documentation - Data Extraction", "subtitle": "Extract data with CSS selector"}'

You can find more about this feature in our documentation: Data Extraction. And more about CSS selectors in W3Schools - CSS Selectors page.

Go back to tutorials