Documentation

How to use ScrapingBee

Basic Usage

ScrapingBee is meant to be the easiest scraping API available on the web.

To crawl a web page, you only need two things, your API key, available here, and the webpage URL you want to crawl.

Then, simply do this:

curl -L "https://app.scrapingbee.com/api/v1?api_key=YOUR_APIKEY&url=YOUR_URL"

You'll directly receive HTML in your terminal:

<html>
  <head>
     ...
  </head>
  <body>
     ...
  </body>
</html>

Please note that every URL that failed will be tried as many time as possible during 60 seconds.

So please be aware of this maximum timeout when writing your own code.

Advanced Usage

Warning : If you use one or many of those parameters, please always ensure that the URL parameter is the LAST parameter used in the API call! This is extremely important to avoid mixing your API parameters and the possible HTTP parameters of the URL you want to scrape.

Javascript rendering

You can ask ScrapingBee to fetch the URL you want to scrape directly or through a headless browser that will execute the Javascript code on the target page. This is the default behavior

This can be very useful if you are scraping a Single Page Application built on frameworks like React.js / Angular.js / JQuery or Vue.

To fetch the URL directly use render_js=False in your GET request.

To fetch the URL through a chrome headless browser, use render_js =True in the GET request.

Keep in mind that requests using render_js=True will cost 5 API credits, and that this is the default behavior. Use render_js=False if you don't need it.

Example with a dummy Single Page Application (SPA):

render_js=True (default behavior)

curl "https://app.scrapingbee.com/api/v1?api_key=YOUR_APIKEY&render_js=True&url=YOUR_URL"

<html>
  <head>
     ...
  </head>
  <body>
     <content>
     </content>
     <content>
     </content>
     <content>
     </content>
      <content>
     </content>
     <content>
     </content>
  </body>
</html>

render_js=False

curl "https://app.scrapingbee.com/api/v1?api_key=YOUR_APIKEY&render_js=False&url=YOUR_URL"

<html>
  <head>
  </head>
  <body>
  </body>
</html>

Javascript Execution

You can ask ScrapingBee API to execute arbitrary Javascript code inside our Headless Chrome instance.

This can be useful for example if you need to perform a scroll in case of an infinite scroll web page triggering Ajax requests to load more elements.

Or if you need to click some button before specific information is being displayed.

To do so, you need to add the parameter js_snippet with your code encoded to base64.

If you need help encoding you JS snippet in base64 you can find below how to do it:

Warning : Do not forget to correctly encode your URL before calling your API because there is a good chance that special characters such as + are in your base_64 string.

If you need help encoding your URL you can find below how to do it:

If your code needs some time to execute, you can also add an optional wait parameter with a value in milliseconds between 0 and 10000. The browser will then wait for this value before returning the page's HTML.

If you need some help setting all this up, do not hesitate to contact us :).

Header Forwarding

You might need to forward specific headers to the website you'd like to scrape.

In order to do so, you first have to set forward_headers to True and then pass your custom headers.

Warning : Everytime you make a request, many headers are set behind the scene. So to target the headers you want to forward to the website you'll have to prefix them with "Scn-". This prefix will be trimmed by our servers before the request actually hits the webpage you want to scrape.

Example :

If you want to send the header Accept-Language: En-US, when you make the request to ScrapingNinja, send the header: Scn-Accept-Language: En-US.

 curl --header "Scn-locale: En-US" "https://app.scrapingbee.com/api/v1?api_key=<API_KEY>&forward_headers=True&url=http://httpbin.org/headers" 

{
  "headers": {
    ...
    "Locale": "En-US", 
    ...
  }
}

Custom Cookies

You can pass custom cookies to the webpages you want to crawl.

To do this just passe the cookie string in the cookies parameter.

We currently only handle name and value of custom cookies. If you want to set multiple cookies just separate cookies with ;.

Example:

    cookies = "cookie_name_1=cookie_value1;cookie_name_2=cookie_value_2"

Warning : Do not forget to url encode your cookies parameter, ; and = are special character that needs to be url encoded.

Premium Proxy

For some difficult websites, you may need to use premium proxies (or Residential proxies). These proxies almost never get blocked and you should definitely try it in case of error codes or difficult to scrape websites, like search engines, social networks or hard to scrape E-commerce websites.

To do so, you need to add the parameter premium_proxy=True

Each request with this parameter will count as 100 API credits.

curl "https://app.scrapingbee.com/api/v1?api_key=YOUR_APIKEY&premium_proxy=True&url=YOUR_URL"

Note that these requests are always rendered with Javascript, even if you omit the render_js parameter.

Geolocation

In addition to premium proxies, you can also choose the proxy country from the following countries with the parameter country_code=COUNTRY_CODE

Here is the list of supported country codes (With ISO 3166-1 format https://en.wikipedia.org/wiki/ISO_3166-1 )

country_code Country Name
br Brazil
ca Canada
fr France
de Germany
gr Greece
il Israel
it Italy
mx Mexico
nl Netherlands
ru Russia
es Spain
se Sweden
us UnitedStates
gb UnitedKingdom

Session (coming soon)

Returned Code

Please find here the list of HTTP code returned by ScrapingBee.

Code Billed ? Meaning Solution
200 Yes Successfull Call
401 No No more credit available Please upgrade your plan or contact sale
404 Yes Url requested not found Pass valid URL
429 No Too many concurrent request Please upgrade your plan or contact sale
500 No Misc error Please retry
503 No Timeout Please retry
504 No Ip ban Please retry