How to use ScrapingBee
ScrapingBee is meant to be the easiest scraping API available on the web.
To crawl a web page, you only need two things, your API key, available here, and the webpage URL you want to crawl.
Then, simply do this:
curl -L "https://app.scrapingbee.com/api/v1?api_key=YOUR_APIKEY&url=YOUR_URL"
You'll directly receive HTML in your terminal:
<html>
<head>
...
</head>
<body>
...
</body>
</html>
Please note that every URL that failed will be tried as many time as possible during 60 seconds.
So please be aware of this maximum timeout when writing your own code.
Warning : If you use one or many of those parameters, please always ensure that the URL parameter is the LAST parameter used in the API call! This is extremely important to avoid mixing your API parameters and the possible HTTP parameters of the URL you want to scrape.
You can ask ScrapingBee to fetch the URL you want to scrape directly or through a headless browser that will execute the Javascript code on the target page. This is the default behavior
This can be very useful if you are scraping a Single Page Application built on frameworks like React.js / Angular.js / JQuery or Vue.
To fetch the URL directly use render_js=False
in your GET
request.
To fetch the URL through a chrome headless browser, use render_js =True
in the GET
request.
Keep in mind that requests using render_js=True
will cost 5 API credits, and that this is the default behavior. Use render_js=False
if you don't need it.
render_js=True
(default behavior)
curl "https://app.scrapingbee.com/api/v1?api_key=YOUR_APIKEY&render_js=True&url=YOUR_URL"
<html>
<head>
...
</head>
<body>
<content>
</content>
<content>
</content>
<content>
</content>
<content>
</content>
<content>
</content>
</body>
</html>
render_js=False
curl "https://app.scrapingbee.com/api/v1?api_key=YOUR_APIKEY&render_js=False&url=YOUR_URL"
<html>
<head>
</head>
<body>
</body>
</html>
You can ask ScrapingBee API to execute arbitrary Javascript code inside our Headless Chrome instance.
This can be useful for example if you need to perform a scroll in case of an infinite scroll web page triggering Ajax requests to load more elements.
Or if you need to click some button before specific information is being displayed.
To do so, you need to add the parameter js_snippet
with your code encoded to base64.
If you need help encoding you JS snippet in base64 you can find below how to do it:
Warning : Do not forget to correctly encode your URL before calling your API because there is a good chance that special characters such as +
are in your base_64 string.
If you need help encoding your URL you can find below how to do it:
If your code needs some time to execute, you can also add an optional wait
parameter with a value in milliseconds between 0 and 10000. The browser will then wait for this value before returning the page's HTML.
If you need some help setting all this up, do not hesitate to contact us :).
You might need to forward specific headers to the website you'd like to scrape.
In order to do so, you first have to set forward_headers
to True
and then pass your custom headers.
Warning : Everytime you make a request, many headers are set behind the scene. So to target the headers you want to forward to the website you'll have to prefix them with "Scn-". This prefix will be trimmed by our servers before the request actually hits the webpage you want to scrape.
Example :
If you want to send the header Accept-Language: En-US
, when you make the request to ScrapingNinja, send the header: Scn-Accept-Language: En-US
.
curl --header "Scn-locale: En-US" "https://app.scrapingbee.com/api/v1?api_key=<API_KEY>&forward_headers=True&url=http://httpbin.org/headers"
{
"headers": {
...
"Locale": "En-US",
...
}
}
You can pass custom cookies to the webpages you want to crawl.
To do this just passe the cookie string in the cookies
parameter.
We currently only handle name and value of custom cookies. If you want to set multiple cookies just separate cookies with ;
.
Example:
cookies = "cookie_name_1=cookie_value1;cookie_name_2=cookie_value_2"
Warning : Do not forget to url encode your cookies
parameter, ;
and =
are special character that needs to be url encoded.
For some difficult websites, you may need to use premium proxies (or Residential proxies). These proxies almost never get blocked and you should definitely try it in case of error codes or difficult to scrape websites, like search engines, social networks or hard to scrape E-commerce websites.
To do so, you need to add the parameter premium_proxy=True
Each request with this parameter will count as 100 API credits.
curl "https://app.scrapingbee.com/api/v1?api_key=YOUR_APIKEY&premium_proxy=True&url=YOUR_URL"
Note that these requests are always rendered with Javascript, even if you omit the render_js parameter.
In addition to premium proxies, you can also choose the proxy country from the following countries with the parameter country_code=COUNTRY_CODE
Here is the list of supported country codes (With ISO 3166-1 format https://en.wikipedia.org/wiki/ISO_3166-1 )
country_code | Country Name |
---|---|
br | Brazil |
ca | Canada |
fr | France |
de | Germany |
gr | Greece |
il | Israel |
it | Italy |
mx | Mexico |
nl | Netherlands |
ru | Russia |
es | Spain |
se | Sweden |
us | UnitedStates |
gb | UnitedKingdom |
Please find here the list of HTTP code returned by ScrapingBee.
Code | Billed ? | Meaning | Solution |
---|---|---|---|
200 | Yes | Successfull Call | |
401 | No | No more credit available | Please upgrade your plan or contact sale |
404 | Yes | Url requested not found | Pass valid URL |
429 | No | Too many concurrent request | Please upgrade your plan or contact sale |
500 | No | Misc error | Please retry |
503 | No | Timeout | Please retry |
504 | No | Ip ban | Please retry |