Contents

How to use proxy with node-fetch?

Pierre de Wulf Pierre de Wulf

Pierre is a data engineer who worked in several high growth startup before co-founding ScrapingBee. He is an expert in data processing and web scraping.

How to use proxy with node-fetch

Why node-fetch?

Node-fetch is a popular HTTP client package, with around twenty million downloads per week; according to npm, it is also one of the most used NPM packages of all-time.

Node-fetch's primary motivation was to implement a server-die API similar to window.fetch, a client-side one since it is implemented in the browser.

This API is primarily used to make asynchronous requests to load content inside the current page on the browser side. However, on the server-side of things, there are many more use-cases.

One of them being web-scraping. Web-scraping is the fact of programmatically requesting page content without opening a real-browser. Most of the websites don't really like being scraped, and using proxies if one of the many tools you can use to web-scrape without getting blocked.

And as you will see, this is not as straight-forward as it seems.

Solution

Unfortunately, Node-fetch does not natively support proxies. But there is a work-around. By using the node-https-proxy-agent, you will be able to easily forward all your requests through an https proxy.

Here is how to do it

// proxy_test.py


// npm install node-fetch
// npm install https-proxy-agent

const fetch = require('node-fetch');
const HttpsProxyAgent = require('https-proxy-agent');


(async () => {
    const proxyAgent = new HttpsProxyAgent('http://46.250.171.31:8080');
    const response = await fetch('https://httpbin.org/ip?json', { agent: proxyAgent});
    const body = await response.text();
    console.log(body);
})();

As you can see, this is rather simple.

To test this script, we will use a free proxy service that publicly provides HTTPS proxies to use. Be careful when using such a tool. Because those tools intercept all the HTTP traffic, you will send it to them. They can do anything with it. You can learn more about free proxies, risks, and benchmark here.

That being said, let's now check that it works. If I run node proxy_test.py, the response will be {ip: "46.250.171.31"}, bingo !

Things you should know about proxy.

Now that you know how to use web proxies with node-fetch, there are couples of things to know before using them.

Firstly, not all proxy providers are equal. Some proxies, such as the one used in this example, are free. It would be best if you used those with cautions as those services, by definition, have access to all the HTTP traffic you send them. Most of those free proxies services publicly display the IP addresses they are using, making it trivial for any website to block them. I suggest you only use those kinds of proxies for small usage, one-of script, or university homework, and only if you don't mind someone seeing all your HTTP traffic.

But using a paid proxy provider won't assure you that your HTTP request will be successful either. Some providers will use what is called “data-center” IPs, easily blocked by websites. Others might sell you “residential” IPs, more reliable but very expensive. You will notice that some providers are way faster than others.

If you do some research, you will see that you can also buy 4g proxies. Proxies that will use IP from a real phone. We have written an extensive benchmark of those providers.

Proxies are (probably) not enough.

Using proxies is just one technique that could allow you to web scrape without getting blocked. But there are many more things you should be aware of.

For example, you should definitely learn about User-Agent and headers. It is trivial for a website to detect traffic that is not coming from a real browser if you don't set up those. You should also consider using a real browser, called a headless browser if you are looking to scrape websites with a lot of JavaScript. More details here.

Conclusion

All of this to say that choosing a proxy provider is not easy, and several criteria are to be considered: success-rate, IP quality, speed, the shadiness of the provider, etc. This is why last year, Kevin and I built ScrapingBee, a web scraping API that takes all the burden of choosing a proxy provider and allows you to scrape any web page with a simple API call. Here is how you would do it with node-fetch, for example:

// npm install node-fetch

const fetch = require('node-fetch');


(async () => {
    const response = await fetch('https://app.scrapingbee.com/api/v1?api_key=<YOUR_API_KEY>&url=https://httpbin.org/ip');
    const body = await response.text();
    console.log(body);
})();

Easy, isn't it?

I hope you learned something new reading this article. If you wish to learn more about web scraping in JavaScript, we really recommend you take a look at this web scraping with NodeJS guide we have written.

Happy scraping.

Tired of getting blocked while scraping the web? Our API handles headless browsers and rotates proxies for you.