How to use a proxy with node-fetch?

09 November 2020 | 4 min read

Why node-fetch?

Node-fetch is a popular HTTP client library, with around twenty million downloads per week; according to NPM, it is also one of the most downloaded NPM packages of all-time.

Node-fetch's primary motivation was to implement a server-side API similar to window.fetch, a client-side one; since it is implemented in the browser.

This API is primarily used to make asynchronous requests to load content on the browser side. However, on the server-side of things, there are many more use-cases.

One of those use cases is web-scraping. Web-scraping is the fact of programmatically fetching web page without opening a real-browser. Most websites don't really like being scraped, and using proxies if one of the many tools you can use to scrape the web without getting blocked.

And as you will see, this is not as straight-forward as it seems.

cover image

Solution

Unfortunately, Node-fetch does not natively support proxies. But there is a work-around. By using the node-https-proxy-agent, you will be able to easily forward all your requests through an HTTPS proxy.

Here is how to do it:

// proxy_test.py


// npm install node-fetch
// npm install https-proxy-agent

const fetch = require('node-fetch');
const HttpsProxyAgent = require('https-proxy-agent');


(async () => {
    const proxyAgent = new HttpsProxyAgent('http://46.250.171.31:8080');
    const response = await fetch('https://httpbin.org/ip?json', { agent: proxyAgent});
    const body = await response.text();
    console.log(body);
})();

As you can see, this is easy.

To test this script, we will use a free proxy service that publicly provides HTTPS proxies. Be careful when using such a tool. Those tools intercept all the HTTP traffic you will send through them. They can do anything with this traffic. You can learn more about free proxies, risks, and benchmark here.

That being said, let's now check that it works. If I run node proxy_test.py, the response will be {ip: "46.250.171.31"}, bingo !

Things you should know about proxies.

Now that you know how to use web proxies with node-fetch, there are a couple of things to know before using them.

First, not all proxy providers are equal. Some proxies, such as the one used in this example, are free. It would be best if you used those with cautions. By definition, they have access to all the HTTP traffic you send them. Most of those free proxies services publicly display the IP addresses they are using, making it trivial for any website to block them. I suggest you only use those kinds of proxies for small usage, one-of scripts, or university homework, and only if you don't mind someone seeing all your HTTP traffic.

But using a paid proxy provider won't assure you that your HTTP request will be successful either. Some providers will use what is called "data-center" IPs, easily blocked by websites. Others might sell you "residential" IPs, more reliable but very expensive. You will notice that some providers are way faster than others.

If you do some research, you will see that you can also buy 4g proxies. Proxies that will use IP from a real phone. We have written an extensive benchmark of those providers.

Proxies are (probably) not enough.

Using proxies is just one technique that could allow you to web scrape without getting blocked. But there are many more things you should be aware of.

For example, you should definitely learn about User-Agent and headers. It is trivial for a website to detect traffic that is not coming from a real browser if you don't set up those. You should also consider using a real browser, called a headless browser if you are looking to scrape websites with a lot of JavaScript. More details here.

Conclusion

All of this to say that choosing a proxy provider is not easy, and several criteria are to be considered: success-rate, IP quality, speed, the provider's reputation, etc. This is why last year, Kevin and I built ScrapingBee, a web scraping API that takes all the burden of choosing a proxy provider and allows you to scrape any web page with a simple API call. Here is how you would do it with node-fetch:

// npm install node-fetch

const fetch = require('node-fetch');


(async () => {
    const response = await fetch('https://app.scrapingbee.com/api/v1?api_key=<YOUR_API_KEY>&url=https://httpbin.org/ip');
    const body = await response.text();
    console.log(body);
})();

Easy, isn't it?

I hope you learned something new reading this article. If you wish to learn more about web scraping in JavaScript, I really recommend you take a look at this web scraping with NodeJS guide.

You could also take a look at this article about file dowloading with Puppeteer

Happy scraping.

image description
Pierre de Wulf

Pierre is a data engineer who worked in several high-growth startups before co-founding ScrapingBee. He is an expert in data processing and web scraping.