Contents

HTTP headers with axios

Kevin Sahin Kevin Sahin

Kevin worked in the web scraping industry for 10 years before co-founding ScrapingBee. He is also the author of the Java Web Scraping Handbook.

Axios for web scraping

There has been quite a lot of debate for a long time in the Javascript community as to which HTTP client is the best when it comes to ease of use, among them, Axios would definitely rank among the top 3 for a lot of developers. This article will show you how to use axios to make HTTP requests and pass HTTP headers with your requests. We will also take a close look at how HTTP headers work and why they are important.

Why would you want to use axios over node-fetch?

node-fetch is the server-side implementation of the Fetch API specification that attempts to standardize what it means to make an HTTP request and the various definitions of the terms involved. axios is an HTTP client like node-fetch, but because axios has been around for a lot longer, a lot of developers tend to use axios over fetch. A few interesting reasons as to why you might want to use axios over node-fetch might be:

  • Helpful utilities like creating interceptors or an instance: While it is possible to write your own code to create interceptors or a reusable instance with node-fetch, it is more or less an extra effort when compared to using the built-in features that axios provides.
  • Aborting requests and timeouts: node-fetch and the browser fetch attempt to solve aborting requests (therefore canceling a promise) by using what is known as an AbortController. Using the AbortController is quite verbose as opposed to the API that axios provides. Also, timeouts have to be implemented by hand when using node-fetch.
  • Automatic data transformation - axios transforms your POST request body to a string for example, without being explicitly told to, unlike node-fetch.

Sending HTTP headers with axios

Sending HTTP headers when making HTTP requests is a very common task. axios provides an API that is very similar to node-fetch sending HTTP headers. There are namely two ways to make HTTP requests in axios, one is to provide a config object to axios(). The second one is to use the request method aliases that axios provide that would follow a general syntax of axios.<method>() to which you would pass a url and a config object as arguments. The latter tends to be used more often as it is intuitive.

For the purpose of this tutorial, we will make use of the JSON placeholder API.

const axios = require("axios");

const config = {
	headers: {
		"Referer": "https://www.scrapingbee.com/",
		"Referrer-Policy": "strict-origin-when-cross-origin"
	},
};

axios
	.get("https://jsonplaceholder.typicode.com/todos/1", config)
	.then((response) => console.log(response.data))
	.catch((error) => console.log(error.response));

Calling axios.get or axios returns a promise that resolves to a response object that has a schema like this. data is one of the properties in this object, and quite literally contains the data that the server responded with. If, for some reason, there was some kind of error, then the promise will reject with an error object that contains a response property which again follows the schema that was mentioned earlier.

The config object has a schema like this, and allows you to do far more than just sending HTTP headers. You probably saw that we make use of the Referer and Referer-Policy headers. Headers like Referer are imperative to implement features that depend on knowing where traffic is coming from like logging or analytics.

However, do beware though Referer (yes, it does not have a double r) can introduce privacy risks in situations where you send sensitive data alongside the referrer URL. A very good way to counter issues like this is to use a Referrer-Policy and of course, designing your application sensibly. Read more about Referer and Referrer-Policy here.

Similar to sending an HTTP GET request, if you wanted to send for example a POST request, you would either use the respective request method alias with your POST data and config object or just pass a config object to axios() with the method property set to POST:

const axios = require("axios");

const newPost = {
	title: "ScrapingBee is awesome!",
	body: "Learning to set HTTP headers...",
	userId: 1,
};

const config = {
	headers: {
		"Content-type": "application/json; charset=UTF-8",
	},
};

axios
	.post(
		"https://jsonplaceholder.typicode.com/posts",
		newPost,
		config
	)
	.then((response) => console.log(response.data))
	.catch((error) => console.log(error.response));

Sending the appropriate HTTP headers along with its HTTP request is important and at times is a security requirement, as these headers give critical insight to what is being sent to the server which helps to identify and extract information out of the HTTP request properly.

One such example is sending the correct Content-Type header. The browser, for example, takes a look at the Content-Type header to know what exactly to do with the data that has been received. You can read more about the importance of Content-Type and how browsers evolved to counter security issues related to it here.

What exactly are HTTP headers?

HTTP headers are quite simply additional information that you pass along with your HTTP request or response so that the client or the server that receives it can handle the request or response properly. There are different types of HTTP headers and the most common way to group them is by looking at their intended context:

  • General Headers - Headers common to both requests and responses, and has nothing to do with the actual data that has been sent or received.
  • Request Headers - Contains critical information about the client that requested it and on what resources are being requested.
  • Response Headers - Contains any additional information related to where and what data is being sent.
  • Entity Headers - Contains information about the resource in question.

You can read more about HTTP headers, other ways to classify them, and the various HTTP headers available that exist at the MDN documentation.

Making your HTTP request look browser-like

It is imperative, in the process of scraping the web, to make your requests look as authentic and non-bot-like as possible. One of the most common, easy, and effective ways to do so is to make use of the User-Agent header. The User-Agent header is a string that dictates the operating system, vendor, and the user agent itself. The user agent can be simply thought of as an application that you would use to perform an HTTP request.

Some of the most common user-agents are:

  • Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36
  • Mozilla/5.0 (iPhone; CPU iPhone OS 13_4_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Mobile/15E148 Safari/604.1
  • Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0

You can take a look at a more verbose list of User-Agent headers here. Keep in mind however to not scrape the list, or at least if you do, to make sure to not abuse the service as the website identifies that they have had issues with improperly written bots.

While user agents are one of the easiest ways to mock an actual browser, there is usually more work involved if you want to increase your chance to pass off as an actual browser. A couple more headers like the:

  • Accept-Language header - declares what language the client or the server making the request is capable of understanding.
  • Accept-Encoding header - declares what kind of compression algorithm to use on the content.

can really improve your chances.

Sending headers like that along with your HTTP request makes it more likely for the request to pass off as a real request made by an authentic browser. However though, as techniques like browser fingerprinting becomes widely adopted each day, your efforts may just not suffice. Browser fingerprinting does however also rely on headers to a certain aspect, but also makes use of cookies and drawing a canvas (called canvas fingerprinting) to uniquely identify a user.

Conclusion

Understanding how to make use of the right combination of the various headers available to you is critical to send browser-like HTTP requests using an HTTP client like axios. However, there are many different ways that a service may thwart your attempts at scraping its resources such as rate-limiting your requests, making you solve captchas, or blacklist your IP address.

ScrapingBee takes all of that out of your hands, and most importantly it's free to try for up to 1000 requests. Give it a go and enjoy an unstoppable web scraping experience!

Resources

  • Axios Github Repo - Contains documentation and a plethora of examples on how to make good use of the various APIs that axios provides to make your life easier.
  • JSON placeholder Documentation - JSON placeholder is a perfect service to test your HTTP requests against without having to set up a server of your own.