Python is one of the most widely used programming languages for web scraping, and a large chunk of any web scraping task is sending HTTP requests. urllib3 and Requests are the most commonly used packages for this purpose. Naturally, the next question is which one do you use?
In this blog, we briefly introduce both packages, highlighting the differences between urllib3 and Requests, and discuss which one of them is best suited for different scenarios.
Requests
Requests is a hugely popular Python package used for sending HTTP requests. It claims the trademark of “HTTP for Humans”, which pretty much explains everything about it. It positions itself as a user-friendly, quick, and easy package. It easily handles common tasks like encoding query strings, form data, using proxies, sessions, etc. and makes it very convenient to perform common scraping tasks.
Installing and Using Requests
Requests is available on PyPI and can be installed using pip
or any other package management tool:
$ pip install requests
Let’s look at some basic examples of using requests:
>>> import requests
>>> print(requests.get('https://httpbin.org/ip').text)
{
"origin": "171.76.86.73"
}
>>> # Just use a dict to add URL query params
>>> params = {'p1': 'hello world'}
>>> r = requests.get('https://httpbin.org/response-headers', params=params)
>>> print(r.text)
{
"Content-Length": "92",
"Content-Type": "application/json",
"p1": "hello world"
}
>>> # Adding request headers is equally easy
>>> headers = {'User-Agent': 'Mozilla/5.0'}
>>> r = requests.get('https://httpbin.org/headers', headers=headers)
>>> print(r.text)
{
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Host": "httpbin.org",
"User-Agent": "Mozilla/5.0",
"X-Amzn-Trace-Id": "Root=1-683d2362-03edd15a7d18a9f2011775ac"
}
}
Pros and Cons of Requests
Pros:
- Easy to use: As we saw in the examples above, it was very easy to add query parameters and headers. It is as easy to add form data or use proxies too. Even with parsing the responses, requests provides easy helper methods to automatically convert JSON responses to
dict
data types and so on. - Lesser verbosity: Since requests provides convenience methods and features such as authentications and timeouts baked in, code written with requests tends to be much smaller and readable than the corresponding code written with urllib3.
- Battle-tested, with a large community: Requests has a large community (50000+ stars on GitHub) and has been in use for more than a decade. So it is easier to get help online when necessary
Cons:
- Abstracted low-level functions: Many low-level functions, especially those involving SSL certificates, are abstracted away with requests and are slightly more difficult to configure when compared with urllib3.
- No async support: Requests does not support sending HTTP requests in a non-blocking way using the
async
function. This can make it harder to scrape multiple URLs in parallel.
urllib3
urllib3 is a lower-level package used for sending HTTP requests in Python. It is different from urllib and urllib2, which are Python built-in libraries that need no installation. urllib3 is a third-party package that has to be installed before use. The main goal of the project is to add features that are missing from the standard library, such as connection pooling, proxy support, and gzip encoding.
FUN FACT: Requests actually uses urllib3 under the hood.
Installing and Using urllib3
urllib3 can also be installed from PyPI as follows:
$ pip install urllib3
Let’s try repeating the examples we used for requests with urllib3
>>> import urllib3
>>> http = urllib3.PoolManager()
>>> r = http.request('GET', 'https://httpbin.org/ip')
>>> print(r.data.decode('utf-8'))
{
"origin": "171.76.86.73"
}
>>> # here too, adding query params is easy
>>> params = {'p1': 'hello world'}
>>> r = http.request('GET', 'https://httpbin.org/response-headers', fields=params)
>>> print(r.data.decode('utf-8'))
{
"Content-Length": "92",
"Content-Type": "application/json",
"p1": "hello world"
}
>>> # passing headers is similar to requests too
>>> headers = {'User-Agent': 'Mozilla/5.0'}
>>> r = http.request('GET', 'https://httpbin.org/headers', headers=headers)
>>> print(r.data.decode('utf-8'))
{
"headers": {
"Accept-Encoding": "identity",
"Host": "httpbin.org",
"User-Agent": "Mozilla/5.0",
"X-Amzn-Trace-Id": "Root=1-683d3874-64427ce4315bf06b2c5e5b7e"
}
}
The code is very similar, with slightly more verbosity, especially the bit where we have to create the PoolManager
. The documentation on PyPI does mention a snippet where there is no PoolManager
initialization needed:
>>> import urllib3
>>> resp = urllib3.request("GET", "https://httpbin.org/robots.txt")
>>> print(resp.data.decode('utf-8'))
User-agent: *
Disallow: /deny
The above example uses a global pool manager instance provided by the module.
Pros and Cons of urllib3
Pros:
- Low-level access: urllib3 provides a lower-level interface, such as being able to use a pool manager to control a pool of HTTP connections
- Adds Features Over Built-in Libraries: urllib3 bundles features such as proxies and alternate encoding support, which are not present in the standard libraries.
Cons:
- More verbose and less user-friendly: urllib3 is more verbose, for example, requiring us to parse the bytes in the response to UTF-8 strings. Requests just wraps around these minor things, so we can focus more on our task.
- No async support: urllib3 does not support non-blocking HTTP requests using asyncio, and this shortcoming trickles down to Requests too, because Requests uses urllib3 under the hood.
Requests Vs. urllib3: Which One Should You Use?
After briefly going over both libraries, it’s time to look at the main question: which library do we use? Let’s look at making this choice from different aspects:
Performance
Since Requests uses urllib3 under the hood, there is not much of a performance difference between the two libraries. Hence, performance is not a major factor in choosing one of the two.
Features
Requests bundles many high-level features, such as authentication, session management, and cookies out of the box, while urllib3 offers access to low-level features, such as connection pool management and custom TLS handling. For most scraping use-cases, Requests is more convenient.
Community
Requests, which depends on urllib3, has over 50000 stars on GitHub while urllib3 itself has just short of 4000 stars. This is just an indicator of which one is more popularly used. Picking the more popular Requests library enables you to easily get help online as there is more discussion around the tool.
User-friendliness
Requests provides a high-level and more user-friendly experience compared to urllib3. By abstracting away common low-level HTTP operations, it is much easier and faster to get started with Requests, and the code is more readable.
HTTP Version Support And Asynchronous Operation
The urllib3 library does not support asynchronous operations using async/await
and does not support HTTP/2 or HTTP/3. Since Requests depends on urllib3, these limitations are carried over to Requests too. In the context of web scraping, it is useful to be able to asynchronously send requests in parallel, and sometimes you might have to use HTTP/2 or HTTP/3 to mimic what browsers do. So this becomes a concern with either package.
While the developers of urllib3 have requested funds of 40000 USD for just adding HTTP/2 support, a fork of urllib3 named urllib3-future has already added support for HTTP/2, HTTP/3, and async/await
. It acts as a drop-in replacement for urllib3 and offers enhancements on top of that. Similarly, there is a fork of Requests called niquests that builds upon urllib3-future. Finally, with both options available for different HTTP versions and asynchronous operations, the choice goes back to the previous considerations.
Comparison Table
Factor | urllib3 | requests |
---|---|---|
Performance | Good | Good |
GitHub Stars (Popularity measure) | 3.9k | 52.9k |
User-friendliness | Workable | Excellent |
Low-level interface | Available | Not Great |
HTTP/2 & HTTP/3 support | Unavailable | Unavailable |
Async Support | Unavailable | Unavailable |
Conclusion
In this blog, we looked at two commonly used Python packages for sending HTTP requests: urllib3 and Requests. While each of them has its pros and cons, Requests is more suitable for most scraping operations. The popularity of the package also makes it easier to get support when needed.
Since urllib3 is a dependency of Requests, some shortcomings of urllib3, such as the lack of HTTP 2/3 support and async support, are carried over to Requests. Both these packages have been forked to compensate for these features, and we can use the forks to overcome these limitations.