How to Use a Proxy with Python Requests?

22 April 2024 (updated) | 9 min read

In this article, we examine how to use the Python Requests library behind a proxy server. Developers use proxies for anonymity, security, and sometimes will even use more than one to prevent websites from banning their IP addresses. Proxies also carry several other benefits such as bypassing filters and censorship. Feel free to learn more about rotating proxies before continuing, but let's get started!

cover image

πŸ’‘ ScrapingBee and proxies

Did you know that ScrapingBee has a native proxy mode? You just authenticate with your API key and optional request parameters and ScrapingBee takes care of everything else. Sign up for a free account and enjoy the first 1,000 scraping requests completely on the house.

Prerequisites & Installation

This article is intended for those who would like to scrape behind a proxy in Python. To get the most out of the material, it is beneficial to:

βœ… Have experience with Python 3 🐍.

βœ… Python 3 installed on your local machine.

Check if the python-requests package is installed by opening the terminal and typing:

$ pip freeze

pip freeze will display all your current python packages and their versions, so go ahead and check if it is present. If not, install it by running:

$ pip install requests

How to Use a Proxy with Requests

Routing your requests via a proxy is quite straightforward with Requests. You simply pass a Python dictionary with the relevant proxy addresses to the usual request methods, and Requests does all the rest:

  1. To use a proxy in Python, first import the requests package.

  2. Next, create a proxies dictionary that defines the HTTP and HTTPS connections. This variable should be a dictionary that maps a protocol to the proxy URL. Additionally, declare a url variable set to the webpage you're scraping from.

Notice in the example below, the dictionary defines the proxy URL for two separate protocols: HTTP and HTTPS. The dictionary defines separate entries for each protocol, but this does not mean that the two cannot point to the same proxy address.

  1. Lastly, run your Requests call and save the response in the response variable. The important part here is to pass the proxy dictionary to the request method.
import requests

proxies = {
   'http': 'http://proxy.example.com:8080',
   'https': 'http://secureproxy.example.com:8090',
}

url = 'http://mywebsite.com/example'

response = requests.post(url, proxies=proxies)

With this code, a POST request is sent to the specified URL, using the proxy provided in the proxies dictionary matching the protocol scheme of our URL. As we passed an HTTP URL, http://proxy.example.com:8080 will be picked.

Setting Proxies with Environment Variables 🌱

In addition to configuring proxies with a Python dictionary, Requests also supports the standard proxy environment variables HTTP_PROXY and HTTPS_PROXY. This comes particularly in handy when you have a number of different Python scripts and want to globally set a proxy, without the need to touch the Python code individually.

Simply set the following environment variables (like with the dictionary setup, HTTP and HTTPS are configured separately) and Python will route requests automatically via these proxies.

HTTP_PROXY='http://10.10.10.10:8000'
HTTPS_PROXY='http://10.10.10.10:1212'

Don't forget to use export when on Unix.

Proxy Authentication πŸ‘©β€πŸ’»

Some proxies (especially paid services) require you to provide proxy authentication credentials using Basic Authentication. As indicated by RFC 1738, you simply specify the relevant credentials in the URL before the hostname:

http://[USERNAME]:[PASSWORD]@[HOST]

For example, extending our previous Python dictionary to authenticate as "bob" with the password "alice", would give us this:

proxies = {
   'http': 'http://bob:alice@proxy.example.com:8080',
   'https': 'http://bob:alice@secureproxy.example.com:8090',
}

The same syntax also applies to environment variables:

HTTP_PROXY='http://bob:alice@10.10.10.10:8000'
HTTPS_PROXY='http://bob:alice@10.10.10.10:1212'

Reading Responses πŸ“–

While not specific to proxies, it's also always a good thing to know how to obtain the data you actually requested.

Getting the response as text is rather straightforward with the text property of the response object:

If you would like to read your data:

response = requests.get(url)
text_resp = response.text

If you have a JSON response, you can also use the json() method to get a pre-parsed Python object with the data from the response:

response = requests.get(url)
json_resp = response.json()

If it's binary data, you can use content to get hold of the body as byte stream:

response = requests.get(url)
response.content

Requests Session with Proxies πŸ•’

Individual web requests alone often don't cut it and you also need to keep track of sessions (especially when a site requires a login). That's where Request's Session class comes to the rescue and allows you to retain network connections and session cookies.

Enabling a session object to support proxies is quite straightforward by simply setting the proxies field, as the following example shows:

import requests
import time

HN_USER = 'YOUR_USERNAME'
HN_PASS = 'YOUR_PASSWORD'

session = requests.Session()

response = session.get('https://news.ycombinator.com/submit')
print("Login prompt present: {}".format("logged in" in response.text))

session.post('https://news.ycombinator.com/login', { 'goto': 'news', 'acct': HN_USER, 'pw': HN_PASS });

# wait a few seconds
time.sleep(5)

response = session.get('https://news.ycombinator.com/submit')
print("Login prompt present: {}".format("logged in" in response.text))

Here, we use once more our favorite site Hacker News for a login example and first request the submission page (which requires a valid user session). Because we are not logged in yet, it tells us to do so and show a sign-in screen. That's exactly what we do with the next call to the session post() method, where we provide the credentials we previously configured. Once post() returns, we should have a valid session cookie in our session object and can now request the submission page again and should not be prompted any more.

When running our code we should get the following output:

Login prompt present: True
Login prompt present: False

Rotating Proxies with Requests

Remember how we said some developers use more than one proxy? Well, now you can too!

Anytime you find yourself scraping from a webpage repeatedly, it's good practice to use more than one proxy, because should the proxy get blocked you'll be back to square one and run into the same fundamental issue as if you hadn't used a proxy to begin with. The scraping cancel culture is real! So, to avoid being canceled, it's best to regularly rotate your list of proxies.

To rotate proxies, you first need to have a pool of proxy IPs available. You can use free proxies found on the internet or commercial solutions. In most cases, if your service relies on scraped data a free proxy will most likely not be enough.

How to Rotate IPs with Requests

In order to start rotating proxies IP addresses, you need a list of free proxies. In the case where free proxies do fit your scrapping needs, here you can find a list of free proxies. Today you'll be writing a script that chooses and rotates through proxies.

  1. First, import the Requests, BeautifulSoup, and choice libraries.

  2. Next, define a method get_proxy() that will be responsible for retrieving IP addresses for you to use. In this method you will define your url as whatever proxy list resources you choose to use. After sending a request api call, convert the response into a Beautiful Soup object to make extraction easier. Use the html5lib parser library to parse the website’s HTML, as you would for a browser. Create a proxy variable that uses choice to randomly choose an IP address from the list of proxies generated by soup. Within the map function, you can use a lambda function to convert the HTML element into text for both retrieved IP addresses and port numbers.

  3. Create a proxy_request method that takes in 3 arguments: the request_type, the url, and **kwargs. Inside this method, define your proxy dictionary as the proxy returned from the get_proxy method. Similiar to before, you'll use the requests, passing in your arguments.

import requests
import random

ip_addresses = [ "http://mysuperproxy.com:5000", "http://mysuperproxy.com:5001", "http://mysuperproxy.com:5100", "http://mysuperproxy.com:5010", "http://mysuperproxy.com:5050", "http://mysuperproxy.com:8080", "http://mysuperproxy.com:8001", "http://mysuperproxy.com:8000", "http://mysuperproxy.com:8050" ]

def proxy_request(request_type, url, **kwargs):
   while True:
      try:
         proxy = random.randint(0, len(ip_addresses) - 1)
         proxies = {"http": ip_addresses[proxy], "https": ip_addresses[proxy]}

         response = requests.request(request_type, url, proxies=proxies, timeout=5, **kwargs)
         print(f"Proxy currently being used: {proxy['https']}")
         break
      except Exception as err:
         print("Error, looking for another proxy")

   return response

proxy_request('GET', 'http://example.com')

You can now scrape and rotate all at once!πŸŒ€

Use ScrapingBee's Proxy Mode

Believe it or not, there is another free alternative that makes scraping behind a proxy even easier! That alternative is ScrapingBee's Proxy Mode, a proxy interface to the ScrapingBee scraping API. 🐝

  1. Create a free account on ScrapingBee. Once logged on, you can see your account information, including your API Key. *And not to mention 1000 free API credits! 🍯😍

  2. Run the following script, passing your API key as the proxy username and the API parameters as the proxy password. You can omit the proxy password if the default API parameters suit your needs.:

# Install the Python Requests library:
# pip install requests
import requests

def send_request():
    proxies = {
        "http": "http://YOUR_SCRAPINGBEE_API_KEY:render_js=False&premium_proxy=True@proxy.scrapingbee.com:8886",
        "https": "https://YOUR_SCRAPINGBEE_API_KEY:render_js=False&premium_proxy=True@proxy.scrapingbee.com:8887"
    }

    response = requests.get(
        url="http://httpbin.org/headers?json",
        proxies=proxies,
        verify=False
    )
    print('Response HTTP Status Code: ', response.status_code)
    print('Response HTTP Response Body: ', response.content)

send_request()

Remember that if you want to use proxy mode, your code must be configured not to verify the SSL certificate. In this case, it would be verify=False since you are working with Python Requests.

That's all there is to sending successful HTTP requests! When you use ScrapingBee's Proxy Mode, you no longer need to deal with proxy rotation manually, we take care of everything for you. 😎

Conclusion

While it might be tempting to start scraping right away with your fancy new proxies, there are still a few key things you should know. For starters, **not all proxies are the same. There are actually different types, with the three main being:

  • transparent proxies
  • anonymous proxies
  • elite proxies

The difference between these proxy types is basically how well they shield the fact that you are using a proxy or whether they are transparent about it. As such, a transparent proxy will will be very upfront and forward your IP address to the site you are scraping. That's of course not ideal 😨. Anonymous proxies are already a notch stealthier and do not divulge the original address, but they still send proxy headers to the site, which make it obvious that a proxy is involved. Elite proxies provide the highest level of abstraction and completely hide the fact that you are using a proxy.

ℹ️ Proxies can help a lot with possible IP restrictions, but you still need to pay attention to request throttling, user agent management, and anti-bot measures. If you prefer to focus more on the content than that bureaucracy, please also check out ScrapingBee's scraping platform, as we designed it with all these obstacles in mind and strive to provide as much a seamless scraping experience as possible. The first 1,000 request are always free.

Now that we have that all cleared up, it's time to start web scraping with a proxy in Python. So, get on out there and make all the requests you can dream up!πŸ’­

image description