How To Set Up A Rotating Proxy in Selenium with Python

08 April 2024 | 10 min read

Selenium is a popular browser automation library that allows you to control headless browsers programmatically. However, even with Selenium, your script can still be identified as a bot and your IP address can be blocked. This is where Selenium proxies come in.

A proxy acts as a middleman between the client and server. When a client makes a request through a proxy, the proxy forwards it to the server. This makes detecting and blocking your IP harder for the target site.

This article will explore setting up a rotating proxy in your Python Selenium script, authentication configuration, and error and timeout handling. We'll also discuss the best alternatives to rotating proxies in Python selenium.

Without further ado, let's get started!

How to set up a rotating proxy in Selenium

TL:DR Selenium Rotating Proxy quick start code

If you're in a hurry, here's the code we'll be writing in this article. However, to follow along smoothly, make sure to install selenium and selenium-wire using the following commands first.

pip install selenium==4.17.2
pip install selenium-wire==5.1.0

Ensure that you replace the IP address and port number with your own and fill in your credentials correctly if required by your proxy server.

from selenium.webdriver.common.by import By
from seleniumwire import webdriver as wiredriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
from selenium.common.exceptions import TimeoutException
from urllib3.exceptions import ProtocolError
import random
import time

def rotate_proxy():
    # List of proxy IP addresses and ports
    proxy_pool = ["191.96.100.33:3155",
                  "167.86.115.218:8888", "20.205.61.143:80"]

    # Chrome options for headless browsing
    chrome_options = Options()
    chrome_options.add_argument("--headless")

    # Number of retries for proxy rotation
    retries = 3
    for _ in range(retries):
        random_proxy = random.choice(proxy_pool)

        # Set up proxy authentication
        proxy_username = "xyz"
        proxy_password = "<secret-password>"

        # Proxy options for both HTTP and HTTPS connections
        proxy_options = {
            "http": f"http://{proxy_username}:{proxy_password}@{random_proxy}",
            "https": f"https://{proxy_username}:{proxy_password}@{random_proxy}",
        }
        try:
            # Initialize Chrome driver with Selenium-Wire, using the random proxy
            driver = wiredriver.Chrome(
                service=ChromeService(ChromeDriverManager().install()),
                seleniumwire_options={"proxy": proxy_options},
                chrome_options=chrome_options,
            )
            # Visit a test site to verify the proxy connection
            driver.get("http://httpbin.org/ip")
            print(driver.find_element(By.TAG_NAME, "body").text)

            driver.quit()
            break  # Proxy connection successful, exit loop
        except (TimeoutException, ProtocolError) as e:
            # Handle timeout or protocol error
            print(f"Error occurred: {e}")
            print(f"Retrying... ({retries - 1} retries left)")
            retries -= 1
            if retries == 0:
                print("Maximum retries reached. Exiting...")
                break
            time.sleep(1)
        finally:
            # Ensure the driver is closed even if an exception occurs
            if "driver" in locals():
                driver.quit()

if __name__ == "__main__":
    rotate_proxy()

How to Set up Rotating Proxies in Selenium

During web scraping, I discovered that a single proxy's IP address can be blocked after a certain amount of activity. To avoid this and ensure efficient large-scale scraping, using a pool of proxies and continuously switching between them is necessary.

This constant change in IP address makes it hard for the server to identify and block you. By appearing like a new user each time, you can bypass restrictions and continue scraping effectively. This is the power of proxy rotation!

Prerequisites

Before you start, make sure you meet all the following requirements:

  1. Download the latest version of Python from the official website. For this blog, we’re using Python 3.12.2.
  2. Choose a code editor like Visual Studio Code, PyCharm, or Jupyter Notebook.
  3. Install the selenium and selenium-wire libraries. Selenium helps you automate web browser interaction, while selenium-wire makes it very easy to use proxies with Selenium.

Steps for Rotating Proxy

Let's see how proxy rotation is an excellent choice for scenarios where you need to avoid frequent IP-based restrictions by changing your IP address. Follow the steps below to set up rotating proxies.

Step 1. Choose a Reliable Proxy Provider

Select a reliable proxy provider that offers a list of rotating proxies. These proxies will assign a new IP address for each request or after a certain time interval.

Step 2. Obtain Credentials from the Proxy Provider

Obtain the necessary credentials from your chosen proxy provider. These credentials include the IP address, port, username (if applicable), and password to connect to the proxy server.

Step 3. Verify Connection (Optional)

Once you have the credentials, you can use tools like cURL or Python libraries such as requests to verify that you can connect to the proxy server and receive responses.

Step 4. Configure Chrome Options

Set up Chrome options by adding the --proxy-server argument and passing a random proxy from your working proxies list to this argument. After this, initialize the Chrome WebDriver with the configured options.

Step 5. Access the URL

Visit a test URL to verify that the WebDriver is using the proxy correctly. Print out the response to confirm the IP address associated with the proxy.

Great! You might not have understood the above steps clearly. Don't worry, let's see these steps in action in the further sections.

Adding Rotating Proxy to Selenium

When using Selenium, you can add a rotating proxy by defining a list of proxies and randomly selecting one for each web page visit. However, be cautious with free proxies, as they are often unreliable and short-lived. If you choose to proceed with free proxies, you'll need to test them individually to identify working ones for your web scraping tasks.

This is exactly what we do in the following code. The getProxies function will find all the relevant proxies according to set conditions, and the testProxy function will test and find the working proxies. Now in the rotateProxy function, we are using the --proxy-server argument and passing a random proxy from the list of working proxies.

Here’s the code:

from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
import random
from selenium.webdriver.common.by import By
import concurrent.futures
import requests
from bs4 import BeautifulSoup

def getProxies():
    r = requests.get("https://free-proxy-list.net/")
    soup = BeautifulSoup(r.content, "html.parser")
    table = soup.find("tbody")

    proxies = []
    for row in table.find_all("tr"):
        columns = row.find_all("td")
        if columns[4].text.strip() == "elite proxy":
            proxy = f"{columns[0].text}:{columns[1].text}"
            proxies.append(proxy)
    return proxies

def testProxy(proxy):
    try:
        r = requests.get(
            "https://httpbin.org/ip", proxies={"http": proxy, "https": proxy}, timeout=5
        )
        r.raise_for_status()  # Raises HTTPError if the response status code is >= 400
        return proxy
    except requests.exceptions.RequestException:
        return None

def rotateProxy(working_proxies):
    if not working_proxies:
        print("No working proxies found.")
        return
    random_proxy = random.choice(working_proxies)
    print(f"Rotating to proxy: {random_proxy}")

    options = Options()
    options.add_argument("--headless")
    options.add_argument(f"--proxy-server={random_proxy}")

    driver = webdriver.Chrome(
        service=ChromeService(ChromeDriverManager().install()),
        options=options,
    )

    driver.get("http://httpbin.org/ip")
    print(driver.find_element(By.TAG_NAME, "body").text)

    driver.quit()

def main():
    proxies = getProxies()
    working_proxies = []
    with concurrent.futures.ThreadPoolExecutor() as executor:
        results = executor.map(testProxy, proxies)
        for result in results:
            if result is not None:
                working_proxies.append(result)
    num_working_proxies = len(working_proxies)
    print(f"Found {num_working_proxies} working proxies.")

    rotateProxy(working_proxies)

if __name__ == "__main__":
    main()

Here’s the result:

proxy rotated successfully

We successfully confirmed that Selenium is using a random proxy by matching the rotated IP address with the one from the web page response.

By rotating proxies, I was able to scrape data from various websites at scale without getting IP banned.

Here are some guidelines to help you determine how often you should rotate your proxies:

  1. Adapting to Anti-Scraping Measures: If a website has more aggressive anti-scraping measures, you should rotate your proxies more frequently, ideally with every request or every few requests. However, you can rotate your proxies every few minutes for websites with less strict anti-scraping measures.
  2. Proxy Pool Size: The size of your proxy pool also determines how often you should rotate your proxies. If you’ve a larger pool of high-quality proxies, you can rotate them less frequently. But if you’ve a smaller pool, you’ll need to rotate them more often to avoid reusing the same ones.
  3. Data Volume and Complexity: If you're scraping a large amount of data or performing complex tasks, you may need to rotate your proxies more frequently to avoid triggering anti-scraping mechanisms.

Configuring Authentication

Some proxy servers require authentication, restricting access only to users with valid credentials. This ensures that only authorized users can connect to the server. This is typically the case with commercial proxy services or premium proxies. The proxy URL in these cases will look something like this:

<PROXY_PROTOCOL>://<USERNAME>:<PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>

Note that when using a URL in the --proxy-server command, the Chrome driver ignores the username and password by default. However, there is a third-party plugin called Selenium Wire that can help with this issue. It provides advanced features such as proxy management with authentication, request interception, and modification, which can help resolve this problem.

First, you need to install Selenium Wire using pip:

pip install selenium-wire

Update your scraper to use the seleniumwire webdriver instead of the default selenium webdriver.

from seleniumwire import webdriver as wiredriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
import random

def rotateProxy():
    proxy_pool = [
        "191.96.100.33:3155",
        "146.190.53.175:32782",
        "167.86.115.218:8888",
        "20.205.61.143:80",
    ]
    chrome_options = Options()
    chrome_options.add_argument("--headless")

    random_proxy = random.choice(proxy_pool)

    # Set up selenium-wire with the proxy

    proxy_username = "xyz"
    proxy_password = "<secret-password>"
    proxy_options = {
        "http": f"http://{proxy_username}:{proxy_password}@{random_proxy}",
        "https": f"https://{proxy_username}:{proxy_password}@{random_proxy}",
    }
    driver = wiredriver.Chrome(
        service=ChromeService(ChromeDriverManager().install()),
        seleniumwire_options={"proxy": proxy_options},
        chrome_options=chrome_options,
    )

    driver.get("http://httpbin.org/ip")
    print(driver.find_element(By.TAG_NAME, "body").text)

    driver.quit()

if __name__ == "__main__":
    rotateProxy()

When you first run the code using Selenium Wire, you might encounter an error similar to the one described below.

chrome certificate error

To resolve this, you need to install the certificate in Chrome. You can extract the certificate using the command. Learn more about it here.

python -m seleniumwire extractcert

Note: If your credentials are invalid, the proxy server will respond with a [407: Proxy Authentication Required](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/407) error, and your Python Selenium script will fail with an ERR_HTTP_RESPONSE_CODE_FAILURE error. Ensure you use valid username and password credentials.

Handling Errors and Timeouts

While scraping web pages using rotating proxies, you need to handle invalid proxies and timeouts to prevent your program from crashing. Here are the steps to follow:

  1. Wrap your scraping code in a try-catch block. This catches any errors that might occur during the scraping process, including those related to invalid proxies or timeouts
  2. Implement a retry mechanism for failed proxy connections. If a connection attempt fails, try with another proxy from your pool up to a certain number of attempts.
  3. Set appropriate timeouts for different operations, like page navigation and HTTP requests.

Here's the code for handling errors and retrying failed proxy connections (up to 3 times).

from selenium.webdriver.common.by import By
from seleniumwire import webdriver as wiredriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
from selenium.common.exceptions import TimeoutException
from urllib3.exceptions import ProtocolError
import random
import time

def rotate_proxy():
    proxy_pool = [
        "191.96.100.33:3155",
        "146.190.53.175:32782",
        "167.86.115.218:8888",
        "20.205.61.143:80",
    ]
    chrome_options = Options()
    chrome_options.add_argument("--headless")

    retries = 3
    for _ in range(retries):
        random_proxy = random.choice(proxy_pool)

        # Set up selenium-wire with the proxy
        proxy_username = "xyz"
        proxy_password = "<secret-password>"
        proxy_options = {
            "http": f"http://{proxy_username}:{proxy_password}@{random_proxy}",
            "https": f"https://{proxy_username}:{proxy_password}@{random_proxy}",
        }
        try:
            driver = wiredriver.Chrome(
                service=ChromeService(ChromeDriverManager().install()),
                seleniumwire_options={"proxy": proxy_options},
                chrome_options=chrome_options,
            )
            driver.set_page_load_timeout(10)  # Set a timeout for page loading
            driver.get("http://httpbin.org/ip")
            print(driver.find_element(By.TAG_NAME, "body").text)

            driver.quit()
            break  # Proxy connection successful, exit loop
        except (TimeoutException, ProtocolError) as e:
            print(f"Error occurred: {e}")
            print(f"Retrying... ({retries - 1} retries left)")
            retries -= 1
            if retries == 0:
                print("Maximum retries reached. Exiting...")
                break
            time.sleep(1)
        finally:
            if "driver" in locals():
                driver.quit()

if __name__ == "__main__":
    rotate_proxy()

Note: The free proxies used in this blog are unreliable, short-lived, and can quickly become outdated. However, we’ll explore a better alternative.

Alternatives to rotating proxies in Python Selenium

To simplify your web scraper and achieve scalability, you might want to get rid of the infrastructure headaches and just focus on the data extraction. ScrapingBee API offers a solution that allows you to scrape the target page with just one API call.

ScrapingBee offers a fresh pool of proxies that can handle even the most challenging websites. To use this pool, you simply need to add stealth_proxy=True to your API calls. The ScrapingBee Python SDK makes it easier to interact with ScrapingBee's API.

Don't forget to replace "Your_ScrapingBee_API_Key" with your actual API key, which you can retrieve from here.

Before using an SDK, we’ll have to install the SDK. And we can do that using this command:

pip install scrapingbee

Here’s the quick start code:

from scrapingbee import ScrapingBeeClient

client = ScrapingBeeClient(
    api_key="Your_ScrapingBee_API_Key"
)

response = client.get(
    "https://author.today/",
    params={
        "stealth_proxy": "True",
    },
)

print(response.status_code)

The snippet would return a status code of 200. Fantastic!

💡Interested in rotating proxies in other languages, check out our guide on Rotating proxies in Puppeteer.

Wrapping Up

You've learned about Selenium proxies, how to rotate them with Selenium, how to handle authenticated proxies, and how to deal with invalid proxies and timeouts. Finally, you've explored why free proxies are a bad idea and learned about alternative solutions.

image description
Satyam Tripathi

Satyam is a senior technical writer who is passionate about web scraping, automation, and data engineering. He has delivered over 130 blog posts since 2021.