Step-by-step guide on building a custom proxy rotator with Python

Be it web scraping or website testing, the necessity of proxies is undeniable; however, you need a lot of different addresses to avoid IP bans. At the same time, you won’t switch between proxies manually. Proxy rotator is the answer in this situation. In this article, we are building a custom proxy rotator using Python – from scratch. 

Why is it essential to build a good proxy rotator 

A rotator automatically cycles through a proxy pool. It assigns another IP address to every request or at a set time interval. The target server doesn’t see that requests come from the same source, and traffic seems to originate from different addresses and locations, so that you can go unnoticed by anti-bot systems. That’s why building a proxy rotator is at the heart of scraping or multiaccounting. The rotator is responsible for preventing bans and captchas, avoiding anti-bot systems, and overcoming geo-based limitations, especially in large-scale projects with multiple connection threats. 

Also, there are some not-so-obvious advantages of using a proxy rotator. For example, it enhances your privacy and anonymity, as with your IP constantly changing, it is harder for websites and trackers to profile your activities. Hackers are another step further from you, as with your real address hidden, it’s harder for scammers who target particular devices to figure you out. 

Moreover, many websites serve different results based on your location or preferences. By using multiple IPs, you get more diverse and accurate data. Rotation also prevents seeing personalized content if you don’t need it.

Reliable Proxy Rotator: Its Components 

Besides simply using a new IP for every request, you should include some more features, so a proxy rotator really benefits you and doesn’t become a source of problems:

  • Retry logic – in case one proxy doesn’t work, the rotator uses another one;
  • Rotating User-Agent headers, so your requests don’t look like coming from the same source;
  • Handling “bad” HTTP status codes like 403 (“Forbidden”), 429 (“Too many requests”), 500 (“Internal server error”); 
  • Disabling proxies, which keep failing;
  • Adding small delays between requests to avoid triggering anti-bot systems;
  • Logging – so that, in case of a retry, you see the reason for it and which proxy and User-Agent were used;
  • Health check before scraping – to test all proxies before you start and make sure they work;
  • Circuit breaker to disable dead proxies for some time;
  • Sticky sessions per host so that within one session, the domain sees the same IP, in case you need (for multiaccounting or so, when you need to stick to the same session).

We will keep those points in mind while creating our proxy rotator. 

Building a proxy rotator using Python

We will create a proxy rotator in a single file. In our example, we named it proxy_rotator.py. Below, you will find all pieces of code along with an explanation of what they do. You can copy and paste them. We build code without hardcoding data like credentials, so you won’t need to replace anything later. 

Getting started 

  1. Install Python from its official website (or upgrade it). You need version 3.8 or higher. 
  2. Run the following command to install the requests library. We will need it to make HTTP requests. 

pip install requests

      
        
      
  1. Get a list of proxies from the necessary plan’s page. You can manually copy and paste them into a .txt file from your DataImpulse dashboard or download them. We named our file proxies.txt. It should look like that:

http://login:password.host:port
https://login:password.host:port

You can change the proxy format if necessary. 

Imports and setup 

First, we must bring in tools called “modules” that we will use. 


import argparse
import logging
import os
import random
import threading
import time
from dataclasses import dataclass
from typing import Dict, Iterable, List, Optional, Tuple

import requests
from requests import Response

      
        
        
        
        
        
        
        
        
        
        
        
      

request sends web requests, random picks proxies and headers, and threading is responsible for handling multiple connection threads. 

Default settings 

The next part fulfills several crucial functions: it rotates User Agents, switches to another proxy in case of an error code, and checks whether a proxy works. 


DEFAULT_USER_AGENTS: List[str] = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:125.0) Gecko/20100101 Firefox/125.0",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 Edg/124.0.0.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 13_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.5 Safari/605.1.15",
    "Mozilla/5.0 (iPhone; CPU iPhone OS 16_5 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.5 Mobile/15E148 Safari/604.1",
    "Mozilla/5.0 (Linux; Android 13; Pixel 7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Mobile Safari/537.36",
]

DEFAULT_ROTATE_ON_STATUS = {403, 407, 409, 418, 421, 425, 429, 500, 502, 503, 504}
DEFAULT_HEALTHCHECK_URL = "https://httpbin.org/ip"

      
        
        
        
        
        
        
        
        
        
        
        
      

You can replace or add User Agents and use another website for a health check if you want. In our case, we use httpbin.org/ip. Other decent options are https://httpbin.org/get (shows request headers as well as IP), https://api.ipify.org (or https://api.ipify.org?format=json to return your IP in JSON), https://ifconfig.me or https://ifconfig.me/all.json (returns headers + IP in JSON), or http://ip-api.com/json/ (shows country, city, ISP, and IP). 

Proxy object 

This part defines how proxies are stored.


@dataclass(frozen=True)
class Proxy:
    url: str
    label: str = ""
    def to_requests_proxies(self) -> Dict[str, str]:
        return {"http": self.url, "https": self.url}

      
        
        
        
        
        
        
      

Again, you do not have to replace anything here. 

Reading a proxy file 

This part reads proxies from a .txt file. 


def parse_proxy_file(path: str) -> List[Proxy]:
    proxies: List[Proxy] = []
    with open(path, "r", encoding="utf-8") as f:
        for line in f:
            raw = line.strip()
            if not raw or raw.startswith("#"):
                continue
            raw = raw.split(" #", 1)[0].split("\t#", 1)[0].strip()
            proxies.append(Proxy(url=raw))
    if not proxies:
        raise ValueError(f"No proxies found in file: {path}")
    return proxies

      
        
        
        
        
        
        
        
        
        
        
        
        
      

Retry delay 

This piece of code designs a retry logic. With it, a rotator waits longer every time a request fails to prevent throttling and bans.


def _exp_backoff_sleep(base: float, attempt: int, max_sleep: float) -> None:
    sleep_s = min(base * (2 ** attempt) + random.random() * base, max_sleep)
    time.sleep(sleep_s)

      
        
        
        
      

The proxy rotator class 

The heart of your rotator. It picks a proxy, tries requests again, and tracks failures. It also decides on a strategy – whether to use proxies randomly or cycle through the list, sets timeouts and headers, and uses the same proxy for a website. It sends requests, marks proxies as whitelisted or blocked, and runs health checks. 


class ProxyRotator:
    def __init__(
        self,
        proxies: Iterable[Proxy],
        strategy: str = "round_robin",  
        max_retries: int = 3,
        timeout: Tuple[float, float] = (10.0, 30.0),
        rotate_on_status: Optional[Iterable[int]] = None,
        user_agents: Optional[List[str]] = None,
        sticky_per_host: bool = True,
        circuit_threshold: int = 2, 
        circuit_cooldown: float = 60.0,  
        backoff_base: float = 0.5,
        backoff_max: float = 5.0,
        healthcheck_url: Optional[str] = None,
        logger: Optional[logging.Logger] = None,
    ) -> None:
        self.proxies: List[Proxy] = list(proxies)
        if not self.proxies:
            raise ValueError("ProxyRotator requires at least one proxy")
        if strategy not in {"round_robin", "random"}:
            raise ValueError("strategy must be 'round_robin' or 'random'")
        self.strategy = strategy

        self.max_retries = max_retries
        self.timeout = timeout
        self.rotate_on_status = set(rotate_on_status or DEFAULT_ROTATE_ON_STATUS)
        self.user_agents = user_agents or DEFAULT_USER_AGENTS
        self.sticky_per_host = sticky_per_host

        self.circuit_threshold = circuit_threshold
        self.circuit_cooldown = circuit_cooldown
        self.failure_counts: Dict[Proxy, int] = {p: 0 for p in self.proxies}
        self.disabled_until: Dict[Proxy, float] = {p: 0.0 for p in self.proxies}

        self.backoff_base = backoff_base
        self.backoff_max = backoff_max

        self.healthcheck_url = healthcheck_url

        self._local = threading.local()

        self._sticky_map: Dict[str, Proxy] = {}

        self._idx = 0

        self._lock = threading.Lock()

        self.log = logger or logging.getLogger("proxy_rotator")

    def get(self, url: str, **kwargs) -> Response:
        return self.request("GET", url, **kwargs)

    def post(self, url: str, **kwargs) -> Response:
        return self.request("POST", url, **kwargs)

    def request(self, method: str, url: str, **kwargs) -> Response:
        session = getattr(self._local, "session", None)
        if session is None:
            session = requests.Session()
            self._local.session = session

        headers = kwargs.pop("headers", {}) or {}
        headers.setdefault("User-Agent", random.choice(self.user_agents))

        timeout = kwargs.pop("timeout", self.timeout)

        last_exc: Optional[Exception] = None
        last_response: Optional[Response] = None

        for attempt in range(self.max_retries + 1):
            proxy = self._choose_proxy(url)
            if proxy is None:
                raise RuntimeError("No healthy proxies currently available")

            try:
                self.log.debug("Using proxy %s for %s %s", proxy.url, method, url)
                response = session.request(
                    method,
                    url,
                    headers=headers,
                    proxies=proxy.to_requests_proxies(),
                    timeout=timeout,
                    **kwargs,
                )

                last_response = response
                if response.status_code in self.rotate_on_status:
                    self._mark_failure(proxy, f"HTTP {response.status_code}")
                    self._maybe_unstick_host(url, proxy)
                    if attempt < self.max_retries:
                        _exp_backoff_sleep(self.backoff_base, attempt, self.backoff_max)
                        continue
                else:
                    self._mark_success(proxy)
                return response

            except requests.RequestException as exc:  
                last_exc = exc
                self._mark_failure(proxy, repr(exc))
                self._maybe_unstick_host(url, proxy)
                if attempt < self.max_retries:
                    _exp_backoff_sleep(self.backoff_base, attempt, self.backoff_max)
                    continue
                break

        if last_exc:
            raise last_exc
        assert last_response is not None 
        return last_response

    def health_check(self, url: Optional[str] = None, sample: Optional[int] = None) -> Dict[str, bool]:
        check_url = url or self.healthcheck_url or DEFAULT_HEALTHCHECK_URL
        to_test = list(self.proxies)
        if sample is not None:
            to_test = random.sample(to_test, k=min(sample, len(to_test)))

        results: Dict[str, bool] = {}
        for p in to_test:
            try:
                r = requests.get(check_url, proxies=p.to_requests_proxies(), timeout=(5, 10))
                healthy = r.ok
            except Exception:
                healthy = False
            results[p.url] = healthy
            if healthy:
                self._mark_success(p)
            else:
                self._mark_failure(p, "healthcheck")
        return results

    def _choose_proxy(self, url: str) -> Optional[Proxy]:
        host = requests.utils.urlparse(url).hostname or ""
        now = time.time()

        with self._lock:
            if self.sticky_per_host:
                sticky = self._sticky_map.get(host)
                if sticky and self.disabled_until.get(sticky, 0.0) <= now:
                    return sticky

            enabled = [p for p in self.proxies if self.disabled_until.get(p, 0.0) <= now]
            if not enabled:
                return None

            if self.strategy == "round_robin":
                proxy = enabled[self._idx % len(enabled)]
                self._idx = (self._idx + 1) % len(enabled)
            else:  
                proxy = random.choice(enabled)

            if self.sticky_per_host:
                self._sticky_map[host] = proxy
            return proxy

    def _mark_failure(self, proxy: Proxy, reason: str) -> None:
        with self._lock:
            cnt = self.failure_counts.get(proxy, 0) + 1
            self.failure_counts[proxy] = cnt
            if cnt >= self.circuit_threshold:
                self.disabled_until[proxy] = time.time() + self.circuit_cooldown
                self.log.warning("Disabling proxy %s for %.1fs (reason: %s)", proxy.url, self.circuit_cooldown, reason)

    def _mark_success(self, proxy: Proxy) -> None:
        with self._lock:
            self.failure_counts[proxy] = 0
            self.disabled_until[proxy] = 0.0

    def _maybe_unstick_host(self, url: str, proxy: Proxy) -> None:
        if not self.sticky_per_host:
            return
        host = requests.utils.urlparse(url).hostname or ""
        with self._lock:
            if self._sticky_map.get(host) == proxy:
                self._sticky_map.pop(host, None)

      
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
         
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
      

Command-Line Options

You need this piece to run a rotator with a single-line command via terminal.


def build_arg_parser() -> argparse.ArgumentParser:
    p = argparse.ArgumentParser(description="HTTP proxy rotator")
    p.add_argument("--url", required=False, help="Target URL to request (for quick testing)")
    p.add_argument("--method", default="GET", help="HTTP method to use (default: GET)")
    p.add_argument(
        "--proxies",
        help="Path to proxies.txt file (one proxy URL per line). If omitted, reads PROXY_LIST env (comma-separated)",
    )
    p.add_argument("--strategy", choices=["round_robin", "random"], default="round_robin")
    p.add_argument("--retries", type=int, default=3, help="Max retry attempts across proxies")
    p.add_argument(
        "--timeout",
        type=float,
        default=15.0,
        help="Read timeout in seconds (connect timeout is fixed at 10s in this simple CLI)",
    )
    p.add_argument(
        "--rotate-on",
        default=",".join(str(s) for s in sorted(DEFAULT_ROTATE_ON_STATUS)),
        help="Comma-separated status codes that trigger rotation",
    )
    p.add_argument(
        "--no-sticky",
        action="store_true",
        help="Disable sticky-per-host behavior",
    )
    p.add_argument(
        "--healthcheck",
        action="store_true",
        help="Run a healthcheck across proxies and print a summary before the request",
    )
    p.add_argument(
        "--debug",
        action="store_true",
        help="Enable verbose logging",
    )
    return p


def _load_proxies_from_args(args: argparse.Namespace) -> List[Proxy]:
    if args.proxies:
        return parse_proxy_file(args.proxies)
    env = os.getenv("PROXY_LIST", "").strip()
    if env:
        return [Proxy(url=p.strip()) for p in env.split(",") if p.strip()]
    raise SystemExit("No proxies provided. Use --proxies or set PROXY_LIST env.")


def main_cli() -> None:
    parser = build_arg_parser()
    args = parser.parse_args()

    logging.basicConfig(
        level=logging.DEBUG if args.debug else logging.INFO,
        format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
    )
    log = logging.getLogger("proxy_rotator")

    proxies = _load_proxies_from_args(args)

    rotate_on_set = {int(x.strip()) for x in args.rotate_on.split(",") if x.strip()}

    rotator = ProxyRotator(
        proxies=proxies,
        strategy=args.strategy,
        max_retries=args.retries,
        timeout=(10.0, args.timeout),
        rotate_on_status=rotate_on_set,
        user_agents=DEFAULT_USER_AGENTS,
        sticky_per_host=not args.no_sticky,
        circuit_threshold=2,
        circuit_cooldown=60.0,
        backoff_base=0.5,
        backoff_max=5.0,
        healthcheck_url=DEFAULT_HEALTHCHECK_URL,
        logger=log,
    )

    if args.healthcheck:
        results = rotator.health_check()
        total = len(results)
        healthy = sum(1 for ok in results.values() if ok)
        log.info("Healthcheck: %s/%s proxies OK", healthy, total)
        for url, ok in results.items():
            log.info("  %-40s %s", url, "OK" if ok else "BAD")

    if args.url:
        try:
            resp = rotator.request(args.method.upper(), args.url)
            print(f"Status: {resp.status_code}")
            text = resp.text
            preview = text[:500].replace("\n", " ")
            print(f"Body preview (first 500 chars):\n{preview}")
        except Exception as e:
            log.error("Request failed: %s", e)
            raise
    else:
        log.info("No --url provided; nothing else to do. Add --url to test a request.")

      
        
 
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
      

Entry point

It is what starts your rotator. It loads proxies, checks them (if you require it), and makes requests. 


if __name__ == "__main__":
    main_cli()

      
        
        
      

Running a proxy rotator 

You can run your rotator from a command line by using the following command:


python proxy_rotator.py --proxies ./proxies.txt --url https://httpbin.org/ip

      
        
      

Change ./proxies.txt to the actual name of your file with proxies and – – URL to the target URL of a website you want to scrape or test. 

Also, you can add some more flags:

– – strategy random  – so that a rotator will select a proxy from a pool randomly

– – retries 5 – you can set another number of retries if necessary

– – healthcheck – to test proxies beforehand

– – no-sticky – to disable sticky proxies. Use only if you don’t need to hold to the same session

– – debug – you will see detailed logs, which will help you troubleshoot. 

Conclusion 

Building a proxy rotator isn’t long or hard; however, it helps you get the most out of a proxy pool. A good rotator that features retry logic, health check, and time pauses, paired with trustworthy proxies, can get your web scraping or multiaccounting to the next level. As for reliable, ethically sourced IPs, DataImpulse is here to help you. With us, you can get over 90 million IPs from 195 locations to forget about bans and geo-based limits. Contact us at [email protected] or press the “Try now” button to start. 

Jennifer R.

Content Editor

Content Manager at DataImpulse. Jennifer's degree in philology and translation and several years of experience in content writing help her create easy-to-understand copies, even on tangled tech topics. While writing every text, her goal is to provide an in-depth look at the given topic and give answers to all possible questions. Subscribe to our newsletter and always be updated on the best technologies for your business.