
In this Article
Be it web scraping or website testing, the necessity of proxies is undeniable; however, you need a lot of different addresses to avoid IP bans. At the same time, you won’t switch between proxies manually. Proxy rotator is the answer in this situation. In this article, we are building a custom proxy rotator using Python – from scratch.
Why is it essential to build a good proxy rotator
A rotator automatically cycles through a proxy pool. It assigns another IP address to every request or at a set time interval. The target server doesn’t see that requests come from the same source, and traffic seems to originate from different addresses and locations, so that you can go unnoticed by anti-bot systems. That’s why building a proxy rotator is at the heart of scraping or multiaccounting. The rotator is responsible for preventing bans and captchas, avoiding anti-bot systems, and overcoming geo-based limitations, especially in large-scale projects with multiple connection threats.
Also, there are some not-so-obvious advantages of using a proxy rotator. For example, it enhances your privacy and anonymity, as with your IP constantly changing, it is harder for websites and trackers to profile your activities. Hackers are another step further from you, as with your real address hidden, it’s harder for scammers who target particular devices to figure you out.
Moreover, many websites serve different results based on your location or preferences. By using multiple IPs, you get more diverse and accurate data. Rotation also prevents seeing personalized content if you don’t need it.
Reliable Proxy Rotator: Its Components
Besides simply using a new IP for every request, you should include some more features, so a proxy rotator really benefits you and doesn’t become a source of problems:
- Retry logic – in case one proxy doesn’t work, the rotator uses another one;
- Rotating User-Agent headers, so your requests don’t look like coming from the same source;
- Handling “bad” HTTP status codes like 403 (“Forbidden”), 429 (“Too many requests”), 500 (“Internal server error”);
- Disabling proxies, which keep failing;
- Adding small delays between requests to avoid triggering anti-bot systems;
- Logging – so that, in case of a retry, you see the reason for it and which proxy and User-Agent were used;
- Health check before scraping – to test all proxies before you start and make sure they work;
- Circuit breaker to disable dead proxies for some time;
- Sticky sessions per host so that within one session, the domain sees the same IP, in case you need (for multiaccounting or so, when you need to stick to the same session).
We will keep those points in mind while creating our proxy rotator.
Building a proxy rotator using Python
We will create a proxy rotator in a single file. In our example, we named it proxy_rotator.py. Below, you will find all pieces of code along with an explanation of what they do. You can copy and paste them. We build code without hardcoding data like credentials, so you won’t need to replace anything later.
Getting started
- Install Python from its official website (or upgrade it). You need version 3.8 or higher.
- Run the following command to install the requests library. We will need it to make HTTP requests.
pip install requests
- Get a list of proxies from the necessary plan’s page. You can manually copy and paste them into a .txt file from your DataImpulse dashboard or download them. We named our file proxies.txt. It should look like that:
http://login:password.host:port
https://login:password.host:port
You can change the proxy format if necessary.
Imports and setup
First, we must bring in tools called “modules” that we will use.
import argparse
import logging
import os
import random
import threading
import time
from dataclasses import dataclass
from typing import Dict, Iterable, List, Optional, Tuple
import requests
from requests import Response
request sends web requests, random picks proxies and headers, and threading is responsible for handling multiple connection threads.
Default settings
The next part fulfills several crucial functions: it rotates User Agents, switches to another proxy in case of an error code, and checks whether a proxy works.
DEFAULT_USER_AGENTS: List[str] = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:125.0) Gecko/20100101 Firefox/125.0",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 Edg/124.0.0.0",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 13_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.5 Safari/605.1.15",
"Mozilla/5.0 (iPhone; CPU iPhone OS 16_5 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.5 Mobile/15E148 Safari/604.1",
"Mozilla/5.0 (Linux; Android 13; Pixel 7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Mobile Safari/537.36",
]
DEFAULT_ROTATE_ON_STATUS = {403, 407, 409, 418, 421, 425, 429, 500, 502, 503, 504}
DEFAULT_HEALTHCHECK_URL = "https://httpbin.org/ip"
You can replace or add User Agents and use another website for a health check if you want. In our case, we use httpbin.org/ip. Other decent options are https://httpbin.org/get (shows request headers as well as IP), https://api.ipify.org (or https://api.ipify.org?format=json to return your IP in JSON), https://ifconfig.me or https://ifconfig.me/all.json (returns headers + IP in JSON), or http://ip-api.com/json/ (shows country, city, ISP, and IP).
Proxy object
This part defines how proxies are stored.
@dataclass(frozen=True)
class Proxy:
url: str
label: str = ""
def to_requests_proxies(self) -> Dict[str, str]:
return {"http": self.url, "https": self.url}
Again, you do not have to replace anything here.
Reading a proxy file
This part reads proxies from a .txt file.
def parse_proxy_file(path: str) -> List[Proxy]:
proxies: List[Proxy] = []
with open(path, "r", encoding="utf-8") as f:
for line in f:
raw = line.strip()
if not raw or raw.startswith("#"):
continue
raw = raw.split(" #", 1)[0].split("\t#", 1)[0].strip()
proxies.append(Proxy(url=raw))
if not proxies:
raise ValueError(f"No proxies found in file: {path}")
return proxies
Retry delay
This piece of code designs a retry logic. With it, a rotator waits longer every time a request fails to prevent throttling and bans.
def _exp_backoff_sleep(base: float, attempt: int, max_sleep: float) -> None:
sleep_s = min(base * (2 ** attempt) + random.random() * base, max_sleep)
time.sleep(sleep_s)
The proxy rotator class
The heart of your rotator. It picks a proxy, tries requests again, and tracks failures. It also decides on a strategy – whether to use proxies randomly or cycle through the list, sets timeouts and headers, and uses the same proxy for a website. It sends requests, marks proxies as whitelisted or blocked, and runs health checks.
class ProxyRotator:
def __init__(
self,
proxies: Iterable[Proxy],
strategy: str = "round_robin",
max_retries: int = 3,
timeout: Tuple[float, float] = (10.0, 30.0),
rotate_on_status: Optional[Iterable[int]] = None,
user_agents: Optional[List[str]] = None,
sticky_per_host: bool = True,
circuit_threshold: int = 2,
circuit_cooldown: float = 60.0,
backoff_base: float = 0.5,
backoff_max: float = 5.0,
healthcheck_url: Optional[str] = None,
logger: Optional[logging.Logger] = None,
) -> None:
self.proxies: List[Proxy] = list(proxies)
if not self.proxies:
raise ValueError("ProxyRotator requires at least one proxy")
if strategy not in {"round_robin", "random"}:
raise ValueError("strategy must be 'round_robin' or 'random'")
self.strategy = strategy
self.max_retries = max_retries
self.timeout = timeout
self.rotate_on_status = set(rotate_on_status or DEFAULT_ROTATE_ON_STATUS)
self.user_agents = user_agents or DEFAULT_USER_AGENTS
self.sticky_per_host = sticky_per_host
self.circuit_threshold = circuit_threshold
self.circuit_cooldown = circuit_cooldown
self.failure_counts: Dict[Proxy, int] = {p: 0 for p in self.proxies}
self.disabled_until: Dict[Proxy, float] = {p: 0.0 for p in self.proxies}
self.backoff_base = backoff_base
self.backoff_max = backoff_max
self.healthcheck_url = healthcheck_url
self._local = threading.local()
self._sticky_map: Dict[str, Proxy] = {}
self._idx = 0
self._lock = threading.Lock()
self.log = logger or logging.getLogger("proxy_rotator")
def get(self, url: str, **kwargs) -> Response:
return self.request("GET", url, **kwargs)
def post(self, url: str, **kwargs) -> Response:
return self.request("POST", url, **kwargs)
def request(self, method: str, url: str, **kwargs) -> Response:
session = getattr(self._local, "session", None)
if session is None:
session = requests.Session()
self._local.session = session
headers = kwargs.pop("headers", {}) or {}
headers.setdefault("User-Agent", random.choice(self.user_agents))
timeout = kwargs.pop("timeout", self.timeout)
last_exc: Optional[Exception] = None
last_response: Optional[Response] = None
for attempt in range(self.max_retries + 1):
proxy = self._choose_proxy(url)
if proxy is None:
raise RuntimeError("No healthy proxies currently available")
try:
self.log.debug("Using proxy %s for %s %s", proxy.url, method, url)
response = session.request(
method,
url,
headers=headers,
proxies=proxy.to_requests_proxies(),
timeout=timeout,
**kwargs,
)
last_response = response
if response.status_code in self.rotate_on_status:
self._mark_failure(proxy, f"HTTP {response.status_code}")
self._maybe_unstick_host(url, proxy)
if attempt < self.max_retries:
_exp_backoff_sleep(self.backoff_base, attempt, self.backoff_max)
continue
else:
self._mark_success(proxy)
return response
except requests.RequestException as exc:
last_exc = exc
self._mark_failure(proxy, repr(exc))
self._maybe_unstick_host(url, proxy)
if attempt < self.max_retries:
_exp_backoff_sleep(self.backoff_base, attempt, self.backoff_max)
continue
break
if last_exc:
raise last_exc
assert last_response is not None
return last_response
def health_check(self, url: Optional[str] = None, sample: Optional[int] = None) -> Dict[str, bool]:
check_url = url or self.healthcheck_url or DEFAULT_HEALTHCHECK_URL
to_test = list(self.proxies)
if sample is not None:
to_test = random.sample(to_test, k=min(sample, len(to_test)))
results: Dict[str, bool] = {}
for p in to_test:
try:
r = requests.get(check_url, proxies=p.to_requests_proxies(), timeout=(5, 10))
healthy = r.ok
except Exception:
healthy = False
results[p.url] = healthy
if healthy:
self._mark_success(p)
else:
self._mark_failure(p, "healthcheck")
return results
def _choose_proxy(self, url: str) -> Optional[Proxy]:
host = requests.utils.urlparse(url).hostname or ""
now = time.time()
with self._lock:
if self.sticky_per_host:
sticky = self._sticky_map.get(host)
if sticky and self.disabled_until.get(sticky, 0.0) <= now:
return sticky
enabled = [p for p in self.proxies if self.disabled_until.get(p, 0.0) <= now]
if not enabled:
return None
if self.strategy == "round_robin":
proxy = enabled[self._idx % len(enabled)]
self._idx = (self._idx + 1) % len(enabled)
else:
proxy = random.choice(enabled)
if self.sticky_per_host:
self._sticky_map[host] = proxy
return proxy
def _mark_failure(self, proxy: Proxy, reason: str) -> None:
with self._lock:
cnt = self.failure_counts.get(proxy, 0) + 1
self.failure_counts[proxy] = cnt
if cnt >= self.circuit_threshold:
self.disabled_until[proxy] = time.time() + self.circuit_cooldown
self.log.warning("Disabling proxy %s for %.1fs (reason: %s)", proxy.url, self.circuit_cooldown, reason)
def _mark_success(self, proxy: Proxy) -> None:
with self._lock:
self.failure_counts[proxy] = 0
self.disabled_until[proxy] = 0.0
def _maybe_unstick_host(self, url: str, proxy: Proxy) -> None:
if not self.sticky_per_host:
return
host = requests.utils.urlparse(url).hostname or ""
with self._lock:
if self._sticky_map.get(host) == proxy:
self._sticky_map.pop(host, None)
Command-Line Options
You need this piece to run a rotator with a single-line command via terminal.
def build_arg_parser() -> argparse.ArgumentParser:
p = argparse.ArgumentParser(description="HTTP proxy rotator")
p.add_argument("--url", required=False, help="Target URL to request (for quick testing)")
p.add_argument("--method", default="GET", help="HTTP method to use (default: GET)")
p.add_argument(
"--proxies",
help="Path to proxies.txt file (one proxy URL per line). If omitted, reads PROXY_LIST env (comma-separated)",
)
p.add_argument("--strategy", choices=["round_robin", "random"], default="round_robin")
p.add_argument("--retries", type=int, default=3, help="Max retry attempts across proxies")
p.add_argument(
"--timeout",
type=float,
default=15.0,
help="Read timeout in seconds (connect timeout is fixed at 10s in this simple CLI)",
)
p.add_argument(
"--rotate-on",
default=",".join(str(s) for s in sorted(DEFAULT_ROTATE_ON_STATUS)),
help="Comma-separated status codes that trigger rotation",
)
p.add_argument(
"--no-sticky",
action="store_true",
help="Disable sticky-per-host behavior",
)
p.add_argument(
"--healthcheck",
action="store_true",
help="Run a healthcheck across proxies and print a summary before the request",
)
p.add_argument(
"--debug",
action="store_true",
help="Enable verbose logging",
)
return p
def _load_proxies_from_args(args: argparse.Namespace) -> List[Proxy]:
if args.proxies:
return parse_proxy_file(args.proxies)
env = os.getenv("PROXY_LIST", "").strip()
if env:
return [Proxy(url=p.strip()) for p in env.split(",") if p.strip()]
raise SystemExit("No proxies provided. Use --proxies or set PROXY_LIST env.")
def main_cli() -> None:
parser = build_arg_parser()
args = parser.parse_args()
logging.basicConfig(
level=logging.DEBUG if args.debug else logging.INFO,
format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
)
log = logging.getLogger("proxy_rotator")
proxies = _load_proxies_from_args(args)
rotate_on_set = {int(x.strip()) for x in args.rotate_on.split(",") if x.strip()}
rotator = ProxyRotator(
proxies=proxies,
strategy=args.strategy,
max_retries=args.retries,
timeout=(10.0, args.timeout),
rotate_on_status=rotate_on_set,
user_agents=DEFAULT_USER_AGENTS,
sticky_per_host=not args.no_sticky,
circuit_threshold=2,
circuit_cooldown=60.0,
backoff_base=0.5,
backoff_max=5.0,
healthcheck_url=DEFAULT_HEALTHCHECK_URL,
logger=log,
)
if args.healthcheck:
results = rotator.health_check()
total = len(results)
healthy = sum(1 for ok in results.values() if ok)
log.info("Healthcheck: %s/%s proxies OK", healthy, total)
for url, ok in results.items():
log.info(" %-40s %s", url, "OK" if ok else "BAD")
if args.url:
try:
resp = rotator.request(args.method.upper(), args.url)
print(f"Status: {resp.status_code}")
text = resp.text
preview = text[:500].replace("\n", " ")
print(f"Body preview (first 500 chars):\n{preview}")
except Exception as e:
log.error("Request failed: %s", e)
raise
else:
log.info("No --url provided; nothing else to do. Add --url to test a request.")
Entry point
It is what starts your rotator. It loads proxies, checks them (if you require it), and makes requests.
if __name__ == "__main__":
main_cli()
Running a proxy rotator
You can run your rotator from a command line by using the following command:
python proxy_rotator.py --proxies ./proxies.txt --url https://httpbin.org/ip
Change ./proxies.txt to the actual name of your file with proxies and – – URL to the target URL of a website you want to scrape or test.
Also, you can add some more flags:
– – strategy random – so that a rotator will select a proxy from a pool randomly
– – retries 5 – you can set another number of retries if necessary
– – healthcheck – to test proxies beforehand
– – no-sticky – to disable sticky proxies. Use only if you don’t need to hold to the same session
– – debug – you will see detailed logs, which will help you troubleshoot.
Conclusion
Building a proxy rotator isn’t long or hard; however, it helps you get the most out of a proxy pool. A good rotator that features retry logic, health check, and time pauses, paired with trustworthy proxies, can get your web scraping or multiaccounting to the next level. As for reliable, ethically sourced IPs, DataImpulse is here to help you. With us, you can get over 90 million IPs from 195 locations to forget about bans and geo-based limits. Contact us at [email protected] or press the “Try now” button to start.