Coding February 15, 2020 • 6 min read
Writing a Proxy Server Rotation Script in Python
Learn how to rotate proxy servers in Python to avoid IP blocking and rate limits when web scraping.
When performing web scraping at scale, you often encounter IP blocking and rate limiting. One effective solution is to rotate through multiple proxy servers to distribute requests across different IP addresses.
Why Use Proxy Rotation?
- Avoid IP blocking: Distribute requests across multiple IPs
- Bypass rate limits: Rotate proxies to stay under rate limits
- Geographic targeting: Use proxies from specific locations
- Anonymity: Hide your real IP address
Basic Proxy Rotation Script
Here’s a simple Python script to rotate through a list of proxies:
import requests
import random
from itertools import cycle
# List of proxy servers
proxies = [
{'http': 'http://proxy1:port', 'https': 'https://proxy1:port'},
{'http': 'http://proxy2:port', 'https': 'https://proxy2:port'},
{'http': 'http://proxy3:port', 'https': 'https://proxy3:port'},
]
# Create a cycle iterator for round-robin rotation
proxy_pool = cycle(proxies)
def get_session():
"""Get a requests session with a proxy"""
session = requests.Session()
proxy = next(proxy_pool)
session.proxies.update(proxy)
return session
# Usage
for i in range(10):
session = get_session()
try:
response = session.get('https://example.com', timeout=10)
print(f"Request {i+1} successful with proxy")
except Exception as e:
print(f"Request {i+1} failed: {e}")
Advanced Proxy Rotation with Error Handling
import requests
import random
import time
from itertools import cycle
class ProxyRotator:
def __init__(self, proxies):
self.proxies = proxies
self.proxy_pool = cycle(proxies)
self.failed_proxies = []
def get_proxy(self):
"""Get next proxy from pool"""
return next(self.proxy_pool)
def mark_failed(self, proxy):
"""Mark a proxy as failed"""
if proxy not in self.failed_proxies:
self.failed_proxies.append(proxy)
def make_request(self, url, max_retries=3):
"""Make request with proxy rotation and retry logic"""
for attempt in range(max_retries):
proxy = self.get_proxy()
try:
response = requests.get(
url,
proxies=proxy,
timeout=10,
headers={'User-Agent': 'Mozilla/5.0'}
)
if response.status_code == 200:
return response
except Exception as e:
print(f"Proxy {proxy} failed: {e}")
self.mark_failed(proxy)
time.sleep(1)
raise Exception("All proxies failed")
# Usage
proxies = [
{'http': 'http://proxy1:port', 'https': 'https://proxy1:port'},
{'http': 'http://proxy2:port', 'https': 'https://proxy2:port'},
]
rotator = ProxyRotator(proxies)
response = rotator.make_request('https://example.com')
Random Proxy Selection
Instead of round-robin, you can randomly select proxies:
import requests
import random
proxies = [
{'http': 'http://proxy1:port', 'https': 'https://proxy1:port'},
{'http': 'http://proxy2:port', 'https': 'https://proxy2:port'},
]
def get_random_proxy():
return random.choice(proxies)
# Usage
proxy = get_random_proxy()
response = requests.get('https://example.com', proxies=proxy)
Proxy Authentication
If your proxies require authentication:
import requests
proxy = {
'http': 'http://username:password@proxy:port',
'https': 'https://username:password@proxy:port'
}
response = requests.get('https://example.com', proxies=proxy)
Best Practices
- Test proxies before use: Verify proxies are working
- Handle failures gracefully: Remove dead proxies from rotation
- Add delays: Don’t overwhelm target servers
- Use user agents: Rotate user agents along with proxies
- Monitor success rates: Track which proxies work best
Resources
This proxy rotation approach is essential for large-scale web scraping projects where you need to avoid detection and rate limiting.
python web-scraping proxy automation