A proxy checker in Python verifies the functionality, speed, anonymity, and geographical location of a proxy server by making HTTP requests through it to a known target URL and analyzing the response.
Fundamentals of Proxy Checking
The core of a proxy checker involves sending an HTTP request through a specified proxy and evaluating the outcome. This process typically uses the requests library, which simplifies HTTP interactions in Python.
Basic HTTP Proxy Check
To check a basic HTTP or HTTPS proxy, configure the proxies dictionary in the requests.get() method. A reliable target URL, such as httpbin.org/ip or ipinfo.io/json, is essential. These services return the public IP address from which the request originated, allowing for verification.
import requests
import time
def check_http_proxy(proxy_address: str, timeout: float = 10.0) -> dict:
"""
Checks an HTTP/HTTPS proxy for basic connectivity and response time.
Args:
proxy_address: The proxy string (e.g., "http://user:pass@ip:port").
timeout: Maximum time in seconds to wait for a response.
Returns:
A dictionary with proxy status, response time, and error message if any.
"""
proxies = {
"http": proxy_address,
"https": proxy_address,
}
target_url = "http://httpbin.org/ip" # Use http for httpbin, or https for ipinfo
start_time = time.time()
try:
response = requests.get(target_url, proxies=proxies, timeout=timeout)
response.raise_for_status() # Raise HTTPError for bad responses (4xx or 5xx)
end_time = time.time()
latency = round((end_time - start_time) * 1000, 2) # Latency in ms
origin_ip = response.json().get('origin')
return {
"proxy": proxy_address,
"status": "Alive",
"latency_ms": latency,
"origin_ip": origin_ip,
"error": None
}
except requests.exceptions.Timeout:
return {"proxy": proxy_address, "status": "Timeout", "latency_ms": None, "origin_ip": None, "error": "Request timed out"}
except requests.exceptions.ConnectionError:
return {"proxy": proxy_address, "status": "Dead", "latency_ms": None, "origin_ip": None, "error": "Connection failed"}
except requests.exceptions.HTTPError as e:
return {"proxy": proxy_address, "status": "Error", "latency_ms": None, "origin_ip": None, "error": f"HTTP Error: {e}"}
except Exception as e:
return {"proxy": proxy_address, "status": "Error", "latency_ms": None, "origin_ip": None, "error": f"An unexpected error occurred: {e}"}
# Example usage:
# print(check_http_proxy("http://203.0.113.45:8080"))
# print(check_http_proxy("https://198.51.100.22:443"))
Handling Different Proxy Types
The requests library natively supports HTTP and HTTPS proxies. For SOCKS proxies (SOCKS4 and SOCKS5), an additional dependency requests[socks] is required. Install it using pip install requests[socks]. The proxy URL scheme then changes to socks5:// or socks4://.
| Proxy Type | Scheme for requests |
Description |
|---|---|---|
| HTTP | http:// |
Standard HTTP proxy. |
| HTTPS | https:// |
Standard HTTPS proxy. |
| SOCKS4 | socks4:// |
SOCKS protocol version 4. |
| SOCKS5 | socks5:// |
SOCKS protocol version 5, supports UDP and authentication. |
# Example for SOCKS5 proxy
def check_socks_proxy(proxy_address: str, timeout: float = 10.0) -> dict:
"""
Checks a SOCKS proxy. Requires 'requests[socks]' to be installed.
"""
proxies = {
"http": f"socks5://{proxy_address}",
"https": f"socks5://{proxy_address}",
}
target_url = "http://httpbin.org/ip"
start_time = time.time()
try:
response = requests.get(target_url, proxies=proxies, timeout=timeout)
response.raise_for_status()
end_time = time.time()
latency = round((end_time - start_time) * 1000, 2)
origin_ip = response.json().get('origin')
return {
"proxy": proxy_address,
"status": "Alive",
"latency_ms": latency,
"origin_ip": origin_ip,
"error": None
}
except requests.exceptions.Timeout:
return {"proxy": proxy_address, "status": "Timeout", "latency_ms": None, "origin_ip": None, "error": "Request timed out"}
except requests.exceptions.ConnectionError:
return {"proxy": proxy_address, "status": "Dead", "latency_ms": None, "origin_ip": None, "error": "Connection failed"}
except Exception as e:
return {"proxy": proxy_address, "status": "Error", "latency_ms": None, "origin_ip": None, "error": f"SOCKS error: {e}"}
# Example usage:
# print(check_socks_proxy("user:pass@192.0.2.10:1080"))
Assessing Proxy Quality
Beyond basic connectivity, a proxy's quality is defined by its speed, anonymity, and geographical location.
Speed and Latency
Latency is measured by the time elapsed between sending the request and receiving the first byte of the response. The time.time() function can be used to capture start and end times, with the difference indicating latency. It is typically reported in milliseconds.
# The latency calculation is already included in the check_http_proxy and check_socks_proxy functions.
# latency = round((end_time - start_time) * 1000, 2) # Latency in ms
Anonymity Levels
Proxy anonymity refers to how effectively a proxy conceals the client's original IP address and identity. This is determined by analyzing specific HTTP headers that the proxy might add or modify. Services like httpbin.org/headers or ipinfo.io/json can be used to inspect these headers.
- Elite/High Anonymity: The proxy hides the client's IP address and does not send any identifying headers like
X-Forwarded-For,Via, orProxy-Connection. The target server sees only the proxy's IP. - Anonymous: The proxy hides the client's IP address but may send headers like
ViaorX-Forwarded-Forwith the proxy's IP, indicating the use of a proxy. The client's original IP is not directly exposed. - Transparent: The proxy forwards the client's IP address in headers like
X-Forwarded-For. The target server knows the request came from a proxy and can identify the original client's IP.
To check anonymity, send a request to a URL that echoes back the request headers, then inspect the response. You also need to know your own public IP address (without a proxy) for comparison.
import requests
def get_my_ip():
"""Returns the client's public IP address without a proxy."""
try:
response = requests.get("http://httpbin.org/ip", timeout=5)
response.raise_for_status()
return response.json().get('origin')
except requests.exceptions.RequestException:
return None
def check_anonymity(proxy_address: str, my_ip: str, timeout: float = 10.0) -> str:
"""
Checks the anonymity level of a proxy.
"""
proxies = {
"http": proxy_address,
"https": proxy_address,
}
target_url = "http://httpbin.org/headers" # Returns all request headers
try:
response = requests.get(target_url, proxies=proxies, timeout=timeout)
response.raise_for_status()
headers = response.json().get('headers', {})
# Check for X-Forwarded-For, Via, Proxy-Connection headers
if 'X-Forwarded-For' in headers and headers['X-Forwarded-For'] == my_ip:
return "Transparent"
if 'Via' in headers or 'X-Forwarded-For' in headers:
return "Anonymous"
# If the origin IP (from httpbin.org/ip) is different from my_ip and no identifying headers are present.
# This requires a separate check to httpbin.org/ip through the proxy.
# For a more precise check, one would compare the 'origin' IP from httpbin.org/ip through proxy
# with my_ip and then check headers.
# Simplified check: if no common identifying headers, assume Elite (requires careful validation)
return "Elite"
except requests.exceptions.RequestException:
return "Unknown (Proxy Dead or Error)"
# Example usage:
# my_public_ip = get_my_ip()
# if my_public_ip:
# print(f"My IP: {my_public_ip}")
# print(check_anonymity("http://203.0.113.45:8080", my_public_ip))
| Anonymity Level | X-Forwarded-For |
Via |
Proxy-Connection |
|---|---|---|---|
| Elite | Not Present | Not Present | Not Present |
| Anonymous | May be present (proxy's IP) | May be present | May be present |
| Transparent | Present (client's IP) | May be present | May be present |
Geolocation
Geolocation identifies the physical location (country, city, region) of the proxy server. Services like ipinfo.io/json provide this information in a JSON response based on the request's origin IP.
import requests
def get_proxy_geolocation(proxy_address: str, timeout: float = 10.0) -> dict:
"""
Retrieves geolocation data for a proxy.
"""
proxies = {
"http": proxy_address,
"https": proxy_address,
}
target_url = "https://ipinfo.io/json" # ipinfo.io is good for geolocation
try:
response = requests.get(target_url, proxies=proxies, timeout=timeout)
response.raise_for_status()
data = response.json()
return {
"country": data.get('country'),
"region": data.get('region'),
"city": data.get('city'),
"org": data.get('org'),
"timezone": data.get('timezone')
}
except requests.exceptions.RequestException:
return {"country": None, "region": None, "city": None, "org": None, "timezone": None}
# Example usage:
# geo_data = get_proxy_geolocation("http://203.0.113.45:8080")
# print(geo_data)
Building a Concurrent Checker
Checking proxies sequentially is inefficient for large lists. Concurrency, using multithreading, allows checking multiple proxies simultaneously. Python's concurrent.futures.ThreadPoolExecutor simplifies this.
import concurrent.futures
import csv
import time
# Assume check_http_proxy, check_anonymity, get_proxy_geolocation are defined above
def comprehensive_proxy_check(proxy_address: str, my_ip: str, timeout: float = 10.0) -> dict:
"""
Performs a comprehensive check for a single proxy.
"""
result = check_http_proxy(proxy_address, timeout) # Basic connectivity and latency
if result["status"] == "Alive":
# Only proceed with anonymity and geolocation if proxy is alive
anonymity = check_anonymity(proxy_address, my_ip, timeout)
geolocation = get_proxy_geolocation(proxy_address, timeout)
result.update({"anonymity": anonymity})
result.update(geolocation)
else:
result.update({"anonymity": None, "country": None, "region": None, "city": None, "org": None, "timezone": None})
return result
def run_concurrent_checker(proxy_list_file: str, output_file: str, max_workers: int = 10, timeout: float = 10.0):
"""
Reads proxies from a file, checks them concurrently, and writes results to CSV.
"""
proxies_to_check = []
try:
with open(proxy_list_file, 'r') as f:
for line in f:
proxy = line.strip()
if proxy:
proxies_to_check.append(proxy)
except FileNotFoundError:
print(f"Error: Proxy list file '{proxy_list_file}' not found.")
return
my_ip = get_my_ip()
if not my_ip:
print("Could not determine client's public IP. Anonymity checks might be inaccurate.")
results = []
with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
future_to_proxy = {executor.submit(comprehensive_proxy_check, proxy, my_ip, timeout): proxy for proxy in proxies_to_check}
for future in concurrent.futures.as_completed(future_to_proxy):
proxy_address = future_to_proxy[future]
try:
result = future.result()
results.append(result)
print(f"Checked {proxy_address}: Status={result['status']}, Latency={result['latency_ms']}ms, Anonymity={result['anonymity']}")
except Exception as exc:
print(f"Proxy {proxy_address} generated an exception: {exc}")
results.append({"proxy": proxy_address, "status": "Error", "error": str(exc)})
# Write results to CSV
if results:
fieldnames = list(results[0].keys())
with open(output_file, 'w', newline='') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(results)
print(f"Results written to {output_file}")
else:
print("No proxies checked or no results to write.")
# Example usage:
# Create a file named 'proxies.txt' with one proxy per line, e.g.:
# http://user:pass@192.0.2.1:8080
# socks5://user:pass@198.51.100.2:1080
# http://203.0.113.3:80
# run_concurrent_checker('proxies.txt', 'proxy_results.csv', max_workers=20, timeout=15)
Robust Error Handling and Best Practices
- Specific Exceptions: Catch specific
requests.exceptions(e.g.,Timeout,ConnectionError,HTTPError) to provide granular feedback. A generalExceptioncatch should be a last resort. - User-Agent: Include a
User-Agentheader in requests to mimic a browser and avoid being blocked by target services.
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'} - Target URL Selection: Choose reliable, fast, and publicly available services that provide the necessary information (IP, headers, geolocation). Avoid hammering a single service to prevent IP bans.
- Input/Output: Read proxy lists from text files (one proxy per line) and write results to structured formats like CSV or JSON for easy analysis and integration.
- Proxy Authentication: Ensure proxy URLs are correctly formatted for authentication (e.g.,
http://user:pass@ip:port). - Retries: For production-grade checkers, consider implementing retry logic for transient errors. The
requestslibrary can be extended withrequests.adapters.HTTPAdapterandurllib3.util.retry.Retryfor this purpose.