Using Proxies in Ruby: Net::HTTP, Mechanize & GProxy

In Ruby, Net::HTTP and Mechanize facilitate proxy usage by allowing direct configuration of proxy host, port, and authentication credentials during object initialization or connection establishment. These libraries enable Ruby applications to route network requests through intermediary proxy servers, supporting use cases such as IP rotation, geo-targeting, and circumventing rate limits.

Net::HTTP Proxy Configuration

Net::HTTP is Ruby's standard library for making HTTP requests. It provides direct control over connection parameters, including proxy settings.

Direct Proxy Parameters

To configure a proxy for a Net::HTTP request, specify the proxy host, port, username, and password when creating the Net::HTTP object or when starting the connection.

require 'net/http'
require 'uri'

# Proxy details
proxy_host = 'your_proxy_host.com'
proxy_port = 8080
proxy_user = 'proxy_username'
proxy_pass = 'proxy_password'

# Target URL
uri = URI('http://example.com/data')

# Method 1: Specify proxy parameters in Net::HTTP.new
http = Net::HTTP.new(uri.host, uri.port, proxy_host, proxy_port, proxy_user, proxy_pass)
http.use_ssl = (uri.scheme == 'https') # Required for HTTPS

request = Net::HTTP::Get.new(uri.request_uri)

begin
  response = http.request(request)
  puts "Method 1 Response Status: #{response.code}"
  # puts response.body
rescue Net::HTTPClientException => e
  puts "HTTP Error: #{e.message}"
rescue Net::ReadTimeout, Net::OpenTimeout => e
  puts "Timeout Error: #{e.message}"
rescue StandardError => e
  puts "An error occurred: #{e.message}"
ensure
  http.finish if http.started? # Ensure connection is closed
end

# Method 2: Specify proxy parameters in Net::HTTP.start block
uri_https = URI('https://api.ipify.org?format=json') # A simple endpoint to check IP

Net::HTTP.start(uri_https.host, uri_https.port,
                proxy_host, proxy_port, proxy_user, proxy_pass,
                use_ssl: uri_https.scheme == 'https') do |http_with_proxy|

  request_https = Net::HTTP::Get.new(uri_https.request_uri)
  response_https = http_with_proxy.request(request_https)
  puts "Method 2 Response Status: #{response_https.code}"
  puts "External IP via proxy: #{response_https.body}" # Should show proxy's IP
rescue Net::HTTPClientException => e
  puts "HTTP Error: #{e.message}"
rescue Net::ReadTimeout, Net::OpenTimeout => e
  puts "Timeout Error: #{e.message}"
rescue StandardError => e
  puts "An error occurred: #{e.message}"
end

Environment Variables

Net::HTTP can automatically detect proxy settings from environment variables. This approach is suitable for system-wide or application-wide proxy configuration without modifying code.

http_proxy: For HTTP requests (e.g., http://user:pass@proxy.example.com:8080)
https_proxy: For HTTPS requests (e.g., https://user:pass@proxy.example.com:8080)
no_proxy: A comma-separated list of hostnames that should bypass the proxy.

require 'net/http'
require 'uri'

# Set environment variables (example, typically done outside the script)
# ENV['http_proxy'] = 'http://proxy_username:proxy_password@your_proxy_host.com:8080'
# ENV['https_proxy'] = 'http://proxy_username:proxy_password@your_proxy_host.com:8080' # Note: https_proxy can also be an http proxy
# ENV['no_proxy'] = 'localhost,127.0.0.1'

uri = URI('http://example.com/data')
http = Net::HTTP.new(uri.host, uri.port) # No explicit proxy parameters
request = Net::HTTP::Get.new(uri.request_uri)

# If http_proxy/https_proxy are set, Net::HTTP will use them.
begin
  response = http.request(request)
  puts "Env Var Response Status: #{response.code}"
rescue StandardError => e
  puts "An error occurred: #{e.message}"
end

# Clear environment variables after use if set programmatically
# ENV.delete('http_proxy')
# ENV.delete('https_proxy')

Mechanize Proxy Configuration

Mechanize is a Ruby gem that simplifies web scraping by emulating a web browser. It builds upon Net::HTTP and offers a higher-level API for handling proxies.

Direct Proxy Parameters

Mechanize allows setting proxy details during agent initialization or by calling the set_proxy method.

require 'mechanize'

# Proxy details
proxy_host = 'your_proxy_host.com'
proxy_port = 8080
proxy_user = 'proxy_username'
proxy_pass = 'proxy_password'

# Method 1: Initialize Mechanize with proxy parameters
agent = Mechanize.new do |a|
  a.set_proxy(proxy_host, proxy_port, proxy_user, proxy_pass)
  a.user_agent_alias = 'Mac Safari' # Recommended for web scraping
end

begin
  page = agent.get('http://example.com/')
  puts "Method 1 Mechanize Page Title: #{page.title}"
rescue Mechanize::ResponseCodeError => e
  puts "Mechanize HTTP Error: #{e.response_code} - #{e.page.uri}"
rescue Mechanize::Error => e
  puts "Mechanize Error: #{e.message}"
rescue StandardError => e
  puts "An error occurred: #{e.message}"
end

# Method 2: Pass proxy parameters directly to Mechanize.new (simpler for one-off)
agent_direct = Mechanize.new(proxy_addr: proxy_host,
                             proxy_port: proxy_port,
                             proxy_user: proxy_user,
                             proxy_pass: proxy_pass)
agent_direct.user_agent_alias = 'Linux Firefox'

begin
  page_direct = agent_direct.get('https://api.ipify.org?format=json')
  puts "Method 2 Mechanize External IP via proxy: #{page_direct.body}"
rescue Mechanize::ResponseCodeError => e
  puts "Mechanize HTTP Error: #{e.response_code} - #{e.page.uri}"
rescue Mechanize::Error => e
  puts "Mechanize Error: #{e.message}"
rescue StandardError => e
  puts "An error occurred: #{e.message}"
end

Proxy Rotation with Mechanize

For scenarios requiring frequent IP changes, such as large-scale data collection, proxy rotation is essential. Mechanize's set_proxy method can be called multiple times to change the proxy during an agent's lifecycle.

require 'mechanize'

proxies = [
  { host: 'proxy1.example.com', port: 8080, user: 'user1', pass: 'pass1' },
  { host: 'proxy2.example.com', port: 8080, user: 'user2', pass: 'pass2' },
  # ... more proxies
]

agent = Mechanize.new
agent.user_agent_alias = 'Windows Chrome'

proxies.each_with_index do |p, i|
  puts "Using proxy #{i+1}: #{p[:host]}"
  agent.set_proxy(p[:host], p[:port], p[:user], p[:pass])
  begin
    page = agent.get('https://api.ipify.org?format=json')
    puts "Current IP: #{page.body.strip}"
    sleep(2) # Pause to avoid overwhelming the target
  rescue Mechanize::ResponseCodeError => e
    puts "Error with proxy #{p[:host]}: #{e.response_code}"
  rescue Mechanize::Error, StandardError => e
    puts "Connection error with proxy #{p[:host]}: #{e.message}"
  end
end

Comparison: Net::HTTP vs. Mechanize for Proxies

Feature	Net::HTTP	Mechanize
Level of Abstraction	Low-level HTTP client. Direct socket control.	High-level web scraping library. Emulates browser.
Proxy Configuration	Constructor arguments or `Net::HTTP.start`.	`set_proxy` method or `Mechanize.new` options.
Ease of Use	More verbose for complex tasks.	Simpler for navigating websites, form submission.
Automatic Features	None beyond basic HTTP.	Cookie handling, redirects, JavaScript interpretation (limited), user-agent management.
Error Handling	`Net::HTTP` exceptions (e.g., `Net::OpenTimeout`).	`Mechanize::Error`, `Mechanize::ResponseCodeError`.
Best for	Simple API calls, specific HTTP/S requests.	Web scraping, browser automation, complex navigation.
Dependency	Standard library.	Gem (`mechanize`).

Proxy Types and Limitations

Net::HTTP (and by extension Mechanize) primarily supports HTTP and HTTPS proxy types. These proxies forward HTTP/HTTPS requests.

HTTP Proxies: Used for unencrypted HTTP traffic.
HTTPS Proxies (CONNECT): Used for encrypted HTTPS traffic. Net::HTTP establishes a CONNECT tunnel through the proxy to the target host.
SOCKS Proxies: Net::HTTP does not natively support SOCKS proxies (SOCKS4, SOCKS5). To use SOCKS proxies in Ruby, an external gem like socks-ruby is required. This gem can integrate with Net::HTTP by overriding its socket creation.

# Example of SOCKS proxy usage with socks-ruby (requires 'socks-ruby' gem)
# gem install socks-ruby
require 'net/http'
require 'socks-ruby'
require 'uri'

socks_proxy_host = 'your_socks_proxy.com'
socks_proxy_port = 1080
socks_proxy_user = 'socks_user'
socks_proxy_pass = 'socks_pass'

uri = URI('https://api.ipify.org?format=json')

# Override Net::HTTP's socket creation
Net::HTTP.class_eval do
  def connect
    if proxy_address
      # Existing HTTP/HTTPS proxy logic
      super
    elsif ENV['SOCKS_PROXY'] # Custom environment variable for SOCKS
      socks_uri = URI(ENV['SOCKS_PROXY'])
      socks_socket = Socks::HTTP.new(socks_uri.host, socks_uri.port,
                                     socks_uri.user, socks_uri.password)
      @socket = socks_socket.connect(@address, @port)
      @socket.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_NODELAY, 1)
      if use_ssl?
        @socket = Net::HTTP::SSL_SOCKET_CLASS.new(@socket, read_timeout: @read_timeout)
        @socket.sync_close = true
        @socket.connect
      end
    else
      super
    end
  end
end

# Set a custom environment variable for SOCKS proxy
ENV['SOCKS_PROXY'] = "socks5://#{socks_proxy_user}:#{socks_proxy_pass}@#{socks_proxy_host}:#{socks_proxy_port}"

http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true # Essential for HTTPS
http.verify_mode = OpenSSL::SSL::VERIFY_PEER # Recommended for security

request = Net::HTTP::Get.new(uri.request_uri)

begin
  response = http.request(request)
  puts "SOCKS Proxy Response Status: #{response.code}"
  puts "External IP via SOCKS proxy: #{response.body}"
rescue StandardError => e
  puts "SOCKS Proxy Error: #{e.message}"
end

ENV.delete('SOCKS_PROXY') # Clean up

Error Handling and Timeouts

When using proxies, network issues, proxy misconfigurations, or proxy server failures are common. Robust error handling is crucial.

Common Errors

Net::OpenTimeout: The connection to the proxy or target server timed out before being established.
Net::ReadTimeout: The server (proxy or target) did not send data within the specified timeout period.
Errno::ECONNREFUSED: The proxy server actively refused the connection.
Net::HTTPBadResponse: The proxy server or target returned an invalid HTTP response.
Mechanize::ResponseCodeError: Mechanize-specific error when the target server returns a non-2xx HTTP status code (e.g., 403 Forbidden, 404 Not Found, 500 Internal Server Error).

Timeout Configuration

Setting appropriate timeouts prevents scripts from hanging indefinitely.

require 'net/http'
require 'uri'

uri = URI('http://slow-api.example.com')
proxy_host = 'your_proxy_host.com'
proxy_port = 8080

http = Net::HTTP.new(uri.host, uri.port, proxy_host, proxy_port)
http.open_timeout = 5 # Timeout for establishing the connection (seconds)
http.read_timeout = 10 # Timeout for reading data from the connection (seconds)
http.use_ssl = (uri.scheme == 'https')

request = Net::HTTP::Get.new(uri.request_uri)

begin
  response = http.request(request)
  puts "Response Status: #{response.code}"
rescue Net::OpenTimeout
  puts "Connection to proxy or target timed out."
rescue Net::ReadTimeout
  puts "Reading data from proxy or target timed out."
rescue Errno::ECONNREFUSED
  puts "Connection refused by proxy or target server."
rescue StandardError => e
  puts "An unexpected error occurred: #{e.message}"
end

# Mechanize also supports timeouts
# agent = Mechanize.new
# agent.set_proxy(proxy_host, proxy_port)
# agent.open_timeout = 5
# agent.read_timeout = 10

Best Practices

User-Agent Strings: When scraping, always set a realistic User-Agent header. Many websites block requests without one or with generic ones. Mechanize's user_agent_alias simplifies this.
Referer Headers: For some sites, a valid Referer header is necessary to mimic legitimate browser behavior.
Cookie Management: Mechanize handles cookies automatically, which is crucial for maintaining sessions through a proxy. With Net::HTTP, manual cookie management is required.
Error Handling and Retries: Implement retry logic with exponential backoff for transient network errors or proxy issues.
Proxy Health Checks: Before using a proxy, consider a quick check against a known endpoint (e.g., https://api.ipify.org) to verify its functionality and IP address.
Resource Management: Ensure Net::HTTP connections are properly closed, especially when not using the start block, to prevent resource leaks. Mechanize manages connections internally.

Analysis & Check

Security & Network

Generators

9 tools

Using Proxies in Ruby

Net::HTTP Proxy Configuration

Direct Proxy Parameters

Environment Variables

Mechanize Proxy Configuration

Direct Proxy Parameters

Proxy Rotation with Mechanize

Comparison: Net::HTTP vs. Mechanize for Proxies

Proxy Types and Limitations

Error Handling and Timeouts

Common Errors

Timeout Configuration

Best Practices

Advantages of our proxies