Set Up Proxies with AIOHTTP
Setting up proxies with AIOHTTP is essential for enhancing security and bypassing IP restrictions when making HTTP requests or performing web scraping tasks. AIOHTTP is a powerful Python library that not only supports proxy integration but also enables the development of efficient and scalable asynchronous web services and applications.
Before getting started with setting up proxies in AIOHTTP, make sure you have the following prerequisites:
- – Python 3.6 or above installed on your system.
- – DataImpulse proxy plan credentials, which will be used for proxy authentication.
Installing AIOHTTP
To install the aiohttp library, open your terminal and execute the following pip command:
pip install aiohttp
To install the aiohttp library, open your Windows command prompt and run the following command:
python -m pip install aiohttp
To install the asyncio package, use the following command in your command prompt or terminal:
pip install asyncio
Now that the aiohttp library is installed, you can proceed to import the necessary packages in your Python file. This will allow you to send web requests and receive web responses using aiohttp.
import aiohttp
import asyncio
To handle HTTP requests using aiohttp, you can refer to the following example. In this example, we’ll be sending an HTTP GET request to the “https://ip-api.com/” web page. This web page will then respond with the IP address of the requester.
Let’s begin by creating the get_response() asynchronous function:
async def get_response():
async with aiohttp.ClientSession() as session:
async with session.get(
'https://ip-api.com/'
) as response:
print('Status Code: ', response.status)
print('Body: ', await response.text())
When you execute the get_response() function, it first creates a session using the ClientSession() method. Then, it sends an HTTP GET request to the specified URL. Once the response to this GET request is received, the function prints the status code value and the response text.
To see the HTTP request in action, let’s call the get_response() function:
loop_obj = asyncio.get_event_loop()
loop_obj.run_until_complete(get_response())
The output will show the success status and the requester’s IP address.
Integrating AIOHTTP proxies
When performing web scraping, it is common to encounter IP blocking from websites that have implemented anti-scraping measures. This occurs when we access the website repeatedly with the same IP, resulting in the IP being blocked and us being restricted from accessing the website.
To avoid such problems, you can integrate DataImpulse’ proxies with your HTTP requests. Simply provide your proxy server IP address and credentials for proxy authentication within the GET method.
The integration of proxies involves three steps. We will reuse the code from the previous section and add additional lines of code to enable proxy integration:
Step 1: Before using their functionalities, import the following packages:
import asyncio
import aiohttp
Step 2: Create variables to store the proxy address, proxy plan username and password. We will use these variables in the GET request later.
PROXY_END_POINT = ‘proxy_address’
USERNAME = ‘YourProxyPlanUsername’
PASSWORD = ‘YourProxyPlanPassword’
Make sure to replace the username and password with your DataImpulse proxy plan credentials, and replace the proxy_address with the address of the proxy server you wish to use.
Residential proxies
Proxy type: HTTP
IP address: gw.dataimpulse.com
Port: 823
PROXY_END_POINT = ‘gw.dataimpulse.com:823’
Step 3: Finally, submit an HTTP request with the target URL and the proxy dictionary using the aiohttp library.
The following code integrates the specified Residential Proxy server setup from step 2 into aiohttp to send a GET request to https://ip-api.com/. It also prints the status code and the response text in the output.
async def get_response_using_proxy():
async with aiohttp.ClientSession() as session:
async with session.get(
'https://ip-api.com/',
proxy=f'http://{YourProxyPlanUsername}:{YourProxyPlanPassword}@{PROXY_END_POINT}'
) as response:
print('Status Code: ', response.status)
print('Body: ', await response.text())
loop_obj = asyncio.get_event_loop()
loop_obj.run_until_complete(get_response_using_proxy())
Testing proxy connection
In the previous section, we provided a step-by-step guide on how to integrate proxies with Python and the aiohttp library. Now, let’s test the code to see the output and verify that the proxy integration is functioning as expected.
import aiohttp
import asyncio
PROXY_END_POINT = 'proxy_address'
USERNAME = 'YourProxyPlanUsername'
PASSWORD = 'YourProxyPlanPassword'
async def get_response_using_proxy():
async with aiohttp.ClientSession() as session:
async with session.get(
'https://ip-api.com/',
proxy=f'http://{YourProxyPlanUsername}:{YourProxyPlanPassword}@{PROXY_END_POINT}'
) as response:
print('Status Code: ', response.status)
print('Body: ', await response.text())
loop_obj = asyncio.get_event_loop()
loop_obj.run_until_complete(get_response_using_proxy())
Proxy integration using basic authentication
In the previous example, we demonstrated how to pass the username, password, and proxy address as a single string with the GET request. However, aiohttp also offers an alternative method for user authentication called BasicAuth. The following example showcases the usage of BasicAuth to include the proxy plan username and password in the request:
import asyncio
import aiohttp
PROXY_END_POINT = 'proxy_address'
USERNAME = 'YourProxyPlanUsername'
PASSWORD = 'YourProxyPlanPassword'
async def get_response_using_proxy():
async with aiohttp.ClientSession() as session:
async with session.get(
'https://ip-api.com/',
proxy=f'http://{PROXY_END_POINT}',
proxy_auth=aiohttp.BasicAuth(YourProxyPlanUsername, YourProxyPlanPassword)
) as response:
print('Status Code: ', response.status)
print('Body: ', await response.text())
loop_obj = asyncio.get_event_loop()
loop_obj.run_until_complete(get_response_using_proxy())
How to rotate proxies with Python and AIOHTTP
When performing web scraping, certain websites may impose restrictions and block IP addresses that are suspected of scraping activities. This can lead to your proxy IP getting blocked if you repeatedly use the same proxy server IP.
DataImpulse’s Residential Proxies offer the option to either use different IP addresses for a maximum of 30 minutes or randomly change the proxy server address with each request.
Alternatively, you can achieve proxy rotation using Python’s aiohttp library. While the library itself does not have a built-in rotation functionality, you can employ the following two techniques to rotate proxy servers.
Selecting a random proxy from the list
One simple approach to rotate proxies is by maintaining a list of proxies and randomly selecting one from the list for each web request.
If you have a list of proxies, you can use the following code to rotate the proxies for every web request:
import asyncio
import random
import aiohttp
proxy_list = [
'http://YourProxyPlanUsername:YourProxyPlanPasswordW@PROXY_ADDRESS_1:10000',
'http://YourProxyPlanUsername:YourProxyPlanPassword@PROXY_ADDRESS_2:10001',
.
.
.
'http://YourProxyPlanUsername:YourProxyPlanPassword@PROXY_ADDRESS_N:20000'
]
proxy = random.choice(proxy_list)
async def get_response_using_proxy():
async with aiohttp.ClientSession() as session:
async with session.get(
'https://ip-api.com/',
proxy=proxy
) as response:
print('Status Code: ', response.status)
print('Body: ', await response.text())
loop_obj = asyncio.get_event_loop()
loop_obj.run_until_complete(get_response_using_proxy())
The code provided above creates a list of proxy endpoints along with their corresponding IP addresses, usernames, and passwords. It then uses the random.choice() method to randomly select one proxy endpoint from the proxy_list. Once a proxy endpoint is selected, the request is sent using that specific proxy server.
It’s important to note that the random.choice() method can select the same proxy endpoint multiple times, which means there is a possibility of using a particular proxy address multiple times in the rotation process.
The previous approach for proxy rotation was non-deterministic due to its unpredictable nature. To achieve a more predictable rotation, we can use a round-robin-style strategy.
In this method, we create a list of proxy endpoints and iterate over the list indices in a cyclical manner. The value of i is mapped to the 0th index using modular arithmetic, ensuring that we cycle back to the beginning of the list once we reach the end. This process continues until all iterations of the for loop are completed.
The following code illustrates this concept:
import asyncio
import aiohttp
proxy_list = [
'http://YourProxyPlanUsername:YourProxyPlanPassword@PROXY_ADDRESS_1:10000',
'http://YourProxyPlanUsername:YourProxyPlanPassword@PROXY_ADDRESS_2:10001',
.
.
.
'http://YourProxyPlanUsername:YourProxyPlanPassword@PROXY_ADDRESS_N:200000'
]
async def get_response_using_proxy(target_url, proxy):
async with aiohttp.ClientSession() as session:
async with session.get(
target_url,
proxy=proxy,
) as response:
print('Status Code: ', response.status)
print('Body: ', await response.text())
number_of_requests = 10
length = len(proxy_list)
for i in range(number_of_requests):
index = i % length
loop_obj = asyncio.get_event_loop()
loop_obj.run_until_complete(get_response_using_proxy('https://ip-api.com/', proxy_list[index]))
The for
loop sends HTTP requests based on the value of number_of_requests
. Notice how the index is determined within each cycle. To ensure that the index stays within the boundaries of the proxy list, we use the expression i % length
. This expression maps all i
values in the range of 0 to length - 1
, resulting in a seamless rotation of proxies.
How to reuse proxies
With proxy rotation, we ensure that a different proxy is selected for each request sent by aiohttp. We can reuse the same proxy until it gets blocked by the website, and then switch to another one.
Let’s take a look at an example:
import asyncio
import aiohttp
proxy_list = [
'http://YourProxyPlanUsername:YourProxyPlanPassword@PROXY_ADDRESS_1:10000',
'http://YourProxyPlanUsername:YourProxyPlanPassword@PROXY_ADDRESS_2:10001',
.
.
.
'http://YourProxyPlanUsername:YourProxyPlanPassword@PROXY_ADDRESS_N:20000
]
async def get_response_using_proxy(target_url, proxy):
async with aiohttp.ClientSession() as session:
async with session.get(
target_url,
proxy=proxy
) as response:
print('Status Code: ', response.status)
print('Body: ', await response.text())
return response.status
index = 0
number_of_requests = 10
length = len(proxy_list)
for _ in range(number_of_requests):
loop_obj = asyncio.get_event_loop()
status_code = loop_obj.run_until_complete(get_response_using_proxy('https://ip-api.com/', proxy_list[index])) if status_code != 200:
index = index + 1 # selecting new proxy index
index = index % length # taking index with in the proxy list size
else:
continue # to reuse the same proxy
In this code, the for loop continuously sends requests using a single proxy. If the response status code is not 200 (indicating a success), the code updates the index and switches to the next proxy from the list. This allows for seamless proxy rotation until a successful response is received.