爬蟲request如何設(shè)置代理IP?
在我們書寫爬蟲程序的時(shí)候,可能都多多少少會(huì)遇到ip被封的問題,或者說(shuō)ip被限制的問題,那么就需要用到ip代理了,那么ip代理在request里面的原理究竟是怎么實(shí)現(xiàn)的呢?下面和天啟HTTP來(lái)了解一下。
一、單個(gè)ip設(shè)置代理
import urllib.request
def create_handler():
url = 'http://httpbin.org/ip'
# 添加代理
proxy = {
# 'http': 'http://119.5.72.6:4226' # 這是官方定義的
'http': '119.5.72.6:4226'
}
# 代理處理器
proxy_handler = urllib.request.ProxyHandler(proxy)
# 創(chuàng)建自己的opener
opener = urllib.request.build_opener(proxy_handler)
# 拿著代理ip去發(fā)送請(qǐng)求
data = opener.open(url).read()
print(data)
if __name__ == '__main__':
create_handler()
二、多個(gè)ip設(shè)置代理
import urllib.request
def proxy_handel():
proxy_list = [
{'http': '125.111.149.163:4205'},
{'http': '106.46.136.93:4225'},
{'http': '114.230.18.38:4228'},
{'http': '115.151.50.141:4273'},
{'http': '182.105.201.153:4275'},
]
for proxy in proxy_list:
print(proxy)
# 創(chuàng)造處理器
proxy_head = urllib.request.ProxyHandler(proxy)
# 創(chuàng)建opener
opener = urllib.request.build_opener(proxy_head)
try:
print(opener.open('http://httpbin.org/ip', timeout=1).read())
print('=='*20)
except Exception as e:
print(e)
if __name__ == '__main__':
proxy_handel()