5.爬蟲 requests庫講解 高階用法

那是個好男孩發表於2019-04-09

0.檔案上傳

import requests

files = {'file': open('favicon.ico', 'rb')}
response = requests.post("http://httpbin.org/post", files=files)
print(response.text)

 

1.獲取cookies

import requests

response = requests.get("https://www.baidu.com")
print(response.cookies)
for key, value in response.cookies.items():
    print(key + '=' + value)

 

2.會話維持

import requests

requests.get('http://httpbin.org/cookies/set/number/123456789')
response = requests.get('http://httpbin.org/cookies')
print(response.text)

*可以通過http://httpbin.org/cookies/set/number/123456789對這個網址設定個cookies

輸出結果如下:

 

{
  "cookies": {}
}

 

為空?!用Session()方法試試?

import requests

s = requests.Session()
s.get('http://httpbin.org/cookies/set/number/123456789')
response = s.get('http://httpbin.org/cookies')
print(response.text)

輸出結果如下:

{
  "cookies": {
    "number": "123456789"
  }
}

 

3.證照驗證

import requests

response = requests.get('https://www.12306.cn')
print(response.status_code)
# 提示出現SSLError表示證照驗證錯誤
####################### #去除警告 import requests from requests.packages import urllib3
urllib3.disable_warnings() response
= requests.get('https://www.12306.cn', verify=False) print(response.status_code) ####################### #指定一個本地證照用作客戶端證照 import requests response = requests.get('https://www.12306.cn', cert=('/path/server.crt', '/path/key')) print(respo
nse.status_code)

 

4.代理設定

#無密碼的
import requests

proxies = {
  "http": "http://127.0.0.1:9743",
  "https": "https://127.0.0.1:9743",
}

response = requests.get("https://www.taobao.com", proxies=proxies)
print(response.status_code)

##############################

#有密碼的
import requests

proxies = {
    "http": "http://user:password@127.0.0.1:9743/",
}
response = requests.get("https://www.taobao.com", proxies=proxies)
print(response.status_code)

##############################

#代理不支援http,支援sockes
#pip3 install 'requests[socks]'
import requests

proxies = {
    'http': 'socks5://127.0.0.1:9742',
    'https': 'socks5://127.0.0.1:9742'
}
response = requests.get("https://www.taobao.com", proxies=proxies)
print(response.status_code)

 

5.超時設定

import requests
from requests.exceptions import ReadTimeout
try:
    response = requests.get("http://httpbin.org/get", timeout = 0.5)
    print(response.status_code)
except ReadTimeout:
    print('Timeout')

 *timeout = (5,30) 5是連線超時時間 30是讀取超時時間

 

6.認證設定

import requests
from requests.auth import HTTPBasicAuth

r = requests.get('http://120.27.34.24:9001', auth=HTTPBasicAuth('user', '123'))
#還可以像下面這樣寫 簡單些(預設使用HTTPBasicAuth這個類來認證 當然這個網址訪問不了的)
#r = requests.get('http://120.27.34.24:9001', auth=('user', '123'))
print(r.status_code)

 

7.異常處理

import requests
from requests import ReadTimeout, ConnectionError, RequestException
try:
    response = requests.get("http://httpbin.org/get", timeout = 0.5)
    print(response.status_code)
except ReadTimeout:
    print('Timeout')
except ConnectionError:
    print('Connection error')
except RequestException:
    print('Error')

*可以去requests庫的官方文件,找到API,再看裡面的異常!!

 

8.Prepared Request

*在urllib裡,可以將請求表示為資料結構,其餘各個引數都可以通過一個Request物件來表示.

*在requests裡,用Prepared Request同樣可以做到!

from requests import Request,Session
url = "..."
data = {'...':'...'}
headers = {'User-Agent':'...'}
s = Session()
req = Request('POST',url,data = data,headers = headers)
prepped = s.prepare_request(req)
r = s.send(prepped)
print(r.text)

*在這裡,我們引入Request,然後用url、data、headers引數構造了一個Requests物件,這時候呼叫Session的prepare_request()方法將其轉換為一個Prepared Request物件,然後再呼叫send方法傳送即可。

*有了這個Requests物件,就可以將請求當作獨立的物件來看待,這樣在進行佇列排程時會非常方便。

 

相關文章