# 注意一下 是import urllib.request 還是 form urllib import request
0. urlopen()
語法:urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)
- 例項0:(這個函式 一般就使用三個引數 url data timeout)
*新增的data引數需要使用bytes()方法將引數轉換為位元組流(區別於str的一種型別 是一種位元流 010010010)編碼的格式的內容,即bytes型別。
*response.read()是bytes型別的資料,需要decode(解碼)一下。
import urllib.parse import urllib.request import urllib.error url = 'http://httpbin.org/post' data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding='utf8') try: response = urllib.request.urlopen(url, data=data,timeout=1) except urllib.error.URLError as e: if isinstance(e.reason, socket.timeout): print('TIME OUT') else: print(response.read().decode("utf-8"))
輸出結果:
{ "args": {}, "data": "", "files": {}, "form": { "word": "hello" }, "headers": { "Accept-Encoding": "identity", "Content-Length": "10", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "Python-urllib/3.6" }, "json": null, "origin": "101.206.170.234, 101.206.170.234", "url": "https://httpbin.org/post" }
- 例項1:檢視i狀態碼、響應頭、響應頭裡server欄位的資訊
import urllib.request response = urllib.request.urlopen('https://www.python.org') print(response.status) print(response.getheaders()) print(response.getheader('Server'))
輸出結果:
200 [('Server', 'nginx'), ('Content-Type', 'text/html; charset=utf-8'), ('X-Frame-Options', 'DENY'), ('Via', '1.1 vegur'), ('Via', '1.1 varnish'), ('Content-Length', '48410'), ('Accept-Ranges', 'bytes'), ('Date', 'Tue, 09 Apr 2019 02:32:34 GMT'), ('Via', '1.1 varnish'), ('Age', '722'), ('Connection', 'close'), ('X-Served-By', 'cache-iad2126-IAD, cache-hnd18751-HND'), ('X-Cache', 'MISS, HIT'), ('X-Cache-Hits', '0, 1223'), ('X-Timer', 'S1554777154.210361,VS0,VE0'), ('Vary', 'Cookie'), ('Strict-Transport-Security', 'max-age=63072000; includeSubDomains')] nginx
使用urllib庫的urlopen()方法有很大的侷限性,比如不能設定響應頭的資訊等。所以需要引入request()方法。
1. Request()
- 例項0:(這兩種方法的實現效果是一樣的)
import urllib.request response = urllib.request.urlopen('https://www.python.org') print(response.read().decode('utf-8')) ###################################### import urllib.request req = urllib.request.Request('https://python.org') response = urllib.request.urlopen(req) print(response.read().decode('utf-8'))
下面主要講解下使用Request()方法來實現get請求和post請求,並設定引數。
- 例項1:(post請求)
from urllib import request, parse url = 'http://httpbin.org/post' headers = { 'User-Agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)', 'Host': 'httpbin.org' } dict = { 'name': 'Germey' } data = bytes(parse.urlencode(dict), encoding='utf8') req = request.Request(url=url, data=data, headers=headers, method='POST') response = request.urlopen(req) print(response.read().decode('utf-8'))
亦可使用add_header()方法來新增報頭,實現瀏覽器的模擬,新增data屬性亦可如下書寫:
補充:還可以使用bulid_opener()修改報頭,不過多闡述,夠用了就好。
from urllib import request, parse url = 'http://httpbin.org/post' dict = { 'name': 'Germey' } data = parse.urlencode(dict).encode('utf-8') req = request.Request(url=url, data=data, method='POST') req.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)') response = request.urlopen(req) print(response.read().decode('utf-8'))
- 例項2:(get請求) 百度關鍵字的查詢
from urllib import request,parse url = 'http://www.baidu.com/s?wd=' key = '路飛' key_code = parse.quote(key) url_all = url + key_code """ #第二種寫法 url = 'http://www.baidu.com/s' key = '路飛' wd = parse.urlencode({'wd':key}) url_all = url + '?' + wd """ req = request.Request(url_all) response = request.urlopen(req) print(response.read().decode('utf-8'))
在這裡,對編碼decode、reqest模組裡的quote()方法、urlencode()方法 等就有疑問了,,對此,做一些說明:
- parse.quote:將str資料轉換為對應的編碼
- parse.urlencode:將字典中的k:v轉換為K:編碼後的v
- parse.unquote:將編碼後的資料轉化為編碼前的資料
- decode 字串解碼 decode("utf-8")跟read()搭配很配!
- encode 字串編碼
>>> str0 = '我愛你'
>>> str1 = str0.encode('gb2312')
>>> str1
b'\xce\xd2\xb0\xae\xc4\xe3'
>>> str2 = str0.encode('gbk')
>>> str2
b'\xce\xd2\xb0\xae\xc4\xe3'
>>> str3 = str0.encode('utf-8')
>>> str3
b'\xe6\x88\x91\xe7\x88\xb1\xe4\xbd\xa0'
>>> str00 = str1.decode('gb2312')
>>> str00
'我愛你'
>>> str11 = str1.decode('utf-8') #報錯,因為str1是gb2312編碼的
Traceback (most recent call last):
File "<pyshell#9>", line 1, in <module>
str11 = str1.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 0: invalid continuation byte
* encoding指定編碼格式
在這裡,又有疑問了?read()、readline()、readlines()的區別:
- read():全部,字串str
- reasline():一行
- readlines():全部,列表list