如何使用 urllib 包獲取網路資源

鴨梨山大發表於2015-05-10

簡介

你還可以在這篇文章中找到對使用Python獲取網路資源有幫助的資訊:

基礎認證帶有Python示例的基礎認證教程

urllib.request是一個用於獲取URL(Uniform Resource Locators)的Python模組。它提供的介面（以urlopen函式的形式）非常簡單。它可以用不同的協議去獲取URL。同時它還提供了稍微複雜些的介面，讓我們能在一些常見的場景下使用，如基礎認證，Cookies，代理等等。這些介面是通過handler和opener物件來提供的。

urllib.request通過相關的網路協議（例如FTP，Http）支援多種“URL模式”下（以URL中的冒號之前的字串識別，例如“ftp”是“ftp://python.org/”的 URL模式）的URL獲取。本篇教程重點放在最常見的場景中，即Http。

urlopen在簡單的場景中極易使用。然而當你在開啟Http URL的時候遇到錯誤或是不正常的情況時，你將會需要一些超文字傳輸協議的知識。最全面且最權威的Http參考文是RFC 2616。但這不是一個通俗易懂的技術文件。本篇HOWTO意在講述urllib的使用方法，輔以足夠的Http細節去幫助你理解。本文並不是 urllib.request 文件的替代, 而是一個補充。

URL的獲取

urllib.request最簡單的用法如下：

import urllib.request

response = urllib.request.urlopen('http://python.org/')

html = response.read()

如果你想通過URL獲取一個資源並儲存在某個臨時的空間，你可以通過urlretrieve() 函式去實現:

import urllib.request

local_filename, headers = urllib.request.urlretrieve('http://python.org/')

html = open(local_filename)

urllib的許多用法就是這麼簡單（注意，除了“http”，我們還以使用以“ftp”，“file”等開頭的URL）。無論如何，本教程的目的在於講解更復雜的案例，且重點放在Http的案例上。

Http基於請求和響應——客戶端作出請求而伺服器傳送響應。urllib.request通過Request物件對映了你正在做的Http請求。建立最簡單的Request物件例項，你只需要指定目標URL。呼叫urlopen並傳入所建立的Request例項，將會返回該URL請求的response物件。該response物件類似於file，這意味著你可以在它上面呼叫.read()：

import urllib.request

req = urllib.request.Request('http://www.voidspace.org.uk')

response = urllib.request.urlopen(req)

the_page = response.read()

應該注意到urllib.request使用了同樣的Request的介面去處理所有的URL模式。比如，你可以像這樣做一個FTP請求：

1	req = urllib.request.Request('ftp://example.com/')

在Http的案例中，Request物件可以做兩樣額外的事情。首先，你可以傳入要發給伺服器的資料。其次，你可以傳入額外的關於資料或關於該請求本身的資訊（“後設資料”）給伺服器端——這些資訊會作為Http的“headers”傳輸。接下來讓我們依次來了解他們。

Data

有時你會想要向一個URL傳輸資料（通常這裡的URL指的是一個CGI（Common Gateway Interface公共閘道器介面）指令碼或是其他的網路應用）。在Http裡，這常常是通過POST請求所完成的。這也是當你填好一個頁面中的HTML表單並提交時，你的瀏覽器所做之事。但並不是所有的POST請求都是來自表單：你可以在你自己的網路應用裡用POST請求去傳送任意資料。通常在HTML表單中，資料需要以標準方式編碼然後通過data引數傳給Request物件。一般會使用 urllib.parse 庫來進行編碼。

import urllib.parse

import urllib.request

url = 'http://www.someserver.com/cgi-bin/register.cgi'

values = {'name' : 'Michael Foord', 'location' : 'Northampton', 'language' : 'Python' }

data = urllib.parse.urlencode(values)

data = data.encode('utf-8') # data should be bytes 資料應為位元組碼

req = urllib.request.Request(url, data)

response = urllib.request.urlopen(req)

the_page = response.read()

注意，有時候我們也會需要到其他型別的編碼（比如，通過HTML表單上傳檔案 —— 點選 HTML規範, 表單的提交瞭解更多細節).

如果你不給data引數傳值，urllib將使用GET請求。GET請求和POST請求之間的一個不同點在於，POST請求通常有“副作用”：它們在某種意義上改變了系統的狀態（例如給網站下一個訂單，要求送100斤的豬肉罐頭到你家門口）。雖然Http標準聲稱POST請求很有可能造成副作用，同時GET請求從不造成副作用，但是GET請求仍可能產生副作用，POST請求也不一定就會造成副作用。在Http GET請求裡，資料也可以被編碼進URL。

實現方法：

>>> import urllib.request

>>> import urllib.parse

>>> data = {}

>>> data['name'] = 'Somebody Here'

>>> data['location'] = 'Northampton'

>>> data['language'] = 'Python'

>>> url_values = urllib.parse.urlencode(data)

>>> print(url_values) # The order may differ from below. 順序可能與下面不同。

name=Somebody+Here&language=Python&location=Northampton

>>> url = 'http://www.example.com/example.cgi'

>>> full_url = url + '?' + url_values

>>> data = urllib.request.urlopen(full_url)

注意：完整的URL是在原URL後面加上 ?以及編碼的結果而生成的。

Headers

我們下面將會討論一個具體的Http header, 向大家展示怎麼向你的Http請求新增header。

一些網站 [1] 不喜歡被程式訪問，也不喜歡匹配多種瀏覽器 [2]。預設情況下，urllib將自己設定為Python-urllib/x.y (這裡 x 和 y 分別是Python的主要和次要版本號, 如Python-urllib/2.5), 這會把網站搞糊塗，或者乾脆就不能正常執行。瀏覽器通過 User-Agent header [3]來定位自己。建立一個Request 物件時你可以傳入一個header的dictionary。下面的例子建立的請求跟之前的一樣，唯一不同的地方是該請求將自己標為IE瀏覽器的某個版本 [4]。

import urllib.parse

import urllib.request

url = 'http://www.someserver.com/cgi-bin/register.cgi'

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'

values = {'name' : 'Michael Foord',

'location' : 'Northampton',

'language' : 'Python' }

headers = { 'User-Agent' : user_agent }

data = urllib.parse.urlencode(values)

data = data.encode('utf-8')

req = urllib.request.Request(url, data, headers)

response = urllib.request.urlopen(req)

the_page = response.read()

響應也有兩個有用的方法。在我們待會了解出現了問題時的情形後，可以看這一節info and geturl 去學習這兩個方法。

處理異常

urlopen 會在它無法處理一個響應時丟擲 URLError (Python的API也常常這樣丟擲內建的異常，如 ValueError, TypeError等等)。 HttpError 是URLError的子類，在Http URL的一些情況中會被丟擲。這些異常類來自urllib.error 模組。

e.g. 通常URL錯誤的產生原因是沒有網路連線（即沒有到達指定伺服器的路由），或是指定的伺服器不存在。這時，被丟擲的異常將會有“reason”屬性。該屬性是一個包含錯誤碼及錯誤文字的元組（tuple）。如：

>>>req = urllib.request.Request('http://www.pretend<em>server.org')

>>>try: urllib.request.urlopen(req)

... except urllib.error.URLError as e:

... print(e.reason)

...

(4, 'getaddrinfo failed')

Http錯誤

每個來自伺服器的Http響應都有一個數字的“狀態碼”。狀態碼有時會表示伺服器無法實現請求。預設的handler將會幫你處理這樣的一些響應（例如，如果響應是一個要求客戶端從另外的URL獲取文件，即“重定向（redirection）”,urllib會幫你處理它）。而無法被處理的響應，HttpError將會被urlopen丟擲。常見的錯誤包括“404”(page not found無法找到頁面), “403”(request forbidden請求被禁止), 和“401” (authentication required需要驗證)。

欲瞭解Http錯誤碼，請閱讀RFC2616的第十節。

被丟擲的HttpError例項會有一個整數的“code”屬性，對應伺服器傳送的錯誤。

Error Codes

由於預設的handler會處理重定向（300範圍內的程式碼），並且100-299範圍的程式碼表示成功，所以你一般只會看到400-599範圍內的錯誤碼。 http.server.BaseHttpRequestHandler.responses是一個很有用的響應程式碼字典，它包含了RFC2616裡所用到的全部響應碼。下面是該字典的重現：

# Table mapping response codes to messages; entries have the

# form {code: (shortmessage, longmessage)}.

responses = { 100: (&#039;Continue&#039;, &#039;Request received, please continue&#039;), 
101: (&#039;Switching Protocols&#039;, &#039;Switching to new protocol; obey Upgrade header&#039;),
200: (&#039;OK&#039;, &#039;Request fulfilled, document follows&#039;),
201: (&#039;Created&#039;, &#039;Document created, URL follows&#039;),
202: (&#039;Accepted&#039;,
      &#039;Request accepted, processing continues off-line&#039;),
203: (&#039;Non-Authoritative Information&#039;, &#039;Request fulfilled from cache&#039;),
204: (&#039;No Content&#039;, &#039;Request fulfilled, nothing follows&#039;),
205: (&#039;Reset Content&#039;, &#039;Clear input form for further input.&#039;),
206: (&#039;Partial Content&#039;, &#039;Partial content follows.&#039;),

300: (&#039;Multiple Choices&#039;,
      &#039;Object has several resources -- see URI list&#039;),
301: (&#039;Moved Permanently&#039;, &#039;Object moved permanently -- see URI list&#039;),
302: (&#039;Found&#039;, &#039;Object moved temporarily -- see URI list&#039;),
303: (&#039;See Other&#039;, &#039;Object moved -- see Method and URL list&#039;),
304: (&#039;Not Modified&#039;,
      &#039;Document has not changed since given time&#039;),
305: (&#039;Use Proxy&#039;,
      &#039;You must use proxy specified in Location to access this &#039;
      &#039;resource.&#039;),
307: (&#039;Temporary Redirect&#039;,
      &#039;Object moved temporarily -- see URI list&#039;),

400: (&#039;Bad Request&#039;,
      &#039;Bad request syntax or unsupported method&#039;),
401: (&#039;Unauthorized&#039;,
      &#039;No permission -- see authorization schemes&#039;),
402: (&#039;Payment Required&#039;,
      &#039;No payment -- see charging schemes&#039;),
403: (&#039;Forbidden&#039;,
      &#039;Request forbidden -- authorization will not help&#039;),
404: (&#039;Not Found&#039;, &#039;Nothing matches the given URI&#039;),
405: (&#039;Method Not Allowed&#039;,
      &#039;Specified method is invalid for this server.&#039;),
406: (&#039;Not Acceptable&#039;, &#039;URI not available in preferred format.&#039;),
407: (&#039;Proxy Authentication Required&#039;, &#039;You must authenticate with &#039;
      &#039;this proxy before proceeding.&#039;),
408: (&#039;Request Timeout&#039;, &#039;Request timed out; try again later.&#039;),
409: (&#039;Conflict&#039;, &#039;Request conflict.&#039;),
410: (&#039;Gone&#039;,
      &#039;URI no longer exists and has been permanently removed.&#039;),
411: (&#039;Length Required&#039;, &#039;Client must specify Content-Length.&#039;),
412: (&#039;Precondition Failed&#039;, &#039;Precondition in headers is false.&#039;),
413: (&#039;Request Entity Too Large&#039;, &#039;Entity is too large.&#039;),
414: (&#039;Request-URI Too Long&#039;, &#039;URI is too long.&#039;),
415: (&#039;Unsupported Media Type&#039;, &#039;Entity body in unsupported format.&#039;),
416: (&#039;Requested Range Not Satisfiable&#039;,
      &#039;Cannot satisfy request range.&#039;),
417: (&#039;Expectation Failed&#039;,
      &#039;Expect condition could not be satisfied.&#039;),

500: (&#039;Internal Server Error&#039;, &#039;Server got itself in trouble&#039;),
501: (&#039;Not Implemented&#039;,
      &#039;Server does not support this operation&#039;),
502: (&#039;Bad Gateway&#039;, &#039;Invalid responses from another server/proxy.&#039;),
503: (&#039;Service Unavailable&#039;,
      &#039;The server cannot process the request due to a high load&#039;),
504: (&#039;Gateway Timeout&#039;,
      &#039;The gateway server did not receive a timely response&#039;),
505: (&#039;Http Version Not Supported&#039;, &#039;Cannot fulfill request.&#039;),
}

# Table mapping response codes to messages; entries have the

# form {code: (shortmessage, longmessage)}.

responses = { 100: ('Continue', 'Request received, please continue'),

101: ('Switching Protocols', 'Switching to new protocol; obey Upgrade header'),

200: ('OK', 'Request fulfilled, document follows'),

201: ('Created', 'Document created, URL follows'),

202: ('Accepted',

'Request accepted, processing continues off-line'),

203: ('Non-Authoritative Information', 'Request fulfilled from cache'),

204: ('No Content', 'Request fulfilled, nothing follows'),

205: ('Reset Content', 'Clear input form for further input.'),

206: ('Partial Content', 'Partial content follows.'),

300: ('Multiple Choices',

'Object has several resources -- see URI list'),

301: ('Moved Permanently', 'Object moved permanently -- see URI list'),

302: ('Found', 'Object moved temporarily -- see URI list'),

303: ('See Other', 'Object moved -- see Method and URL list'),

304: ('Not Modified',

'Document has not changed since given time'),

305: ('Use Proxy',

'You must use proxy specified in Location to access this '

'resource.'),

307: ('Temporary Redirect',

'Object moved temporarily -- see URI list'),

400: ('Bad Request',

'Bad request syntax or unsupported method'),

401: ('Unauthorized',

'No permission -- see authorization schemes'),

402: ('Payment Required',

'No payment -- see charging schemes'),

403: ('Forbidden',

'Request forbidden -- authorization will not help'),

404: ('Not Found', 'Nothing matches the given URI'),

405: ('Method Not Allowed',

'Specified method is invalid for this server.'),

406: ('Not Acceptable', 'URI not available in preferred format.'),

407: ('Proxy Authentication Required', 'You must authenticate with '

'this proxy before proceeding.'),

408: ('Request Timeout', 'Request timed out; try again later.'),

409: ('Conflict', 'Request conflict.'),

410: ('Gone',

'URI no longer exists and has been permanently removed.'),

411: ('Length Required', 'Client must specify Content-Length.'),

412: ('Precondition Failed', 'Precondition in headers is false.'),

413: ('Request Entity Too Large', 'Entity is too large.'),

414: ('Request-URI Too Long', 'URI is too long.'),

415: ('Unsupported Media Type', 'Entity body in unsupported format.'),

416: ('Requested Range Not Satisfiable',

'Cannot satisfy request range.'),

417: ('Expectation Failed',

'Expect condition could not be satisfied.'),

500: ('Internal Server Error', 'Server got itself in trouble'),

501: ('Not Implemented',

'Server does not support this operation'),

502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),

503: ('Service Unavailable',

'The server cannot process the request due to a high load'),

504: ('Gateway Timeout',

'The gateway server did not receive a timely response'),

505: ('Http Version Not Supported', 'Cannot fulfill request.'),

}

當一個錯誤被伺服器響應丟擲時，返回一個Http 錯誤碼和一個錯誤頁面。你可以使用HttpError例項作為一個返回頁面中的響應。這意味著同code屬性，它還有read, geturl, 和info方法，如urllib.response模組返回的一樣：

>>> req = urllib.request.Request('http://www.python.org/fish.html')

>>> try:

... urllib.request.urlopen(req)

... except urllib.error.HttpError as e:

... print(e.code)

... print(e.read())

...

404

b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">nnn<html

...

<title>Page Not Found</title>n

...

小結

如果你想為HttpError 或 URLError作好準備，這裡有兩個方法。我個人推薦第二個。

第一個

from urllib.request import Request, urlopen

from urllib.error import URLError, HttpError

req = Request(someurl)

try:

response = urlopen(req)

except HttpError as e:

print('The server couldn't fulfill the request.')

print('Error code: ', e.code)

except URLError as e:

print('We failed to reach a server.')

print('Reason: ', e.reason)

else:

# everything is fine

注意： except HttpError 一定要放在前面, 不然except URLError 也會獲取HttpError。

第二個

from urllib.request import Request, urlopen

from urllib.error import URLError

req = Request(someurl)

try:

response = urlopen(req)

except URLError as e:

if hasacoder(e, 'reason'):

print('We failed to reach a server.')

print('Reason: ', e.reason)

elif hasacoder(e, 'code'):

print('The server couldn't fulfill the request.')

print('Error code: ', e.code)

else:

# everything is fine

info and geturl

urlopen (或HttpError 例項)返回的響應有兩個有用的方法， info() 和geturl() ，定義在urllib.response模組中.

geturl – 它返回獲取的頁面的真實URL。在urlopen (或使用的opener物件) 可能帶有一個重定向時，它很有幫助。獲取的頁面的URL不一定跟請求的URL相同。

info – 它返回一個字典-就像一個物件，用於描述獲取的頁面，特別是伺服器傳送的header。它是一個http.client.HttpMessage 例項。常見的header有‘Content-length’, ‘Content-type’等等。點選Quick Reference to Http Headers 檢視Http header列表，內含各個header的簡單介紹和用法。

Openers and Handlers

當你需要獲取一個URL時，你需要一個opener(一個看起來不太容易理解的物件urllib.request.OpenerDirector的例項)。一般情況下我們都會通過urlopen來使用預設的opener，但是你可以自己建立不同的opener。Opener會使用handler。所有的“重活”由handler去承擔。每個handler知道如何去以某個特定的URL模式（http,ftp等等）開啟URL，或是如何處理URL啟動的某個方面，比如Http重定向或Http cookie。

如果你想用某個已建立的handler去獲取URL，你需要建立opener，例如一個處理cookie的opener，或是一個不處理重定向的opener。

要建立一個opener，你需要初始化一個OpenerDirector，然後重複呼叫.add_handler(some_handler_instance)。或者，你可以使用build_opener，一個便利的建立opener的函式，只需呼叫一次該函式便可建立opener。build_opener預設新增了一些handler，但是提供了便捷的途徑去新增和/或重寫預設的handler。

如果你想知道的話，還有其他種類的handler可以適用於代理，驗證以及其他常見但又有些特殊的情形。

install_opener 可以用來建立一個opener物件，即（全域性）預設opener。這意味著urlopen將會使用你建立的opener。

Opener物件有一個open方法，可以直接用來像urlopen一樣去獲取URL：除非更方便，否則沒必要呼叫install_opener。

基本驗證

為了演示一個handler的建立和設定，我們將用到HttpBasicAuthHandler。想了解更多關於這方面的細節——包括基本驗證是如何執行的——請看Basic Authentication Tutorial。當需要驗證時，伺服器會傳送一個header（同時還有401錯誤碼）來請求驗證。這將會指定驗證方案以及一個“realm”。Header看起來是這樣的。

1	WWW-Authenticate: Basic realm="cPanel Users"

客戶端接著應該用正確的使用者名稱和密碼進行重新請求（請求的header中包含realm）。這就是“基本驗證”。為了簡化這個過程，我們可以建立一個HttpBasicAuthHandler例項和一個使用該handler的opener。

HttpBasicAuthHandler使用一個叫password manager的物件去處理URL和realm，密碼和使用者名稱之間的對映。如果你知道realm是什麼(根據伺服器發來的驗證header)，那麼你就可以使用HttpPasswordMgr。通常人們不會關心realm是什麼。在這種情況下，使用HttpPasswordMgrWithDefaultRealm會很方便。它允許你指定一個URL預設的使用者名稱和密碼。它會在你沒有給某個realm提供使用者名稱和密碼的時候起到作用。實現這種情況，我們需要將add_password 方法的realm引數設定為None。

最上層的URL是第一個要求驗證的URL。只要是比你傳給.add_password()的URL“更深”的URL都可以匹配上。

# create a password manager建立一個password manager

password_mgr = urllib.request.HttpPasswordMgrWithDefaultRealm()

# Add the username and password. 新增使用者名稱和密碼

# If we knew the realm, we could use it instead of None.如果知道realm的值，可以替換掉None

top_level_url = "http://example.com/foo/"

password_mgr.add_password(None, top_level_url, username, password)

handler = urllib.request.HttpBasicAuthHandler(password_mgr)

# create "opener" (OpenerDirector instance)

opener = urllib.request.build_opener(handler)

# use the opener to fetch a URL

opener.open(a_url)

# Install the opener.

# Now all calls to urllib.request.urlopen use our opener.

urllib.request.install_opener(opener)

注意在上面的例子中，我們只提供HttpBasicAuthHandler給build_opener。預設的情況下，opener有處理普通情況的handler —— ProxyHandler（如果代理的環境變數如http_proxy已經設定的話），UnknownHandler, HttpHandler，HttpDefaultErrorHandler，HttpRedirectHandler，FTPHandler，FileHandler， DataHandler，HttpErrorProcessor。

toplevel_url實際上要麼是一個完整的URL（包括“Http：”模式部分以及主機名和可選的埠號）比如“http://example.com/” ，要麼是一個“主體”（即主機名和可選的埠號）例如“example.com”或“example.com:8080”（後者包含了埠號）。該主體如果出現的話，不能包含“使用者資訊”部分——如“joe@password:example.com”就是不對的。

代理

urllib 將會自動檢測你的代理設定並使用它們。這是通過ProxyHandler來實現的，當代理設定被檢測到時，它是普通handler鏈的一部分。通常來說這是好事，但是它不一定會帶來幫助[5]。一個不用定義代理的實現方式是建立我們自己的ProxyHandler。這個實現方法類似於建立一個基本認證handler：

>>> proxy_support = urllib.request.ProxyHandler({})

>>> opener = urllib.request.build_opener(proxy_support)

>>> urllib.request.install_opener(opener)

請注意目前urllib.request不支援通過代理獲取https位置。然而，這可以通過擴充套件urllib.request來實現，見[6]。

Sockets and Layers

Python支援從多層級網頁中獲取資源。urllib使用http.client庫，而該庫使用了socket庫。在Python2.3中，你可以指定一個socket等待響應的時間。這對於需要獲取網頁的一些應用來說很有用。預設情況下，scoket模組沒有超時時間的設定而是一直掛著。目前socket超時在http.client或urllib.request層是隱藏的。然後你可以將所有socket的預設超時設定成全域性的，方法是：

import socket

import urllib.request

# timeout in seconds

#超時時間，以秒為單位

timeout = 10

socket.setdefaulcodeimeout(timeout)

# this call to urllib.request.urlopen now uses the default timeout

# we have set in the socket module

# 這裡呼叫urllib.request.urlopen使用了我們在socket模組中設定的預設超時時間

req = urllib.request.Request('http://www.voidspace.org.uk')

response = urllib.request.urlopen(req)

註腳

本篇文章由John Lee審閱和修改。

[1]	Like Google for example. The proper way to use google from a program is to use PyGoogle of course.

[2]	Browser sniffing is a very bad practise for website design – building sites using web standards is much more sensible. Unfortunately a lot of sites still send different versions to different browsers.

[3]	The user agent for MSIE 6 is ‘Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)’

[4]	For details of more Http request headers, see Quick Reference to Http Headers.

[5]	In my case I have to use a proxy to access the internet at work. If you acodeempt to fetch _localhost URLs through this proxy it blocks them. IE is set to use the proxy, which urllib picks up on. In order to test scripts with a localhost server, I have to prevent urllib from using the proxy.

[6]	urllib opener for SSL proxy (CONNECT method): ASPN Cookbook Recipe.

使用URLConnection物件獲取網路資源資訊
2020-11-10
物件
全棧 - 8 爬蟲使用 urllib2 獲取資料
2017-02-06
全棧爬蟲
好用的資源獲取網站
2017-05-06
網站
使用Paging Library獲取網路資料
2017-11-12
網路節點資源獲取共享方面的演算法
2018-11-11
演算法
如何獲取Mac/win軟體資源？
2022-03-03
Mac
Java - 獲取ClassPath的路徑和資源
2014-05-25
Java
如何使用 Python 請求網路資源
2022-10-23
Python
網路爬蟲如何獲取IP進行資料抓取
2022-05-19
爬蟲
用Python網路爬蟲獲取Mikan動漫資源
2020-08-26
Python爬蟲
Android開發16——獲取網路資源之基礎應用
2017-11-30
Android
移動網際網路如何獲取精準流量
2016-04-14
Python使用內建urllib模組或第三方庫requests訪問網路資源
2019-04-12
Python
JVM 如何獲取當前容器的資源限制？
2023-01-11
JVM
Python教程之獲取網路資料！
2021-05-24
Python
教你如何使用API介面獲取資料！
2023-09-25
API
異源資料同步 → 如何獲取 DataX 已同步資料量？
2024-11-04
使用RxJava從多個資料來源獲取資料
2015-12-25
RxJava
golang 介面按需獲取資源
2024-07-11
Golang
python爬蟲如何獲取表情包
2021-09-11
Python爬蟲
Flutter筆記: 獲取網路資料及渲染列表
2018-08-20
Flutter筆記
001-urllib讀取網頁
2018-12-16
網頁
使用tcpdump+wireshark抓包分析網路資料包
2018-12-16
TCP
網際網路時代，如何透過Facebook獲取更多客戶？
2021-12-04
獲取使用者訪問網站的IP來源
2006-04-18
網站
如何使用API介面獲取淘寶商品資料
2023-12-15
API
網際網路大廠是如何設計和使用快取的？方案已開源！
2023-12-13
快取
使用 C# 獲取 Kubernetes 叢集資源資訊
2022-02-15
C#
如何教會小白使用API介面獲取商品資料
2023-09-26
API
如何使用js獲取USB掃碼槍資料
2021-12-11
JS
獲取網路圖片的大小
2019-03-04
C#—獲取網路時間
2017-06-05
C#
Android使用getIdentifier()方法根據資源名來獲取資
2021-09-09
AndroidIDE
獲取使用者臨時資料夾路徑
2017-12-29
Android使用VideoView播放網路視訊,獲取網路視訊縮圖
2017-09-11
AndroidIDEView
如何除錯前端：優化網路資源
2019-02-28
除錯前端優化
想獲取JS載入網頁的源網頁的原始碼，不想獲取JS載入後的資料
2024-04-10
JS網頁原始碼
如何獲取Flume連線HDFS所需要的包
2024-06-17

如何使用 urllib 包獲取網路資源

URL的獲取

Data

Headers

處理異常

Http錯誤

Error Codes

小結

第一個

第二個

info and geturl

Openers and Handlers

基本驗證

代理

Sockets and Layers

註腳

相關文章