Python爬蟲學習（1）： urllib的使用

Amei1314發表於2016-10-10

1.urllib.urlopen

開啟一個url的方法，返回一個檔案物件，然後可以進行類似檔案物件的操作

In [1]: import urllib

In [2]: file = urllib.urlopen("http://www.baidu.com")

In [3]: file.readline()
Out[3]: '<!DOCTYPE html><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta content="always" name="referrer"><meta name="theme-color" content="#2932e1"><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="\xe7\x99\xbe\xe5\xba\xa6\xe6\x90\x9c\xe7\xb4\xa2" /><link rel="icon" sizes="any" mask href="//www.baidu.com/img/baidu.svg"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.com"/><link rel="dns-prefetch" href="//t10.baidu.com"/><link rel="dns-prefetch" href="//t11.baidu.com"/><link rel="dns-prefetch" href="//t12.baidu.com"/><link rel="dns-prefetch" href="//b1.bdstatic.com"/><title>\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8\x80\xe4\xb8\x8b\xef\xbc\x8c\xe4\xbd\xa0\xe5\xb0\xb1\xe7\x9f\xa5\xe9\x81\x93</title>\n'
In [4]: file.getcode()
Out[4]: 200

urlopen返回物件提供方法：

- read() , readline() ,readlines() , fileno() , close() ：這些方法的使用方式與檔案物件完全一樣

- info()：返回一個httplib.HTTPMessage物件，表示遠端伺服器返回的頭資訊

- getcode()：返回Http狀態碼。如果是http請求，200請求成功完成;404網址未找到

- geturl()：返回請求的url

2.urllib.urlretrieve

urlretrieve方法將url定位到的html檔案下載到你本地的硬碟中。如果不指定filename，則會存為臨時檔案。

urlretrieve()返回一個二元組(filename,mine_hdrs)

存為本地檔案:

In [12]: file = urllib.urlretrieve("http://www.baidu.com","/tmp/baidu.html")

In [13]: ls /tmp/baidu.html
/tmp/baidu.html

4.urllib.quote(url)和urllib.unquote(url)，urllib.unquote(url)和urllib.unquote_plus(url)

　　urllib.quote(url)： URL中的保留字元reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","中除了"/"之外都會被編碼

　　urllib.unquote(url)：還原由quote編碼的url

　　urllib.unquote(url)： URL中的所有保留字元都會被重編碼　　　　
　　

In [18]: urllib.quote("http://neeao.com/index.php?id=1") 
Out[18]: 'http%3A//neeao.com/index.php%3Fid%3D1'

In [19]: urllib.unquote("http%3A//neeao.com/index.php%3Fid%3D1")
Out[19]: 'http://neeao.com/index.php?id=1'

In [20]: urllib.quote_plus("http://neeao.com/index.php?id=1")
Out[20]: 'http%3A%2F%2Fneeao.com%2Findex.php%3Fid%3D1'

In [21]: urllib.unquote_plus("http%3A%2F%2Fneeao.com%2Findex.php%3Fid%3D1")
Out[21]: 'http://neeao.com/index.php?id=1'

與4的函式相反。

5.urllib.urlencode(query)

將URL中的鍵值對以連線符&劃分

這裡可以與urlopen結合以實現post方法和get方法：

GET方法：

>>> import urllib
>>> params=urllib.urlencode({'spam':1,'eggs':2,'bacon':0})
>>> params
'eggs=2&bacon=0&spam=1'
>>> f=urllib.urlopen("http://python.org/query?%s" % params)
>>> print f.read()

POST方法：

>>> import urllib
>>> parmas = urllib.urlencode({'spam':1,'eggs':2,'bacon':0})
>>> f=urllib.urlopen("http://python.org/query",parmas)
>>> f.read()

python爬蟲學習1
2020-11-29
Python爬蟲
Python爬蟲學習之（二）| urllib進階篇
2018-01-04
Python爬蟲
爬蟲-urllib模組的使用
2021-01-14
爬蟲
【Python學習】爬蟲爬蟲爬蟲爬蟲~
2018-05-03
Python爬蟲
爬蟲-urllib3模組的使用
2021-01-15
爬蟲
python爬蟲基礎之urllib
2020-11-26
Python爬蟲
Python爬蟲學習筆記（1）爬取知乎使用者資訊
2018-01-12
Python爬蟲筆記
Python爬蟲進階之urllib庫使用方法
2021-09-11
Python爬蟲
Python爬蟲學習（9）：Selenium的使用
2016-11-22
Python爬蟲
Python爬蟲學習（11）：Beautiful Soup的使用
2016-11-29
Python爬蟲
selenium爬蟲學習1
2024-08-29
爬蟲
2.python爬蟲基礎——Urllib庫
2018-02-07
Python爬蟲
Python3爬蟲實戰（urllib模組）
2018-01-27
Python爬蟲
python爬蟲常用庫之urllib詳解
2018-03-11
Python爬蟲
python爬蟲是什麼?學習python爬蟲難嗎
2021-03-31
Python爬蟲
JB的Python之旅-爬蟲篇--urllib和Beautiful Soup
2018-05-15
Python爬蟲
Python爬蟲學習（5）: 簡單的爬取
2016-10-20
Python爬蟲
Python 爬蟲十六式 - 第二式： urllib 與 urllib3
2019-01-07
Python爬蟲
Python爬蟲學習系列教程
2015-07-12
Python爬蟲
python爬蟲系列(4.5-使用urllib模組方式下載圖片)
2018-11-09
Python爬蟲
什麼是爬蟲?學習Python爬蟲難不難?
2019-11-05
爬蟲Python
爬蟲基本原理及urllib庫的基本使用
2019-03-21
爬蟲
學習Python的urllib模組
2023-11-10
Python
Python爬蟲（1.爬蟲的基本概念）
2018-04-20
Python爬蟲
為什麼學習python及爬蟲，Python爬蟲[入門篇]？
2018-11-21
Python爬蟲
python爬蟲學習(1)-抓取糗事百科笑話
2017-02-10
Python爬蟲
python爬蟲js逆向學習（二）
2020-07-03
Python爬蟲JS
Python爬蟲學習筆記(三)
2021-01-30
Python爬蟲筆記
python爬蟲學習筆記（二）
2020-11-24
Python爬蟲筆記
Python爬蟲學習（2）： httplib
2016-10-17
Python爬蟲HTTP
Python 爬蟲 (六) -- Scrapy 框架學習
2017-08-28
Python爬蟲框架
python爬蟲—學習筆記-4
2024-04-23
Python爬蟲筆記
python爬蟲—學習筆記-2
2024-04-10
Python爬蟲筆記
【Python學習筆記1】Python網路爬蟲初體驗
2018-10-28
Python筆記爬蟲
Python模組學習：urllib
2015-05-22
Python
Python爬蟲學習（6）: 爬取MM圖片
2016-10-21
Python爬蟲
學習C語言還是學習Python爬蟲?
2020-11-23
C語言Python爬蟲
Python爬蟲之路-chrome在爬蟲中的使用
2021-01-04
Python爬蟲Chrome

Python爬蟲學習（1）： urllib的使用

1.urllib.urlopen

2.urllib.urlretrieve

4.urllib.quote(url)和urllib.unquote(url)，urllib.unquote(url)和urllib.unquote_plus(url)

5.urllib.urlencode(query)

相關文章