2.python爬蟲基礎——Urllib庫

wsc449發表於2018-02-07

原文網址 : https://flycode.co/archives/168669

#python中Urllib庫實戰
#系統學習urllib模組，從urllib基礎開始。學習urlretrieve()，urlcleanup(),info(),getcode(),geturl()
import urllib.request
#urlretrieve() 直接將一個網頁爬到本地
urllib.request.urlretrieve("http://www.hellobi.com",filename="/Users/xubin/myapp/pythonfile/urlretrieve.html")

#urlcleanup() 將urlretrieve產生的快取，清空
urllib.request.urlcleanup()

#info()  將一些基礎的環境資訊展示粗來
file=urllib.request.urlopen("http://www.hellobi.com")
print(file.info())

#getcode() 獲取訪問url的狀態碼，返貨200，
print(file.getcode())

#geturl()  獲取爬取得網址
print(file.geturl())

#超時設定
#爬取一個網頁，需要時間。訪問網頁，網頁長時間未響應，系統判斷網頁超時了，無法開啟網頁。
#伺服器反應快設定2秒沒反應未超時，如果伺服器反應慢設定100秒沒反應未超時，timeout超時時間為2 100
file=urllib.request.urlopen("http://www.hellobi.com",timeout=1)

for i in range(0,10):
    try:
        file=urllib.request.urlopen("http://yum.iqianyue.com",timeout=0.1)
        data=file.read()
        print(len(data))
    except Exception as e:
        print("出現異常："+str(e))

#自動模擬http請求
#客戶端如果要與伺服器端進行通訊，需要通過http請求進行，http請求有很多種
#主要涉及post，get兩種方式，比如登入，搜尋某些資訊的時候會用到
#一般登入某個網站的時候，需要post請求
#一般搜尋某些資訊的時候，需要get請求

#在百度上搜尋關鍵詞，用python實現，需要用到請求，get  get請求URL中有？
#https://www.baidu.com/s?wd=python
import urllib.request
import re
keywd="徐彬"
keywd=urllib.request.quote(keywd)
url="http://www.baidu.com/s?wd="+keywd    #注意不能用https
req=urllib.request.Request(url)
data=urllib.request.urlopen(req).read()
fh=open("/Users/xubin/myapp/pythonfile/百度python.html","wb")
fh.write(data)
fh.close()

#post請求  比如需要登入使用者  需要提交post請求
#http://passport.csdn.net/account/login    使用者名稱：username  密碼：password
import urllib.request
import urllib.parse
url="https://passport.csdn.net/account/login"
mydata=urllib.parse.urlencode({"username":"bingoxubin","password":"19900127LLBingo"}).encode("utf-8")
req=urllib.request.Request(url,mydata)
data=urllib.request.urlopen(req).read()
fh=open("/Users/xubin/myapp/pythonfile/csdn登入介面.html","wb")
fh.write(data)
fh.close()


```
#爬取oa上的所有照片，存到OA照片.docx中  #遇到問題，目前所學，只能爬取單頁的內容
import re
import urllib.request

data=urllib.request.urlopen("oa.epoint.com.cn").read()
data=data.decode("utf-8")
pat=""
mydata=re.compile(pat).findall(data)
fh=open("/Users/xubin/myapp/pythonfile/OA照片.docx","w")
for i in range(0,len(mydata)):
    fh.write(mydata[i]+"
")
fh.close()
```

python爬蟲基礎之urllib
2020-11-26
Python爬蟲
python爬蟲常用庫之urllib詳解
2018-03-11
Python爬蟲
【0基礎學爬蟲】爬蟲基礎之網路請求庫的使用
2023-03-26
爬蟲
Python爬蟲進階之urllib庫使用方法
2021-09-11
Python爬蟲
爬蟲基礎
2019-03-30
爬蟲
爬蟲-urllib模組的使用
2021-01-14
爬蟲
爬蟲基本原理及urllib庫的基本使用
2019-03-21
爬蟲
爬蟲基礎---1
2019-01-06
爬蟲
Python：基礎&爬蟲
2023-10-29
Python爬蟲
爬蟲基礎篇
2020-07-31
爬蟲
【0基礎學爬蟲】爬蟲基礎之資料儲存
2023-04-14
爬蟲
【0基礎學爬蟲】爬蟲基礎之檔案儲存
2023-04-07
爬蟲
爬蟲中網路請求的那些事之urllib庫
2022-03-19
爬蟲
python爬蟲基礎概念
2020-05-11
Python爬蟲
python_爬蟲基礎
2024-07-30
Python爬蟲
爬蟲基礎知識
2023-03-15
爬蟲
Python 爬蟲十六式 - 第二式： urllib 與 urllib3
2019-01-07
Python爬蟲
爬蟲-urllib3模組的使用
2021-01-15
爬蟲
Python分散式爬蟲(三) - 爬蟲基礎知識
2019-03-21
Python分散式爬蟲
爬蟲（1） - 爬蟲基礎入門理論篇
2022-06-30
爬蟲
Python爬蟲之路-爬蟲基礎知識(理論)
2021-01-04
Python爬蟲
【0基礎學爬蟲】爬蟲基礎之自動化工具 Pyppeteer 的使用
2023-05-15
爬蟲
【0基礎學爬蟲】爬蟲基礎之自動化工具 Playwright 的使用
2023-04-28
爬蟲
【0基礎學爬蟲】爬蟲基礎之自動化工具 Selenium 的使用
2023-04-21
爬蟲
基礎爬蟲案例實戰
2024-05-24
爬蟲
爬蟲入門基礎-Python
2020-05-09
爬蟲Python
Python爬蟲基礎之selenium
2022-07-13
Python爬蟲
python網路爬蟲（9）構建基礎爬蟲思路
2019-06-09
Python爬蟲
JB的Python之旅-爬蟲篇--urllib和Beautiful Soup
2018-05-15
Python爬蟲
python urllib 基礎之 3
2024-07-10
Python
python urllib 基礎 get ajax
2024-07-10
Python
python爬蟲基礎與http協議
2019-03-25
Python爬蟲HTTP協議
Python爬蟲基礎-01-帶有請求引數的爬蟲
2018-06-06
Python爬蟲
[Python3網路爬蟲開發實戰] 2-爬蟲基礎 2-網頁基礎
2018-03-08
Python爬蟲網頁
Python爬蟲之Scrapy學習（基礎篇）
2019-03-04
Python爬蟲
學爬蟲，我需要掌握哪些Python基礎？
2018-08-21
爬蟲Python
Python 爬蟲零基礎教程(1)：爬單個圖片
2024-03-13
Python爬蟲
urllib庫
2018-06-02
python爬蟲學習手冊-伺服器渲染（基礎庫pycurl）瞭解
2018-12-03
Python爬蟲伺服器

2.python爬蟲基礎——Urllib庫

相關文章