Python：requests模組

爾玉先生發表於2020-10-18

原文網址 : https://blog.csdn.net/weixin_44330955/article/details/109149415

$[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片儲存下來直接上傳(img-Wn0zGbep-1603023348297)(F:\JianShu_material\Python\圖片\requests模組\requests模組.png)]$

1. 爬蟲概述

爬蟲是通過編寫程式，模擬瀏覽器上網，讓後讓其在網際網路上抓取資料的過程

爬蟲的分類：

通用爬蟲：抓取系統的重要組成部分，抓取的是一整張頁面資料
聚焦爬蟲：是建立在通用爬蟲的基礎之上，抓取的是頁面中特定的區域性內容
增量式爬蟲：檢測網站中資料更新的情況，只會抓取網站中最新更新出來的資料

反爬機制：入口網站，可以通過制定相應的策略或技術手段，防止爬蟲程式進行網站資料的爬取

反反爬策略：爬蟲程式可以通過制定相關的策略或者技術手段，破解入口網站中具備的反爬機制，從而獲取入口網站資訊

robots.txt協議：君子協議，規定了網站中哪些資料可以被爬蟲爬取，哪些資料不可以被爬取

http協議：是伺服器和客戶端進行資料互動的一種形式，https是安全的超文字傳輸協議，加密方式為證照密匙加密

常用請求頭資訊：

User-Agent：請求載體的身份標識
Connection：請求完畢後，是斷開連線還是保持連線

常用響應頭資訊：

Content-Type：伺服器響應回客戶端的資料型別

2. requests模組

2.1 requests模組概述

requests模組是python中一款基於網路請求的模組，功能強大，簡單便捷，效率極高

requests的作用是模擬瀏覽器發請求

如何使用：（requests模組的編碼流程）

請求url（網址）
發起請求
獲取響應資料
持久化儲存

2.2 使用方法

案例一：爬取搜狗首頁的頁面資料

#需求：爬取搜狗首頁的頁面資料

import requests

#1.指定url
url = 'https://www.sogou.com/'
#2.發起請求
#get方法會返回一個響應物件
response = requests.get(url=url)
#3.獲取響應資料，.text返回的是字串形式的響應資料
page_text = response.text
print(page_text)
#4.持久化儲存
with open('./sogou.html','w',encoding='utf-8') as fp:
    fp.write(page_text)

print('爬取資料結束')

案例二：網頁採集器

UA：User-Agent（請求載體的身份標識）

UA檢測：入口網站的伺服器會檢測對應請求的載體身份標識、如果檢測到請求的載體身份標識為某一款瀏覽，說明該請求是一個正常的請求。但是，如果檢測到請求的載體身份標識不是某一款瀏覽器的，則表示該請求為不正常的請求（爬蟲），則伺服器端成很有可能拒絕該次請求。

UA偽裝：為了應對UA檢測，要讓爬蟲對應的請求載體身份標識偽裝成某一款瀏覽器。

#案例二：網頁採集器

import requests

#UA偽裝：將對應的User-Agent封裝到一個字典裡
header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

url = 'https://www.baidu.com/'

#處理url攜帶的引數：封裝到字典裡
kw = input('enter a word：')
param = {'query':kw}

#對指定的url發起請求對應的url是攜帶引數的，並在請求過程中處理引數
#url是指定網頁位置
#params是指定搜尋關鍵值
#headers是UA偽裝
response = requests.get(url=url,params=param,headers=header)

page_text = response.text
fileName = kw + '.html'
with open(fileName,'w',encoding='utf-8') as fp:
    fp.write(page_text)

print(fileName,'儲存成功！！！')

案例三：破解百度翻譯

#案例三：破解百度翻譯
import requests
import json

#指定url
post_url = 'https://fanyi.baidu.com/langdetect'

#進行UA偽裝
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

#post請求引數處理（同get請求一致）
word = input('enter a word:')
data = {'kw':word}

#請求傳送
#post是請求資料，get是傳送資料
response = requests.post(url=post_url,data=data,headers=headers)

#獲取響應資料(json方法返回的是物件，如果確認響應資料是json型別，才可以使用json)
dic_obj = response.json()

#持久化儲存
fileName = word + '.json'
fp = open(fileName,'w',encoding='utf-8')
json.dump(dic_obj,fp=fp,ensure_ascii=False)

print('over!!!')

案例四：爬取豆瓣電影資料

#*案例三四：爬取豆瓣電影資料
import requests
import json

url = 'https://movie.douban.com/j/subject_abstract?subject_id=1292720'
param = {
    'type':'movie',
    'tag':'經典',
    'sort':'recommend',
    'page_limit':'0', #從庫中第幾部電影中去取
    'page_start':'20', #一次取出的個數
}

#進行UA偽裝
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

response = requests.get(url=url,params=param,headers=headers)

list_data = response.json()
fp = open('./douban.json','w',encoding='utf-8')
json.dump(list_data,fp=fp,ensure_ascii=False)

print('over!!!')

python爬蟲requests模組
2019-03-01
Python爬蟲
requests模組
2024-11-01
爬蟲——Requests模組
2019-01-13
爬蟲
爬蟲-Requests模組
2022-03-03
爬蟲
requests模組獲取cookie
2018-08-15
Cookie
requests模組 - get 請求
2024-10-13
requests 模組 - post 請求
2024-10-13
Python2、3並存，Python3無法安裝requests模組
2020-09-28
Python
介面自動化Python3_requests之使用xlrd讀取excel模組
2020-10-24
PythonExcel
python 模組：itsdangerous 模組
2020-02-16
Python
Python模組：time模組
2021-09-09
Python
09 第三方模組 pyinstaller requests
2024-09-28
Python "爬蟲"出發前的裝備之二資料先行（ Requests 模組）
2022-03-03
Python爬蟲
[實戰演練]python3使用requests模組爬取頁面內容
2021-09-09
Python
入門學Python一定要知道的requests模組安裝及使用
2021-09-09
Python
Python模組之urllib模組
2020-10-30
Python
python模組之collections模組
2019-01-04
Python
Python 模組
2021-11-23
Python
python爬蟲:爬蟲的簡單介紹及requests模組的簡單使用
2022-02-24
Python爬蟲
[Python模組學習] glob模組
2018-05-26
Python
Python中模組是什麼？Python有哪些模組?
2021-09-15
Python
Python Execl模組
2019-02-16
Python
Python mongoHelper模組
2018-11-20
PythonGo
Python——JSON 模組
2019-01-19
PythonJSON
[Python] pipe模組
2024-05-23
Python
Python - 模組包
2024-04-30
Python
python——typing模組
2024-04-16
Python
Python functools 模組
2020-05-23
Python
Python pymsql模組
2020-09-04
PythonSQL
Python模組reload
2019-01-30
Python
python之模組
2024-11-12
Python
15 Python模組
2024-09-06
Python
python–inspect模組
2018-04-18
Python
python random模組
2018-03-28
Pythonrandom
python Subprocess 模組
2024-07-24
Python
Python：pathlib模組
2022-02-18
Python
python APScheduler模組
2021-12-04
Python
Python webargs 模組
2022-01-21
PythonWeb

Python：requests模組

1. 爬蟲概述

2. requests模組

2.1 requests模組概述

2.2 使用方法

相關文章