Python 爬蟲實戰（二）：使用 requests-html

吳小龍同學發表於2018-03-14

原文網址 : https://juejin.im/post/5aa91dd35188255588051a63

Python 爬蟲實戰（一）：使用 requests 和 BeautifulSoup，我們使用了 requests 做網路請求，拿到網頁資料再用 BeautifulSoup 解析，就在前不久，requests 作者 kennethreitz 出了一個新庫 requests-html，Pythonic HTML Parsing for Humans™，它可以用於解析 HTML 文件的。requests-html 是基於現有的框架 PyQuery、Requests、lxml 等庫進行了二次封裝，更加方便開發者呼叫。

安裝

Mac：

pip3 install requests-html
複製程式碼

Windows：

pip install requests-html
複製程式碼

例項

程式碼擼多了，讓我們看會妹紙，爬的網站我選的是 http://www.win4000.com/zt/xinggan.html ，開啟網站，觀察到這是個列表，圖片是縮圖，要想儲存圖片到本地，當然需要高清大圖，因此得進入列表詳情，進一步解析，完整程式碼如下：

from requests_html import HTMLSession
import requests
import time

session = HTMLSession()


# 解析圖片列表
def get_girl_list():
    # 返回一個 response 物件
    response = session.get('http://www.win4000.com/zt/xinggan.html')  # 單位秒數

    content = response.html.find('div.Left_bar', first=True)

    li_list = content.find('li')

    for li in li_list:
        url = li.find('a', first=True).attrs['href']
        get_girl_detail(url)


# 解析圖片詳細
def get_girl_detail(url):
    # 返回一個 response 物件
    response = session.get(url)  # 單位秒數
    content = response.html.find('div.scroll-img-cont', first=True)
    li_list = content.find('li')
    for li in li_list:
        img_url = li.find('img', first=True).attrs['data-original']
        img_url = img_url[0:img_url.find('_')] + '.jpg'
        print(img_url + '.jpg')
        save_image(img_url)


# 保持大圖
def save_image(img_url):
    img_response = requests.get(img_url)
    t = int(round(time.time() * 1000))  # 毫秒級時間戳
    f = open('/Users/wuxiaolong/Desktop/Girl/%d.jpg' % t, 'ab')  # 儲存圖片，多媒體檔案需要引數b（二進位制檔案）
    f.write(img_response.content)  # 多媒體儲存content
    f.close()


if __name__ == '__main__':
    get_girl_list()

複製程式碼

程式碼就這麼多，是不是感覺很簡單啊。

說明：

1、requests-html 與 BeautifulSoup 不同，可以直接通過標籤來 find，一般如下：標籤標籤.someClass 標籤#someID 標籤[target=_blank] 引數 first 是 True，表示只返回 Element 找到的第一個，更多使用：http://html.python-requests.org/ ；

2、這裡儲存本地路徑 /Users/wuxiaolong/Desktop/Girl/我寫死了，需要讀者改成自己的，如果直接是檔名，儲存路徑將是專案目錄下。

遺留問題

示例所爬網站是分頁的，沒有做，可以定時迴圈來爬妹紙哦，有興趣的讀者自己玩下。

參考

requests-html

今天用了一下Requests-HTML庫（Python爬蟲）

公眾號

我的公眾號：吳小龍同學，歡迎交流～

Python 爬蟲實戰
2023-10-16
Python爬蟲
python爬蟲實戰教程-Python爬蟲開發實戰教程（微課版）
2020-11-11
Python爬蟲
python爬蟲實戰，爬蟲之路，永無止境
2022-01-27
Python爬蟲
python爬蟲實戰：爬取西刺代理的代理ip（二）
2019-02-16
Python爬蟲
python3網路爬蟲開發實戰_Python3 爬蟲實戰
2022-01-24
Python爬蟲
Python網路爬蟲實戰
2022-03-18
Python爬蟲
python 爬蟲實戰的原理
2021-10-29
Python爬蟲
Python爬蟲實戰之bilibili
2021-04-04
Python爬蟲
python爬蟲-33個Python爬蟲專案實戰(推薦)
2020-10-28
Python爬蟲
Python爬蟲實戰之（二）| 尋找你的招聘資訊
2018-04-28
Python爬蟲
Python【爬蟲實戰】提取資料
2020-11-17
Python爬蟲
python網路爬蟲應用_python網路爬蟲應用實戰
2020-12-29
Python爬蟲
python3 爬蟲實戰：為爬蟲新增 GUI 影象介面
2020-03-06
Python爬蟲GUI
不踩坑的Python爬蟲：Python爬蟲開發與專案實戰，從爬蟲入門 Python
2021-12-17
Python爬蟲
Python爬蟲 ---scrapy框架初探及實戰
2020-04-16
Python爬蟲框架
Python爬蟲實戰之叩富網
2021-04-04
Python爬蟲
《Python3網路爬蟲開發實戰》教程||爬蟲教程
2018-11-13
Python爬蟲
網路爬蟲——爬蟲實戰（一）
2022-01-29
爬蟲
Python爬蟲實戰詳解：爬取圖片之家
2020-11-04
Python爬蟲
python爬蟲實操專案_Python爬蟲開發與專案實戰 1.6 小結
2021-02-04
Python爬蟲
我的爬蟲入門書 —— 《Python3網路爬蟲開發實戰（第二版）》
2022-02-27
爬蟲Python
Python網路爬蟲實戰專案大全 32個Python爬蟲專案demo
2019-04-24
Python爬蟲
Python3網路爬蟲開發實戰（第二版）
2022-01-15
Python爬蟲
2個月精通Python爬蟲——3大爬蟲框架+6場實戰+反爬蟲技巧+分散式爬蟲
2018-06-28
Python爬蟲框架分散式
[Python3網路爬蟲開發實戰] 分散式爬蟲原理
2019-12-08
Python爬蟲分散式
爬蟲實戰scrapy
2018-03-11
爬蟲
python簡單爬蟲(二)
2018-04-18
Python爬蟲
[Python3網路爬蟲開發實戰] Charles 的使用
2019-12-08
Python爬蟲
[Python3網路爬蟲開發實戰] --Splash的使用
2019-06-10
Python爬蟲
Python 實戰:用 Scrapyd 打造爬蟲控制檯
2018-10-30
Python爬蟲
乾貨分享！Python網路爬蟲實戰
2020-08-07
Python爬蟲
Python網路爬蟲實戰小專案
2021-04-12
Python爬蟲
Python爬蟲實戰之蘿蔔投研
2021-04-04
Python爬蟲
Python 3網路爬蟲開發實戰
2021-04-28
Python爬蟲
Python網路爬蟲實戰專案大全！
2020-12-19
Python爬蟲
Python爬蟲實戰：爬取淘寶的商品資訊
2021-09-11
Python爬蟲
「docker實戰篇」python的docker爬蟲技術-python
2021-09-09
DockerPython爬蟲
【Python爬蟲實戰】使用Selenium爬取QQ音樂歌曲及評論資訊
2021-03-24
Python爬蟲

Python 爬蟲實戰（二）：使用 requests-html

安裝

例項

遺留問題

參考

公眾號

相關文章