python爬去百度美女吧圖片

艾利金德發表於2018-04-01

原文網址 : https://juejin.im/post/5ac0cdf76fb9a028b617a718

# coding=utf-8
import requests
from lxml import etree
import os
import re


class TieBa(object):
    """抓取百度貼吧美女圖片"""
    def __init__(self, word):
        self.url = 'https://tieba.baidu.com/f?kw={}'.format(word)
        self.headers = {
            'User-Agent': 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0; TUCOWS) '
        }

    def get_data(self, url):
        # 構造請求
        response = requests.get(url, headers=self.headers)
        data = response.content
        # print(data)
        return data

    def parse_page(self, data):
        """解析資料"""
        # 建立xpath物件
        html = etree.HTML(data)
        # 提取當前頁標題，url資料
        node_list = html.xpath('//*[@id="thread_list"]/li/div/div[2]/div[1]/div[1]/a')
        detail_list = []
        for node in node_list:
            temp = dict()
            temp['title'] = node.xpath('./text()')[0]
            temp['url'] = 'https://tieba.baidu.com' + node.xpath('./@href')[0]
            detail_list.append(temp)
            # print(temp)
        # 提取下一頁連線
        next_url = html.xpath('//*[@id="frs_list_pager"]/a[contains(text(), "下一頁")]/@href')[0]
        next_url = 'http:' + next_url if len(next_url) > 0 else None
        # print(next_url)
        return detail_list, next_url

    def parse_detail(self, detail_list):
        """提取詳情頁url"""
        data_url = []
        for detail in detail_list:
            data_url.append(detail['url'])
        return data_url

    def save_data(self, url):
        """儲存資料"""
        # 請求標題連線地址
        data = self.get_data(url)
        # 建立xpath物件
        html = etree.HTML(data)
        # print(html)
        # print(url)
        # 獲取圖片url
        try:
            image_url = html.xpath('//*[contains(@id,"post_content")]/img[1]/@src')[0]
        except Exception as e:
            return
        print(image_url)
        # 判斷圖片地址是否已jpg結尾
        if re.match(r'.*\.jpg$', image_url):
            # 請求圖片地址，獲取圖片
            image_data = self.get_data(image_url)
            filename = 'image/' + image_url.split('/')[-1]
            # print(filename)
            # 儲存圖片
            with open(filename, 'wb') as f:
                f.write(image_data)

    def run(self):
        # 判斷是否有image資料夾
        if not os.path.exists('image'):
            # 建立資料夾
            os.mkdir('image')
        next_url = self.url
        # 請求美女吧首頁
        data = self.get_data(next_url)
        # 儲存首頁檔案，觀察資料，是否有需要的資料
        with open('tieba.json', 'wb') as f:
            f.write(data)
        # 如果有下一頁就執行
        while next_url:
            # 獲取每頁標題和對應的連線地址
            detail_list, next_url = self.parse_page(data)
            # 提取每頁的詳情頁的url
            data_url = self.parse_detail(detail_list)
            # 遍歷每個url
            for url in data_url:
                # 儲存圖片
                self.save_data(url)
            # 構造下一頁請求
            data = self.get_data(next_url)


if __name__ == '__main__':
    tb = TieBa('美女')
    tb.run()複製程式碼

python 爬蟲下載百度美女圖片
2024-04-18
Python爬蟲
Python爬蟲—爬取某網站圖片
2020-11-19
Python爬蟲網站
如何用Python爬蟲實現百度圖片自動下載？
2019-03-01
Python爬蟲
【python--爬蟲】千圖網高清背景圖片爬蟲
2019-05-21
Python爬蟲
Python爬蟲入門【5】：27270圖片爬取
2019-07-30
Python爬蟲
Python 實用爬蟲-04-使用 BeautifulSoup 去水印下載 CSDN 部落格圖片
2019-06-16
Python爬蟲
Python爬蟲實戰詳解：爬取圖片之家
2020-11-04
Python爬蟲
Python《必應bing桌面圖片爬取》
2020-12-26
Python
Python爬蟲新手教程：知乎文章圖片爬取器
2019-07-20
Python爬蟲
Python爬蟲遞迴呼叫爬取動漫美女圖片
2020-10-19
Python爬蟲遞迴
python 爬蟲之requests爬取頁面圖片的url，並將圖片下載到本地
2019-06-12
Python爬蟲
百度地圖POI爬蟲(Python3)
2018-09-07
地圖爬蟲Python
新手爬蟲教程：Python爬取知乎文章中的圖片
2019-01-17
爬蟲Python
Python 爬蟲零基礎教程(1)：爬單個圖片
2024-03-13
Python爬蟲
python爬蟲---網頁爬蟲，圖片爬蟲，文章爬蟲，Python爬蟲爬取新聞網站新聞
2019-01-04
Python爬蟲網頁網站
Python應用開發——爬取網頁圖片
2022-09-21
Python網頁
Python爬蟲入門【7】：蜂鳥網圖片爬取之二
2019-07-31
Python爬蟲
Python爬蟲入門【8】：蜂鳥網圖片爬取之三
2019-07-31
Python爬蟲
Python爬蟲入門【6】：蜂鳥網圖片爬取之一
2019-07-30
Python爬蟲
Python批量圖片去水印，提高工作效率
2021-05-26
Python
Java爬蟲批量爬取圖片
2021-09-24
Java爬蟲
Python資料爬蟲學習筆記（11）爬取千圖網圖片資料
2018-09-18
Python爬蟲筆記
Python爬取王者榮耀英雄皮膚高清圖片
2018-11-07
Python
利用Python爬取攝影網站圖片，切勿商用
2018-12-18
Python網站
使用Python爬蟲實現自動下載圖片
2021-09-11
Python爬蟲
Python網路爬蟲2 - 爬取新浪微博使用者圖片
2018-04-10
Python爬蟲
Python爬蟲入門【4】：美空網未登入圖片爬取
2019-07-30
Python爬蟲
爬蟲Selenium+PhantomJS爬取動態網站圖片資訊（Python）
2018-03-24
爬蟲JS網站Python
node：爬蟲爬取網頁圖片
2019-02-16
爬蟲網頁
爬蟲---xpath解析（爬取美女圖片）
2020-12-23
爬蟲
AotucCrawler 快速爬取圖片
2021-11-25
Python呼叫百度OCR介面圖片識別轉文字
2022-04-23
Python
爬蟲 Scrapy框架爬取圖蟲圖片並下載
2018-08-27
爬蟲框架
Python爬蟲入門教程 8-100 蜂鳥網圖片爬取之三
2018-12-20
Python爬蟲
Python3呼叫百度OCR圖片文字識別API
2020-08-20
PythonAPI
002.01 圖片去外框處理
2019-08-26
網路爬蟲---從千圖網爬取圖片到本地
2019-09-03
爬蟲
Python爬蟲入門教程 4-100 美空網未登入圖片爬取
2018-12-17
Python爬蟲

python爬去百度美女吧圖片

相關文章