python網路爬蟲--專案實戰--scrapy嵌入selenium，晶片廠級聯評論爬取（6）

太原浪子發表於2020-10-23

原文網址 : https://blog.csdn.net/u010671028/article/details/109234797

Python爬蟲晶片

一、目標

爬取晶片廠電影級聯頁面的評論

二、分析

2.1 網頁分析

經過研究發現，該網頁的評論是動態載入的。故我們本次採用selenium來解決。本次只拿資料不進行儲存。

三、完整程式碼

xpc.py

import scrapy


class XpcSpider(scrapy.Spider):
    name = 'xpc'
    allowed_domains = ['www.xinpianchang.com']
    start_urls = ['https://www.xinpianchang.com/a10975710?from=ArticleList']

    def parse(self, response):
        results = response.xpath("//ul[contains(@class, 'comment-list')]/li/div/div/i[@class='text']/text()").extract()
        print(results)

middlewares.py

該py檔案中只需要改 process_request函式即可

class ScrapyadvancedDownloaderMiddleware:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the downloader middleware does not modify the
    # passed objects.

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        return s

    def process_request(self, request, spider):
        # Called for each request that goes through the downloader
        # middleware.

        # Must either:
        # - return None: continue processing this request
        # - or return a Response object
        # - or return a Request object
        # - or raise IgnoreRequest: process_exception() methods of
        #   installed downloader middleware will be called

        if isinstance(spider, XpcSpider):
            # 在這可以很方便的新增 隨機UA，Cookie，Proxy
            print("切點我來了", request.url)

            # if isinstance(spider, XpcSpider):
            # 呼叫谷歌瀏覽器進行請求
            driver = WebDriver()
            driver.get(request.url)
            sleep(2)
            # 獲取請求的內容
            content = driver.page_source

            # 使用請求內容構造Response
            response = HtmlResponse(request.url, body=content.encode("utf-8"))
            return response
        # return None

【Python爬蟲實戰】使用Selenium爬取QQ音樂歌曲及評論資訊
2021-03-24
Python爬蟲
Python網路爬蟲實戰專案大全 32個Python爬蟲專案demo
2019-04-24
Python爬蟲
Python網路爬蟲實戰小專案
2021-04-12
Python爬蟲
Python網路爬蟲實戰專案大全！
2020-12-19
Python爬蟲
精通Scrapy網路爬蟲【一】第一個爬蟲專案
2021-06-19
爬蟲
python實現selenium網路爬蟲
2021-03-11
Python爬蟲
網路爬蟲——專案實戰（爬取糗事百科所有文章）
2020-02-07
爬蟲
【爬蟲】專案篇-使用selenium爬取大魚潮汐網
2024-04-05
爬蟲
Python學習筆記——爬蟲之Scrapy專案實戰
2018-09-03
Python筆記爬蟲
python網路爬蟲--爬取淘寶聯盟
2018-07-17
Python爬蟲
python網路爬蟲（14）使用Scrapy搭建爬蟲框架
2019-07-27
Python爬蟲框架
網路爬蟲（python專案）
2018-12-04
爬蟲Python
專案－－python網路爬蟲
2020-08-15
Python爬蟲
爬蟲實戰scrapy
2018-03-11
爬蟲
Python網路爬蟲實戰
2022-03-18
Python爬蟲
Python爬蟲教程-31-建立 Scrapy 爬蟲框架專案
2018-09-04
Python爬蟲框架
python網路爬蟲應用_python網路爬蟲應用實戰
2020-12-29
Python爬蟲
python爬蟲-33個Python爬蟲專案實戰(推薦)
2020-10-28
Python爬蟲
網路爬蟲——爬蟲實戰（一）
2022-01-29
爬蟲
利用scrapy建立初始Python爬蟲專案
2018-03-04
Python爬蟲
Python爬蟲 ---scrapy框架初探及實戰
2020-04-16
Python爬蟲框架
2019最新《網路爬蟲JAVA專案實戰》
2019-05-09
爬蟲Java
最新《30小時搞定Python網路爬蟲專案實戰》
2020-02-18
Python爬蟲
Python網路爬蟲4 - scrapy入門
2018-05-29
Python爬蟲
Python靜態網頁爬蟲專案實戰
2020-05-01
Python網頁爬蟲
爬蟲（9） - Scrapy框架(1) | Scrapy 非同步網路爬蟲框架
2022-07-05
爬蟲框架非同步
python3網路爬蟲開發實戰_Python3 爬蟲實戰
2022-01-24
Python爬蟲
python爬蟲實操專案_Python爬蟲開發與專案實戰 1.6 小結
2021-02-04
Python爬蟲
視訊教程-Python網路爬蟲開發與專案實戰-Python
2020-05-28
Python爬蟲
scrapy實戰專案（簡單的爬取知乎專案）
2018-05-17
網路爬蟲專案
2022-01-29
爬蟲
Python網路爬蟲 - Phantomjs, selenium/Chromedirver使用
2019-01-22
Python爬蟲JSChrome
精通 Python 網路爬蟲：核心技術、框架與專案實戰
2018-11-06
Python爬蟲框架
[Python3網路爬蟲開發實戰] 7-動態渲染頁面爬取-4-使用Selenium爬取淘寶商品
2018-03-30
Python爬蟲
《Python3網路爬蟲開發實戰》教程||爬蟲教程
2018-11-13
Python爬蟲
不踩坑的Python爬蟲：Python爬蟲開發與專案實戰，從爬蟲入門 Python
2021-12-17
Python爬蟲
python3網路爬蟲開發實戰_Python 3開發網路爬蟲(一)
2020-12-07
Python爬蟲
python 爬蟲實戰專案--爬取京東商品資訊（價格、優惠、排名、好評率等）
2018-06-27
Python爬蟲

python網路爬蟲--專案實戰--scrapy嵌入selenium，晶片廠級聯評論爬取（6）

一、目標

二、分析

2.1 網頁分析

三、完整程式碼

相關文章