scrapy（2）

下雨天的眼睛發表於2024-05-22

原文網址 : https://www.cnblogs.com/tudoot/p/18206846

import requests
from lxml import etree

class Houst(object):
    def __init__(self):
        self.url = "https://yibin.lianjia.com/ershoufang/pg{}/"
        self.headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Edg/123.0.0.0"
        }

    def get_url_list(self):
        url_list = []
        for num in range(1, 11):
            url_list.append(self.url.format(num))
        return url_list

    def get_data_index(self, url):
        response = requests.get(url, headers=self.headers)
        response.encoding = "utf-8"
        if response.status_code == 200:
            return response.text
        else:
            return None

    def parse_data_index(self, response):
        html = etree.HTML(response)
        # 找ul標籤下面全部的li標籤
        data_list = html.xpath('//ul[@class="sellListContent"]//li[@class="clear LOGVIEWDATA LOGCLICKDATA"]')
        for data in data_list:
            title = data.xpath('./div[1]/div[1]/a/text()')[0]
            info = data.xpath("./div[1]/div[3]/div[1]/text()")[0]
            number = data.xpath("./div[1]/div[4]/text()")[0]
            # 抓詳情頁地址
            detail_url = data.xpath('./div[1]/div[1]/a/@href')[0]
            # 向詳情頁傳送請求，獲取資料
            detail_resp = self.get_data_index(detail_url)
            # 解析詳情頁資料
            detail_html = etree.HTML(detail_resp)
            data_time = detail_html.xpath('//*[@id="introduction"]/div/div/div[2]/div[2]/ul/li[1]/span[2]/text()')[0]
            print(title, info, number, data_time)
        print("*-*-*-*-"*20)

    def main(self):
        for url in self.get_url_list():
            response = self.get_data_index(url)
            self.parse_data_index(response)


if __name__ == '__main__':
    spider = Houst()
    spider.main()

爬蟲代理 Scrapy 框架詳細介紹 2
2020-06-04
爬蟲框架
Scrapy原始碼閱讀分析_2_啟動流程
2019-02-19
原始碼
Python Scrapy 爬蟲（二）：scrapy 初試
2018-08-13
Python爬蟲
Scrapy框架的使用之Scrapy入門
2018-05-02
框架
scrapy使用
2024-04-12
初始scrapy
2024-04-04
Scrapy框架
2023-03-29
框架
Scrapy框架的使用之Scrapy框架介紹
2018-05-02
框架
Scrapy框架的使用之Scrapy通用爬蟲
2018-05-21
框架爬蟲
Scrapy框架的使用之Scrapy對接Splash
2018-05-18
框架
scrapy之分散式爬蟲scrapy-redis
2020-12-24
分散式爬蟲Redis
scrapy入門
2018-12-13
Scrapy框架-Spider
2019-02-15
框架IDE
scrapy 基礎
2024-07-05
Scrapy-Redis
2024-07-05
Redis
Scrapy框架的使用之Scrapy爬取新浪微博
2018-05-23
框架
scrapy新增新命令
2019-02-16
Scrapy框架簡介
2019-01-06
框架
Scrapy爬蟲-草稿
2018-09-08
爬蟲
scrapy中的selenium
2019-03-04
Scrapy爬蟲框架
2024-11-13
爬蟲框架
Scrapy 對接 Docker
2018-04-18
Docker
Scrapy框架-通過scrapy_splash解析動態渲染的資料
2018-07-13
框架
scrapy和scrapy-redis有什麼區別?Python基礎教程
2021-08-18
RedisPython
爬蟲（9） - Scrapy框架(1) | Scrapy 非同步網路爬蟲框架
2022-07-05
爬蟲框架非同步
Scrapy+Chromium+代理+selenium
2019-02-16
python爬蟲Scrapy框架
2018-11-21
Python爬蟲框架
安裝scrapy失敗
2019-01-21
scrapy爬蟲代理池
2018-08-28
爬蟲
爬蟲實戰scrapy
2018-03-11
爬蟲
如何匯入Scrapy框架
2023-11-23
框架
16--Scrapy02:管道
2024-04-17
Ubuntu 安裝 SCRAPY 方法
2019-11-08
Ubuntu
scrapy 爬取空值
2020-10-03
Python爬蟲—Scrapy框架
2020-10-04
Python爬蟲框架
scrapy基本原理
2024-08-06
【Python篇】scrapy爬蟲
2020-11-29
Python爬蟲
Scrapy框架中的Middleware擴充套件與Scrapy-Redis分散式爬蟲
2023-10-16
框架套件Redis分散式爬蟲

scrapy（2）

相關文章