scrapy使用

HelloJacker發表於2024-04-12

原文網址 : https://www.cnblogs.com/hellojacker/p/18119345

建立
cd xx
scrapy startproject <資料夾名> [dir]
cd 檔案目錄
scrapy genspider <檔名> <域名>
scrapy crawl <檔名>
spiders
定義的詳細爬取規則
items
爬取的資料結構
middlewares
中介軟體
pipelines
資料管道，負責持久儲存和清洗

：存取mongod

class MongoDBPipeline(object):
    def __init__(self):
        # 建立連結
        self.client = pymongo.MongoClient(host='localhost', port=27017)
        # 進入資料庫
        self.db = self.client["test"]
        # 進入集合
        self.col = self.db["j"]

    def process_item(self, item, spider):
        # 插入資料
        self.col.insert_one(dict(item))
        return item

    def close_spider(self, spider):
        self.client.close()

：儲存mysql

class BookschinaPipeline:
    def __init__(self):
        self.df = pd.DataFrame(columns=['name', 'price', 'author',
                                        'out_date', 'publisher'])
        # database 資料庫名
        self.conn = pymysql.Connect(
            host='localhost',
            port=3306,
            user='root',
            password='',
            database='spiders',
            charset='utf8mb4',
            cursorclass=pymysql.cursors.DictCursor
        )
        self.cursor = self.conn.cursor()
        self.count = 1

    def process_item(self, item, spider):
        sql = """insert into bookschina_goods (name,price,author,out_date,publisher)
                                   values (%s, %s, %s, %s, %s)"""
        self.cursor.execute(sql, (
            item.get('name', ''),
            item.get('price', ''),
            item.get('author', ''),
            item.get('out_date', ''),
            item.get('publisher', '')
        ))
        self.conn.commit()
        return item

    def close_spider(self, spider):
        self.cursor.close()
        self.conn.close()

settings
配置檔案
shell命令
scrapy shell 網址

Scrapy框架的使用之Scrapy入門
2018-05-02
框架
Scrapy框架的使用之Scrapy框架介紹
2018-05-02
框架
Scrapy框架的使用之Scrapy通用爬蟲
2018-05-21
框架爬蟲
Scrapy框架的使用之Scrapy對接Splash
2018-05-18
框架
Scrapy框架的使用之Scrapy爬取新浪微博
2018-05-23
框架
Scrapy框架的使用之Scrapyrt的使用
2018-05-21
框架
Day4--Scrapy基本使用
2019-03-24
爬蟲框架-scrapy的使用
2021-04-28
爬蟲框架
Scrapy爬蟲框架的使用
2021-01-17
爬蟲框架
使用 Scrapy 爬取股票程式碼
2019-02-25
Scrapy基礎（二）：使用詳解
2018-12-12
scrapy 使用的基本流程和例子
2018-08-02
Scrapy框架的使用之Selector的用法
2019-03-04
框架
Scrapy框架的使用之Spider的用法
2018-05-07
框架IDE
使用Scrapy抓取新浪微博使用者資訊
2019-02-16
使用scrapy抓取Youtube播放列表資訊
2019-02-16
Scrapy框架的使用之Item Pipeline的用法
2018-05-14
框架
Scrapy框架的使用之Downloader Middleware的用法
2018-05-09
框架
Scrapy使用入門及爬蟲代理配置
2020-11-11
爬蟲
Python Scrapy 爬蟲（二）：scrapy 初試
2018-08-13
Python爬蟲
為什麼使用Scrapy框架來寫爬蟲？
2018-12-19
框架爬蟲
Python爬蟲 --- 2.3 Scrapy 框架的簡單使用
2018-12-19
Python爬蟲框架
Python爬蟲教程-33-scrapy shell 的使用
2018-09-06
Python爬蟲
scrapy（2）
2024-05-22
初始scrapy
2024-04-04
Scrapy框架
2023-03-29
框架
scrapy之分散式爬蟲scrapy-redis
2020-12-24
分散式爬蟲Redis
Scrapy使用隨機User-Agent爬取網站
2018-08-31
隨機網站
使用python的scrapy來編寫一個爬蟲
2019-03-14
Python爬蟲
Scrapy爬蟲框架如何使用代理進行採集
2022-02-22
爬蟲框架
scrapy-redis非多網址採集的使用
2021-01-29
Redis
scrapy入門
2018-12-13
Scrapy框架-Spider
2019-02-15
框架IDE
scrapy 基礎
2024-07-05
Scrapy-Redis
2024-07-05
Redis
python網路爬蟲（14）使用Scrapy搭建爬蟲框架
2019-07-27
Python爬蟲框架
使用Scrapy爬取圖片入庫,並儲存在本地
2019-06-27
scrapy新增新命令
2019-02-16

scrapy使用

相關文章