爬蟲練手：使用scrapy抓取噹噹網程式設計類圖書資訊，並儲存到MySQL

weixin_33860722發表於2016-12-28

原文網址 : https://blog.csdn.net/weixin_33860722/article/details/87392677

爬蟲程式設計MySql

爬取目標

噹噹網程式設計類圖書資訊，網址為：
http://category.dangdang.com/cp01.54.06.00.00.00.html

開發環境

python3.5/MySQL 5.6/scrapy 1.3
python 執行在windows上，MySQL執行在centos6.7

原始碼

items.py

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class Dangdang01Item(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    title = scrapy.Field()  # 標題
    link = scrapy.Field()  # 連結
    comment = scrapy.Field()  # 評論數

dangdang.py （爬蟲程式）

# -*- coding: utf-8 -*-
import scrapy
from dangdang01.items import Dangdang01Item
from scrapy.http import Request

class DangdangSpider(scrapy.Spider):
    name = "dangdang"
    allowed_domains = ["dangdang.com"]
    start_urls = [
        'http://category.dangdang.com/cp01.54.06.00.00.00-srsort_score_desc-f0%7C0%7C0%7C0%7C0%7C1%7C0%7C0%7C0%7C0%7C0%7C0%7C0-shlist.html']

    def parse(self, response):

        title = response.xpath("//p[@class='name']/a/text()").extract()
        link = response.xpath("//a[@class='pic']/@href").extract()
        comment = response.xpath("//a[@name='P_pl']/text()").extract()

        for i in range(0,len(title)):
            dd = Dangdang01Item()
            dd["title"] = title[i]
            dd["link"] = link[i]
            dd["comment"] = comment[i]
            # print(dd["title"])
            # print(dd["link"])
            # print(dd["comment"])
            yield dd
        for i in range(1, 101):
            url = "http://category.dangdang.com/pg" + str(
                i) + "-cp01.54.06.00.00.00-srsort_score_desc-f0%7C0%7C0%7C0%7C0%7C1%7C0%7C0%7C0%7C0%7C0%7C0%7C0-shlist.html"
            yield Request(url, callback=self.parse)

建立資料庫和表

Paste_Image.png

檢視許可權分配

Paste_Image.png

pipelines.py

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
import pymysql
import sys

class Dangdang01Pipeline(object):
    def __init__(self):
        # load(sys)
        self.conn = pymysql.connect(host="192.168.1.188",user="root",password="654321",db="dangdang")
        self.conn.set_charset("utf8")

    def process_item(self, item, spider):
        title = item["title"]
        link = item["link"]
        comment = item["comment"]

        print(item["title"])
        print(item["link"])
        print(item["comment"])
        # return item
        sql = "insert into book(title,link,commit) values ('" + title + "','" + link + "','" + comment + "');"
        print(sql)

        cursor = self.conn.cursor()
        try:
            cursor.execute(sql)
            self.conn.commit()
        except Exception as e:
            print(e)
            self.conn.rollback()
        # pass

    def close_spider(self):
        self.conn.close()

Scrapy爬蟲：實習僧網最新招聘資訊抓取
2021-09-09
爬蟲
爬蟲 Scrapy框架爬取圖蟲圖片並下載
2018-08-27
爬蟲框架
50行爬蟲?️抓取並處理圖靈書目
2019-02-25
爬蟲圖靈
一文解決scrapy帶案例爬取噹噹圖書
2021-06-04
Python 爬蟲（六）：使用 Scrapy 爬取去哪兒網景區資訊
2019-10-20
Python爬蟲
爬蟲雙色球所有的歷史資料並儲存到SQLite
2020-10-28
爬蟲SQLite
python 爬蟲 5i5j房屋資訊獲取並儲存到資料庫
2018-08-20
Python爬蟲資料庫
使用Scrapy爬取圖片入庫,並儲存在本地
2019-06-27
python入門012～使用requests爬取網路圖片並儲存到本地
2021-09-09
Python
Python網路爬蟲抓取動態網頁並將資料存入資料庫MYSQL
2019-01-04
Python爬蟲網頁資料庫MySql
Python爬蟲抓取股票資訊
2021-01-03
Python爬蟲
python網路爬蟲（14）使用Scrapy搭建爬蟲框架
2019-07-27
Python爬蟲框架
儲存資料到MySql資料庫——我用scrapy寫爬蟲（二）
2019-02-16
MySql資料庫爬蟲
爬蟲系列：使用 MySQL 儲存資料
2021-12-09
爬蟲MySql
網頁資料抓取之噹噹網
2020-12-21
網頁
爬蟲抓取網頁資料原理
2021-12-06
爬蟲網頁
爬蟲（9） - Scrapy框架(1) | Scrapy 非同步網路爬蟲框架
2022-07-05
爬蟲框架非同步
爬蟲app資訊抓取之apk反編譯抓取
2019-05-10
爬蟲APPAPK編譯
scrapy + mogoDB 網站爬蟲
2019-05-19
Go網站爬蟲
爬蟲框架-scrapy的使用
2021-04-28
爬蟲框架
Scrapy爬蟲框架的使用
2021-01-17
爬蟲框架
使用scrapy抓取Youtube播放列表資訊
2019-02-16
Scrapy框架的使用之Scrapy通用爬蟲
2018-05-21
框架爬蟲
Python爬蟲抓取知乎所有使用者資訊
2018-03-14
Python爬蟲
python爬蟲抓取哈爾濱天氣資訊（靜態爬蟲）
2020-04-05
Python爬蟲
使用Scrapy抓取新浪微博使用者資訊
2019-02-16
爬蟲--Scrapy簡易爬蟲
2020-10-07
爬蟲
Python爬蟲訓練：爬取酷燃網視訊資料
2020-10-23
Python爬蟲
scrapy爬取鏈家二手房存到mongo資料庫
2021-01-03
Go資料庫
爬蟲技術抓取網站資料方法
2021-09-11
爬蟲網站
Python爬蟲入門教程 33-100 《海王》評論資料抓取 scrapy
2019-02-14
Python爬蟲
Python第一個爬蟲，爬取噹噹網 Top 500 本五星好評書籍
2019-07-19
Python爬蟲
Scrapy爬蟲-草稿
2018-09-08
爬蟲
Scrapy爬蟲框架
2024-11-13
爬蟲框架
爬取微博圖片資料存到Mysql中遇到的各種坑mysql儲存圖片爬取微博圖片
2019-02-16
MySql
python爬蟲練習之爬取豆瓣讀書所有標籤下的書籍資訊
2018-07-23
Python爬蟲
使用Java將圖片生成sequence file並儲存到HBase
2020-08-13
Java
爬蟲原理與資料抓取
2020-12-17
爬蟲
Python爬蟲新手教程：手機APP資料抓取 pyspider
2019-07-20
Python爬蟲APPIDE

爬蟲練手：使用scrapy抓取噹噹網程式設計類圖書資訊，並儲存到MySQL

爬取目標

開發環境

原始碼

items.py

dangdang.py （爬蟲程式）

建立資料庫和表

檢視許可權分配

pipelines.py

相關文章