scrapy 採集常用的Pipeline(輸出檔案、圖片下載)

cms5發表於2020-09-28

scrapy 採集常用的pipeline,備份一下

import json
import pymysql
from scrapy import Request
from twisted.enterprise import adbapi
from scrapy.exceptions import DropItem
from scrapy.pipelines.images import ImagesPipeline
class YwnamePipeline:
    def process_item(self, item, spider):
        return item
class myEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, bytes):
            return str(obj,encoding='utf-8')
        return json.JSONEncoder.default(self, obj)
class FilePipeline(object):
    def __init__(self):
        self.f = open("ywname.json", "w",encoding='utf-8')
        self.f.write("[")
    def process_item(self, item, spider):
        # dict 列表轉成字典,再轉成json
        text = json.dumps(dict(item), ensure_ascii=False, cls=myEncoder) + ",\n"
        # text = json.dumps(dict(item), ensure_ascii=False) + "\n"
        self.f.write(text)
        return item
    def close_spider(self, spider):
        self.f.write("]")
        self.f.close()
 class DownimagesPipeline(ImagesPipeline):
    def file_path(self, request, response=None, info=None):
        url = request.url
        file_name = "https://haotingde.com/"url.split('/')[-1]
        return file_name
    def item_completed(self, results, item, info):
        image_paths = [x['path'] for ok, x in results if ok]
        if not image_paths:
            raise DropItem('Image Downloaded Failed')
        return item
    def get_media_requests(self, item, info):
        yield Request(item['url'])

轉自: https://912616.com/app/python/307.html

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29699285/viewspace-2724859/,如需轉載,請註明出處,否則將追究法律責任。

相關文章