爬蟲框架的功能組成是什麼

1.排程（scheduler）程式

爬蟲框架需要具備哪些功能。Scrapy，pyspider有http請求庫，html解析工具，資料庫儲存等，但其實最核心的是他們的排程（scheduler）程式：即如何讓你的請求，解析，儲存協同工作。

2.請求，解析，儲存

一個最小的爬蟲框架只需要一套排程程式就可以了，其他的請求，解析，儲存都可以作為框架的擴充套件來使用，比如：gaoxinge/spidery。另外既然一個最小的爬蟲框架只有一套排程程式，那麼它也可以用來做非爬蟲的工作。

3.例項

# -*- coding: utf-8 -*-
"""
url: 
fetch: requests
parse: lxml
presist: txt
"""
import requests
from lxml import etree
from spidery import Spider
 
spider = Spider(
    urls = ['' + str(i) + '&sort=votes' for i in range(1, 4)],
)
 
@spider.fetch
def fetch(url):
    response = requests.get(url)
    return response
 
@spider.parse
def parse(response):
    root = etree.HTML(response.text)
    results = root.xpath('//div[@class='question-summary']')
    for result in results:
        question = {}
        question['votes']   = result.xpath('div[@class='statscontainer']//strong/text()')[0]
        question['answers'] = result.xpath('div[@class='statscontainer']//strong/text()')[1]
        question['views']   = result.xpath('div[@class='statscontainer']/div[@class='views supernova']/text()')[0].strip()
        question['title']   = result.xpath('div[@class='summary']/h3/a/text()')[0]
        question['link']    = result.xpath('div[@class='summary']/h3/a/@href')[0]
        yield question, None
 
@spider.presist
def presist(item):
    f.write(str(item) + 'n')
 
f = open('stackoverflow.txt', 'wb')
spider.consume_all()
f.close()

以上就是爬蟲框架的功能組成介紹，在我們對其的一些用法進行了解後，就可以進行展開練習。建議建議結合代理ip的使用，如果大家想測試使用下，可以嘗試，免費測試包含各種類ip資源，無限呼叫IP量！更多常見問題解決：

推薦操作環境：windows7系統、Python 3.9.1，DELL G3電腦。

爬蟲框架的功能組成是什麼

相關文章