小福利，用gevent多協程高效爬取海量資料

littlespider889發表於2020-10-18

原文網址 : https://blog.csdn.net/littlespider889/article/details/109148067

大家好，我是天空之城，今天給大家帶來小福利，用gevent多協程高效爬取海量資料
話不多說，程式碼如下

from gevent import monkey
monkey.patch_all()
import gevent,time,requests
from bs4 import BeautifulSoup
from gevent.queue import Queue
start = time.time()

header = {
      'Referer': 'https://movie.douban.com/top250?start=1&filter=',
      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:46.0) Gecko/20100101 Firefox/46.0'}

url_list =[]
for i in range(0,225,25):
    url='https://movie.douban.com/top250?start={}&filter='.format(str(i))
    url_list.append(url)

work=Queue()
for url in url_list:
    work.put_nowait(url)

def crawler():
    while not work.empty():
        url = work.get_nowait()
        res = requests.get(url,headers=header)
        film = res.text
    #這裡的res.text就是獲取到的整個網頁的所有原始碼了，下面利用 'html.parser'模組進行網頁資料的解析
        soup = BeautifulSoup(film, 'html.parser')
    #首先獲取到了所有li標籤下面的<‘div’,class_="item">標籤，構成一個大的列表
        items = soup.find_all("div",class_="item")
    #對列表進行遍歷，獲取每一部電影的相關資訊
        for item in items:
            xuhao=item.find('em').text #序號
            title=item.find(class_="title").text #電影名稱
            pingfen=item.find(class_="rating_num").text #評分
            comment=item.find(class_="inq") #評論
            if comment==None:
                comment=''
            else:
                comment = item.find(class_="inq").text  # 評論

            link=item.find('a')['href'] #網址
            #列印一下我們獲得的資訊
            print(xuhao,title,pingfen,comment,link)


task_list=[]
for x in range(5):
    task=gevent.spawn(crawler)
    task_list.append(task)
gevent.joinall(task_list)

獲取資料截圖如下
在這裡插入圖片描述

python中gevent協程庫
2018-08-24
Python
怎樣高效的爬取資料？
2023-02-07
如何保障爬蟲高效穩定爬取資料？
2022-05-27
爬蟲
如何高效獲取大資料?動態ip代理：用爬蟲!
2019-01-24
大資料爬蟲
python-多工，簡易的協程gevent的安裝與使用例程
2020-12-17
Python
使用socket+gevent實現協程併發
2018-08-24
新手小白的爬蟲神器-無程式碼高效爬取資料
2021-01-01
爬蟲
Scrapy框架爬取海量妹子圖
2018-08-30
框架
「無程式碼」高效的爬取網頁資料神器
2021-10-18
網頁
房產資料爬取、智慧財產權資料爬取、企業工商資料爬取、抖音直播間資料python爬蟲爬取
2024-07-11
Python爬蟲
某網站加密返回資料加密_爬取過程
2024-06-08
網站加密
HBase海量資料高效入倉解決方案
2022-03-15
Python爬蟲之小說資訊爬取與資料視覺化分析
2021-01-09
Python爬蟲視覺化
50億海量資料如何高效儲存和分析？
2022-12-01
用PYTHON爬蟲簡單爬取網路小說
2021-09-11
Python爬蟲
python多執行緒非同步爬蟲-Python非同步爬蟲試驗[Celery,gevent,requests]
2020-11-11
Python執行緒非同步爬蟲
如何使用python多執行緒有效爬取大量資料？
2021-09-11
Python執行緒
Go使用協程批次獲取資料，加快介面返回速度
2023-02-10
Go
使用selenium進行爬取掘金前端小冊的資料
2019-08-13
前端
啟用海量資料價值，實現生產過程最佳化
2022-12-22
段友福利：Python爬取段友之家貼吧圖片和小視訊
2018-06-01
Python
用Jupyter—Notebook爬取網頁資料例項14
2020-12-01
網頁
用Jupyter—Notebook爬取網頁資料例項12
2020-12-01
網頁
爬蟲爬取資料如何繞開限制？
2022-06-10
爬蟲
Python：爬取疫情每日資料
2020-02-17
Python
Puppeteer爬取網頁資料
2019-03-22
網頁
同花順資料爬取
2024-06-27
爬蟲爬取微信小程式
2019-02-16
爬蟲微信小程式
用xpath、bs4、re爬取B站python資料
2018-08-07
Python
基於多執行緒+協程的非同步增量式爬蟲
2024-05-12
執行緒非同步爬蟲
Puppeteer 爬取豆瓣小組公開資訊
2020-05-21
【小專案】爬取上海票據交易所資料並寫入資料庫
2020-12-17
資料庫
python爬蟲58同城（多個資訊一次爬取）
2018-11-04
Python爬蟲
爬蟲實戰——58同城租房資料爬取
2019-12-04
爬蟲
學校課程表爬取
2021-01-04
python爬取股票資料並存到資料庫
2021-03-29
Python資料庫
Python 爬取 baidu 股票市值資料
2019-02-16
PythonAI
鬥魚彈幕資料爬取
2018-12-08

小福利，用gevent多協程高效爬取海量資料

相關文章