nodejs爬蟲獲取漫威超級英雄電影海報

木子昭發表於2018-05-12

原文網址 : https://flycode.co/archives/168301

昨天去看了《復聯3》的首映,當我提前15分鐘進入影院的時候, 看到了粉絲們取票的長隊, 頓時有一種跨年夜的感覺…
最近看了node爬蟲的一些知識, 這裡用node爬取一下漫威官網的電影海報!

marvel

// https://marvel.com/movies/all
const request = require(`superagent`)
const cheerio = require(`cheerio`)
const fs = require(`fs-extra`)
const path = require(`path`)

let url = `https://marvel.com/movies/all`

// 獲取圖片url和圖片名字
async function getUrlAndName(){
    // 用於儲存返回值
    let imgAddrArray = []
    // 請求資源
    const res = await request.get(url)
    // 將獲取的html, 轉換為資源符$, 相當於python中的xpath語法的etree過程
    const $ = cheerio.load(res.text)
    // 定位資源位置, 將圖片資源,和圖片名字, 以陣列方式, 返回給呼叫函式
    $(`.row-item-image a`).each(function(i, elem){
        let movieName = $(this).attr(`href`).split(`/`).pop()
        let imgAddr = $(this).find(`img`).attr(`src`)
        imgAddrArray.push([imgAddr, movieName])
    })
    return imgAddrArray
}
// 下載圖片
async function download(imgAndName){
    // 拼接出, 當前資源的檔名
    let filename = imgAndName[1] + `.jpg`
    console.log("爬取海報:", filename);
    // 獲取圖片二進位制資料
    const req = request.get(imgAndName[0]);
    // 儲存圖片
    await req.pipe(fs.createWriteStream(path.join(__dirname, `images`, filename))); 
}

// 建立資料夾, 控制整體流程
async function init(){
    let imgAddrArray = await getUrlAndName()
    // 建立資料夾
    try{
        await fs.mkdir(path.join(__dirname, `images`));
    }
    catch(err){
        console.log("==>", err);
    }
    // 獲取資源
    for (let imgAddr of imgAddrArray){
        await download(imgAddr);
    }
}

init()

執行結果

小結:

直觀感受, node爬蟲並沒有python好用, 而且由於瀏覽器的同源限制, 在瀏覽器端跑node爬蟲也會有些麻煩；node爬蟲的優勢：理論上講，node預設的非同步玩法, 能達到python的多執行緒爬蟲的效果.
寫爬蟲, 還是老老實實用python吧!

IGN：15部最佳R級超級英雄漫改電影
2022-01-14
擼個爬蟲，爬取電影種子
2019-05-11
爬蟲
用Python網路爬蟲獲取Mikan動漫資源
2020-08-26
Python爬蟲
python初級爬蟲之貓眼電影
2019-02-23
Python爬蟲
爬蟲如何爬取貓眼電影TOP榜資料
2019-06-17
爬蟲
爬蟲01:爬取豆瓣電影TOP 250基本資訊
2020-12-29
爬蟲
Python爬蟲筆記（4）：利用scrapy爬取豆瓣電影250
2018-11-10
Python爬蟲筆記
Python爬蟲教程-17-ajax爬取例項（豆瓣電影）
2018-09-06
Python爬蟲
python爬蟲爬取豆瓣電影 1-10 ajax 資料
2024-07-04
Python爬蟲
nodejs 爬蟲
2019-02-16
NodeJS爬蟲
Python爬蟲例項：爬取貓眼電影——破解字型反爬
2019-02-26
Python爬蟲
超級英雄集結《漫威爭鋒》國服技術測試今日開啟！
2024-10-18
撿了滑鼠開網咖系列——nodejs爬取電影連結
2018-10-24
NodeJS
Python爬蟲遞迴呼叫爬取動漫美女圖片
2020-10-19
Python爬蟲遞迴
使用 puppeteer + nodejs 爬取喜歡的動漫資源
2022-06-11
NodeJS
他也是唯一一個跨越電影世界的超級英雄
2019-09-28
Python爬取電影天堂
2018-11-01
Python
Python網路爬蟲實踐案例：爬取貓眼電影Top100
2024-11-21
Python爬蟲
python爬蟲如何獲取表情包
2021-09-11
Python爬蟲
looter——超輕量級爬蟲框架
2019-04-27
爬蟲框架
Python網路爬蟲（正則, 內涵段子，貓眼電影, 鏈家爬取）
2018-10-30
Python爬蟲
python-爬蟲-css提取-寫入csv-爬取貓眼電影榜單
2023-04-05
Python爬蟲CSS
Python一鍵獲取日漫Top100榜單電影資訊
2020-05-25
Python
【python爬蟲案例】利用python爬取豆瓣電影TOP250評分排行資料！
2024-09-18
Python爬蟲
scrapy爬取豆瓣電影資料
2021-09-11
python爬蟲獲取百度熱搜
2024-06-15
Python爬蟲
手把手教你網路爬蟲（爬取豆瓣電影top250，附帶原始碼）
2023-03-04
爬蟲原始碼
Python電影爬蟲之身體每況愈下
2020-05-23
Python爬蟲
Python爬蟲批次下載電影連結
2021-09-09
Python爬蟲
python爬蟲學習01--電子書爬取
2020-07-13
Python爬蟲
獲取爬蟲動態IP的三種方法
2022-06-06
爬蟲
python爬蟲從ip池獲取隨機IP
2021-09-11
Python爬蟲隨機
python 爬蟲之獲取標題和連結
2020-11-27
Python爬蟲
Golang框架beego電影網爬蟲小試牛刀
2018-09-25
Golang框架爬蟲
一個基於 golang 的爬蟲電影站
2020-03-20
Golang爬蟲
為爬蟲獲取登入cookies：使用browsercookie從瀏覽器獲取cookies
2018-12-03
爬蟲Cookie瀏覽器
2018年漫威電影宇宙全球總票房突破40億美元
2018-09-12
Nodejs爬取新聞列表
2021-09-09
NodeJS

nodejs爬蟲獲取漫威超級英雄電影海報

小結:

相關文章