Nodejs爬取新聞列表

davidtim發表於2021-09-09

原文網址 : http://blog.itpub.net/3349/viewspace-2813017/

爬取地址

使用到的庫

superagent (頁面資料下載)
cheerio (頁面資料解析)

程式碼

app.js

// 引入依賴const superagent = require('superagent'); // superagent是nodejs裡一個非常方便的客戶端請求程式碼模組const cheerio = require('cheerio'); // 可以理解為一個Node.js版本的Jquery// 爬取地址const url = '';// 讀取頁面資料superagent.get(url).end((err, res) => {  if (err) throw Error(err);  let postlist = getFilterHtml(res.text);  // 存入資料庫操作...})// 過濾資料function getFilterHtml(html) {  let $ = cheerio.load(html); // 使用cheerio
  let postList = []; // 存放新聞列表的陣列

  // F12分析後的節點資料，用Jquery的語法進行過濾、摘取
  $('#listContent .news_li').each((index, item) => {    let elem = $(item);    let post = {      icon: elem.find('.tiptitleImg img').attr('src'),      title: elem.find('h2 a').text(),      intro: elem.find('p').text(),      link: elem.find('h2 a').attr('href'),      target: elem.find('.pdtt_trbs a').text(),      hot: elem.find('.pdtt_trbs .trbszan').text()
    }
    postList.push(post);
  })  return postList;
}

作者：daydreammoon
連結：

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/3349/viewspace-2813017/，如需轉載，請註明出處，否則將追究法律責任。

爬取網站新聞
2020-09-24
網站
python爬蟲---網頁爬蟲，圖片爬蟲，文章爬蟲，Python爬蟲爬取新聞網站新聞
2019-01-04
Python爬蟲網頁網站
爬蟲搭建代理池、爬取某網站影片案例、爬取新聞案例
2023-03-16
爬蟲網站
大規模非同步新聞爬蟲：實現一個同步定向新聞爬蟲
2018-12-03
非同步爬蟲
大規模非同步新聞爬蟲：簡單的百度新聞爬蟲
2018-12-02
非同步爬蟲
jQuery新聞列表垂直滾動詳解
2018-11-28
jQuery
CSS 帶有時間日期的新聞列表
2020-03-12
CSS
用Nodejs Cheerio爬取NPM包詳細資訊
2019-05-06
NodeJSNPM
nodejs 爬蟲
2019-02-16
NodeJS爬蟲
使用 puppeteer + nodejs 爬取喜歡的動漫資源
2022-06-11
NodeJS
Go秒爬部落格園100頁新聞
2018-08-01
Go
Python爬蟲百度新聞標題
2020-11-29
Python爬蟲
爬蟲實戰：探索XPath爬蟲技巧之熱榜新聞
2024-03-21
爬蟲
Node.js爬取科技新聞網站cnBeta（附前端及服務端原始碼）
2018-12-16
Node.js網站前端服務端原始碼
撿了滑鼠開網咖系列——nodejs爬取電影連結
2018-10-24
NodeJS
nodejs爬蟲獲取漫威超級英雄電影海報
2018-05-12
NodeJS爬蟲
爬取GoCn每日新聞並推送到微信/郵箱
2019-05-10
Go
Jsoup + HtmlUtil 實現網易新聞網頁爬蟲
2019-01-14
JSHTML網頁爬蟲
通用新聞爬蟲開發系列（專案介紹）
2022-02-18
爬蟲
大規模非同步新聞爬蟲：用asyncio實現非同步爬蟲
2018-12-03
非同步爬蟲
Flutter 新聞詳情頁二——WebView和列表豎直滾動
2018-11-08
FlutterWebView
大規模非同步新聞爬蟲的實現思路
2019-05-20
非同步爬蟲
Python爬蟲工具列表
2018-11-15
Python爬蟲
帝國CMS列表頁模板新聞關鍵詞帶連結呼叫
2024-11-16
創新專案實訓：資料爬取
2022-06-08
大規模非同步新聞爬蟲：網頁正文的提取
2018-12-03
非同步爬蟲網頁
大規模非同步新聞爬蟲的分散式實現
2019-06-10
非同步爬蟲分散式
教你如何用nodejs爬掘金(一）
2018-04-14
NodeJS
puppeteer+mysql—爬蟲新方法！抓取新聞&評論so easy！
2018-09-17
MySql爬蟲
C++--Win32--列表編輯--獲取列表內容--獲取列表行數--修改列表內容
2020-10-12
C++Win32
HBuilder開發詞典app（三）--主頁圖文輪播和新聞列表
2018-07-18
UIAPP
PbootCMS後臺列表只有一條新聞，但是前端顯示2條內容
2024-08-29
boot前端
基於nodejs編寫小爬蟲
2019-02-16
NodeJS爬蟲
nodejs + koa2 實現爬蟲
2019-02-16
NodeJS爬蟲
nodejs eggjs框架爬蟲 readhub.me
2018-11-29
NodeJS框架爬蟲
記錄一次nodejs爬取《17吉他》所有吉他譜（只探討技術）
2019-03-03
NodeJS
獲取 NodeJS 程式退出碼
2021-08-23
NodeJS
golang每天定時爬取gocn最新的每日新聞傳送到釘釘群
2019-03-18
Golang

Nodejs爬取新聞列表

爬取地址

使用到的庫

程式碼

相關文章