webscraper 4個Sitemap
一、抓取公眾號標題、時間、內容連結
{"_id":"gongzhonghao","startUrl":["https://mp.weixin.qq.com/mp/profile_ext?action=home&__biz=MzIxODUxMDM5MQ==&scene=124&#wechat_redirect"],"selectors":[{"id":"total","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.weui_msg_card:nth-of-type(n+2)","multiple":true,"delay":"1000"},{"id":"title","type":"SelectorText","parentSelectors":["total"],"selector":"h4.weui_media_title","multiple":false,"regex":"","delay":0},{"id":"date","type":"SelectorText","parentSelectors":["total"],"selector":"p.weui_media_extra_info","multiple":false,"regex":"","delay":0},{"id":"link","type":"SelectorElementAttribute","parentSelectors":["total"],"selector":"h4.weui_media_title","multiple":false,"extractAttribute":"hrefs","delay":0}]}
二、知乎
1、知乎大 V 所有文章標題、連結、點贊數、評論數
{"_id":"zhihu-article","startUrl":["https://www.zhihu.com/people/zhang-jia-wei/posts?page=[1-44]"],"selectors":[{"id":"aaa","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.List-item","multiple":true,"delay":"2000"},{"id":"title","type":"SelectorLink","parentSelectors":["aaa"],"selector":"h2.ContentItem-title a","multiple":false,"delay":0},{"id":"like","type":"SelectorText","parentSelectors":["aaa"],"selector":"button.Button.VoteButton--up","multiple":false,"regex":"","delay":0},{"id":"comments","type":"SelectorText","parentSelectors":["aaa"],"selector":"button.Button.ContentItem-action:nth-of-type(1)","multiple":false,"regex":"","delay":0}]}
2、知乎大 V 所有回答、連結、點贊數、評論數
{"_id":"zhihu-questions","startUrl":["https://www.zhihu.com/people/zhang-jia-wei/answers?page=[1-169]"],"selectors":[{"id":"total","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.List-item","multiple":true,"delay":"2000"},{"id":"questions","type":"SelectorLink","parentSelectors":["total"],"selector":"h2.ContentItem-title a","multiple":false,"delay":0},{"id":"likes","type":"SelectorText","parentSelectors":["total"],"selector":"button.Button.VoteButton--up","multiple":false,"regex":"","delay":0},{"id":"comments","type":"SelectorText","parentSelectors":["total"],"selector":"button.Button.ContentItem-action:nth-of-type(1)","multiple":false,"regex":"","delay":0}]}
3、抓取知乎搜尋關鍵字,所有結果標題、連結、點贊數、評論數
{"_id":"zhihu-search","startUrl":["https://www.zhihu.com/search?q=%E8%B5%9A%E9%92%B1&type=content"],"selectors":[{"id":"total","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.List div div.List-item","multiple":true,"delay":"3000"},{"id":"link","type":"SelectorLink","parentSelectors":["total"],"selector":"h2.ContentItem-title a","multiple":false,"delay":0},{"id":"likes","type":"SelectorText","parentSelectors":["total"],"selector":"button.Button.VoteButton--up","multiple":false,"regex":"","delay":0},{"id":"comments","type":"SelectorText","parentSelectors":["total"],"selector":"button.Button.ContentItem-action","multiple":false,"regex":"","delay":0}]}
三、抓取頭條熱點文章標題、釋出源、評論數、釋出時間
{"_id":"toutiao","startUrl":["https://www.toutiao.com/ch/news_hot/"],"selectors":[{"id":"total","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.item-inner","multiple":true,"delay":"4000"},{"id":"link","type":"SelectorLink","parentSelectors":["total"],"selector":"a.link","multiple":false,"delay":0},{"id":"source","type":"SelectorText","parentSelectors":["total"],"selector":"a.lbtn.source","multiple":false,"regex":"","delay":0},{"id":"comments","type":"SelectorText","parentSelectors":["total"],"selector":"a.lbtn.comment","multiple":false,"regex":"","delay":0},{"id":"time","type":"SelectorText","parentSelectors":["total"],"selector":"span.lbtn","multiple":false,"regex":"","delay":0}]}
四、微博
1、抓取微博內容、轉發連結、轉發數、評論數、點贊數、釋出時間
{"_id":"weibo","startUrl":["https://weibo.com/bylixiaolai?is_search=0&visible=0&is_hot=1&is_tag=0&profile_ftype=1&page=[1-60]#feedtop"],"selectors":[{"id":"total","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.WB_cardwrap.WB_feed_type:nth-of-type(n+2)","multiple":true,"delay":"1000"},{"id":"click","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.WB_cardwrap:nth-of-type(2) div.WB_text","multiple":true,"delay":"2000","clickElementSelector":"div.WB_text.W_f14 a.WB_text_opt","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"real-total","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.WB_cardwrap.WB_feed_type:nth-of-type(n+2)","multiple":true,"delay":"10000"},{"id":"content","type":"SelectorText","parentSelectors":["real-total"],"selector":"div.WB_text","multiple":false,"regex":"","delay":0},{"id":"forward","type":"SelectorLink","parentSelectors":["real-total"],"selector":"a.S_func1.W_autocut","multiple":false,"delay":0},{"id":"shares","type":"SelectorText","parentSelectors":["real-total"],"selector":"li:nth-of-type(2) em:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"comments","type":"SelectorText","parentSelectors":["real-total"],"selector":"li:nth-of-type(3) em:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"likes","type":"SelectorText","parentSelectors":["real-total"],"selector":"li:nth-of-type(4) em:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"time","type":"SelectorText","parentSelectors":["real-total"],"selector":"div.WB_detail > div.WB_from a.S_txt2:nth-of-type(1)","multiple":false,"regex":"","delay":0}]}
2、抓取微博所有評論
{"_id":"weibo-comment","startUrl":["https://weibo.com/1576218000/Gqjfh0VYa?filter=hot&root_comment_id=0&type=comment"],"selectors":[{"id":"scroll","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.list_box > div.list_ul > div.list_li:nth-of-type(1) > div.list_con > div.WB_text","multiple":true,"delay":"1000"},{"id":"click","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.list_box > div.list_ul > div.list_li > div.list_con > div.WB_text","multiple":true,"delay":"3000","clickElementSelector":"span.more_txt","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueCSSSelector"},{"id":"content","type":"SelectorText","parentSelectors":["click"],"selector":"parent","multiple":false,"regex":"","delay":0}]}
相關文章
- 有關webscraper的問題,看這個就夠了Web
- SEO - 配置sitemap
- 小程式sitemap配置
- django 網站地圖 sitemapDjango網站地圖
- sitemap 檔案填充示例程式碼
- 關於 Spartacus 的 sitemap.xml 問題XML
- Go 部落格平臺 Pipe 1.6.0 釋出,支援 sitemapGo
- 國內主流搜尋引擎提交Sitemap(網站地圖)網站地圖
- 教你如何動態生成Sitemap.xml網站地圖!XML網站地圖
- PbootCMS生成的 sitemap.xml 中增加 tag 標籤連結bootXML
- 前端 Website 的 sitemap.xml 檔案和搜尋引擎最佳化前端WebXML
- 160個CrackMe(4)
- VuePress 部落格之 SEO 優化(一)之 sitemap 與搜尋引擎收錄Vue優化
- 社招——4個offer
- 4 月 COPR 中的 4 個新酷專案
- 寫一個函式,輸入一個4位數字,要求輸出這4個數字字元函式字元
- 個人部落格入門4
- 住宅代理的4個優勢
- 三維座標要建一個4*4的矩陣?矩陣
- 我常用的4個備份工具
- 自媒體必備4個工具
- linux 設定tab為4個空格Linux
- Kubernetes設計的4個原則
- 本週4個Github有趣專案Github
- 4個非常有用的 Flutter 技巧Flutter
- 譯 10 個 Flutter 元件推薦 – 4Flutter元件
- 4個最佳化MongoDB的技巧MongoDB
- 防止通知疲勞的 4 個技巧
- 4盤位,是先上1個16T的nas盤,還是先上4個4T的垂直監控盤組RAID?AI
- 使用 Python 把多個 MP4 合成一個視訊Python
- [譯]19個CSS level 4 選擇器CSS
- 4個Python相關的公眾號Python
- Go 語言的 4 個特性改動Go
- MYSQL預設有4個資料庫MySql資料庫
- 如何訓練個人的ChatGpt4ChatGPT
- java log4j 的一個bugJava
- 幫助你駕馭 Kubernetes 的 4 個工具
- 有效尋源的4個最佳實踐