webscraper 4個Sitemap
一、抓取公眾號標題、時間、內容連結
{"_id":"gongzhonghao","startUrl":["https://mp.weixin.qq.com/mp/profile_ext?action=home&__biz=MzIxODUxMDM5MQ==&scene=124&#wechat_redirect"],"selectors":[{"id":"total","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.weui_msg_card:nth-of-type(n+2)","multiple":true,"delay":"1000"},{"id":"title","type":"SelectorText","parentSelectors":["total"],"selector":"h4.weui_media_title","multiple":false,"regex":"","delay":0},{"id":"date","type":"SelectorText","parentSelectors":["total"],"selector":"p.weui_media_extra_info","multiple":false,"regex":"","delay":0},{"id":"link","type":"SelectorElementAttribute","parentSelectors":["total"],"selector":"h4.weui_media_title","multiple":false,"extractAttribute":"hrefs","delay":0}]}
二、知乎
1、知乎大 V 所有文章標題、連結、點贊數、評論數
{"_id":"zhihu-article","startUrl":["https://www.zhihu.com/people/zhang-jia-wei/posts?page=[1-44]"],"selectors":[{"id":"aaa","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.List-item","multiple":true,"delay":"2000"},{"id":"title","type":"SelectorLink","parentSelectors":["aaa"],"selector":"h2.ContentItem-title a","multiple":false,"delay":0},{"id":"like","type":"SelectorText","parentSelectors":["aaa"],"selector":"button.Button.VoteButton--up","multiple":false,"regex":"","delay":0},{"id":"comments","type":"SelectorText","parentSelectors":["aaa"],"selector":"button.Button.ContentItem-action:nth-of-type(1)","multiple":false,"regex":"","delay":0}]}
2、知乎大 V 所有回答、連結、點贊數、評論數
{"_id":"zhihu-questions","startUrl":["https://www.zhihu.com/people/zhang-jia-wei/answers?page=[1-169]"],"selectors":[{"id":"total","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.List-item","multiple":true,"delay":"2000"},{"id":"questions","type":"SelectorLink","parentSelectors":["total"],"selector":"h2.ContentItem-title a","multiple":false,"delay":0},{"id":"likes","type":"SelectorText","parentSelectors":["total"],"selector":"button.Button.VoteButton--up","multiple":false,"regex":"","delay":0},{"id":"comments","type":"SelectorText","parentSelectors":["total"],"selector":"button.Button.ContentItem-action:nth-of-type(1)","multiple":false,"regex":"","delay":0}]}
3、抓取知乎搜尋關鍵字,所有結果標題、連結、點贊數、評論數
{"_id":"zhihu-search","startUrl":["https://www.zhihu.com/search?q=%E8%B5%9A%E9%92%B1&type=content"],"selectors":[{"id":"total","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.List div div.List-item","multiple":true,"delay":"3000"},{"id":"link","type":"SelectorLink","parentSelectors":["total"],"selector":"h2.ContentItem-title a","multiple":false,"delay":0},{"id":"likes","type":"SelectorText","parentSelectors":["total"],"selector":"button.Button.VoteButton--up","multiple":false,"regex":"","delay":0},{"id":"comments","type":"SelectorText","parentSelectors":["total"],"selector":"button.Button.ContentItem-action","multiple":false,"regex":"","delay":0}]}
三、抓取頭條熱點文章標題、釋出源、評論數、釋出時間
{"_id":"toutiao","startUrl":["https://www.toutiao.com/ch/news_hot/"],"selectors":[{"id":"total","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.item-inner","multiple":true,"delay":"4000"},{"id":"link","type":"SelectorLink","parentSelectors":["total"],"selector":"a.link","multiple":false,"delay":0},{"id":"source","type":"SelectorText","parentSelectors":["total"],"selector":"a.lbtn.source","multiple":false,"regex":"","delay":0},{"id":"comments","type":"SelectorText","parentSelectors":["total"],"selector":"a.lbtn.comment","multiple":false,"regex":"","delay":0},{"id":"time","type":"SelectorText","parentSelectors":["total"],"selector":"span.lbtn","multiple":false,"regex":"","delay":0}]}
四、微博
1、抓取微博內容、轉發連結、轉發數、評論數、點贊數、釋出時間
{"_id":"weibo","startUrl":["https://weibo.com/bylixiaolai?is_search=0&visible=0&is_hot=1&is_tag=0&profile_ftype=1&page=[1-60]#feedtop"],"selectors":[{"id":"total","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.WB_cardwrap.WB_feed_type:nth-of-type(n+2)","multiple":true,"delay":"1000"},{"id":"click","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.WB_cardwrap:nth-of-type(2) div.WB_text","multiple":true,"delay":"2000","clickElementSelector":"div.WB_text.W_f14 a.WB_text_opt","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"real-total","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.WB_cardwrap.WB_feed_type:nth-of-type(n+2)","multiple":true,"delay":"10000"},{"id":"content","type":"SelectorText","parentSelectors":["real-total"],"selector":"div.WB_text","multiple":false,"regex":"","delay":0},{"id":"forward","type":"SelectorLink","parentSelectors":["real-total"],"selector":"a.S_func1.W_autocut","multiple":false,"delay":0},{"id":"shares","type":"SelectorText","parentSelectors":["real-total"],"selector":"li:nth-of-type(2) em:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"comments","type":"SelectorText","parentSelectors":["real-total"],"selector":"li:nth-of-type(3) em:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"likes","type":"SelectorText","parentSelectors":["real-total"],"selector":"li:nth-of-type(4) em:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"time","type":"SelectorText","parentSelectors":["real-total"],"selector":"div.WB_detail > div.WB_from a.S_txt2:nth-of-type(1)","multiple":false,"regex":"","delay":0}]}
2、抓取微博所有評論
{"_id":"weibo-comment","startUrl":["https://weibo.com/1576218000/Gqjfh0VYa?filter=hot&root_comment_id=0&type=comment"],"selectors":[{"id":"scroll","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.list_box > div.list_ul > div.list_li:nth-of-type(1) > div.list_con > div.WB_text","multiple":true,"delay":"1000"},{"id":"click","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.list_box > div.list_ul > div.list_li > div.list_con > div.WB_text","multiple":true,"delay":"3000","clickElementSelector":"span.more_txt","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueCSSSelector"},{"id":"content","type":"SelectorText","parentSelectors":["click"],"selector":"parent","multiple":false,"regex":"","delay":0}]}
相關文章
- 有關webscraper的問題,看這個就夠了Web
- SEO - 配置sitemap
- 小程式sitemap配置
- django 網站地圖 sitemapDjango網站地圖
- Asp.net SiteMap & BreadcrumbASP.NET
- 用 Laravel 簡單製作 SitemapLaravel
- 如何使用Python來生成sitemapPython
- sitemap 檔案填充示例程式碼
- 關於 Spartacus 的 sitemap.xml 問題XML
- asp.net中sitemap的簡單實用ASP.NET
- Go 部落格平臺 Pipe 1.6.0 釋出,支援 sitemapGo
- 國內主流搜尋引擎提交Sitemap(網站地圖)網站地圖
- 教你如何動態生成Sitemap.xml網站地圖!XML網站地圖
- Dynamics 365(online) V9.0 new features(五:sitemap)
- PbootCMS生成的 sitemap.xml 中增加 tag 標籤連結bootXML
- 網頁資料抓取工具,webscraper 最簡單的資料抓取教程,人人都用得上網頁Web
- 前端 Website 的 sitemap.xml 檔案和搜尋引擎最佳化前端WebXML
- sitemap工具支援https資料提交!附https技術建議HTTP
- 社招——4個offer
- VuePress 部落格之 SEO 優化(一)之 sitemap 與搜尋引擎收錄Vue優化
- Python3分析sitemap.xml抓取匯出全站連結PythonXML
- 百度sitemap許可權即將全面開放(附製作教程)
- 禁止蜘蛛/爬蟲:如何配置Robots.txt和網站地圖(Sitemap.xml)爬蟲網站地圖XML
- 4 月 COPR 中的 4 個新酷專案
- 健身4個月總結
- 寫一個函式,輸入一個4位數字,要求輸出這4個數字字元函式字元
- 住宅代理的4個優勢
- 三維座標要建一個4*4的矩陣?矩陣
- 自媒體必備4個工具
- 4個非常有用的 Flutter 技巧Flutter
- 4個最佳化MongoDB的技巧MongoDB
- 我常用的4個備份工具
- 譯 10 個 Flutter 元件推薦 – 4Flutter元件
- 一個老程式猿的分享4
- 4個常用的HTTP安全頭部HTTP
- Linux 桌面的 4 個小技巧Linux
- 整潔程式碼的4個提示
- DBWn程式觸發4個條件