Python爬蟲實戰系列3：今日BBNews程式設計新聞採集

Python魔法师發表於2024-03-15

原文網址 : https://www.cnblogs.com/meet/p/18068021

Python爬蟲程式設計

一、分析頁面

開啟今日BBNews網址 https://news.bicido.com ，下拉選擇【程式設計】欄目

首頁.png

1.1、分析請求

F12開啟開發者模式，然後點選Network後點選任意一個請求，Ctrl+F開啟搜尋，輸入標題Apache Doris 2.1.0 版本釋出 ，開始搜尋

分析請求.png

搜尋結果顯示直接返回的json格式，那就so easy了，直接copy curl，然後將curl 轉換為Python程式碼，執行。

推薦個curl轉Python程式碼的線上工具：https://curlconverter.com/

二、程式碼實現

直接將curl 轉換後的Python程式碼做下修改，然後除錯執行即可。

完整程式碼

# -*- coding: utf-8 -*-
import os
import sys
from datetime import datetime

import requests

opd = os.path.dirname
curr_path = opd(os.path.realpath(__file__))
proj_path = opd(opd(opd(curr_path)))
sys.path.insert(0, proj_path)

from app.conf.conf_base import USERAGENT

spider_config = {
    "name_en": "https://news.bicido.com",
    "name_cn": "今日BBNews"
}


class Bbnews:
    def __init__(self):
        self.headers = {
            'referer': 'https://news.bicido.com/',
            'user-agent': USERAGENT
        }

    def get_group(self):
        url = 'https://news.bicido.com/api/config/news_group/'
        content = requests.get(url=url, headers=self.headers)
        content = content.json()
        return content

    def get_news(self):
        groups = self.get_group()
        news_type = []
        for group in groups:
            if group['name'] == '程式設計':
                news_type = group['news_types']
        result = []
        for news_type in news_type:
            type_id = news_type['id']
            url = f'https://news.bicido.com/api/news/?type_id={type_id}'
            content = requests.get(url, headers=self.headers)
            news_list = content.json()
            for new in news_list:
                result.append({
                    "news_title": str(new['title']),
                    "news_date": datetime.now(),
                    "source_en": spider_config['name_en'],
                    "source_cn": spider_config['name_cn'],
                })
        return result


def main():
    bbnews = Bbnews()
    results = bbnews.get_news()
    print(results)


if __name__ == '__main__':
    main()

總結

今日BBNews頁面沒反爬策略，比較簡單，拿來即用
本文介紹了curl to Python的工具，方便好用。

本文章程式碼只做學習交流使用，作者不負責任何由此引起的法律責任。

各位看官，如對你有幫助歡迎點贊，收藏，轉發，關注公眾號【Python魔法師】獲取更多Python魔法~

Python爬蟲實戰系列1：部落格園cnblogs熱門新聞採集
2024-03-13
Python爬蟲
Python爬蟲實戰系列4：天眼查公司工商資訊採集
2024-03-20
Python爬蟲
python爬蟲---網頁爬蟲，圖片爬蟲，文章爬蟲，Python爬蟲爬取新聞網站新聞
2019-01-04
Python爬蟲網頁網站
爬蟲實戰：探索XPath爬蟲技巧之熱榜新聞
2024-03-21
爬蟲
python3網路爬蟲開發實戰_Python3 爬蟲實戰
2022-01-24
Python爬蟲
【python爬蟲實戰】使用Selenium webdriver採集山東招考資料
2020-07-02
Python爬蟲Web
Python網路爬蟲資料採集實戰：Requests和Re庫
2020-03-22
Python爬蟲
python3 爬蟲實戰：為爬蟲新增 GUI 影象介面
2020-03-06
Python爬蟲GUI
Python 爬蟲實戰
2023-10-16
Python爬蟲
大規模非同步新聞爬蟲：實現一個同步定向新聞爬蟲
2018-12-03
非同步爬蟲
《Python3網路爬蟲開發實戰》教程||爬蟲教程
2018-11-13
Python爬蟲
Python 3網路爬蟲開發實戰
2021-04-28
Python爬蟲
python3網路爬蟲開發實戰_Python 3開發網路爬蟲(一)
2020-12-07
Python爬蟲
Python爬蟲百度新聞標題
2020-11-29
Python爬蟲
[Python3網路爬蟲開發實戰] 分散式爬蟲原理
2019-12-08
Python爬蟲分散式
python爬蟲實戰教程-Python爬蟲開發實戰教程（微課版）
2020-11-11
Python爬蟲
python爬蟲實戰，爬蟲之路，永無止境
2022-01-27
Python爬蟲
Python3網路爬蟲開發實戰
2021-04-15
Python爬蟲
2個月精通Python爬蟲——3大爬蟲框架+6場實戰+反爬蟲技巧+分散式爬蟲
2018-06-28
Python爬蟲框架分散式
普京宣佈開戰，俄烏戰爭實時新聞採集整理
2022-02-24
《Python 3網路爬蟲開發實戰》chapter3
2019-07-09
Python爬蟲APT
通用新聞爬蟲開發系列（專案介紹）
2022-02-18
爬蟲
北郵《Python程式設計與實踐》——爬蟲學習
2021-12-14
Python程式設計爬蟲
Python網路爬蟲實戰
2022-03-18
Python爬蟲
python 爬蟲實戰的原理
2021-10-29
Python爬蟲
Python爬蟲實戰之bilibili
2021-04-04
Python爬蟲
Python 爬蟲系列
2021-01-01
Python爬蟲
python爬蟲-33個Python爬蟲專案實戰(推薦)
2020-10-28
Python爬蟲
aardio爬蟲) 實戰篇：採集自己的公眾號粉絲列表
2024-04-29
爬蟲
《python3網路爬蟲開發實戰》--pyspider
2018-10-18
Python爬蟲IDE
python3網路爬蟲開發實戰pdf
2021-11-30
Python爬蟲
《Python3網路爬蟲開發實戰程式碼》基本庫使用
2019-05-05
Python爬蟲
Python爬蟲初學二（網路資料採集）
2020-05-03
Python爬蟲
IPIDEA分析資料採集新趨勢，Python爬蟲的應用前景如何？
2023-04-23
IdeaPython爬蟲
大規模非同步新聞爬蟲：簡單的百度新聞爬蟲
2018-12-02
非同步爬蟲
如何提高爬取爬蟲採集的效率？
2022-06-11
爬蟲
大規模非同步新聞爬蟲：用asyncio實現非同步爬蟲
2018-12-03
非同步爬蟲
Python【爬蟲實戰】提取資料
2020-11-17
Python爬蟲

Python爬蟲實戰系列3：今日BBNews程式設計新聞採集

一、分析頁面

1.1、分析請求

二、程式碼實現

總結

相關文章