北京市政百姓信件分析實戰一（利用python爬取資料）

财神给你送元宝發表於2024-09-06

原文網址 : https://www.cnblogs.com/youxiandechilun/p/18399453

Python

因為我的python版本為3.12

所以安裝一些軟體包命令與之前有些許不同

pip install beautifulSoup4

pip install demjson3

pip install requests

話不多說程式碼奉上

import json

import demjson3
import requests
from bs4 import BeautifulSoup
import csv

headers = {
    'Host': 'www.beijing.gov.cn',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0',
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
    'Accept-Encoding': 'gzip, deflate',
    'Content-Type': 'text/json',
    'X-Requested-With': 'XMLHttpRequest',
    'Content-Length': '155',
    'Origin': 'http://www.beijing.gov.cn',
    'Connection': 'keep-alive',
    'Referer': 'http://www.beijing.gov.cn/hudong/hdjl/'
}

if __name__ == "__main__":
    page = 1
    datas = json.dumps({})

    while page < 175:
        print(page)
        url = f"https://www.beijing.gov.cn/hudong/hdjl/sindex/bjah-index-hdjl!replyLetterListJson.action?page.pageNo={page}&page.pageSize=6&orgtitleLength=26"
        r = requests.post(url, data=datas, headers=headers)
        rr = demjson3.decode(r.text);


        for item in rr.get("result", []):
            originalId = item.get("originalId")  # 編號
            letterTypeName = item.get("letterTypeName")  # 信件型別

            # 構建詳情頁URL
            detail_url = f"http://www.beijing.gov.cn/hudong/hdjl/com.web.{('consult' if letterTypeName == '諮詢' else 'suggest')}.{('consultDetail' if letterTypeName == '諮詢' else 'suggesDetail')}.flow?originalId={originalId}"
            r1 = requests.get(detail_url, headers={'user-agent': 'Mozilla/5.0'})

            if r1.status_code == 200:
                demo = r1.text
                soup = BeautifulSoup(demo, "html.parser")

                title = soup.find("strong").get_text().replace("\n", "") if soup.find("strong") else ""
                fromPeople = soup.find_all("div", {"class": "col-xs-10 col-lg-3 col-sm-3 col-md-4 text-muted"})[0].get_text().lstrip('來信人：').lstrip().rstrip() if soup.find_all("div", {"class": "col-xs-10 col-lg-3 col-sm-3 col-md-4 text-muted"}) else ""
                fromTime = soup.find_all("div", {"class": "col-xs-5 col-lg-3 col-sm-3 col-md-3 text-muted"})[0].get_text().lstrip('時間：') if soup.find_all("div", {"class": "col-xs-5 col-lg-3 col-sm-3 col-md-3 text-muted"}) else ""
                problem = soup.find_all("div", {"class": "col-xs-12 col-md-12 column p-2 text-muted mx-2"})[0].get_text().lstrip().rstrip().replace("\r", "").replace("\n", "") if soup.find_all("div", {"class", "col-xs-12 col-md-12 column p-2 text-muted mx-2"}) else ""
                office = soup.find_all("div", {"class": "col-xs-9 col-sm-7 col-md-5 o-font4 my-2"})[0].get_text().replace("\n", "") if soup.find_all("div", {"class": "col-xs-9 col-sm-7 col-md-5 o-font4 my-2"}) else ""
                answerTime = soup.find_all("div", {"class": "col-xs-12 col-sm-3 col-md-3 my-2"})[0].get_text().lstrip('答覆時間：') if soup.find_all("div", {"class": "col-xs-12 col-sm-3 col-md-3 my-2"}) else ""
                answer = soup.find_all("div", {"class": "col-xs-12 col-md-12 column p-4 text-muted my-3"})[0].get_text().lstrip().rstrip().replace("\n", "").replace("\r", "") if soup.find_all("div", {"class": "col-xs-12 col-md-12 column p-4 text-muted my-3"}) else ""

                itemm = f"{originalId}|{letterTypeName}|{title}|{fromPeople}|{fromTime}|{problem}|{office}|{answerTime}|{answer}"

                with open("yijian.txt", 'a', encoding='utf-8') as fp:
                    fp.write(itemm + '\n')
            else:
                print(f"Failed to retrieve details for ID: {originalId}")

        page += 1

python爬取北京租房資訊
2018-05-18
Python
輕鬆利用Python爬蟲爬取你想要的資料
2021-09-10
Python爬蟲
python爬蟲利用代理IP分析大資料
2020-12-01
Python爬蟲大資料
利用python爬取某殼的房產資料
2024-05-05
Python
Python 爬蟲實戰之爬拼多多商品並做資料分析
2023-10-17
Python爬蟲
爬蟲實戰——58同城租房資料爬取
2019-12-04
爬蟲
Python【爬蟲實戰】提取資料
2020-11-17
Python爬蟲
Python | 資料分析實戰Ⅰ
2019-03-04
Python
Python | 資料分析實戰 Ⅱ
2018-04-28
Python
Python爬蟲實戰：爬取淘寶的商品資訊
2021-09-11
Python爬蟲
利用python編寫爬蟲爬取淘寶奶粉部分資料.1
2021-09-09
Python爬蟲
實時獲取股票資料，免費！——Python爬蟲Sina Stock實戰
2021-10-13
Python爬蟲
python爬取58同城一頁資料
2018-08-04
Python
Python 爬蟲進階篇-利用beautifulsoup庫爬取網頁文章內容實戰演示
2020-09-14
Python爬蟲網頁
【python爬蟲案例】利用python爬取豆瓣電影TOP250評分排行資料！
2024-09-18
Python爬蟲
【Python3網路爬蟲開發實戰】6-Ajax資料爬取-4-分析Ajax爬取今日頭條街拍美圖
2019-02-19
Python爬蟲
Python網路爬蟲實戰：爬取知乎話題下 18934 條回答資料
2019-01-17
Python爬蟲
Python爬蟲實戰案例：取喜馬拉雅音訊資料詳解
2020-12-05
Python爬蟲音訊
如何利用 Selenium 爬取評論資料？
2018-04-12
基於python的大資料分析-pandas資料讀取（程式碼實戰）
2019-08-29
Python大資料
利用Python爬取必應桌布
2020-10-13
Python
房產資料爬取、智慧財產權資料爬取、企業工商資料爬取、抖音直播間資料python爬蟲爬取
2024-07-11
Python爬蟲
爬蟲實戰（一）：爬取微博使用者資訊
2018-07-15
爬蟲
Python爬蟲實戰詳解：爬取圖片之家
2020-11-04
Python爬蟲
Python爬蟲之小說資訊爬取與資料視覺化分析
2021-01-09
Python爬蟲視覺化
Python：爬取疫情每日資料
2020-02-17
Python
Python&R爬取分析趕集網北京二手房資料（附詳細程式碼）
2019-02-26
Python
Python+資料分析：資料分析：北京Python開發的現狀
2018-11-24
Python
【python爬蟲案例】利用python爬取豆瓣讀書評分TOP250排行資料
2024-09-20
Python爬蟲
python爬取前程無憂和拉勾資料分析崗位並分析
2021-09-09
Python
API商品資料介面呼叫實戰：爬蟲與資料獲取
2023-10-29
API爬蟲
利用Python爬取新冠肺炎疫情實時資料，Pyecharts畫2019-nCoV疫情地圖
2020-02-06
PythonEcharts地圖
利用Python自動爬取全國30+城市地鐵圖資料
2019-01-18
Python
Python如何爬取實時變化的WebSocket資料
2019-03-10
PythonWeb
python爬蟲獲取天氣網實時資料
2022-11-29
Python爬蟲
大資料實戰：電商該如何利用大資料獲取流量？
2019-06-03
大資料
入門Python資料分析最好的實戰專案（一）
2018-07-01
Python
Python 爬取 baidu 股票市值資料
2019-02-16
PythonAI

北京市政百姓信件分析實戰一 （利用python爬取資料）

相關文章

北京市政百姓信件分析實戰一（利用python爬取資料）