利用python爬取丁香醫生上新型肺炎資料,並下載到本地,附帶經緯度資訊

新月清光發表於2020-02-07

新型肺炎肆虐全國,可以預知,最近一兩年地理學中會有一部分論文研究新型肺炎的空間分佈及與其他指標的關聯分析。獲取其患病人員分佈資料,對於科學研究具有一定的指導意義,因此利用python爬取丁香醫生上實時的資料,並將其打包成exe檔案,可以本地直接執行,不需配置環境,當然爬取的資料沒有經緯度資訊,這裡我利用百度地圖開發者平臺,通過市名獲取經緯度座標。

軟體下載地址:

不帶經緯度(可用)
連結:https://pan.baidu.com/s/1ffcGv7CsaKPPDohFd03pww
提取碼:ibql

帶經緯度(受百度地圖api呼叫次數限制)
連結:https://pan.baidu.com/s/1zgPIre_39eG9iQfxTxq1Fg
提取碼:1tmi

執行效果如下:

在這裡插入圖片描述
對比丁香醫生上資料:在這裡插入圖片描述

在這裡插入圖片描述
完整程式碼如下:

不帶經緯度程式碼:

import requests,re
import json
import time
import csv


url = 'https://service-f9fjwngp-1252021671.bj.apigw.tencentcs.com/release/pneumonia'
html = requests.get(url).text
unicodestr=json.loads(html)  #將string轉化為dict
dat = unicodestr["data"].get("statistics")["modifyTime"] #獲取data中的內容,取出的內容為str
timeArray = time.localtime(dat/1000)
formatTime = time.strftime("%Y-%m-%d %H:%M", timeArray)


new_list = unicodestr.get("data").get("listByArea")  #獲取data中的內容,取出的內容為str

j = 0
print("###############"
      "版權所有:殷宗敏   &"
      "& 資料來源:丁香醫生 "
      "###############")
while j < len(new_list):
    a = new_list[j]["cities"]
    s = new_list[j]["provinceName"]

    header = ['時間', '城市', '確診人數', '疑似病例', '死亡人數', '治癒人數' ]
    with open('./'+s+'.csv', encoding='utf-8-sig', mode='w',newline='') as f:
    #編碼utf-8後加-sig可解決csv中文寫入亂碼問題
        f_csv = csv.writer(f)
        f_csv.writerow(header)
    f.close()

    def save_data(data):
        with open('./'+s+'.csv', encoding='UTF-8', mode='a+',newline='') as f:
            f_csv = csv.writer(f)
            f_csv.writerow(data)
        f.close()

    b = len(a)
    i = 0
    while i<b:
        data = (formatTime)
        confirm = (a[i]['confirmed'])
        city = (a[i]['cityName'])
        suspect = (a[i]['suspected'])
        dead = (a[i]['dead'])
        heal = (a[i]['cured'])

        i+=1
        tap = (data, city, confirm, suspect, dead, heal)
        save_data(tap)

    j += 1
    print(s+"下載結束!")

具有經緯度功能程式碼:

import requests,re
import json
import time
import csv
from urllib.request import urlopen, quote

url = 'https://service-f9fjwngp-1252021671.bj.apigw.tencentcs.com/release/pneumonia'
html = requests.get(url).text
unicodestr=json.loads(html)  #將string轉化為dict
dat = unicodestr["data"].get("statistics")["modifyTime"] #獲取data中的內容,取出的內容為str
timeArray = time.localtime(dat/1000)
formatTime = time.strftime("%Y-%m-%d %H:%M", timeArray)

url = 'http://api.map.baidu.com/geocoder/v2/'
output = 'json'
ak = 'XeCfCY777qDMTKSqyc3LTiGPnMA7fqzy'#你的ak

new_list = unicodestr.get("data").get("listByArea")  #獲取data中的內容,取出的內容為str

j = 0
print("###############"
      " 版權所有:殷宗敏   &"
      "&   資料來源:丁香醫生 "
      "###############")
while j < len(new_list):
    a = new_list[j]["cities"]
    s = new_list[j]["provinceName"]

    header = ['時間', '城市', '確診人數', '疑似病例', '死亡人數', '治癒人數' ,'經度','緯度']
    with open('./'+s+'.csv', encoding='utf-8-sig', mode='w',newline='') as f:
    #編碼utf-8後加-sig可解決csv中文寫入亂碼問題
        f_csv = csv.writer(f)
        f_csv.writerow(header)
    f.close()

    def save_data(data):
        with open('./'+s+'.csv', encoding='UTF-8', mode='a+',newline='') as f:
            f_csv = csv.writer(f)
            f_csv.writerow(data)
        f.close()

    b = len(a)
    i = 0
    while i<b:
        data = (formatTime)
        confirm = (a[i]['confirmed'])
        city = (a[i]['cityName'])
        suspect = (a[i]['suspected'])
        dead = (a[i]['dead'])
        heal = (a[i]['cured'])

        add = quote(a[i]['cityName'])
        uri = url + '?' + 'address=' + add + '&output=' + output + '&ak=' + ak  # 百度地理編碼API
        req = urlopen(uri)
        res = req.read().decode()
        temp = json.loads(res)

        if temp['status'] == 1:
            temp["result"] = {'location': {'lng': 0, 'lat': 0}}

        lon = temp['result']['location']['lng']
        lat = temp['result']['location']['lat']

        i+=1
        tap = (data, city, confirm, suspect, dead, heal, lon, lat)
        save_data(tap)

    j += 1
    print(s+"下載結束!")
print("##########資料下載結束#########")

相關文章