資料匯入與預處理實驗二---json格式檔案轉換

chaRon522發表於2020-09-26

原文網址 : https://blog.csdn.net/weixin_43588190/article/details/108738462

一、實驗概述：
【實驗目的】

初步掌握資料採集的方法；
初步掌握利用爬蟲爬取網路資料的方法
掌握不同資料格式之間的轉換方法；

【實施環境】（使用的材料、裝置、軟體） Linux或Windows作業系統環境，MySql資料庫，Python或其他高階語言

二、實驗內容
第1題爬取網路資料
【實驗要求】

爬取酷狗音樂網站（https://www.kugou.com/）上榜單前500名的歌曲名稱，演唱者，歌名和歌曲時長
將爬取的資料以JSon格式檔案儲存。
讀取JSON格式任意資料，檢驗檔案格式是否正確。

【實驗過程】（步驟、記錄、資料、程式等）
請提供操作步驟及介面截圖證明。

from bs4 import BeautifulSoup
import requests
import time
import re
import json
import demjson
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
}

nameList = []
singerList = []
timeList = []
song = []
total = []
keys = ['songName','singer','time']

def get_info(url, file):
    res = requests.get(url, headers=headers)
    res.encoding = file.encoding  # 同樣讀取和寫入的編碼格式
    soup = BeautifulSoup(res.text, 'lxml')
    ranks = soup.select('span.pc_temp_num')
    titles = soup.select('a.pc_temp_songname')
    times = soup.select('span.pc_temp_time')
    #jsonData = []
    for rank, title, time in zip(ranks, titles, times):
        data = {
            #'rank': rank.get_text().strip(),
            'title': title.get_text().strip(),
            'time': time.get_text().strip()
        }
        #print(data)

        singer, songName = data['title'].split(' - ')
        nameList.append(songName)
        singerList.append(singer)
        timeList.append(data['time'])
        #print(nameList)
        #print(singerList)
        #print(data['time'])
        #print(timeList)
        #print(singer, songName)
        #print(jsonData)

def output(url, file):
    songInfo = []
    for i in range(0,len(nameList)):
        #print(nameList[i])
        #print(singerList[i])
        #print(timeList[i])
        songInfo.append(nameList[i])
        songInfo.append(singerList[i])
        songInfo.append(timeList[i])
    #print(songInfo)
    for i in range(0, len(songInfo), 3):
        temp = songInfo[i:i + 3]
        song.append(temp)
    #print(len(song))
    file.write('{\n"songInfo":[\n')
    for i in range(0,len(song)):
        d = dict(zip(keys, song[i]))
        #print(d)
        file.write(json.dumps(d,ensure_ascii=False,indent=4,separators=(',', ': ')))
        if i != len(song)-1:
            file.write(',')
    file.write('\n]\n}')
def get_website_encoding(url):  # 一般每個網站自己的網頁編碼都是一致的,所以只需要搜尋一次主頁確定
    res = requests.get(url, headers=headers)
    charset = re.search("charset=(.*?)>", res.text)
    if charset is not None:
        blocked = ['\'', ' ', '\"', '/']
        filter = [c for c in charset.group(1) if c not in blocked]
        return ''.join(filter)  # 修改res編碼格式為源網頁的格式,防止出現亂碼
    else:
        return res.encoding  # 沒有找到編碼格式,返回res的預設編碼

if __name__ == '__main__':
    encoding = get_website_encoding('http://www.kugou.com')
    #print(encoding)
    urls = ['http://www.kugou.com/yy/rank/home/{}-8888.html?from=rank'.format(str(i)) for i in range(1, 23)]
with open(r'.\kugou_500.json', 'w+', encoding=encoding) as f:
    #f.write("歌手         歌名          長度\n")
    for url in urls:
        get_info(url, f)
        time.sleep(1) #緩衝一秒,防止請求頻率過快
    output(url,f)

得到的json檔案
在這裡插入圖片描述
開啟使用json.load開啟檔案，成功輸出後代表檔案格式正確

import json

with open("kugou_500.json",'r',encoding='UTF-8') as f:
    new_dict = json.load(f)
    print(new_dict)

在這裡插入圖片描述

第2題程式設計生成CSV檔案並轉換成JSon格式
【實驗要求】

程式設計生成CSV格式檔案。檔案內容如下：姓名，性別，籍貫，系別張迪，男，重慶，計算機系蘭博，男，江蘇，通訊工程系黃飛，男，四川，物聯網系鄧玉春，女，陝西，計算機系周麗，女，天津，藝術系李雲，女，上海，外語系
將上述CSV格式檔案轉換成JSon格式，並查詢檔案中所有女生的資訊。

【實驗過程】（步驟、記錄、資料、程式等）
請提供操作步驟及介面截圖證明。

import csv
#建立檔案物件
f = open("question02.csv","w",encoding="utf-8")
#構建csv寫入物件
csv_writer = csv.writer(f)
#構建列表頭
csv_writer.writerow(["姓名","性別","籍貫","系別"])
#寫入csv檔案內容
csv_writer.writerow(["張迪","男","重慶","計算機系"])
csv_writer.writerow(["蘭博","男","江蘇","通訊工程系"])
csv_writer.writerow(["黃飛","男","四川","物聯網系"])
csv_writer.writerow(["周麗","女","天津","藝術系"])
csv_writer.writerow(["李芸","女","上海","外語系"])

在這裡插入圖片描述
轉換為json格式

import csv
import json
csvFile = open("question02.csv","r",encoding="utf-8")
jsonFile = open("question02.json","w",encoding="utf-8")

fieldNames = {"姓名","性別","籍貫","系別"}
reader = csv.DictReader(csvFile)
i = 1
jsonFile.write('{\n"personInfo":[\n')
for row in reader:
    print(row)
    jsonFile.write(json.dumps(row,ensure_ascii=False,indent=4))
    if i != 5:
        jsonFile.write(',')
        i = i+1
jsonFile.write('\n]\n}')

在這裡插入圖片描述

import json
with open("question02.json","r",encoding="utf-8") as f:
    data = json.load(f)
    #print(data['personInfo'][1]['性別'])
    #print(type(data))
    for i in range(0,5):
        if data['personInfo'][i]['性別'] == '女':
            print(data['personInfo'][i])

在這裡插入圖片描述

第3題. XML格式檔案與JSon的轉換
【實驗內容集要求】
(1) 讀取以下XML格式的檔案，內容如下： <?xml
version=”1.0” encoding=”gb2312”> <圖書> <書名>紅樓夢</書名> <作者>曹雪芹</作者><主要內容>描述賈寶玉和林黛玉的愛情故事</主要內容> <出版社>人民文學出版社</出版社> </圖書>
(2) 將以上XML格式檔案轉換成JSon格式。

【實驗過程】（步驟、記錄、資料、程式等）
請提供相應程式碼及程式執行介面截圖。

新建xml檔案
在這裡插入圖片描述

import xml.dom.minidom
import xmltodict
import json
#開啟xml文件
#dom = xml.dom.minidom.parse('question_03.xml')
#得到文件元素物件
#root = dom.documentElement
#bb = root.getElementsByTagName('書名')
#print(bb[0].firstChild.data)

#獲取xml檔案
file = open("question_03.xml","r",encoding="utf-8")
#讀取檔案內容
xmlStr = file.read()
#print(xmlStr)
jsonStr = xmltodict.parse(xmlStr)
#print(jsonStr)
with open("question03JSON.json","w",encoding="utf-8") as f:
    f.write(str(json.dumps(jsonStr,ensure_ascii=False,indent=4,separators=(',', ': '))))

在這裡插入圖片描述

處理json格式的資料
2024-07-23
JSON
Poi 匯入格式轉換
2018-09-13
如何透過Python將JSON格式檔案匯入redis？
2024-02-08
PythonJSONRedis
教程：如何通過DLA實現資料檔案格式轉換
2018-11-22
生信分析預處理：plink兩種格式識別與轉換
2020-11-16
資料預處理-資料整合與資料變換
2020-01-19
ChannelHandler之間處理資料格式轉換與Netty自帶的Channelhandler
2018-05-29
Netty
處理檔案上傳時的訊息格式轉換問題
2023-11-17
Python資料處理(一)：處理 JSON、XML、CSV 三種格式資料
2019-01-27
PythonJSONXML
Python字典格式與JSON格式的相互轉換
2019-02-20
PythonJSON
MySQL MaxCompute與AnalyticDB實現資料處理與轉換過程
2023-02-04
MySql
js 匯入json配置檔案
2018-06-18
JSON
json字串轉義格式化後再轉換處理demo StringEscapeUtils.unescapeJava
2024-09-07
JSON字串Java
影片格式處理：騰訊影片格式怎麼轉換成mp4檔案？
2021-04-26
bat批處理轉換成exe檔案
2024-11-06
BAT
plist檔案格式轉換器
2023-04-12
把JSON資料格式轉換為Python的類物件
2019-06-04
JSONPython物件
玩轉大資料系列之二：資料分析與處理
2019-01-07
大資料
Sqoop匯入資料異常處理
2019-01-30
OOP
SCAU 高程綜合實驗：檔案操作與字元處理
2020-12-26
字元
【Java】基本資料、包裝類間轉換與處理
2019-08-11
Java
[Java] 基本資料、包裝類間轉換與處理
2019-08-11
Java
ofd檔案如何轉換成pdf格式電腦上ofd檔案如何轉換成pdf格式
2022-02-22
Python將xml格式轉換為json格式
2019-03-22
PythonXMLJSON
資料集轉換JSON
2024-07-04
JSON
flutter json資料處理
2019-08-26
FlutterJSON
Hive處理Json資料
2021-11-30
HiveJSON
如何將檔案PDF格式轉換成Word格式
2019-02-17
java程式碼實現excel檔案資料匯入
2020-11-11
JavaExcel
Pytorch資料讀取與預處理實現與探索
2021-03-26
PyTorch
ofd檔案如何轉換成pdf格式電腦ofd檔案如何免費轉換為pdf格式
2022-04-16
springboot去讀json檔案解析json陣列處理
2020-03-14
Spring BootJSON陣列
使用csv批量匯入、匯出資料的需求處理
2020-09-30
csv格式怎麼轉換成excel？csv格式轉換成excel格式檔案的方法
2019-04-08
Excel
如何使用python把json檔案轉換為csv檔案
2021-03-12
PythonJSON
Hive資料格式轉換
2019-01-08
Hive
將json資料轉換為Python字典將json資料轉換為Python字典
2023-11-07
JSONPython
gis pro中將shp檔案轉為/匯入地理資料庫有什麼好處？
2024-08-26
資料庫

資料匯入與預處理實驗二---json格式檔案轉換

相關文章