Pandas之EXCEL資料讀取/儲存/檔案分割/檔案合併

周小董發表於2019-01-30

原文網址 : https://blog.csdn.net/xc_zhou/article/details/86701551

excel的寫入函式為pd.DataFrame.to_excel()；必須是DataFrame寫入excel, 即Write DataFrame to an excel sheet。

pd.to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='', float_format=None,columns=None, 
header=True, index=True, index_label=None,startrow=0, startcol=0, engine=None, 
merge_cells=True, encoding=None,inf_rep='inf', verbose=True, freeze_panes=None)

test.csv

index,a_name,b_name
0,1,3
1,2,3
2,3,4
3,5

讀csv檔案

# -*- coding:utf-8 -*-
import pandas as pd

df = pd.read_csv('test.csv')
print(df)

輸出

   index  a_name  b_name
0      0       1     3.0
1      1       2     3.0
2      2       3     4.0
3      3       5     NaN

讀取excel

讀取excel主要通過read_excel函式實現，除了pandas還需要安裝第三方庫xlrd。

'''
pd.read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0, index_col=None, 
    names=None,parse_cols=None, parse_dates=False, date_parser=None, na_values=None, 
    thousands=None, convert_float=True, has_index_names=None, converters=None, dtype=None, 
    true_values=None, false_values=None, engine=None, squeeze=False, **kwds)
    
該函式主要的引數為io、sheetname、header、names、encoding。
    io:excel檔案，可以是檔案路徑、檔案網址、file-like物件、xlrd workbook;
    sheet_name:返回指定的sheet，引數可以是字串（sheet名）、整型（sheet索引）、
        list（元素為字串和整型，返回字典{'key':'sheet'}）、None（返回字典，全部sheet）;
    header:指定資料表的表頭，引數可以是int、list of ints，即為索引行數為表頭;
    names:返回指定name的列，引數為array-like物件。
    encoding:關鍵字引數，指定以何種編碼讀取。
該函式返回pandas中的DataFrame或dict of DataFrame物件，利用DataFrame的相關操作即可讀取相應的資料。
'''

df = pd.read_excel('excel_output.xls',sheet_name=None)
# print(df.head())  #看看讀取的資料，預設為前5行
print(df['2']) #指定sheet

xls_file=pd.ExcelFile('excel_output.xls')

print(xls_file.sheet_names)#顯示出讀入excel檔案中的表名字
sheet1=xls_file.parse('2')
sheet2=xls_file.parse(0)
print('sheet1:',sheet1)
print('sheet2:',sheet2)

寫入excel

寫入excel主要通過pandas構造DataFrame，呼叫to_excel方法實現。

'''
pd.to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='', float_format=None,columns=None, 
    header=True, index=True, index_label=None,startrow=0, startcol=0, engine=None, 
    merge_cells=True, encoding=None,inf_rep='inf', verbose=True, freeze_panes=None)

excel_writer:寫入的目標excel檔案，可以是檔案路徑、ExcelWriter物件;
sheet_name:表名
na_rep ： 缺失值填充
    如果na_rep設定為bool值，則寫入excel時改為0和1；也可以寫入字串或數字
    na_rep=True --> 1
    na_rep=False --> 0
    na_rep=3 --> 3
    na_rep='a' --> 'a'
columns ：選擇輸出的的列存入
index：預設為True，顯示index，當index=False 則不顯示行索引（名字）
header :指定作為列名的行，預設0，即取第一行，資料為列名行以下的資料；
    若資料不含列名，則設定 header = None
index_label：設定索引列的列名
encoding:指定寫入編碼，string型別。
'''
'''一個excel寫入1個sheet'''
df.to_excel('excel_output.xls',sheet_name='2',na_rep=True,columns=['index','b_name'],index=False)

'''一個excel寫入多個sheet'''
writer = pd.ExcelWriter('output.xlsx')
# df1 = pd.DataFrame(data={'col1':[1,1], 'col2':[2,2]})
df1 = pd.DataFrame(data=[{'col1':1, 'col2':2},{'col1':3, 'col2':4}])
df1.to_excel(writer,sheet_name='Sheet1')
df1.to_excel(writer,sheet_name='2')
writer.save()
writer.close()
#-------------------------------------------------------
df1 = pd.DataFrame({'Data1': [1, 2, 3, 4, 5, 6, 7]})
df2 = pd.DataFrame({'Data2': [8, 9, 10, 11, 12, 13]})
df3 = pd.DataFrame({'Data3': [14, 15, 16, 17, 18]})
with pd.ExcelWriter('output2.xlsx') as writer:
    df1.to_excel(writer, sheet_name='Data1', startcol=0, index=False)
    df2.to_excel(writer, sheet_name='Data1', startcol=1, index=False)
    df3.to_excel(writer, sheet_name='Data3', index=False)

將一個EXCEL檔案分割成多個檔案

有時因為一個EXCEL檔案的資料量很大，需要分割成多個檔案進行處理。這時用Pandas的切片操作即可達到要求。

import pandas as pd
data = pd.read_excel('E:\\PythonTestCode\\public opinion.xlsx', sheetname='public opinion')

row_num, column_num = data.shape    #資料共有多少行，多少列
print('the sample number is %s and the column number is %s' % (row_num, column_num))
#這裡我們的資料共有210000行，假設要讓每個檔案1萬行資料，即分成21個檔案
for i in range(0, 21):
    save_data = data.iloc[i*10000+1:(i+1)*10000+1, :] #每隔1萬迴圈一次
    file_name= 'E:\\PythonTestCode\\public opinion\\public opinion' + str(i) + '.xlsx'
    save_data.to_excel(file_name, sheet_name = 'public opinion', index = False)

分割前的檔案是這樣

分割後的檔案就有這麼些了

將多個EXCEL檔案合併成一個檔案

分割的檔案處理完了我們可能又要把它們合併在一起。這時可以用pandas的concat功能來實現。

import pandas as pd

data0 = pd.read_excel('E:\\PythonTestCode\\public opinion\\public opinion0.xlsx', sheetname='public opinion')
data1 = pd.read_excel('E:\\PythonTestCode\\public opinion\\public opinion1.xlsx', sheetname='public opinion')
data = pd.concat([data0, data1])

for i in range(2, 21):
    file_name = 'E:\\PythonTestCode\\public opinion\\public opinion' + str(i) + '.xlsx'
    data2 = pd.read_excel(file_name)
    data = pd.concat([data, data2])
data.to_excel('E:\\PythonTestCode\\public opinion\\public opinion-concat.xlsx', index = False)

這樣就把所有的檔案都合併在了一起。

在Pandas中直接載入MongoDB的資料

import pymongo
import pandas as pd

client = pymongo.MongoClient('localhost',27017)
db  = client['Lottery']
collection = db['Pk10']

data = pd.DataFrame(list(collection .find()))

#刪除mongodb中的_id欄位
del data['_id']

#選擇需要顯示的欄位
data = data[['date','num1','num10']]
print(data)

參考：https://blog.csdn.net/brucewong0516/article/details/79097909
https://zhuanlan.zhihu.com/p/36031795
https://www.cnblogs.com/snaildev/archive/2018/04/22/8907952.html

Springboot整合MongoDB儲存檔案、讀取檔案
2023-04-14
Spring BootMongoDB
資料儲存--檔案儲存
2024-05-26
使用openpyxl庫讀取Excel檔案資料
2023-11-05
Excel
Android中的資料儲存之檔案儲存
2020-03-11
Android
前端讀取excel檔案
2024-06-08
前端Excel
golang 讀取切分儲存byte流檔案
2019-02-16
Golang
（slam工具）1檔案讀取和儲存
2024-06-17
SLAM
python讀取兩個excel資料檔案輸出整理好以後的excel資料檔案
2020-10-19
PythonExcel
shell 檔案合併去重分割
2020-09-25
java快速分割及合併檔案
2021-09-09
Java
多個 EXCEL 檔案如何合併成一個檔案
2020-06-16
Excel
Python批次分割Excel後逐行做差、合併檔案的方法
2024-09-03
PythonExcel
多個excel檔案合併成一個excel表的方法如何快速合併多個excel檔案
2022-02-08
Excel
Python專案實踐：串列埠字串資料的讀取、分割與儲存到csv檔案
2020-12-15
Python串列埠字串
讀取資料夾檔案
2024-05-31
資料檔案合併與拆分
2020-11-19
pandas讀 .sql檔案
2024-09-08
SQL
讀取本地Excel檔案生成echarts
2020-12-07
ExcelEcharts
EasyExcel庫來讀取指定Excel檔案中的資料
2024-03-28
Excel
Pandas 基礎 (4) - 讀 / 寫 Excel 和 CSV 檔案
2019-03-08
Excel
Python中檔案讀取與儲存程式碼示例
2024-04-15
Python
Flutter持久化儲存之檔案儲存
2019-03-06
Flutter持久化
Python之合併PDF檔案
2018-05-18
Python
檔案儲存
2019-05-23
使用yaml檔案讀取資料
2020-11-21
YAML
php讀取excel檔案資料的匯入和匯出
2018-06-09
PHPExcel
nodejs 讀取excel檔案，並去重
2019-02-16
NodeJSExcel
VBA遍歷 Excel 合併到一個 Excel 檔案
2024-03-20
Excel
【Python基礎】Python處理Excel檔案，進行篩選資料、排序等操作及儲存新的Excel檔案
2020-12-13
PythonExcel排序
Spring之Property檔案讀取
2019-03-13
Spring
大資料檔案儲存系統HDFS
2019-01-15
大資料
Go Web：資料儲存(2)——CSV檔案
2018-12-03
GoWeb
js讀取excel檔案,繪製echarts圖形---資料處理
2018-12-16
JSExcelEcharts
Git 合併指定檔案或資料夾
2018-04-25
Git
Android 檔案儲存
2019-05-13
Android
CSV檔案儲存
2024-06-09
儲存json檔案
2024-07-13
JSON
塊儲存檔案儲存物件儲存
2020-05-28
物件