Pandas介紹與安裝

　　為什麼會有Pandas

　　 Pandas支援大部分Numpy語言風格，尤其是陣列函式與廣播機制的各種資料處理。單是Numpy更適合處理同質型的資料。而Pandas的設計就是用來處理表格型或異性資料的，高效的清洗、處理資料。

　　Pandas是什麼?

　　 Pandas是基於Numpy的一種工具，提供高效能矩陣的運算，該工具是為瞭解決資料分析任何而建立的。也是貫穿整個Python資料分析非常核心的工具

　　安裝Pandas

　　pip install Pandas

　　Pandas涉及內容

　　 Pandas基礎、資料清洗與準備、資料聚合與分組、時間序列

　　Pandas資料結構介紹

　　Series介紹

　　 Series是一種一維陣列物件，它包含了一個值序列(value) ,並且包含了資料標籤，稱之為索引 (index)

　　Series建立

　　pd.Series(data=None,index=None,dtype=None,name=None,copy=False)

　　data : 建立陣列的資料，可為array、like、dict、or scalar value

　　index : 指定索引

　　dtype : 陣列資料型別

　　name : 陣列名稱

　　copy : 是否複製

　　Pandas陣列函式

　　語法　　基本使用

　　dtype　　檢視資料型別

　　astype　　修改資料型別

　　head()　　預覽前幾條資料

　　tail()　　預覽後幾條資料

　　In [15]: # 指定索引序列

　　In [16]: series = pd.Series(np.arange(4),index=['a','b','c','d'])

　　In [17]: series

　　Out[17]:

　　a 0

　　b 1

　　c 2

　　d 3

　　dtype: int32

　　In [18]: # 指定索引的名字

　　In [19]: series = pd.Series(np.arange(4),index=['a','b','c','d'],name='SmallJ')

　　In [20]: series

　　Out[20]:

　　a 0

　　b 1

　　c 2

　　d 3

　　Name: SmallJ, dtype: int32

　　In [21]: # 預設返回int32,可指定其他型別

　　In [23]: series = pd.Series(np.arange(4),index=['a','b','c','d'],name='SmallJ',dtype='int64')

　　In [24]: series

　　Out[24]:

　　a 0

　　b 1

　　c 2

　　d 3

　　Name: SmallJ, dtype: int64

　　In [29]: import numpy as np

　　In [30]: import pandas as pd

　　In [31]: series = pd.Series(np.arange(10),name='SmallJ')

　　In [32]: series

　　Out[32]:

　　0 0

　　1 1

　　2 2

　　3 3

　　4 4

　　5 5

　　6 6

　　7 7

　　8 8

　　9 9

　　Name: SmallJ, dtype: int32

　　In [33]: # 前面為索引後面為值

　　In [34]: series.dtype

　　Out[34]: dtype('int32')

　　In [35]: # 檢視資料型別

　　In [36]: series.dtype

　　Out[36]: dtype('int32')

　　In [37]: # 修改資料型別

　　In [38]: series.astype('float64')

　　Out[38]:

　　0 0.0

　　1 1.0

　　2 2.0

　　3 3.0

　　4 4.0

　　5 5.0

　　6 6.0

　　7 7.0

　　8 8.0

　　9 9.0

　　Name: SmallJ, dtype: float64

　　In [39]: # 預覽從頭開始的資料 (括號內填指定的資料)

　　In [40]: series.head(5)

　　Out[40]:

　　0 0

　　1 1

　　2 2

　　3 3

　　4 4

　　Name: SmallJ, dtype: int32

　　In [41]: series.head(6)

　　Out[41]:

　　0 0

　　1 1

　　2 2

　　3 3

　　4 4

　　5 5

　　Name: SmallJ, dtype: int32

　　In [42]: # 預覽最後的資料 (括號填指定的資料)

　　In [43]: series.tail(5)

　　Out[43]:

　　5 5

　　6 6

　　7 7

　　8 8

　　9 9

　　Name: SmallJ, dtype: int32

　　Series的索引與值

　　series.index

　　檢視索引

　　series.values

　　檢視值序列

　　series.reset_index(drop=False)

　　重置索引

　　drop 是否刪除原索引預設為否

　　In [89]: import pandas as pd

　　In [90]: import numpy as np

　　In [91]: series = pd.Series(data=np.arange(5),index=['a','b','c','d','e'])

　　In [92]: series

　　Out[92]:

　　a 0

　　b 1

　　c 2

　　d 3

　　e 4

　　dtype: int32

　　In [93]: # 檢視索引

　　In [94]: series.index

　　Out[94]: Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

　　In [95]: series.values

　　Out[95]: array([0, 1, 2, 3, 4])

　　In [96]: series.reset_index()

　　Out[96]:

　　index 0

　　0 a 0

　　1 b 1

　　2 c 2

　　3 d 3

　　4 e 4

　　In [98]: series

　　Out[98]:

　　a 0

　　b 1

　　c 2

　　d 3

　　e 4

　　dtype: int32

　　In [99]: # 檢視值序列

　　In [100]: series.values

　　Out[100]: array([0, 1, 2, 3, 4])

　　In [101]: # 當drop中的值為True的時候將採用刪除原索引,並不會對原資料進行修改，需要複製

　　In [102]: series = series.reset_index(drop=True)

　　In [103]: series

　　Out[103]:

　　0 0

　　1 1

　　2 2

　　3 3

　　4 4

　　dtype: int32

　　Series索引與切片

　　series.[‘標籤索引’]

　　透過標籤索引來取值

　　series[‘索引’]

　　透過下標索引來取值

　　series.loc(標籤索引)

　　透過標籤索引來取值

　　series.iloc(索引)

　　透過索引

　　In [115]: # 透過標籤來取值

　　In [116]: series.loc['b']

　　Out[116]: 1

　　In [117]: # 透過索引下標來取值

　　In [118]: series.iloc[1]

　　Out[118]: 1

　　採用神奇索引

　　In [139]: series

　　Out[139]:

　　a 0

　　b 1

　　c 10

　　d 3

　　e 22

　　dtype: int32

　　In [141]: # 採用標籤來取值

　　In [142]: series[['a','e']]

　　Out[142]:

　　a 0

　　e 22

　　dtype: int32

　　In [143]: # 採用索引取值

　　In [144]: series[[0,-1]]

　　Out[144]:

　　a 0

　　e 22

　　dtype: int32

　　Series修改值

　　In [122]: series

　　Out[122]:

　　a 0

　　b 1

　　c 2

　　d 3

　　e 4

　　dtype: int32

　　透過索引來修改值

　　series.iloc[2] = 10

　　透過標籤來修改值

　　series.loc['e'] = 22

　　In [139]: series

　　Out[139]:

　　a 0

　　b 1

　　c 10

　　d 3

　　e 22

　　dtype: int32

　　判斷值是否存在

　　in 並不是判斷值，而是根據標籤索引來判斷

　　Series運算

　　共同索引對應運算，其他值填充為NaN

　　Pandas會自動幫我們進行資料轉換，當我們的資料型別為None時，會把資料替換為NaN

　　當沒用透過索引的時候，將全部變為NaN

　　NaN與任何值計算都是NaN

　　In [148]: data = pd.Series(data=[1,2,3,4,None],index=['a','b','c','d','e'])

　　In [149]: data

　　Out[149]:

　　a 1.0

　　b 2.0

　　c 3.0

　　d 4.0

　　e NaN

　　dtype: float64

　　當進行對應標籤索引進行相加的時候

　　In [148]: data = pd.Series(data=[1,2,3,4,None],index=['a','b','c','d','e'])

　　In [149]: data

　　Out[149]:

　　a 1.0

　　b 2.0

　　c 3.0

　　d 4.0

　　e NaN

　　dtype: float64

　　In [150]: data1 = pd.Series(data=[1,2,3,4,None],index=['a','b','c','d','e'])

　　In [151]: data1

　　Out[151]:

　　a 1.0

　　b 2.0

　　c 3.0

　　d 4.0

　　e NaN

　　dtype: float64

　　In [152]: data + data1

　　Out[152]:

　　a 2.0

　　b 4.0

　　c 6.0

　　d 8.0

　　e NaN

　　dtype: float64

　　當對應的標籤索引位置進行相加時

　　當對應是索引的位置沒有數值時，顯示的數值為NaN

　　In [148]: data = pd.Series(data=[1,2,3,4,None],index=['a','b','c','d','e'])

　　In [153]: data2 = pd.Series(data=[1,2,3],index=['a','b','c'])

　　In [156]: data

　　Out[156]:

　　a 1.0

　　b 2.0

　　c 3.0

　　d 4.0

　　e NaN

　　dtype: float64

　　In [157]: data2

　　Out[157]:

　　a 1

　　b 2

　　c 3

　　dtype: int64

　　In [158]: data + data2

　　Out[158]:

　　a 2.0

　　b 4.0

　　c 6.0

　　d NaN

　　e NaN

　　dtype: float64

　　當不對應的索引標籤進行相加的時候

　　當對應的索引標籤不相同的時，顯示的全部結果為NaN

　　In [161]: data2 = pd.Series(data=[1,2,3],index=['a','b','c'])

　　In [162]: data3 = pd.Series(data=[1,2,3,4],index=['d','e','f','g'])

　　In [163]: data2

　　Out[163]:

　　a 1

　　b 2

　　c 3

　　dtype: int64

　　In [164]: data3

　　Out[164]:

　　d 1

　　e 2

　　f 3

　　g 4

　　dtype: int64

　　In [165]: data2 + data3

　　Out[165]:

　　a NaN

　　b NaN

　　c NaN

　　d NaN

　　e NaN

　　f NaN

　　g NaN

　　dtype: float64

Pandas基礎介紹

相關文章