部落格地址:https://www.cnblogs.com/zylyehuo/
開發環境
- anaconda
- 整合環境:整合好了資料分析和機器學習中所需要的全部環境
- 安裝目錄不可以有中文和特殊符號
- jupyter
- anaconda提供的一個基於瀏覽器的視覺化開發工具
import pandas as pd
import numpy as np
級聯操作 -- 對應表格
- pd.concat
- pd.append
- pandas使用pd.concat函式,與np.concatenate函式類似,只是多了一些引數:
- objs
- axis=0
- keys
- join='outer' / 'inner':表示的是級聯的方式,outer會將所有的項進行級聯(忽略匹配和不匹配),而inner只會將匹配的項級聯到一起,不匹配的不級聯
- ignore_index=False
匹配級聯
df1 = pd.DataFrame(data=np.random.randint(0,100,size=(5,3)),columns=['A','B','C'])
df2 = pd.DataFrame(data=np.random.randint(0,100,size=(5,3)),columns=['A','D','C'])
pd.concat((df1,df2),axis=1) # 行列索引都一致的級聯叫做匹配級聯
![](https://i.iter01.com/images/f8933c4137899170e71b9749cb36e615e397fc0e41381c02fe6c7e7122bda4a0.png)
不匹配級聯
- 不匹配指的是級聯的維度的索引不一致。例如縱向級聯時列索引不一致,橫向級聯時行索引不一致
- 有2種連線方式:
- 外連線:補NaN(預設模式)
- 內連線:只連線匹配的項
pd.concat((df1,df2),axis=0)
![](https://i.iter01.com/images/2e3eee60de15e578c39c0d82d53b88a1f2402b4e3c42b1ba05c6f954a9ed559c.png)
內連線
pd.concat((df1,df2),axis=0,join='inner') # inner直把可以級聯的級聯不能級聯不處理
![](https://i.iter01.com/images/cf5dd9a313f2777490576e8b2fc54d698ec94c89d7979bb2141e7dbed7c23703.png)
外連線
- 如果想要保留資料的完整性必須使用 outer(外連線)
pd.concat((df1,df2),axis=0,join='outer')
![](https://i.iter01.com/images/a077ea163d83f3824778f7086570df8eae0ee5c703a3368ba39b60f7a931d067.png)
df1.append(df2)
![](https://i.iter01.com/images/b4968f214d387101180b04ad6cb54b86d217fb3775402acfddf8ca0549580811.png)
合併操作 -- 對應資料
- merge與concat的區別在於,merge需要依據某一共同列來進行合併
- 使用pd.merge()合併時,會自動根據兩者相同column名稱的那一列,作為key來進行合併。
- 注意每一列元素的順序不要求一致
一對一合併
from pandas import DataFrame
df1 = DataFrame({'employee':['Bob','Jake','Lisa'],
'group':['Accounting','Engineering','Engineering'],
})
df1
![](https://i.iter01.com/images/be336eda3a17612f279fad2ad1cdaa6b4ec63de2109ff4259d3dfa80100ea4ca.png)
df2 = DataFrame({'employee':['Lisa','Bob','Jake'],
'hire_date':[2004,2008,2012],
})
df2
![](https://i.iter01.com/images/f5756738ea8742a52a0dbe49e84a195821084766fc6bf62b50c45f4c31914444.png)
pd.merge(df1,df2,on='employee')
![](https://i.iter01.com/images/dd6a0eb0fccc816deb6ec6c6d685a3bdfc5c0c87e2b1a5e276143f9b69cdc2a8.png)
一對多合併
df3 = DataFrame({
'employee':['Lisa','Jake'],
'group':['Accounting','Engineering'],
'hire_date':[2004,2016]})
df3
![](https://i.iter01.com/images/b790586ad2b560ce8f7ceac7071e11fe22904ec1c9248e7642ea53c8669c1b18.png)
df4 = DataFrame({'group':['Accounting','Engineering','Engineering'],
'supervisor':['Carly','Guido','Steve']
})
df4
![](https://i.iter01.com/images/80d29b851f3286171b51529a07f563eec7ca7a8ec253046714bd47ddcfcf105e.png)
pd.merge(df3,df4) # on如果不寫,預設情況下使用兩表中公有的列作為合併條件
![](https://i.iter01.com/images/e2a2034fa969afacff12bb71b60676cd1bcf453b1a359ff0d8871197d0edbde7.png)
多對多合併
df1 = DataFrame({'employee':['Bob','Jake','Lisa'],
'group':['Accounting','Engineering','Engineering']})
df1
![](https://i.iter01.com/images/1d31314aa2eb13f583cfb913bc9c47290798b9a2bfd3f04c6a06ec0cb4990d01.png)
df5 = DataFrame({'group':['Engineering','Engineering','HR'],
'supervisor':['Carly','Guido','Steve']
})
df5
![](https://i.iter01.com/images/4b1ed377f7d1b51ef00c475e2342b9a74915edade639296af97519af0e4fe42b.png)
pd.merge(df1,df5,how='right')
![](https://i.iter01.com/images/46332ce639ee3193787369ac428adcf092a364dae1f5d57e23e19b0588dcb378.png)
pd.merge(df1,df5,how='left')
![](https://i.iter01.com/images/55909780d4777f42bbcd92ff339cf82ac4af047b18da70d32594a4296867aa53.png)
key的規範化
- 當兩張表沒有可進行連線的列時,可使用left_on和right_on手動指定merge中左右兩邊的哪一列列作為連線的列
df1 = DataFrame({'employee':['Bobs','Linda','Bill'],
'group':['Accounting','Product','Marketing'],
'hire_date':[1998,2017,2018]})
df1
![](https://i.iter01.com/images/aeeadaa80018508ce1cf6595923b13e14b126d62835604e94b50018d5b8d66da.png)
df5 = DataFrame({'name':['Lisa','Bobs','Bill'],
'hire_dates':[1998,2016,2007]})
df5
![](https://i.iter01.com/images/9bec16f3d0a423117c3f3529450dbf4974853d7272df6ecbfc61f3256d748111.png)
pd.merge(df1,df5,left_on='employee',right_on='name')
![](https://i.iter01.com/images/48156cb08ec4d0fe2c3f67f3da493c8b89cbc0474f267c0b4fe2d9985383079d.png)
內合併與外合併
df6 = DataFrame({'name':['Peter','Paul','Mary'],
'food':['fish','beans','bread']}
)
df7 = DataFrame({'name':['Mary','Joseph'],
'drink':['wine','beer']})
df6
![](https://i.iter01.com/images/0666e46a0174a68fa7acf6ee89c729eff7416393f216d654d39aa3c2421fa181.png)
df7
![](https://i.iter01.com/images/a59ef9470908e087fe465910e9b355d5340f4c46747f397383ad278236e9418d.png)
pd.merge(df6,df7,how='outer')
![](https://i.iter01.com/images/a7706eb5407ea79835bd783b5378288d31a59d38dc1a9a4e7214190777c28e09.png)
df6 = DataFrame({'name':['Peter','Paul','Mary'],
'food':['fish','beans','bread']}
)
df7 = DataFrame({'name':['Mary','Joseph'],
'drink':['wine','beer']})
df6
![](https://i.iter01.com/images/f2f52a80492f46c24a9209a65f88df2777b209ad593cd526eabffc6436c64b1e.png)
df7
![](https://i.iter01.com/images/b9d28d05d8c1728c1bce52a7ab8303c33472a86a15db4dca6485ed7dc60d571b.png)
pd.merge(df6,df7,how='inner')
![](https://i.iter01.com/images/aa481237344df900914f876954cc80d1cf10b4e2d1b7e059ea3cedba30c743b0.png)