Pandas日期資料處理：如何按日期篩選、顯示及統計資料

https://mp.weixin.qq.com/s?__biz=MzI2NjY5NzI0NA==&mid=2247483778&idx=1&sn=078c0183e03138eac89c6c2d0252b0c5&chksm=ea8b6ef1ddfce7e7794f07ddf4371a0764d2278179f196e7923173fa78d308a693b146caa144&scene=21

前言

pandas有著強大的日期資料處理功能，本期我們來瞭解下pandas處理日期資料的一些基本功能，主要包括以下三個方面：

按日期篩選資料

按日期顯示資料

按日期統計資料

執行環境為 windows系統，64位，python3.5。

1 讀取並整理資料

首先引入pandas庫

1. import pandas as pd

從csv檔案中讀取資料

1. df = pd.read_csv('date.csv', header=None)

2. print(df.head(2))

1. 0 1

2. 0 2013-10-24 3

3. 1 2013-10-25 4

整理資料

1. df.columns = ['date','number']

2. df['date'] = pd.to_datetime(df['date']) #將資料型別轉換為日期型別

3. df = df.set_index('date') # 將date設定為index

4. print(df.head(2))

5. print(df.tail(2))

6. print(df.shape)

1. number

2. date

3. 2013-10-24 3

4. 2013-10-25 4

5. number

6. date

7. 2017-02-14 6

8. 2017-02-22 6

9. (425, 1)

df的行數一共是425行。

檢視Dataframe的資料型別

1. print(type(df))

2. print(df.index)

3. print(type(df.index))

1. <class 'pandas.core.frame.DataFrame'>

2. DatetimeIndex(['2013-10-24', '2013-10-25', '2013-10-29', '2013-10-30',

3. '2013-11-04', '2013-11-06', '2013-11-08', '2013-11-12',

4. '2013-11-14', '2013-11-25',

5. ...

6. '2017-01-03', '2017-01-07', '2017-01-14', '2017-01-17',

7. '2017-01-23', '2017-01-25', '2017-01-26', '2017-02-07',

8. '2017-02-14', '2017-02-22'],

9. dtype='datetime64[ns]', name='date', length=425, freq=None)

10. <class 'pandas.tseries.index.DatetimeIndex'>

構造Series型別資料

1. s = pd.Series(df['number'], index=df.index)

2. print(type(s))

3. s.head(2)

1. <class 'pandas.core.series.Series'>

2.

3. date

4. 2013-10-24 3

5. 2013-10-25 4

6. Name: number, dtype: int64

2 按日期篩選資料

按年度獲取資料

1. print('---------獲取2013年的資料-----------')

2. print(df['2013'].head(2)) # 獲取2013年的資料

3. print(df['2013'].tail(2)) # 獲取2013年的資料

1. ---------獲取2013年的資料-----------

2. number

3. date

4. 2013-10-24 3

5. 2013-10-25 4

6. number

7. date

8. 2013-12-27 2

9. 2013-12-30 2

獲取2016至2017年的資料

1. print('---------獲取2016至2017年的資料-----------')

2. print(df['2016':'2017'].head(2)) #獲取2016至2017年的資料

3. print(df['2016':'2017'].tail(2)) #獲取2016至2017年的資料

1. ---------獲取2016至2017年的資料-----------

2. number

3. date

4. 2016-01-04 4

5. 2016-01-07 6

6. number

7. date

8. 2017-02-14 6

9. 2017-02-22 6

獲取某月的資料

1. print('---------獲取某月的資料-----------')

2. print(df['2013-11']) # 獲取某月的資料

1. ---------獲取某月的資料-----------

2. number

3. date

4. 2013-11-04 1

5. 2013-11-06 3

6. 2013-11-08 1

7. 2013-11-12 5

8. 2013-11-14 2

9. 2013-11-25 1

10. 2013-11-29 1

獲取具體某天的資料

請注意dataframe型別的資料，獲取具體某天的資料時，跟series是有些差異的，詳細情況如下述程式碼所示：

1. # 按日期篩選資料

2. print('---------獲取具體某天的資料-----------')

3. # 獲取具體某天的資料

4. print(s['2013-11-06'])

5.

6. # 獲取具體某天的資料，用datafrme直接選取某天時會報錯，而series的資料就沒有問題

7. # print(df['2013-11-06'])

8.

9. #可以考慮用區間來獲取某天的資料

10. print(df['2013-11-06':'2013-11-06'])

1. ---------獲取具體某天的資料-----------

2. 3

3. number

4. date

5. 2013-11-06 3

dataframe的truncate函式可以獲取某個時期之前或之後的資料，或者某個時間區間的資料

但一般建議直接用切片（slice），這樣更為直觀，方便

1. # dataframe的truncate函式可以獲取某個時期之前或之後的資料，或者某個時間區間的資料

2. # 但一般建議直接用切片（slice），這樣更為直觀，方便

3. print('---------獲取某個時期之前或之後的資料-----------')

4. print('--------after------------')

5. print(df.truncate(after = '2013-11'))

6. print('--------before------------')

7. print(df.truncate(before='2017-02'))

1. ---------獲取某個時期之前或之後的資料-----------

2. --------after------------

3. number

4. date

5. 2013-10-24 3

6. 2013-10-25 4

7. 2013-10-29 2

8. 2013-10-30 1

9. --------before------------

10. number

11. date

12. 2017-02-07 8

13. 2017-02-14 6

14. 2017-02-22 6

3 按日期顯示資料

3.1 to_period()方法

請注意df.index的資料型別是DatetimeIndex；

df_peirod的資料型別是PeriodIndex

按月顯示，但不統計

1. df_period = df.to_period('M') #按月顯示，但不統計

2. print(type(df_period))

3.

4. print(type(df_period.index))

5. # 請注意df.index的資料型別是DatetimeIndex；

6. # df_peirod的資料型別是PeriodIndex

7.

8. print(df_period.head())

1. <class 'pandas.core.frame.DataFrame'>

2. <class 'pandas.tseries.period.PeriodIndex'>

3. number

4. date

5. 2013-10 3

6. 2013-10 4

7. 2013-10 2

8. 2013-10 1

9. 2013-11 1

按季度顯示，但不統計

1. print(df.to_period('Q').head()) #按季度顯示，但不統計

1. number

2. date

3. 2013Q4 3

4. 2013Q4 4

5. 2013Q4 2

6. 2013Q4 1

7. 2013Q4 1

按年度顯示，但不統計

1. print(df.to_period('A').head()) #按年度顯示，但不統計

1. number

2. date

3. 2013 3

4. 2013 4

5. 2013 2

6. 2013 1

7. 2013 1

3.2 asfreq()方法

按年度頻率顯示

1. df_period.index.asfreq('A') # 'A'預設是'A-DEC',其他如'A-JAN'

1. PeriodIndex(['2013', '2013', '2013', '2013', '2013', '2013', '2013', '2013',

2. '2013', '2013',

3. ...

4. '2017', '2017', '2017', '2017', '2017', '2017', '2017', '2017',

5. '2017', '2017'],

6. dtype='period[A-DEC]', name='date', length=425, freq='A-DEC')

1. df_period.index.asfreq('A-JAN') # 'A'預設是'A-DEC',其他如'A-JAN'

1. PeriodIndex(['2014', '2014', '2014', '2014', '2014', '2014', '2014', '2014',

2. '2014', '2014',

3. ...

4. '2017', '2017', '2017', '2017', '2017', '2017', '2017', '2018',

5. '2018', '2018'],

6. dtype='period[A-JAN]', name='date', length=425, freq='A-JAN')

按年度頻率在不同情形下的顯示，可參考下圖所示：

按季度頻率顯示

1. df_period.index.asfreq('Q') # 'Q'預設是'Q-DEC',其他如“Q-SEP”，“Q-FEB”

1. PeriodIndex(['2013Q4', '2013Q4', '2013Q4', '2013Q4', '2013Q4', '2013Q4',

2. '2013Q4', '2013Q4', '2013Q4', '2013Q4',

3. ...

4. '2017Q1', '2017Q1', '2017Q1', '2017Q1', '2017Q1', '2017Q1',

5. '2017Q1', '2017Q1', '2017Q1', '2017Q1'],

6. dtype='period[Q-DEC]', name='date', length=425, freq='Q-DEC')

1. df_period.index.asfreq('Q-SEP') # 可以顯示不同的季度財年，“Q-SEP”，“Q-FEB”

2. # df_period.index = df_period.index.asfreq('Q-DEC') # 可以顯示不同的季度財年，“Q-SEP”，“Q-FEB”

3. # print(df_period.head())

1. PeriodIndex(['2014Q1', '2014Q1', '2014Q1', '2014Q1', '2014Q1', '2014Q1',

2. '2014Q1', '2014Q1', '2014Q1', '2014Q1',

3. ...

4. '2017Q2', '2017Q2', '2017Q2', '2017Q2', '2017Q2', '2017Q2',

5. '2017Q2', '2017Q2', '2017Q2', '2017Q2'],

6. dtype='period[Q-SEP]', name='date', length=425, freq='Q-SEP')

按季度頻率在不同情形下的顯示，可參考下圖所示：

按月度頻率顯示

1. df_period.index.asfreq('M') # 按月份顯示

1. PeriodIndex(['2013-10', '2013-10', '2013-10', '2013-10', '2013-11', '2013-11',

2. '2013-11', '2013-11', '2013-11', '2013-11',

3. ...

4. '2017-01', '2017-01', '2017-01', '2017-01', '2017-01', '2017-01',

5. '2017-01', '2017-02', '2017-02', '2017-02'],

6. dtype='period[M]', name='date', length=425, freq='M')

按工作日顯示

method 1

1. df_period.index.asfreq('B', how='start') # 按工作日期顯示

1. PeriodIndex(['2013-10-01', '2013-10-01', '2013-10-01', '2013-10-01',

2. '2013-11-01', '2013-11-01', '2013-11-01', '2013-11-01',

3. '2013-11-01', '2013-11-01',

4. ...

5. '2017-01-02', '2017-01-02', '2017-01-02', '2017-01-02',

6. '2017-01-02', '2017-01-02', '2017-01-02', '2017-02-01',

7. '2017-02-01', '2017-02-01'],

8. dtype='period[B]', name='date', length=425, freq='B')

method 2

1. df_period.index.asfreq('B', how='end') # 按工作日期顯示

1. PeriodIndex(['2013-10-31', '2013-10-31', '2013-10-31', '2013-10-31',

2. '2013-11-29', '2013-11-29', '2013-11-29', '2013-11-29',

3. '2013-11-29', '2013-11-29',

4. ...

5. '2017-01-31', '2017-01-31', '2017-01-31', '2017-01-31',

6. '2017-01-31', '2017-01-31', '2017-01-31', '2017-02-28',

7. '2017-02-28', '2017-02-28'],

8. dtype='period[B]', name='date', length=425, freq='B')

4 按日期統計資料

4.1按日期統計資料

按周統計資料

1. print(df.resample('w').sum().head())

2. # “w”，week

1. number

2. date

3. 2013-10-27 7.0

4. 2013-11-03 3.0

5. 2013-11-10 5.0

6. 2013-11-17 7.0

7. 2013-11-24 NaN

按月統計資料

1. print(df.resample('M').sum().head())

2. # "MS"是每個月第一天為開始日期, "M"是每個月最後一天

1. number

2. date

3. 2013-10-31 10

4. 2013-11-30 14

5. 2013-12-31 27

6. 2014-01-31 16

7. 2014-02-28 4

按季度統計資料

1. print(df.resample('Q').sum().head())

2. # "QS"是每個季度第一天為開始日期, "Q"是每個季度最後一天

1. number

2. date

3. 2013-12-31 51

4. 2014-03-31 73

5. 2014-06-30 96

6. 2014-09-30 136

7. 2014-12-31 148

按年統計資料

1. print(df.resample('AS').sum())

2. # "AS"是每年第一天為開始日期, "A是每年最後一天

1. number

2. date

3. 2013-01-01 51

4. 2014-01-01 453

5. 2015-01-01 743

6. 2016-01-01 1552

7. 2017-01-01 92

關於日期的型別，按參考下圖所示來選擇合適的分期頻率：

4.2 按日期統計後，按年或季度或月份顯示

按年統計並顯示

1. print(df.resample('AS').sum().to_period('A'))

2. # 按年統計並顯示

1. number

2. date

3. 2013 51

4. 2014 453

5. 2015 743

6. 2016 1552

7. 2017 92

按季度統計並顯示

1. print(df.resample('Q').sum().to_period('Q').head())

2. # 按季度統計並顯示

1. number

2. date

3. 2013Q4 51

4. 2014Q1 73

5. 2014Q2 96

6. 2014Q3 136

7. 2014Q4 148

按月度統計並顯示

1. print(df.resample('M').sum().to_period('M').head())

2. # 按月度統計並顯示

1. number

2. date

3. 2013-10 10

4. 2013-11 14

5. 2013-12 27

6. 2014-01 16

7. 2014-02 4

Pandas日期資料處理：如何按日期篩選、顯示及統計資料

相關文章