Pandas中時間和日期處理

homura發表於2019-02-16

1、生成一個時間段

In [1]:import pandas as pd
In [2]:import numpy as np
1)生成一個時間區間段,間隔為小時
In [3]:rng = pd.date_range(`1/1/2011`, periods=72, freq=`H`)
2)生成一個Series,並制定索引為時間段
In [4]:ts = pd.Series(np.random.randn(len(rng)), index=rng)
In [5]:ts
Out[5]:
2011-01-01 00:00:00 -0.204085
2011-01-01 01:00:00 1.101711
2011-01-01 02:00:00 1.840500
2011-01-01 03:00:00 0.112426
2011-01-01 04:00:00 -0.310413
2011-01-01 05:00:00 1.180762
2011-01-01 06:00:00 0.087775
2011-01-01 07:00:00 1.087877
2011-01-01 08:00:00 -0.950237
2011-01-01 09:00:00 -0.468453
Freq: H, dtype: float64

3)改變時間間隔
In [6]:converted = ts.asfreq(`45Min`, method=`pad`)
In [7]:converted
Out[7]:
2011-01-01 00:00:00 -0.204085
2011-01-01 00:45:00 -0.204085
2011-01-01 01:30:00 1.101711
2011-01-01 02:15:00 1.840500
2011-01-01 03:00:00 0.112426
2011-01-01 03:45:00 0.112426
2011-01-01 04:30:00 -0.310413
2011-01-01 05:15:00 1.180762
2011-01-01 06:00:00 0.087775
2011-01-01 06:45:00 0.087775
2011-01-01 07:30:00 1.087877
2011-01-01 08:15:00 -0.950237
2011-01-01 09:00:00 -0.468453
Freq: 45T, dtype: float64

2、轉換為日期格式

2.1 數字生成日期格式

In [8]: pd.Timestamp(datetime(2012, 5, 1))
Out[8]: Timestamp(`2012-05-01 00:00:00`)

2.2 字元生成日期格式

In [9]: pd.Timestamp(`2012-05-01`)
Out[9]: Timestamp(`2012-05-01 00:00:00`)

2.3 只有年月

In [10]: pd.Period(`2011-01`)
Out[10]: Period(`2011-01`, `M`)

In [11]: pd.Period(`2012-05`, freq=`D`)
Out[11]: Period(`2012-05-01`, `D`)

2.4 轉化為日期格式

In [22]: pd.to_datetime(pd.Series([`Jul 31, 2009`, `2010-01-10`, None]))
Out[22]:
0 2009-07-31
1 2010-01-10
2 NaT
dtype: datetime64[ns]

In [23]: pd.to_datetime([`2005/11/23`, `2010.12.31`])
Out[23]: DatetimeIndex([`2005-11-23`, `2010-12-31`], dtype=`datetime64[ns]`, freq=None)

3、生成一個時間段

3.1 生成索引的方法

In [35]: dates = [datetime(2012, 5, 1), datetime(2012, 5, 2), datetime(2012, 5, 3)]

Note the frequency information
In [36]: index = pd.DatetimeIndex(dates)

In [37]: index
Out[37]: DatetimeIndex([`2012-05-01`, `2012-05-02`, `2012-05-03`], dtype=`datetime64[ns]`, freq=None)

Automatically converted to DatetimeIndex
In [38]: index = pd.Index(dates)

In [39]: index
Out[39]: DatetimeIndex([`2012-05-01`, `2012-05-02`, `2012-05-03`], dtype=`datetime64[ns]`, freq=None)

date_range日曆,bdate_range工作日
In [40]: index = pd.date_range(`2000-1-1`, periods=1000, freq=`M`)

In [41]: index
Out[41]:
DatetimeIndex([`2000-01-31`, `2000-02-29`, `2000-03-31`, `2000-04-30`,

           `2000-05-31`, `2000-06-30`, `2000-07-31`, `2000-08-31`,
           `2000-09-30`, `2000-10-31`,
           ...
           `2082-07-31`, `2082-08-31`, `2082-09-30`, `2082-10-31`,
           `2082-11-30`, `2082-12-31`, `2083-01-31`, `2083-02-28`,
           `2083-03-31`, `2083-04-30`],
          dtype=`datetime64[ns]`, length=1000, freq=`M`)

In [42]: index = pd.bdate_range(`2012-1-1`, periods=250)

In [43]: index
Out[43]:
DatetimeIndex([`2012-01-02`, `2012-01-03`, `2012-01-04`, `2012-01-05`,

           `2012-01-06`, `2012-01-09`, `2012-01-10`, `2012-01-11`,
           `2012-01-12`, `2012-01-13`,
           ...
           `2012-12-03`, `2012-12-04`, `2012-12-05`, `2012-12-06`,
           `2012-12-07`, `2012-12-10`, `2012-12-11`, `2012-12-12`,
           `2012-12-13`, `2012-12-14`],
          dtype=`datetime64[ns]`, length=250, freq=`B`)

In [44]: start = datetime(2011, 1, 1)

In [45]: end = datetime(2012, 1, 1)

In [46]: rng = pd.date_range(start, end)

In [47]: rng
Out[47]:
DatetimeIndex([`2011-01-01`, `2011-01-02`, `2011-01-03`, `2011-01-04`,

           `2011-01-05`, `2011-01-06`, `2011-01-07`, `2011-01-08`,
           `2011-01-09`, `2011-01-10`,
           ...
           `2011-12-23`, `2011-12-24`, `2011-12-25`, `2011-12-26`,
           `2011-12-27`, `2011-12-28`, `2011-12-29`, `2011-12-30`,
           `2011-12-31`, `2012-01-01`],
          dtype=`datetime64[ns]`, length=366, freq=`D`)

In [48]: rng = pd.bdate_range(start, end)

In [49]: rng
Out[49]:
DatetimeIndex([`2011-01-03`, `2011-01-04`, `2011-01-05`, `2011-01-06`,

           `2011-01-07`, `2011-01-10`, `2011-01-11`, `2011-01-12`,
           `2011-01-13`, `2011-01-14`,
           ...
           `2011-12-19`, `2011-12-20`, `2011-12-21`, `2011-12-22`,
           `2011-12-23`, `2011-12-26`, `2011-12-27`, `2011-12-28`,
           `2011-12-29`, `2011-12-30`],
          dtype=`datetime64[ns]`, length=260, freq=`B`)

3.2 每個月末,每隔一週

In [50]: pd.date_range(start, end, freq=`BM`)
Out[50]:
DatetimeIndex([`2011-01-31`, `2011-02-28`, `2011-03-31`, `2011-04-29`,

           `2011-05-31`, `2011-06-30`, `2011-07-29`, `2011-08-31`,
           `2011-09-30`, `2011-10-31`, `2011-11-30`, `2011-12-30`],
          dtype=`datetime64[ns]`, freq=`BM`)

In [51]: pd.date_range(start, end, freq=`W`)
Out[51]:
DatetimeIndex([`2011-01-02`, `2011-01-09`, `2011-01-16`, `2011-01-23`,

           `2011-01-30`, `2011-02-06`, `2011-02-13`, `2011-02-20`,
           `2011-02-27`, `2011-03-06`, `2011-03-13`, `2011-03-20`,
           `2011-03-27`, `2011-04-03`, `2011-04-10`, `2011-04-17`,
           `2011-04-24`, `2011-05-01`, `2011-05-08`, `2011-05-15`,
           `2011-05-22`, `2011-05-29`, `2011-06-05`, `2011-06-12`,
           `2011-06-19`, `2011-06-26`, `2011-07-03`, `2011-07-10`,
           `2011-07-17`, `2011-07-24`, `2011-07-31`, `2011-08-07`,
           `2011-08-14`, `2011-08-21`, `2011-08-28`, `2011-09-04`,
           `2011-09-11`, `2011-09-18`, `2011-09-25`, `2011-10-02`,
           `2011-10-09`, `2011-10-16`, `2011-10-23`, `2011-10-30`,
           `2011-11-06`, `2011-11-13`, `2011-11-20`, `2011-11-27`,
           `2011-12-04`, `2011-12-11`, `2011-12-18`, `2011-12-25`,
           `2012-01-01`],
          dtype=`datetime64[ns]`, freq=`W-SUN`)

3.3 從End往前數20個工作日,從start往後數20個工作日

In [52]: pd.bdate_range(end=end, periods=20)
Out[52]:
DatetimeIndex([`2011-12-05`, `2011-12-06`, `2011-12-07`, `2011-12-08`,

           `2011-12-09`, `2011-12-12`, `2011-12-13`, `2011-12-14`,
           `2011-12-15`, `2011-12-16`, `2011-12-19`, `2011-12-20`,
           `2011-12-21`, `2011-12-22`, `2011-12-23`, `2011-12-26`,
           `2011-12-27`, `2011-12-28`, `2011-12-29`, `2011-12-30`],
          dtype=`datetime64[ns]`, freq=`B`)

In [53]: pd.bdate_range(start=start, periods=20)
Out[53]:
DatetimeIndex([`2011-01-03`, `2011-01-04`, `2011-01-05`, `2011-01-06`,

           `2011-01-07`, `2011-01-10`, `2011-01-11`, `2011-01-12`,
           `2011-01-13`, `2011-01-14`, `2011-01-17`, `2011-01-18`,
           `2011-01-19`, `2011-01-20`, `2011-01-21`, `2011-01-24`,
           `2011-01-25`, `2011-01-26`, `2011-01-27`, `2011-01-28`],
          dtype=`datetime64[ns]`, freq=`B`)

4、根據部分索引選擇,切片

In [56]: rng = pd.date_range(start, end, freq=`BM`)

In [57]: ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [58]: ts.index
Out[58]:
DatetimeIndex([`2011-01-31`, `2011-02-28`, `2011-03-31`, `2011-04-29`,

           `2011-05-31`, `2011-06-30`, `2011-07-29`, `2011-08-31`,
           `2011-09-30`, `2011-10-31`, `2011-11-30`, `2011-12-30`],
          dtype=`datetime64[ns]`, freq=`BM`)

In [59]: ts[:5].index
Out[59]:
DatetimeIndex([`2011-01-31`, `2011-02-28`, `2011-03-31`, `2011-04-29`,

           `2011-05-31`],
          dtype=`datetime64[ns]`, freq=`BM`)

In [60]: ts[::2].index
Out[60]:
DatetimeIndex([`2011-01-31`, `2011-03-31`, `2011-05-31`, `2011-07-29`,

           `2011-09-30`, `2011-11-30`],
          dtype=`datetime64[ns]`, freq=`2BM`)

In [61]: ts[`1/31/2011`]
Out[61]: -1.2812473076599531

In [62]: ts[pd.datetime(2011, 12, 25):]
Out[62]:
2011-12-30 0.687738
Freq: BM, dtype: float64

In [63]: ts[`10/31/2011`:`12/31/2011`]
Out[63]:
2011-10-31 0.149748
2011-11-30 -0.732339
2011-12-30 0.687738
Freq: BM, dtype: float64

In [64]: ts[`2011`]
Out[64]:
2011-01-31 -1.281247
2011-02-28 -0.727707
2011-03-31 -0.121306
2011-04-29 -0.097883
2011-05-31 0.695775
2011-06-30 0.341734
2011-07-29 0.959726
2011-08-31 -1.110336
2011-09-30 -0.619976
2011-10-31 0.149748
2011-11-30 -0.732339
2011-12-30 0.687738
Freq: BM, dtype: float64

In [65]: ts[`2011-6`]
Out[65]:
2011-06-30 0.341734
Freq: BM, dtype: float64

DataFrame中指定了時間索引,可以根據時間索引提取子集
In [66]: dft = pd.DataFrame(np.random.randn(100000,1),columns=[`A`],index=pd.date_range(`20130101`,periods=100000,freq=`T`))

In [67]: dft
Out[67]:

                        A

2013-01-01 00:00:00 0.176444
2013-01-01 00:01:00 0.403310
2013-01-01 00:02:00 -0.154951
2013-01-01 00:03:00 0.301624
2013-01-01 00:04:00 -2.179861
2013-01-01 00:05:00 -1.369849
2013-01-01 00:06:00 -0.954208
… …
2013-03-11 10:33:00 -0.293083
2013-03-11 10:34:00 -0.059881
2013-03-11 10:35:00 1.252450
2013-03-11 10:36:00 0.046611
2013-03-11 10:37:00 0.059478
2013-03-11 10:38:00 -0.286539
2013-03-11 10:39:00 0.841669

[100000 rows x 1 columns]

In [68]: dft[`2013`]
Out[68]:

                        A

2013-01-01 00:00:00 0.176444
2013-01-01 00:01:00 0.403310
2013-01-01 00:02:00 -0.154951
2013-01-01 00:03:00 0.301624
2013-01-01 00:04:00 -2.179861
2013-01-01 00:05:00 -1.369849
2013-01-01 00:06:00 -0.954208
… …
2013-03-11 10:33:00 -0.293083
2013-03-11 10:34:00 -0.059881
2013-03-11 10:35:00 1.252450
2013-03-11 10:36:00 0.046611
2013-03-11 10:37:00 0.059478
2013-03-11 10:38:00 -0.286539
2013-03-11 10:39:00 0.841669

[100000 rows x 1 columns]

In [69]: dft[`2013-1`:`2013-2`]
Out[69]:

                        A

2013-01-01 00:00:00 0.176444
2013-01-01 00:01:00 0.403310
2013-01-01 00:02:00 -0.154951
2013-01-01 00:03:00 0.301624
2013-01-01 00:04:00 -2.179861
2013-01-01 00:05:00 -1.369849
2013-01-01 00:06:00 -0.954208
… …
2013-02-28 23:53:00 0.103114
2013-02-28 23:54:00 -1.303422
2013-02-28 23:55:00 0.451943
2013-02-28 23:56:00 0.220534
2013-02-28 23:57:00 -1.624220
2013-02-28 23:58:00 0.093915
2013-02-28 23:59:00 -1.087454

[84960 rows x 1 columns]

In [70]: dft[`2013-1`:`2013-2-28`]
Out[70]:

                        A

2013-01-01 00:00:00 0.176444
2013-01-01 00:01:00 0.403310
2013-01-01 00:02:00 -0.154951
2013-01-01 00:03:00 0.301624
2013-01-01 00:04:00 -2.179861
2013-01-01 00:05:00 -1.369849
2013-01-01 00:06:00 -0.954208
… …
2013-02-28 23:53:00 0.103114
2013-02-28 23:54:00 -1.303422
2013-02-28 23:55:00 0.451943
2013-02-28 23:56:00 0.220534
2013-02-28 23:57:00 -1.624220
2013-02-28 23:58:00 0.093915
2013-02-28 23:59:00 -1.087454

[84960 rows x 1 columns]

In [71]: dft[`2013-1`:`2013-2-28 00:00:00`]
Out[71]:

                        A

2013-01-01 00:00:00 0.176444
2013-01-01 00:01:00 0.403310
2013-01-01 00:02:00 -0.154951
2013-01-01 00:03:00 0.301624
2013-01-01 00:04:00 -2.179861
2013-01-01 00:05:00 -1.369849
2013-01-01 00:06:00 -0.954208
… …
2013-02-27 23:54:00 0.897051
2013-02-27 23:55:00 -0.309230
2013-02-27 23:56:00 1.944713
2013-02-27 23:57:00 0.369265
2013-02-27 23:58:00 0.053071
2013-02-27 23:59:00 -0.019734
2013-02-28 00:00:00 1.388189

[83521 rows x 1 columns]

In [72]: dft[`2013-1-15`:`2013-1-15 12:30:00`]
Out[72]:

                        A

2013-01-15 00:00:00 0.501288
2013-01-15 00:01:00 -0.605198
2013-01-15 00:02:00 0.215146
2013-01-15 00:03:00 0.924732
2013-01-15 00:04:00 -2.228519
2013-01-15 00:05:00 1.517331
2013-01-15 00:06:00 -1.188774
… …
2013-01-15 12:24:00 1.358314
2013-01-15 12:25:00 -0.737727
2013-01-15 12:26:00 1.838323
2013-01-15 12:27:00 -0.774090
2013-01-15 12:28:00 0.622261
2013-01-15 12:29:00 -0.631649
2013-01-15 12:30:00 0.193284

[751 rows x 1 columns]

In [73]: dft.loc[`2013-1-15 12:30:00`]
Out[73]:
A 0.193284
Name: 2013-01-15 12:30:00, dtype: float64

5、常用時間

類別 解釋
year 年
month 月
day 日
hour 時
minute 分鐘
second 秒
microsecond 微秒
nanosecond 納秒
date 返回日期
time 返回時間
dayofyear 年序日
weekofyear 年序周
week 周
dayofweek 週中的第幾天,Monday=0, Sunday=6
weekday 週中的第幾天,Monday=0, Sunday=6
weekday_name 週中的星期幾,ex: Friday
quarter 季度
days_in_month 一個月中有多少天
is_month_start 是否月初第一天
is_month_end 是否月末最後一天
is_quarter_start 是否季度的最開始
is_quarter_end 是否季度的最後一個
is_year_start 是否年初第一天
is_year_end 是否年末第一天

6、某一時間點,往前往後加一段時間

類別 解釋
BDay 工作日
CDay 自定義日期
Week 周
WeekOfMonth 月中的第幾周
LastWeekOfMonth 月中的最後一週
MonthEnd 日曆上月末
MonthBegin 日曆上月初
BMonthEnd 工作月初
BMonthBegin 月開始營業
CBMonthEnd 自定義月末
CBMonthBegin 自定義月初
QuarterEnd 日曆季末
QuarterBegin 日曆季初
BQuarterEnd 工作季末
BQuarterBegin 工作季初
FY5253Quarter retail (aka 52-53 week) quarter
YearEnd 日曆年末
YearBegin 日曆年初
BYearEnd 工作年末
BYearBegin 工作年初
FY5253 retail (aka 52-53 week) year
BusinessHour 工作小時
CustomBusinessHour 自定義小時
Hour 小時
Minute 分鐘
Second 秒
In [84]: d = pd.datetime(2008, 8, 18, 9, 0)
In [86]: from pandas.tseries.offsets import *

In [87]: d + DateOffset(months=4, days=5)
Out[87]: Timestamp(`2008-12-23 09:00:00`)

In [88]: d – 5 * BDay()
Out[88]: Timestamp(`2008-08-11 09:00:00`)

月末
In [89]: d + BMonthEnd()
Out[89]: Timestamp(`2008-08-29 09:00:00`)

In [90]: d
Out[90]: datetime.datetime(2008, 8, 18, 9, 0)

往前數月末
In [91]: offset = BMonthEnd()

In [92]: offset.rollforward(d)
Out[92]: Timestamp(`2008-08-29 09:00:00`)

往後數月末
In [93]: offset.rollback(d)
Out[93]: Timestamp(`2008-07-31 09:00:00`)

時間方面的
In [94]: day = Day()

In [95]: day.apply(pd.Timestamp(`2014-01-01 09:00`))
Out[95]: Timestamp(`2014-01-02 09:00:00`)

In [96]: day = Day(normalize=True)

In [97]: day.apply(pd.Timestamp(`2014-01-01 09:00`))
Out[97]: Timestamp(`2014-01-02 00:00:00`)

In [98]: hour = Hour()

In [99]: hour.apply(pd.Timestamp(`2014-01-01 22:00`))
Out[99]: Timestamp(`2014-01-01 23:00:00`)

In [100]: hour = Hour(normalize=True)

In [101]: hour.apply(pd.Timestamp(`2014-01-01 22:00`))
Out[101]: Timestamp(`2014-01-01 00:00:00`)

In [102]: hour.apply(pd.Timestamp(`2014-01-01 23:00`))
Out[102]: Timestamp(`2014-01-02 00:00:00`)

周相關的
In [103]: d
Out[103]: datetime.datetime(2008, 8, 18, 9, 0)

In [104]: d + Week()
Out[104]: Timestamp(`2008-08-25 09:00:00`)

In [105]: d + Week(weekday=4)
Out[105]: Timestamp(`2008-08-22 09:00:00`)

In [106]: (d + Week(weekday=4)).weekday()
Out[106]: 4

In [107]: d – Week()
Out[107]: Timestamp(`2008-08-11 09:00:00`)

7、時間序列相關的時間處理

In [213]: ts = ts[:5]

In [214]: ts.shift(1)
Out[214]:
2011-01-31 NaN
2011-02-28 -1.281247
2011-03-31 -0.727707
2011-04-29 -0.121306
2011-05-31 -0.097883
Freq: BM, dtype: float64

In [215]: ts.shift(5, freq=datetools.bday)
Out[215]:
2011-02-07 -1.281247
2011-03-07 -0.727707
2011-04-07 -0.121306
2011-05-06 -0.097883
2011-06-07 0.695775
dtype: float64

In [216]: ts.shift(5, freq=`BM`)
Out[216]:
2011-06-30 -1.281247
2011-07-29 -0.727707
2011-08-31 -0.121306
2011-09-30 -0.097883
2011-10-31 0.695775
Freq: BM, dtype: float64

In [217]: ts.tshift(5, freq=`D`)
Out[217]:
2011-02-05 -1.281247
2011-03-05 -0.727707
2011-04-05 -0.121306
2011-05-04 -0.097883
2011-06-05 0.695775
dtype: float64

相關文章