Pandas 基礎 (16) - Holidays

Rachel發表於2019-04-28

原文網址 : https://learnku.com/articles/28035?order_by=created_at&

這節依然是關於時間方面的知識.
上一節學習瞭如何獲取日期序列的函式, 以及通過一些基本的引數設定可以使時間序列跳過休息日等.
這一節, 將要深入學習這個點, 做更自定義的設計.

通過上一節的學習, 我們知道了如何獲取一個時間段的序列, 那我們很容易就可以得到 2019年2月1日到2月28日的所有工作日的時間序列:

import pandas as pd
pd.date_range(start='2/1/2019', end='2/28/2019', freq='B')

輸出:

DatetimeIndex(['2019-02-01', '2019-02-04', '2019-02-05', '2019-02-06',
               '2019-02-07', '2019-02-08', '2019-02-11', '2019-02-12',
               '2019-02-13', '2019-02-14', '2019-02-15', '2019-02-18',
               '2019-02-19', '2019-02-20', '2019-02-21', '2019-02-22',
               '2019-02-25', '2019-02-26', '2019-02-27', '2019-02-28'],
              dtype='datetime64[ns]', freq='B')

那我們現在得到的就是 2019年2月的所有工作日, 但是其實在美國 2019年2月18日是華盛頓誕辰日 Washington's Birthday (President's Day), 也就是說這一天不能算成工作日, 所以 date_range()函式的第三個引數 freq 現有的幾個選項值都無法滿足這個需求, 而 Pandas 也提供了這種自定義的空間:

from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay

usb = CustomBusinessDay(calendar = USFederalHolidayCalendar())
pd.date_range(start='2/1/2019', end='2/28/2019', freq=usb)

輸出:

DatetimeIndex(['2019-02-01', '2019-02-04', '2019-02-05', '2019-02-06',
               '2019-02-07', '2019-02-08', '2019-02-11', '2019-02-12',
               '2019-02-13', '2019-02-14', '2019-02-15', '2019-02-19',
               '2019-02-20', '2019-02-21', '2019-02-22', '2019-02-25',
               '2019-02-26', '2019-02-27', '2019-02-28'],
              dtype='datetime64[ns]', freq='C')

先來看輸出結果, 果然 '2019-02-18' 被略過了. 其實, 程式碼中每個函式的名字取得都非常好, 基本就是見字知意了, 我這裡就不贅述了(如果有不明白的可以留言).
但是特別說下這個函式 --- USFederalHolidayCalendar() 美國聯邦假期日曆函式, 得益於 Pandas 自帶的這個函式, 我們很輕鬆地獲取到了美國實際的工作日資料. 那麼是不是其他國家也都有這個函式呢? 答案是否定的, 不過沒關係, 我們可以根據這個函式的原始碼, 依葫蘆畫瓢, 自定義我們想要的任何日曆. 下面是 Pandas 的 github 地址, 大家找到如下圖的原始碼, 拷貝一下:
https://github.com/pandas-dev/pandas/blob/...
Pandas 基礎(16) - Holidays

原始碼的程式碼很簡單, 對假日的定義主要就是體現在 holiday() 函式裡. 下面來實踐一下, 比如我的生日是4月20日, 我要把這一天自定義到4月的休息日裡:

from pandas.tseries.holiday import AbstractHolidayCalendar, nearest_workday, Holiday
class myBirthdayCalendar(AbstractHolidayCalendar):
    rules = [
        Holiday('Rachel"s Birthday', month=4, day=20)
    ]
myc = CustomBusinessDay(calendar = myBirthdayCalendar())
pd.date_range(start='4/1/2018', end='4/30/2018', freq=myc)

輸出:

DatetimeIndex(['2018-04-02', '2018-04-03', '2018-04-04', '2018-04-05',
               '2018-04-06', '2018-04-09', '2018-04-10', '2018-04-11',
               '2018-04-12', '2018-04-13', '2018-04-16', '2018-04-17',
               '2018-04-18', '2018-04-19', '2018-04-23', '2018-04-24',
               '2018-04-25', '2018-04-26', '2018-04-27', '2018-04-30'],
              dtype='datetime64[ns]', freq='C')

另外, 大家在看 USFederalHolidayCalendar() 的原始碼時, 應該已經注意到 holiday() 函式的第三個引數 observance=nearest_workday, 這個引數的意思就是說, 如果剛好節日的那天也是週六的話, 那麼就把週五定為休息日, 如果剛好節日的那天也是週日的話, 就把下週一定為休息日, 也就是不能白過節的意思, 哈哈哈. 不知道我表達清楚了沒有, 如果還是有點迷糊, 就動手時間一下, 對照日曆看下就明白了. 我這裡暫且把生日改為4月21日, 剛好那天是週六, 但是我加上這個引數 -- observance=nearest_workday:

from pandas.tseries.holiday import AbstractHolidayCalendar, nearest_workday, Holiday
class myBirthdayCalendar(AbstractHolidayCalendar):
      rules = [
        Holiday('Rachel"s Birthday', month=4, day=21, observance=nearest_workday)
    ]
myc = CustomBusinessDay(calendar = myBirthdayCalendar())
pd.date_range(start='4/1/2018', end='4/30/2018', freq=myc)

輸出:

DatetimeIndex(['2018-04-02', '2018-04-03', '2018-04-04', '2018-04-05',
               '2018-04-06', '2018-04-09', '2018-04-10', '2018-04-11',
               '2018-04-12', '2018-04-13', '2018-04-16', '2018-04-17',
               '2018-04-18', '2018-04-19', '2018-04-23', '2018-04-24',
               '2018-04-25', '2018-04-26', '2018-04-27', '2018-04-30'],
              dtype='datetime64[ns]', freq='C')

從輸出可以看出, 2018-04-20 也被劃為休息日了. OK, 繼續......

大多數國家的工作日都是從週一到週五, 但是也有不一樣的, 比如埃及的工作日就是從週日到週四, 所以, 我們又要自定義了:

b = CustomBusinessDay(weekmask='Sun Mon Tue Wed Thu')
pd.date_range(start='4/1/2018', end='4/30/2018', freq=b)

輸出:

DatetimeIndex(['2018-04-01', '2018-04-02', '2018-04-03', '2018-04-04',
               '2018-04-05', '2018-04-08', '2018-04-09', '2018-04-10',
               '2018-04-11', '2018-04-12', '2018-04-15', '2018-04-16',
               '2018-04-17', '2018-04-18', '2018-04-19', '2018-04-22',
               '2018-04-23', '2018-04-24', '2018-04-25', '2018-04-26',
               '2018-04-29', '2018-04-30'],
              dtype='datetime64[ns]', freq='C')

那比方說, 其中的某一天或者幾天又是法定節假日呢? 簡單:

b = CustomBusinessDay(weekmask='Sun Mon Tue Wed Thu', holidays=['2018-04-15'])
pd.date_range(start='4/1/2018', end='4/30/2018', freq=b)

輸出:

DatetimeIndex(['2018-04-01', '2018-04-02', '2018-04-03', '2018-04-04',
               '2018-04-05', '2018-04-08', '2018-04-09', '2018-04-10',
               '2018-04-11', '2018-04-12', '2018-04-16', '2018-04-17',
               '2018-04-18', '2018-04-19', '2018-04-22', '2018-04-23',
               '2018-04-24', '2018-04-25', '2018-04-26', '2018-04-29',
               '2018-04-30'],
              dtype='datetime64[ns]', freq='C')

綜上, 我們可以看到 Pandas 真的非常強大, 它有各種各樣的引數, 通過不同的設定, 取值, 可謂是花樣玩轉資料分析.

本作品採用《CC 協議》，轉載必須註明作者和本文連結

Pandas基礎
2024-07-22
Pandas 基礎 (2) - Dataframe 基礎
2019-03-07
Pandas進階貳 pandas基礎
2020-12-20
pandas - 基礎屬性
2024-03-10
Pandas基礎介紹
2020-05-06
Pandas基礎學習
2021-05-10
Pandas 基礎 (14) - DatetimeIndex and Resample
2019-04-14
Index
Pandas 基礎 (17) - to_datetime
2019-04-28
Pandas 基礎 (18) - Period and PeriodIndex
2019-04-28
Index
c#基礎-基礎-16.string
2021-01-03
C#
Pandas 基礎 (12) - Stack 和 Unstack
2019-04-14
pandas學習之Python基礎
2020-12-16
Python
【Pandas基礎教程】第02講 Pandas讀取資料
2020-12-24
Pandas 基礎 (13) - Crosstab 交叉列表取值
2019-04-13
ROS
Pandas 基礎 (9) - 組合方法 merge
2019-04-02
組隊學習2——pandas基礎
2020-12-19
C - Ideal Holidays
2024-03-30
Idea
Pandas 基礎 (8) - 用 concat 組合 dataframe
2019-03-24
『與善仁』Appium基礎 — 16、APPium基礎操作API
2021-12-04
APPAPI
Pandas 基礎 (11) - 用 melt 做格式轉換
2019-04-09
Pandas 基礎 (1) - 初識及安裝 yupyter
2019-03-07
Pandas 基礎 (3) - 生成 Dataframe 的幾種方式
2019-03-07
Pandas 基礎 (5) - 處理缺失的資料
2019-03-08
python基礎學習16—-模組
2018-09-20
Python
Pandas庫基礎分析——資料生成和訪問
2019-02-16
Pandas 基礎 (4) - 讀 / 寫 Excel 和 CSV 檔案
2019-03-08
Excel
2024年6月16日 Python - 基礎
2024-06-17
Python
【重溫基礎】16.JSON物件介紹
2019-01-17
JSON物件
2024年6月16日 Python 基礎【歸檔】
2024-06-17
Python
資料統計分析的 16 個基礎概念
2022-09-14
Pandas 基礎 (19) - 運算元據庫 (read_sql, to_sql)
2019-05-13
SQL
[ABC347C] Ideal Holidays題解
2024-03-31
Idea
pandas 處理資料和crc16計算
2020-09-26
16道Linux基礎命令題目及答案彙總!
2023-10-27
Linux
【python系統學習16】編碼基礎知識
2020-05-30
Python
『忘了再學』Shell基礎 — 16、位置引數變數
2022-05-23
變數
Pandas 基礎 (6) - 用 replace () 函式處理不合理資料
2019-03-24
函式
android基礎學習-android篇day16-Menu的使用
2018-09-21
Android

Pandas 基礎 (16) - Holidays

相關文章