原文:
pandas.pydata.org/docs/
時間增量
原文:
pandas.pydata.org/docs/user_guide/timedeltas.html
時間增量是時間之間的差異,以不同的單位表示,例如天、小時、分鐘、秒。它們可以是正數也可以是負數。
Timedelta
是datetime.timedelta
的子類,並且行為類似,但也允許與np.timedelta64
型別相容,以及一系列自定義表示、解析和屬性。
解析
您可以透過各種引數構造一個Timedelta
標量,包括ISO 8601 Duration字串。
In [1]: import datetime
# strings
In [2]: pd.Timedelta("1 days")
Out[2]: Timedelta('1 days 00:00:00')
In [3]: pd.Timedelta("1 days 00:00:00")
Out[3]: Timedelta('1 days 00:00:00')
In [4]: pd.Timedelta("1 days 2 hours")
Out[4]: Timedelta('1 days 02:00:00')
In [5]: pd.Timedelta("-1 days 2 min 3us")
Out[5]: Timedelta('-2 days +23:57:59.999997')
# like datetime.timedelta
# note: these MUST be specified as keyword arguments
In [6]: pd.Timedelta(days=1, seconds=1)
Out[6]: Timedelta('1 days 00:00:01')
# integers with a unit
In [7]: pd.Timedelta(1, unit="d")
Out[7]: Timedelta('1 days 00:00:00')
# from a datetime.timedelta/np.timedelta64
In [8]: pd.Timedelta(datetime.timedelta(days=1, seconds=1))
Out[8]: Timedelta('1 days 00:00:01')
In [9]: pd.Timedelta(np.timedelta64(1, "ms"))
Out[9]: Timedelta('0 days 00:00:00.001000')
# negative Timedeltas have this string repr
# to be more consistent with datetime.timedelta conventions
In [10]: pd.Timedelta("-1us")
Out[10]: Timedelta('-1 days +23:59:59.999999')
# a NaT
In [11]: pd.Timedelta("nan")
Out[11]: NaT
In [12]: pd.Timedelta("nat")
Out[12]: NaT
# ISO 8601 Duration strings
In [13]: pd.Timedelta("P0DT0H1M0S")
Out[13]: Timedelta('0 days 00:01:00')
In [14]: pd.Timedelta("P0DT0H0M0.000000123S")
Out[14]: Timedelta('0 days 00:00:00.000000123')
日期偏移(Day, Hour, Minute, Second, Milli, Micro, Nano
)也可以用於構建。
In [15]: pd.Timedelta(pd.offsets.Second(2))
Out[15]: Timedelta('0 days 00:00:02')
此外,標量與標量之間的操作將產生另一個標量Timedelta
。
In [16]: pd.Timedelta(pd.offsets.Day(2)) + pd.Timedelta(pd.offsets.Second(2)) + pd.Timedelta(
....: "00:00:00.000123"
....: )
....:
Out[16]: Timedelta('2 days 00:00:02.000123')
to_timedelta
使用頂級的 pd.to_timedelta
,您可以將識別的時間增量格式/值的標量、陣列、列表或序列轉換為 Timedelta
型別。如果輸入是序列,則將構造序列,如果輸入類似於標量,則將輸出標量,否則將輸出 TimedeltaIndex
。
您可以將單個字串解析為一個時間增量:
In [17]: pd.to_timedelta("1 days 06:05:01.00003")
Out[17]: Timedelta('1 days 06:05:01.000030')
In [18]: pd.to_timedelta("15.5us")
Out[18]: Timedelta('0 days 00:00:00.000015500')
或者一個字串的列表/陣列:
In [19]: pd.to_timedelta(["1 days 06:05:01.00003", "15.5us", "nan"])
Out[19]: TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015500', NaT], dtype='timedelta64[ns]', freq=None)
unit
關鍵字引數指定了 Timedelta 的單位,如果輸入是數字的話:
In [20]: pd.to_timedelta(np.arange(5), unit="s")
Out[20]:
TimedeltaIndex(['0 days 00:00:00', '0 days 00:00:01', '0 days 00:00:02',
'0 days 00:00:03', '0 days 00:00:04'],
dtype='timedelta64[ns]', freq=None)
In [21]: pd.to_timedelta(np.arange(5), unit="d")
Out[21]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
警告
如果將字串或字串陣列作為輸入傳遞,則將忽略unit
關鍵字引數。如果傳遞沒有單位的字串,則假定預設單位為納秒。
時間增量的限制
pandas 使用 64 位整數以納秒解析度表示Timedeltas
。因此,64 位整數限制確定了Timedelta
的限制。
In [22]: pd.Timedelta.min
Out[22]: Timedelta('-106752 days +00:12:43.145224193')
In [23]: pd.Timedelta.max
Out[23]: Timedelta('106751 days 23:47:16.854775807')
``` ## 操作
您可以對序列/資料框進行操作,並透過在`datetime64[ns]`序列或`Timestamps`上執行減法操作來構建`timedelta64[ns]`序列。
```py
In [24]: s = pd.Series(pd.date_range("2012-1-1", periods=3, freq="D"))
In [25]: td = pd.Series([pd.Timedelta(days=i) for i in range(3)])
In [26]: df = pd.DataFrame({"A": s, "B": td})
In [27]: df
Out[27]:
A B
0 2012-01-01 0 days
1 2012-01-02 1 days
2 2012-01-03 2 days
In [28]: df["C"] = df["A"] + df["B"]
In [29]: df
Out[29]:
A B C
0 2012-01-01 0 days 2012-01-01
1 2012-01-02 1 days 2012-01-03
2 2012-01-03 2 days 2012-01-05
In [30]: df.dtypes
Out[30]:
A datetime64[ns]
B timedelta64[ns]
C datetime64[ns]
dtype: object
In [31]: s - s.max()
Out[31]:
0 -2 days
1 -1 days
2 0 days
dtype: timedelta64[ns]
In [32]: s - datetime.datetime(2011, 1, 1, 3, 5)
Out[32]:
0 364 days 20:55:00
1 365 days 20:55:00
2 366 days 20:55:00
dtype: timedelta64[ns]
In [33]: s + datetime.timedelta(minutes=5)
Out[33]:
0 2012-01-01 00:05:00
1 2012-01-02 00:05:00
2 2012-01-03 00:05:00
dtype: datetime64[ns]
In [34]: s + pd.offsets.Minute(5)
Out[34]:
0 2012-01-01 00:05:00
1 2012-01-02 00:05:00
2 2012-01-03 00:05:00
dtype: datetime64[ns]
In [35]: s + pd.offsets.Minute(5) + pd.offsets.Milli(5)
Out[35]:
0 2012-01-01 00:05:00.005
1 2012-01-02 00:05:00.005
2 2012-01-03 00:05:00.005
dtype: datetime64[ns]
使用timedelta64[ns]
序列的標量進行操作:
In [36]: y = s - s[0]
In [37]: y
Out[37]:
0 0 days
1 1 days
2 2 days
dtype: timedelta64[ns]
支援帶有NaT
值的時間增量序列:
In [38]: y = s - s.shift()
In [39]: y
Out[39]:
0 NaT
1 1 days
2 1 days
dtype: timedelta64[ns]
可以使用np.nan
將元素設定為NaT
,類似於日期時間:
In [40]: y[1] = np.nan
In [41]: y
Out[41]:
0 NaT
1 NaT
2 1 days
dtype: timedelta64[ns]
運算元也可以以相反的順序出現(一個物件與一個序列進行操作):
In [42]: s.max() - s
Out[42]:
0 2 days
1 1 days
2 0 days
dtype: timedelta64[ns]
In [43]: datetime.datetime(2011, 1, 1, 3, 5) - s
Out[43]:
0 -365 days +03:05:00
1 -366 days +03:05:00
2 -367 days +03:05:00
dtype: timedelta64[ns]
In [44]: datetime.timedelta(minutes=5) + s
Out[44]:
0 2012-01-01 00:05:00
1 2012-01-02 00:05:00
2 2012-01-03 00:05:00
dtype: datetime64[ns]
min, max
和相應的 idxmin, idxmax
操作也適用於框架:
In [45]: A = s - pd.Timestamp("20120101") - pd.Timedelta("00:05:05")
In [46]: B = s - pd.Series(pd.date_range("2012-1-2", periods=3, freq="D"))
In [47]: df = pd.DataFrame({"A": A, "B": B})
In [48]: df
Out[48]:
A B
0 -1 days +23:54:55 -1 days
1 0 days 23:54:55 -1 days
2 1 days 23:54:55 -1 days
In [49]: df.min()
Out[49]:
A -1 days +23:54:55
B -1 days +00:00:00
dtype: timedelta64[ns]
In [50]: df.min(axis=1)
Out[50]:
0 -1 days
1 -1 days
2 -1 days
dtype: timedelta64[ns]
In [51]: df.idxmin()
Out[51]:
A 0
B 0
dtype: int64
In [52]: df.idxmax()
Out[52]:
A 2
B 0
dtype: int64
min, max, idxmin, idxmax
操作也適用於序列。標量結果將是一個Timedelta
。
In [53]: df.min().max()
Out[53]: Timedelta('-1 days +23:54:55')
In [54]: df.min(axis=1).min()
Out[54]: Timedelta('-1 days +00:00:00')
In [55]: df.min().idxmax()
Out[55]: 'A'
In [56]: df.min(axis=1).idxmin()
Out[56]: 0
您可以對時間增量進行填充,傳遞一個時間增量以獲得特定值。
In [57]: y.fillna(pd.Timedelta(0))
Out[57]:
0 0 days
1 0 days
2 1 days
dtype: timedelta64[ns]
In [58]: y.fillna(pd.Timedelta(10, unit="s"))
Out[58]:
0 0 days 00:00:10
1 0 days 00:00:10
2 1 days 00:00:00
dtype: timedelta64[ns]
In [59]: y.fillna(pd.Timedelta("-1 days, 00:00:05"))
Out[59]:
0 -1 days +00:00:05
1 -1 days +00:00:05
2 1 days 00:00:00
dtype: timedelta64[ns]
您還可以對Timedeltas
進行取反、乘法和使用abs
:
In [60]: td1 = pd.Timedelta("-1 days 2 hours 3 seconds")
In [61]: td1
Out[61]: Timedelta('-2 days +21:59:57')
In [62]: -1 * td1
Out[62]: Timedelta('1 days 02:00:03')
In [63]: -td1
Out[63]: Timedelta('1 days 02:00:03')
In [64]: abs(td1)
Out[64]: Timedelta('1 days 02:00:03')
``` ## 縮減
`timedelta64[ns]`的數值縮減操作將返回`Timedelta`物件。通常在評估過程中會跳過`NaT`。
```py
In [65]: y2 = pd.Series(
....: pd.to_timedelta(["-1 days +00:00:05", "nat", "-1 days +00:00:05", "1 days"])
....: )
....:
In [66]: y2
Out[66]:
0 -1 days +00:00:05
1 NaT
2 -1 days +00:00:05
3 1 days 00:00:00
dtype: timedelta64[ns]
In [67]: y2.mean()
Out[67]: Timedelta('-1 days +16:00:03.333333334')
In [68]: y2.median()
Out[68]: Timedelta('-1 days +00:00:05')
In [69]: y2.quantile(0.1)
Out[69]: Timedelta('-1 days +00:00:05')
In [70]: y2.sum()
Out[70]: Timedelta('-1 days +00:00:10')
``` ## 頻率轉換
時間增量序列和 `TimedeltaIndex`,以及 `Timedelta` 可以透過轉換為特定的時間增量資料型別來轉換為其他頻率。
```py
In [71]: december = pd.Series(pd.date_range("20121201", periods=4))
In [72]: january = pd.Series(pd.date_range("20130101", periods=4))
In [73]: td = january - december
In [74]: td[2] += datetime.timedelta(minutes=5, seconds=3)
In [75]: td[3] = np.nan
In [76]: td
Out[76]:
0 31 days 00:00:00
1 31 days 00:00:00
2 31 days 00:05:03
3 NaT
dtype: timedelta64[ns]
# to seconds
In [77]: td.astype("timedelta64[s]")
Out[77]:
0 31 days 00:00:00
1 31 days 00:00:00
2 31 days 00:05:03
3 NaT
dtype: timedelta64[s]
對於除了支援的“s”、“ms”、“us”、“ns”之外的 timedelta64 解析度,另一種方法是除以另一個 timedelta 物件。請注意,透過 NumPy 標量進行的除法是真除法,而 astyping 等同於 floor division。
# to days
In [78]: td / np.timedelta64(1, "D")
Out[78]:
0 31.000000
1 31.000000
2 31.003507
3 NaN
dtype: float64
將timedelta64[ns]
Series 除以整數或整數系列,或者乘以整數,會得到另一個timedelta64[ns]
dtypes Series。
In [79]: td * -1
Out[79]:
0 -31 days +00:00:00
1 -31 days +00:00:00
2 -32 days +23:54:57
3 NaT
dtype: timedelta64[ns]
In [80]: td * pd.Series([1, 2, 3, 4])
Out[80]:
0 31 days 00:00:00
1 62 days 00:00:00
2 93 days 00:15:09
3 NaT
dtype: timedelta64[ns]
對timedelta64[ns]
Series 進行四捨五入除法(floor-division)得到一個整數系列。
In [81]: td // pd.Timedelta(days=3, hours=4)
Out[81]:
0 9.0
1 9.0
2 9.0
3 NaN
dtype: float64
In [82]: pd.Timedelta(days=3, hours=4) // td
Out[82]:
0 0.0
1 0.0
2 0.0
3 NaN
dtype: float64
當與另一個類似timedelta
或數值引數操作時,Timedelta
定義了 mod(%)和 divmod 操作。
In [83]: pd.Timedelta(hours=37) % datetime.timedelta(hours=2)
Out[83]: Timedelta('0 days 01:00:00')
# divmod against a timedelta-like returns a pair (int, Timedelta)
In [84]: divmod(datetime.timedelta(hours=2), pd.Timedelta(minutes=11))
Out[84]: (10, Timedelta('0 days 00:10:00'))
# divmod against a numeric returns a pair (Timedelta, Timedelta)
In [85]: divmod(pd.Timedelta(hours=25), 86400000000000)
Out[85]: (Timedelta('0 days 00:00:00.000000001'), Timedelta('0 days 01:00:00'))
屬性
你可以直接使用days,seconds,microseconds,nanoseconds
屬性訪問Timedelta
或TimedeltaIndex
的各個元件。這些與datetime.timedelta
返回的值相同,例如,.seconds
屬性表示大於等於 0 且小於 1 天的秒數。這些值根據Timedelta
是否有符號而有所不同。
這些操作也可以透過Series
的.dt
屬性直接訪問。
注意
注意,屬性不是Timedelta
的顯示值。使用.components
來檢索顯示值。
對於Series
:
In [86]: td.dt.days
Out[86]:
0 31.0
1 31.0
2 31.0
3 NaN
dtype: float64
In [87]: td.dt.seconds
Out[87]:
0 0.0
1 0.0
2 303.0
3 NaN
dtype: float64
你可以直接訪問標量Timedelta
的欄位值。
In [88]: tds = pd.Timedelta("31 days 5 min 3 sec")
In [89]: tds.days
Out[89]: 31
In [90]: tds.seconds
Out[90]: 303
In [91]: (-tds).seconds
Out[91]: 86097
你可以使用.components
屬性訪問時間間隔的簡化形式。這將返回一個類似於Series
的索引的DataFrame
。這些是Timedelta
的顯示值。
In [92]: td.dt.components
Out[92]:
days hours minutes seconds milliseconds microseconds nanoseconds
0 31.0 0.0 0.0 0.0 0.0 0.0 0.0
1 31.0 0.0 0.0 0.0 0.0 0.0 0.0
2 31.0 0.0 5.0 3.0 0.0 0.0 0.0
3 NaN NaN NaN NaN NaN NaN NaN
In [93]: td.dt.components.seconds
Out[93]:
0 0.0
1 0.0
2 3.0
3 NaN
Name: seconds, dtype: float64
你可以使用.isoformat
方法將Timedelta
轉換為ISO 8601 Duration字串。
In [94]: pd.Timedelta(
....: days=6, minutes=50, seconds=3, milliseconds=10, microseconds=10, nanoseconds=12
....: ).isoformat()
....:
Out[94]: 'P6DT0H50M3.010010012S'
TimedeltaIndex
要生成帶有時間間隔的索引,可以使用TimedeltaIndex
或timedelta_range()
建構函式。
使用TimedeltaIndex
可以傳遞類似字串、Timedelta
、timedelta
或np.timedelta64
的物件。傳遞np.nan/pd.NaT/nat
將表示缺失值。
In [95]: pd.TimedeltaIndex(
....: [
....: "1 days",
....: "1 days, 00:00:05",
....: np.timedelta64(2, "D"),
....: datetime.timedelta(days=2, seconds=2),
....: ]
....: )
....:
Out[95]:
TimedeltaIndex(['1 days 00:00:00', '1 days 00:00:05', '2 days 00:00:00',
'2 days 00:00:02'],
dtype='timedelta64[ns]', freq=None)
字串‘infer’可以傳遞以將索引的頻率設定為建立時推斷的頻率:
In [96]: pd.TimedeltaIndex(["0 days", "10 days", "20 days"], freq="infer")
Out[96]: TimedeltaIndex(['0 days', '10 days', '20 days'], dtype='timedelta64[ns]', freq='10D')
生成時間間隔範圍
類似於date_range()
,你可以使用timedelta_range()
構造TimedeltaIndex
的常規範圍。timedelta_range
的預設頻率是日曆日:
In [97]: pd.timedelta_range(start="1 days", periods=5)
Out[97]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D')
可以使用timedelta_range
與start
、end
和periods
的各種組合:
In [98]: pd.timedelta_range(start="1 days", end="5 days")
Out[98]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D')
In [99]: pd.timedelta_range(end="10 days", periods=4)
Out[99]: TimedeltaIndex(['7 days', '8 days', '9 days', '10 days'], dtype='timedelta64[ns]', freq='D')
freq
引數可以傳遞各種 frequency aliases:
In [100]: pd.timedelta_range(start="1 days", end="2 days", freq="30min")
Out[100]:
TimedeltaIndex(['1 days 00:00:00', '1 days 00:30:00', '1 days 01:00:00',
'1 days 01:30:00', '1 days 02:00:00', '1 days 02:30:00',
'1 days 03:00:00', '1 days 03:30:00', '1 days 04:00:00',
'1 days 04:30:00', '1 days 05:00:00', '1 days 05:30:00',
'1 days 06:00:00', '1 days 06:30:00', '1 days 07:00:00',
'1 days 07:30:00', '1 days 08:00:00', '1 days 08:30:00',
'1 days 09:00:00', '1 days 09:30:00', '1 days 10:00:00',
'1 days 10:30:00', '1 days 11:00:00', '1 days 11:30:00',
'1 days 12:00:00', '1 days 12:30:00', '1 days 13:00:00',
'1 days 13:30:00', '1 days 14:00:00', '1 days 14:30:00',
'1 days 15:00:00', '1 days 15:30:00', '1 days 16:00:00',
'1 days 16:30:00', '1 days 17:00:00', '1 days 17:30:00',
'1 days 18:00:00', '1 days 18:30:00', '1 days 19:00:00',
'1 days 19:30:00', '1 days 20:00:00', '1 days 20:30:00',
'1 days 21:00:00', '1 days 21:30:00', '1 days 22:00:00',
'1 days 22:30:00', '1 days 23:00:00', '1 days 23:30:00',
'2 days 00:00:00'],
dtype='timedelta64[ns]', freq='30min')
In [101]: pd.timedelta_range(start="1 days", periods=5, freq="2D5h")
Out[101]:
TimedeltaIndex(['1 days 00:00:00', '3 days 05:00:00', '5 days 10:00:00',
'7 days 15:00:00', '9 days 20:00:00'],
dtype='timedelta64[ns]', freq='53h')
指定 start
、end
和 periods
將生成一系列從 start
到 end
的等間隔 timedeltas,其中結果 TimedeltaIndex
中的元素數為 periods
:
In [102]: pd.timedelta_range("0 days", "4 days", periods=5)
Out[102]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
In [103]: pd.timedelta_range("0 days", "4 days", periods=10)
Out[103]:
TimedeltaIndex(['0 days 00:00:00', '0 days 10:40:00', '0 days 21:20:00',
'1 days 08:00:00', '1 days 18:40:00', '2 days 05:20:00',
'2 days 16:00:00', '3 days 02:40:00', '3 days 13:20:00',
'4 days 00:00:00'],
dtype='timedelta64[ns]', freq=None)
使用 TimedeltaIndex
類似於其他日期時間索引,DatetimeIndex
和 PeriodIndex
,你可以將 TimedeltaIndex
用作 pandas 物件的索引。
In [104]: s = pd.Series(
.....: np.arange(100),
.....: index=pd.timedelta_range("1 days", periods=100, freq="h"),
.....: )
.....:
In [105]: s
Out[105]:
1 days 00:00:00 0
1 days 01:00:00 1
1 days 02:00:00 2
1 days 03:00:00 3
1 days 04:00:00 4
..
4 days 23:00:00 95
5 days 00:00:00 96
5 days 01:00:00 97
5 days 02:00:00 98
5 days 03:00:00 99
Freq: h, Length: 100, dtype: int64
選擇方式類似,對於字串樣式和切片都會進行強制轉換:
In [106]: s["1 day":"2 day"]
Out[106]:
1 days 00:00:00 0
1 days 01:00:00 1
1 days 02:00:00 2
1 days 03:00:00 3
1 days 04:00:00 4
..
2 days 19:00:00 43
2 days 20:00:00 44
2 days 21:00:00 45
2 days 22:00:00 46
2 days 23:00:00 47
Freq: h, Length: 48, dtype: int64
In [107]: s["1 day 01:00:00"]
Out[107]: 1
In [108]: s[pd.Timedelta("1 day 1h")]
Out[108]: 1
此外,你可以使用部分字串選擇,範圍將被推斷:
In [109]: s["1 day":"1 day 5 hours"]
Out[109]:
1 days 00:00:00 0
1 days 01:00:00 1
1 days 02:00:00 2
1 days 03:00:00 3
1 days 04:00:00 4
1 days 05:00:00 5
Freq: h, dtype: int64
操作
最後,TimedeltaIndex
與 DatetimeIndex
的組合允許保留某些組合操作的 NaT:
In [110]: tdi = pd.TimedeltaIndex(["1 days", pd.NaT, "2 days"])
In [111]: tdi.to_list()
Out[111]: [Timedelta('1 days 00:00:00'), NaT, Timedelta('2 days 00:00:00')]
In [112]: dti = pd.date_range("20130101", periods=3)
In [113]: dti.to_list()
Out[113]:
[Timestamp('2013-01-01 00:00:00'),
Timestamp('2013-01-02 00:00:00'),
Timestamp('2013-01-03 00:00:00')]
In [114]: (dti + tdi).to_list()
Out[114]: [Timestamp('2013-01-02 00:00:00'), NaT, Timestamp('2013-01-05 00:00:00')]
In [115]: (dti - tdi).to_list()
Out[115]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2013-01-01 00:00:00')]
轉換
類似於上面對Series
進行的頻率轉換,你可以將這些索引轉換為另一個索引。
In [116]: tdi / np.timedelta64(1, "s")
Out[116]: Index([86400.0, nan, 172800.0], dtype='float64')
In [117]: tdi.astype("timedelta64[s]")
Out[117]: TimedeltaIndex(['1 days', NaT, '2 days'], dtype='timedelta64[s]', freq=None)
標量型別操作也有效。這些可能返回一個不同型別的索引。
# adding or timedelta and date -> datelike
In [118]: tdi + pd.Timestamp("20130101")
Out[118]: DatetimeIndex(['2013-01-02', 'NaT', '2013-01-03'], dtype='datetime64[ns]', freq=None)
# subtraction of a date and a timedelta -> datelike
# note that trying to subtract a date from a Timedelta will raise an exception
In [119]: (pd.Timestamp("20130101") - tdi).to_list()
Out[119]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2012-12-30 00:00:00')]
# timedelta + timedelta -> timedelta
In [120]: tdi + pd.Timedelta("10 days")
Out[120]: TimedeltaIndex(['11 days', NaT, '12 days'], dtype='timedelta64[ns]', freq=None)
# division can result in a Timedelta if the divisor is an integer
In [121]: tdi / 2
Out[121]: TimedeltaIndex(['0 days 12:00:00', NaT, '1 days 00:00:00'], dtype='timedelta64[ns]', freq=None)
# or a float64 Index if the divisor is a Timedelta
In [122]: tdi / tdi[0]
Out[122]: Index([1.0, nan, 2.0], dtype='float64')
``` ## 重新取樣
類似於時間序列重取樣,我們可以使用 `TimedeltaIndex` 進行重新取樣。
```py
In [123]: s.resample("D").mean()
Out[123]:
1 days 11.5
2 days 35.5
3 days 59.5
4 days 83.5
5 days 97.5
Freq: D, dtype: float64
解析
你可以透過各種引數構造一個 Timedelta
標量,包括ISO 8601 Duration字串。
In [1]: import datetime
# strings
In [2]: pd.Timedelta("1 days")
Out[2]: Timedelta('1 days 00:00:00')
In [3]: pd.Timedelta("1 days 00:00:00")
Out[3]: Timedelta('1 days 00:00:00')
In [4]: pd.Timedelta("1 days 2 hours")
Out[4]: Timedelta('1 days 02:00:00')
In [5]: pd.Timedelta("-1 days 2 min 3us")
Out[5]: Timedelta('-2 days +23:57:59.999997')
# like datetime.timedelta
# note: these MUST be specified as keyword arguments
In [6]: pd.Timedelta(days=1, seconds=1)
Out[6]: Timedelta('1 days 00:00:01')
# integers with a unit
In [7]: pd.Timedelta(1, unit="d")
Out[7]: Timedelta('1 days 00:00:00')
# from a datetime.timedelta/np.timedelta64
In [8]: pd.Timedelta(datetime.timedelta(days=1, seconds=1))
Out[8]: Timedelta('1 days 00:00:01')
In [9]: pd.Timedelta(np.timedelta64(1, "ms"))
Out[9]: Timedelta('0 days 00:00:00.001000')
# negative Timedeltas have this string repr
# to be more consistent with datetime.timedelta conventions
In [10]: pd.Timedelta("-1us")
Out[10]: Timedelta('-1 days +23:59:59.999999')
# a NaT
In [11]: pd.Timedelta("nan")
Out[11]: NaT
In [12]: pd.Timedelta("nat")
Out[12]: NaT
# ISO 8601 Duration strings
In [13]: pd.Timedelta("P0DT0H1M0S")
Out[13]: Timedelta('0 days 00:01:00')
In [14]: pd.Timedelta("P0DT0H0M0.000000123S")
Out[14]: Timedelta('0 days 00:00:00.000000123')
DateOffsets(Day, Hour, Minute, Second, Milli, Micro, Nano
)也可以在構造中使用。
In [15]: pd.Timedelta(pd.offsets.Second(2))
Out[15]: Timedelta('0 days 00:00:02')
此外,標量之間的操作會產生另一個標量 Timedelta
。
In [16]: pd.Timedelta(pd.offsets.Day(2)) + pd.Timedelta(pd.offsets.Second(2)) + pd.Timedelta(
....: "00:00:00.000123"
....: )
....:
Out[16]: Timedelta('2 days 00:00:02.000123')
to_timedelta
使用頂層的 pd.to_timedelta
,你可以將一個被識別的時間差格式/值的標量、陣列、列表或 Series 轉換為 Timedelta
型別。如果輸入是 Series,則會構造 Series;如果輸入類似於標量,則會構造標量,否則將輸出一個 TimedeltaIndex
。
你可以將單個字串解析為 Timedelta:
In [17]: pd.to_timedelta("1 days 06:05:01.00003")
Out[17]: Timedelta('1 days 06:05:01.000030')
In [18]: pd.to_timedelta("15.5us")
Out[18]: Timedelta('0 days 00:00:00.000015500')
或者一個字串列表/陣列:
In [19]: pd.to_timedelta(["1 days 06:05:01.00003", "15.5us", "nan"])
Out[19]: TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015500', NaT], dtype='timedelta64[ns]', freq=None)
如果輸入是數字,則unit
關鍵字引數指定 Timedelta 的單位:
In [20]: pd.to_timedelta(np.arange(5), unit="s")
Out[20]:
TimedeltaIndex(['0 days 00:00:00', '0 days 00:00:01', '0 days 00:00:02',
'0 days 00:00:03', '0 days 00:00:04'],
dtype='timedelta64[ns]', freq=None)
In [21]: pd.to_timedelta(np.arange(5), unit="d")
Out[21]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
警告
如果作為輸入傳遞了字串或字串陣列,則unit
關鍵字引數將被忽略。如果傳遞的是沒有單位的字串,則假定為預設單位為納秒。
Timedelta 的限制
pandas 使用 64 位整數以納秒解析度表示 Timedeltas
。因此,64 位整數的限制確定了 Timedelta
的限制。
In [22]: pd.Timedelta.min
Out[22]: Timedelta('-106752 days +00:12:43.145224193')
In [23]: pd.Timedelta.max
Out[23]: Timedelta('106751 days 23:47:16.854775807')
to_timedelta
使用頂層的 pd.to_timedelta
,你可以將一個被識別的時間差格式/值的標量、陣列、列表或 Series 轉換為 Timedelta
型別。如果輸入是 Series,則會構造 Series;如果輸入類似於標量,則會構造標量,否則將輸出一個 TimedeltaIndex
。
你可以將單個字串解析為 Timedelta:
In [17]: pd.to_timedelta("1 days 06:05:01.00003")
Out[17]: Timedelta('1 days 06:05:01.000030')
In [18]: pd.to_timedelta("15.5us")
Out[18]: Timedelta('0 days 00:00:00.000015500')
或者一個字串列表/陣列:
In [19]: pd.to_timedelta(["1 days 06:05:01.00003", "15.5us", "nan"])
Out[19]: TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015500', NaT], dtype='timedelta64[ns]', freq=None)
如果輸入是數字,則unit
關鍵字引數指定 Timedelta 的單位:
In [20]: pd.to_timedelta(np.arange(5), unit="s")
Out[20]:
TimedeltaIndex(['0 days 00:00:00', '0 days 00:00:01', '0 days 00:00:02',
'0 days 00:00:03', '0 days 00:00:04'],
dtype='timedelta64[ns]', freq=None)
In [21]: pd.to_timedelta(np.arange(5), unit="d")
Out[21]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
警告
如果作為輸入傳遞了字串或字串陣列,則unit
關鍵字引數將被忽略。如果傳遞的是沒有單位的字串,則假定為預設單位為納秒。
Timedelta 的限制
pandas 使用 64 位整數以納秒解析度表示 Timedeltas
。因此,64 位整數限制確定了 Timedelta
的限制。
In [22]: pd.Timedelta.min
Out[22]: Timedelta('-106752 days +00:12:43.145224193')
In [23]: pd.Timedelta.max
Out[23]: Timedelta('106751 days 23:47:16.854775807')
操作
您可以對 Series/DataFrames 進行操作,並透過減法操作在 datetime64[ns]
Series 或 Timestamps
上構建 timedelta64[ns]
Series。
In [24]: s = pd.Series(pd.date_range("2012-1-1", periods=3, freq="D"))
In [25]: td = pd.Series([pd.Timedelta(days=i) for i in range(3)])
In [26]: df = pd.DataFrame({"A": s, "B": td})
In [27]: df
Out[27]:
A B
0 2012-01-01 0 days
1 2012-01-02 1 days
2 2012-01-03 2 days
In [28]: df["C"] = df["A"] + df["B"]
In [29]: df
Out[29]:
A B C
0 2012-01-01 0 days 2012-01-01
1 2012-01-02 1 days 2012-01-03
2 2012-01-03 2 days 2012-01-05
In [30]: df.dtypes
Out[30]:
A datetime64[ns]
B timedelta64[ns]
C datetime64[ns]
dtype: object
In [31]: s - s.max()
Out[31]:
0 -2 days
1 -1 days
2 0 days
dtype: timedelta64[ns]
In [32]: s - datetime.datetime(2011, 1, 1, 3, 5)
Out[32]:
0 364 days 20:55:00
1 365 days 20:55:00
2 366 days 20:55:00
dtype: timedelta64[ns]
In [33]: s + datetime.timedelta(minutes=5)
Out[33]:
0 2012-01-01 00:05:00
1 2012-01-02 00:05:00
2 2012-01-03 00:05:00
dtype: datetime64[ns]
In [34]: s + pd.offsets.Minute(5)
Out[34]:
0 2012-01-01 00:05:00
1 2012-01-02 00:05:00
2 2012-01-03 00:05:00
dtype: datetime64[ns]
In [35]: s + pd.offsets.Minute(5) + pd.offsets.Milli(5)
Out[35]:
0 2012-01-01 00:05:00.005
1 2012-01-02 00:05:00.005
2 2012-01-03 00:05:00.005
dtype: datetime64[ns]
從 timedelta64[ns]
Series 中的標量進行操作:
In [36]: y = s - s[0]
In [37]: y
Out[37]:
0 0 days
1 1 days
2 2 days
dtype: timedelta64[ns]
支援具有 NaT
值的時間增量 Series:
In [38]: y = s - s.shift()
In [39]: y
Out[39]:
0 NaT
1 1 days
2 1 days
dtype: timedelta64[ns]
使用 np.nan
類似於日期時間可以將元素設定為 NaT
:
In [40]: y[1] = np.nan
In [41]: y
Out[41]:
0 NaT
1 NaT
2 1 days
dtype: timedelta64[ns]
運算元也可以以相反的順序出現(一個物件與 Series 進行操作):
In [42]: s.max() - s
Out[42]:
0 2 days
1 1 days
2 0 days
dtype: timedelta64[ns]
In [43]: datetime.datetime(2011, 1, 1, 3, 5) - s
Out[43]:
0 -365 days +03:05:00
1 -366 days +03:05:00
2 -367 days +03:05:00
dtype: timedelta64[ns]
In [44]: datetime.timedelta(minutes=5) + s
Out[44]:
0 2012-01-01 00:05:00
1 2012-01-02 00:05:00
2 2012-01-03 00:05:00
dtype: datetime64[ns]
在 frames 上支援 min, max
和相應的 idxmin, idxmax
操作:
In [45]: A = s - pd.Timestamp("20120101") - pd.Timedelta("00:05:05")
In [46]: B = s - pd.Series(pd.date_range("2012-1-2", periods=3, freq="D"))
In [47]: df = pd.DataFrame({"A": A, "B": B})
In [48]: df
Out[48]:
A B
0 -1 days +23:54:55 -1 days
1 0 days 23:54:55 -1 days
2 1 days 23:54:55 -1 days
In [49]: df.min()
Out[49]:
A -1 days +23:54:55
B -1 days +00:00:00
dtype: timedelta64[ns]
In [50]: df.min(axis=1)
Out[50]:
0 -1 days
1 -1 days
2 -1 days
dtype: timedelta64[ns]
In [51]: df.idxmin()
Out[51]:
A 0
B 0
dtype: int64
In [52]: df.idxmax()
Out[52]:
A 2
B 0
dtype: int64
min, max, idxmin, idxmax
操作也支援在 Series 上。標量結果將是一個 Timedelta
。
In [53]: df.min().max()
Out[53]: Timedelta('-1 days +23:54:55')
In [54]: df.min(axis=1).min()
Out[54]: Timedelta('-1 days +00:00:00')
In [55]: df.min().idxmax()
Out[55]: 'A'
In [56]: df.min(axis=1).idxmin()
Out[56]: 0
您可以在 timedeltas 上使用 fillna,傳遞一個 timedelta 以獲取特定值。
In [57]: y.fillna(pd.Timedelta(0))
Out[57]:
0 0 days
1 0 days
2 1 days
dtype: timedelta64[ns]
In [58]: y.fillna(pd.Timedelta(10, unit="s"))
Out[58]:
0 0 days 00:00:10
1 0 days 00:00:10
2 1 days 00:00:00
dtype: timedelta64[ns]
In [59]: y.fillna(pd.Timedelta("-1 days, 00:00:05"))
Out[59]:
0 -1 days +00:00:05
1 -1 days +00:00:05
2 1 days 00:00:00
dtype: timedelta64[ns]
您還可以對 Timedeltas
進行取反、乘法和使用 abs
:
In [60]: td1 = pd.Timedelta("-1 days 2 hours 3 seconds")
In [61]: td1
Out[61]: Timedelta('-2 days +21:59:57')
In [62]: -1 * td1
Out[62]: Timedelta('1 days 02:00:03')
In [63]: -td1
Out[63]: Timedelta('1 days 02:00:03')
In [64]: abs(td1)
Out[64]: Timedelta('1 days 02:00:03')
縮減
對於 timedelta64[ns]
的數值縮減操作將返回 Timedelta
物件。通常在評估過程中跳過 NaT
。
In [65]: y2 = pd.Series(
....: pd.to_timedelta(["-1 days +00:00:05", "nat", "-1 days +00:00:05", "1 days"])
....: )
....:
In [66]: y2
Out[66]:
0 -1 days +00:00:05
1 NaT
2 -1 days +00:00:05
3 1 days 00:00:00
dtype: timedelta64[ns]
In [67]: y2.mean()
Out[67]: Timedelta('-1 days +16:00:03.333333334')
In [68]: y2.median()
Out[68]: Timedelta('-1 days +00:00:05')
In [69]: y2.quantile(0.1)
Out[69]: Timedelta('-1 days +00:00:05')
In [70]: y2.sum()
Out[70]: Timedelta('-1 days +00:00:10')
頻率轉換
Timedelta Series 和 TimedeltaIndex
,以及 Timedelta
可以透過轉換為特定的 timedelta dtype 轉換為其他頻率。
In [71]: december = pd.Series(pd.date_range("20121201", periods=4))
In [72]: january = pd.Series(pd.date_range("20130101", periods=4))
In [73]: td = january - december
In [74]: td[2] += datetime.timedelta(minutes=5, seconds=3)
In [75]: td[3] = np.nan
In [76]: td
Out[76]:
0 31 days 00:00:00
1 31 days 00:00:00
2 31 days 00:05:03
3 NaT
dtype: timedelta64[ns]
# to seconds
In [77]: td.astype("timedelta64[s]")
Out[77]:
0 31 days 00:00:00
1 31 days 00:00:00
2 31 days 00:05:03
3 NaT
dtype: timedelta64[s]
對於不支援的“s”、“ms”、“us”、“ns” 的 timedelta64 解析度,另一種方法是除以另一個 timedelta 物件。請注意,除以 NumPy 標量是真除法,而 astyping 相當於地板除法。
# to days
In [78]: td / np.timedelta64(1, "D")
Out[78]:
0 31.000000
1 31.000000
2 31.003507
3 NaN
dtype: float64
將 timedelta64[ns]
Series 除以整數或整數 Series,或乘以整數,將產生另一個 timedelta64[ns]
dtypes Series。
In [79]: td * -1
Out[79]:
0 -31 days +00:00:00
1 -31 days +00:00:00
2 -32 days +23:54:57
3 NaT
dtype: timedelta64[ns]
In [80]: td * pd.Series([1, 2, 3, 4])
Out[80]:
0 31 days 00:00:00
1 62 days 00:00:00
2 93 days 00:15:09
3 NaT
dtype: timedelta64[ns]
將 timedelta64[ns]
Series 透過標量 Timedelta
進行四捨五入的除法運算將得到一個整數 Series。
In [81]: td // pd.Timedelta(days=3, hours=4)
Out[81]:
0 9.0
1 9.0
2 9.0
3 NaN
dtype: float64
In [82]: pd.Timedelta(days=3, hours=4) // td
Out[82]:
0 0.0
1 0.0
2 0.0
3 NaN
dtype: float64
當與另一個類似 timedelta 或數值引數進行操作時,Timedelta
定義了 mod (%) 和 divmod 操作。
In [83]: pd.Timedelta(hours=37) % datetime.timedelta(hours=2)
Out[83]: Timedelta('0 days 01:00:00')
# divmod against a timedelta-like returns a pair (int, Timedelta)
In [84]: divmod(datetime.timedelta(hours=2), pd.Timedelta(minutes=11))
Out[84]: (10, Timedelta('0 days 00:10:00'))
# divmod against a numeric returns a pair (Timedelta, Timedelta)
In [85]: divmod(pd.Timedelta(hours=25), 86400000000000)
Out[85]: (Timedelta('0 days 00:00:00.000000001'), Timedelta('0 days 01:00:00'))
屬性
您可以直接使用屬性 days,seconds,microseconds,nanoseconds
訪問 Timedelta
或 TimedeltaIndex
的各個元件。這些與 datetime.timedelta
返回的值相同,例如,.seconds
屬性表示大於等於 0 且小於 1 天的秒數。這些根據 Timedelta
是否有符號而有符號。
這些操作也可以透過 Series
的 .dt
屬性直接訪問。
注意
注意,屬性不是 Timedelta
的顯示值。使用 .components
檢索顯示值。
對於一個 Series
:
In [86]: td.dt.days
Out[86]:
0 31.0
1 31.0
2 31.0
3 NaN
dtype: float64
In [87]: td.dt.seconds
Out[87]:
0 0.0
1 0.0
2 303.0
3 NaN
dtype: float64
您可以直接訪問標量 Timedelta
的欄位值。
In [88]: tds = pd.Timedelta("31 days 5 min 3 sec")
In [89]: tds.days
Out[89]: 31
In [90]: tds.seconds
Out[90]: 303
In [91]: (-tds).seconds
Out[91]: 86097
您可以使用 .components
屬性訪問時間增量的縮減形式。這將返回一個類似於 Series
的索引的 DataFrame
。這些是 Timedelta
的顯示值。
In [92]: td.dt.components
Out[92]:
days hours minutes seconds milliseconds microseconds nanoseconds
0 31.0 0.0 0.0 0.0 0.0 0.0 0.0
1 31.0 0.0 0.0 0.0 0.0 0.0 0.0
2 31.0 0.0 5.0 3.0 0.0 0.0 0.0
3 NaN NaN NaN NaN NaN NaN NaN
In [93]: td.dt.components.seconds
Out[93]:
0 0.0
1 0.0
2 3.0
3 NaN
Name: seconds, dtype: float64
您可以使用 .isoformat
方法將 Timedelta
轉換為 ISO 8601 Duration 字串。
In [94]: pd.Timedelta(
....: days=6, minutes=50, seconds=3, milliseconds=10, microseconds=10, nanoseconds=12
....: ).isoformat()
....:
Out[94]: 'P6DT0H50M3.010010012S'
TimedeltaIndex
要生成具有時間增量的索引,您可以使用TimedeltaIndex
或timedelta_range()
建構函式。
使用TimedeltaIndex
,您可以傳遞類似字串的、Timedelta
、timedelta
或np.timedelta64
物件。傳遞np.nan/pd.NaT/nat
將表示缺失值。
In [95]: pd.TimedeltaIndex(
....: [
....: "1 days",
....: "1 days, 00:00:05",
....: np.timedelta64(2, "D"),
....: datetime.timedelta(days=2, seconds=2),
....: ]
....: )
....:
Out[95]:
TimedeltaIndex(['1 days 00:00:00', '1 days 00:00:05', '2 days 00:00:00',
'2 days 00:00:02'],
dtype='timedelta64[ns]', freq=None)
字串‘infer’可以傳遞以設定索引的頻率為建立時推斷的頻率:
In [96]: pd.TimedeltaIndex(["0 days", "10 days", "20 days"], freq="infer")
Out[96]: TimedeltaIndex(['0 days', '10 days', '20 days'], dtype='timedelta64[ns]', freq='10D')
生成時間增量範圍
類似於date_range()
,您可以使用timedelta_range()
構建TimedeltaIndex
的常規範圍。timedelta_range
的預設頻率是日曆日:
In [97]: pd.timedelta_range(start="1 days", periods=5)
Out[97]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D')
可以使用timedelta_range
的各種start
、end
和periods
組合:
In [98]: pd.timedelta_range(start="1 days", end="5 days")
Out[98]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D')
In [99]: pd.timedelta_range(end="10 days", periods=4)
Out[99]: TimedeltaIndex(['7 days', '8 days', '9 days', '10 days'], dtype='timedelta64[ns]', freq='D')
freq
引數可以傳遞各種頻率別名:
In [100]: pd.timedelta_range(start="1 days", end="2 days", freq="30min")
Out[100]:
TimedeltaIndex(['1 days 00:00:00', '1 days 00:30:00', '1 days 01:00:00',
'1 days 01:30:00', '1 days 02:00:00', '1 days 02:30:00',
'1 days 03:00:00', '1 days 03:30:00', '1 days 04:00:00',
'1 days 04:30:00', '1 days 05:00:00', '1 days 05:30:00',
'1 days 06:00:00', '1 days 06:30:00', '1 days 07:00:00',
'1 days 07:30:00', '1 days 08:00:00', '1 days 08:30:00',
'1 days 09:00:00', '1 days 09:30:00', '1 days 10:00:00',
'1 days 10:30:00', '1 days 11:00:00', '1 days 11:30:00',
'1 days 12:00:00', '1 days 12:30:00', '1 days 13:00:00',
'1 days 13:30:00', '1 days 14:00:00', '1 days 14:30:00',
'1 days 15:00:00', '1 days 15:30:00', '1 days 16:00:00',
'1 days 16:30:00', '1 days 17:00:00', '1 days 17:30:00',
'1 days 18:00:00', '1 days 18:30:00', '1 days 19:00:00',
'1 days 19:30:00', '1 days 20:00:00', '1 days 20:30:00',
'1 days 21:00:00', '1 days 21:30:00', '1 days 22:00:00',
'1 days 22:30:00', '1 days 23:00:00', '1 days 23:30:00',
'2 days 00:00:00'],
dtype='timedelta64[ns]', freq='30min')
In [101]: pd.timedelta_range(start="1 days", periods=5, freq="2D5h")
Out[101]:
TimedeltaIndex(['1 days 00:00:00', '3 days 05:00:00', '5 days 10:00:00',
'7 days 15:00:00', '9 days 20:00:00'],
dtype='timedelta64[ns]', freq='53h')
指定start
、end
和periods
將生成從start
到end
的一系列均勻間隔的時間增量,包括start
和end
,結果為TimedeltaIndex
中的periods
個元素:
In [102]: pd.timedelta_range("0 days", "4 days", periods=5)
Out[102]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
In [103]: pd.timedelta_range("0 days", "4 days", periods=10)
Out[103]:
TimedeltaIndex(['0 days 00:00:00', '0 days 10:40:00', '0 days 21:20:00',
'1 days 08:00:00', '1 days 18:40:00', '2 days 05:20:00',
'2 days 16:00:00', '3 days 02:40:00', '3 days 13:20:00',
'4 days 00:00:00'],
dtype='timedelta64[ns]', freq=None)
使用TimedeltaIndex
與其他類似日期時間索引,如DatetimeIndex
和PeriodIndex
,一樣,您可以將TimedeltaIndex
用作 pandas 物件的索引。
In [104]: s = pd.Series(
.....: np.arange(100),
.....: index=pd.timedelta_range("1 days", periods=100, freq="h"),
.....: )
.....:
In [105]: s
Out[105]:
1 days 00:00:00 0
1 days 01:00:00 1
1 days 02:00:00 2
1 days 03:00:00 3
1 days 04:00:00 4
..
4 days 23:00:00 95
5 days 00:00:00 96
5 days 01:00:00 97
5 days 02:00:00 98
5 days 03:00:00 99
Freq: h, Length: 100, dtype: int64
選擇工作方式類似,對於類似字串和切片的強制轉換:
In [106]: s["1 day":"2 day"]
Out[106]:
1 days 00:00:00 0
1 days 01:00:00 1
1 days 02:00:00 2
1 days 03:00:00 3
1 days 04:00:00 4
..
2 days 19:00:00 43
2 days 20:00:00 44
2 days 21:00:00 45
2 days 22:00:00 46
2 days 23:00:00 47
Freq: h, Length: 48, dtype: int64
In [107]: s["1 day 01:00:00"]
Out[107]: 1
In [108]: s[pd.Timedelta("1 day 1h")]
Out[108]: 1
此外,您可以使用部分字串選擇,範圍將被推斷:
In [109]: s["1 day":"1 day 5 hours"]
Out[109]:
1 days 00:00:00 0
1 days 01:00:00 1
1 days 02:00:00 2
1 days 03:00:00 3
1 days 04:00:00 4
1 days 05:00:00 5
Freq: h, dtype: int64
操作
最後,TimedeltaIndex
與DatetimeIndex
的組合允許進行某些保留 NaT 的組合操作:
In [110]: tdi = pd.TimedeltaIndex(["1 days", pd.NaT, "2 days"])
In [111]: tdi.to_list()
Out[111]: [Timedelta('1 days 00:00:00'), NaT, Timedelta('2 days 00:00:00')]
In [112]: dti = pd.date_range("20130101", periods=3)
In [113]: dti.to_list()
Out[113]:
[Timestamp('2013-01-01 00:00:00'),
Timestamp('2013-01-02 00:00:00'),
Timestamp('2013-01-03 00:00:00')]
In [114]: (dti + tdi).to_list()
Out[114]: [Timestamp('2013-01-02 00:00:00'), NaT, Timestamp('2013-01-05 00:00:00')]
In [115]: (dti - tdi).to_list()
Out[115]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2013-01-01 00:00:00')]
轉換
與上面Series
上的頻率轉換類似,您可以將這些索引轉換為另一個索引。
In [116]: tdi / np.timedelta64(1, "s")
Out[116]: Index([86400.0, nan, 172800.0], dtype='float64')
In [117]: tdi.astype("timedelta64[s]")
Out[117]: TimedeltaIndex(['1 days', NaT, '2 days'], dtype='timedelta64[s]', freq=None)
標量型別操作也有效。這些可能返回不同型別的索引。
# adding or timedelta and date -> datelike
In [118]: tdi + pd.Timestamp("20130101")
Out[118]: DatetimeIndex(['2013-01-02', 'NaT', '2013-01-03'], dtype='datetime64[ns]', freq=None)
# subtraction of a date and a timedelta -> datelike
# note that trying to subtract a date from a Timedelta will raise an exception
In [119]: (pd.Timestamp("20130101") - tdi).to_list()
Out[119]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2012-12-30 00:00:00')]
# timedelta + timedelta -> timedelta
In [120]: tdi + pd.Timedelta("10 days")
Out[120]: TimedeltaIndex(['11 days', NaT, '12 days'], dtype='timedelta64[ns]', freq=None)
# division can result in a Timedelta if the divisor is an integer
In [121]: tdi / 2
Out[121]: TimedeltaIndex(['0 days 12:00:00', NaT, '1 days 00:00:00'], dtype='timedelta64[ns]', freq=None)
# or a float64 Index if the divisor is a Timedelta
In [122]: tdi / tdi[0]
Out[122]: Index([1.0, nan, 2.0], dtype='float64')
生成時間增量範圍
類似於date_range()
,您可以使用timedelta_range()
構建TimedeltaIndex
的常規範圍。timedelta_range
的預設頻率是日曆日:
In [97]: pd.timedelta_range(start="1 days", periods=5)
Out[97]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D')
可以使用timedelta_range
的各種start
、end
和periods
組合:
In [98]: pd.timedelta_range(start="1 days", end="5 days")
Out[98]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D')
In [99]: pd.timedelta_range(end="10 days", periods=4)
Out[99]: TimedeltaIndex(['7 days', '8 days', '9 days', '10 days'], dtype='timedelta64[ns]', freq='D')
freq
引數可以傳遞各種頻率別名:
In [100]: pd.timedelta_range(start="1 days", end="2 days", freq="30min")
Out[100]:
TimedeltaIndex(['1 days 00:00:00', '1 days 00:30:00', '1 days 01:00:00',
'1 days 01:30:00', '1 days 02:00:00', '1 days 02:30:00',
'1 days 03:00:00', '1 days 03:30:00', '1 days 04:00:00',
'1 days 04:30:00', '1 days 05:00:00', '1 days 05:30:00',
'1 days 06:00:00', '1 days 06:30:00', '1 days 07:00:00',
'1 days 07:30:00', '1 days 08:00:00', '1 days 08:30:00',
'1 days 09:00:00', '1 days 09:30:00', '1 days 10:00:00',
'1 days 10:30:00', '1 days 11:00:00', '1 days 11:30:00',
'1 days 12:00:00', '1 days 12:30:00', '1 days 13:00:00',
'1 days 13:30:00', '1 days 14:00:00', '1 days 14:30:00',
'1 days 15:00:00', '1 days 15:30:00', '1 days 16:00:00',
'1 days 16:30:00', '1 days 17:00:00', '1 days 17:30:00',
'1 days 18:00:00', '1 days 18:30:00', '1 days 19:00:00',
'1 days 19:30:00', '1 days 20:00:00', '1 days 20:30:00',
'1 days 21:00:00', '1 days 21:30:00', '1 days 22:00:00',
'1 days 22:30:00', '1 days 23:00:00', '1 days 23:30:00',
'2 days 00:00:00'],
dtype='timedelta64[ns]', freq='30min')
In [101]: pd.timedelta_range(start="1 days", periods=5, freq="2D5h")
Out[101]:
TimedeltaIndex(['1 days 00:00:00', '3 days 05:00:00', '5 days 10:00:00',
'7 days 15:00:00', '9 days 20:00:00'],
dtype='timedelta64[ns]', freq='53h')
指定start
、end
和periods
將生成從start
到end
的一系列均勻間隔的時間增量,包括start
和end
,結果為TimedeltaIndex
中的periods
個元素:
In [102]: pd.timedelta_range("0 days", "4 days", periods=5)
Out[102]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
In [103]: pd.timedelta_range("0 days", "4 days", periods=10)
Out[103]:
TimedeltaIndex(['0 days 00:00:00', '0 days 10:40:00', '0 days 21:20:00',
'1 days 08:00:00', '1 days 18:40:00', '2 days 05:20:00',
'2 days 16:00:00', '3 days 02:40:00', '3 days 13:20:00',
'4 days 00:00:00'],
dtype='timedelta64[ns]', freq=None)
使用TimedeltaIndex
與其他類似日期時間索引,如DatetimeIndex
和PeriodIndex
,一樣,您可以將TimedeltaIndex
用作 pandas 物件的索引。
In [104]: s = pd.Series(
.....: np.arange(100),
.....: index=pd.timedelta_range("1 days", periods=100, freq="h"),
.....: )
.....:
In [105]: s
Out[105]:
1 days 00:00:00 0
1 days 01:00:00 1
1 days 02:00:00 2
1 days 03:00:00 3
1 days 04:00:00 4
..
4 days 23:00:00 95
5 days 00:00:00 96
5 days 01:00:00 97
5 days 02:00:00 98
5 days 03:00:00 99
Freq: h, Length: 100, dtype: int64
選擇操作類似,對於類似字串和切片的情況會進行強制轉換:
In [106]: s["1 day":"2 day"]
Out[106]:
1 days 00:00:00 0
1 days 01:00:00 1
1 days 02:00:00 2
1 days 03:00:00 3
1 days 04:00:00 4
..
2 days 19:00:00 43
2 days 20:00:00 44
2 days 21:00:00 45
2 days 22:00:00 46
2 days 23:00:00 47
Freq: h, Length: 48, dtype: int64
In [107]: s["1 day 01:00:00"]
Out[107]: 1
In [108]: s[pd.Timedelta("1 day 1h")]
Out[108]: 1
此外,您可以使用部分字串選擇,範圍將被推斷:
In [109]: s["1 day":"1 day 5 hours"]
Out[109]:
1 days 00:00:00 0
1 days 01:00:00 1
1 days 02:00:00 2
1 days 03:00:00 3
1 days 04:00:00 4
1 days 05:00:00 5
Freq: h, dtype: int64
操作
最後,TimedeltaIndex
與 DatetimeIndex
的組合允許進行某些保留 NaT 的組合操作:
In [110]: tdi = pd.TimedeltaIndex(["1 days", pd.NaT, "2 days"])
In [111]: tdi.to_list()
Out[111]: [Timedelta('1 days 00:00:00'), NaT, Timedelta('2 days 00:00:00')]
In [112]: dti = pd.date_range("20130101", periods=3)
In [113]: dti.to_list()
Out[113]:
[Timestamp('2013-01-01 00:00:00'),
Timestamp('2013-01-02 00:00:00'),
Timestamp('2013-01-03 00:00:00')]
In [114]: (dti + tdi).to_list()
Out[114]: [Timestamp('2013-01-02 00:00:00'), NaT, Timestamp('2013-01-05 00:00:00')]
In [115]: (dti - tdi).to_list()
Out[115]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2013-01-01 00:00:00')]
轉換
類似於上面 Series
上的頻率轉換,您可以將這些索引轉換為另一個索引。
In [116]: tdi / np.timedelta64(1, "s")
Out[116]: Index([86400.0, nan, 172800.0], dtype='float64')
In [117]: tdi.astype("timedelta64[s]")
Out[117]: TimedeltaIndex(['1 days', NaT, '2 days'], dtype='timedelta64[s]', freq=None)
標量型別的操作也可以正常工作。這些操作可能返回一個不同型別的索引。
# adding or timedelta and date -> datelike
In [118]: tdi + pd.Timestamp("20130101")
Out[118]: DatetimeIndex(['2013-01-02', 'NaT', '2013-01-03'], dtype='datetime64[ns]', freq=None)
# subtraction of a date and a timedelta -> datelike
# note that trying to subtract a date from a Timedelta will raise an exception
In [119]: (pd.Timestamp("20130101") - tdi).to_list()
Out[119]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2012-12-30 00:00:00')]
# timedelta + timedelta -> timedelta
In [120]: tdi + pd.Timedelta("10 days")
Out[120]: TimedeltaIndex(['11 days', NaT, '12 days'], dtype='timedelta64[ns]', freq=None)
# division can result in a Timedelta if the divisor is an integer
In [121]: tdi / 2
Out[121]: TimedeltaIndex(['0 days 12:00:00', NaT, '1 days 00:00:00'], dtype='timedelta64[ns]', freq=None)
# or a float64 Index if the divisor is a Timedelta
In [122]: tdi / tdi[0]
Out[122]: Index([1.0, nan, 2.0], dtype='float64')
重取樣
類似於時間序列重取樣,我們可以使用 TimedeltaIndex
進行重取樣。
In [123]: s.resample("D").mean()
Out[123]:
1 days 11.5
2 days 35.5
3 days 59.5
4 days 83.5
5 days 97.5
Freq: D, dtype: float64
選項和設定
原文:
pandas.pydata.org/docs/user_guide/options.html
概覽
pandas 有一個選項 API,可以配置和自定義與 DataFrame
顯示、資料行為等全域性行為相關的行為。
選項具有完整的“點樣式”、不區分大小寫的名稱(例如 display.max_rows
)。您可以直接將選項設定/獲取為頂級 options
屬性的屬性:
In [1]: import pandas as pd
In [2]: pd.options.display.max_rows
Out[2]: 15
In [3]: pd.options.display.max_rows = 999
In [4]: pd.options.display.max_rows
Out[4]: 999
API 由 5 個相關函式組成,可直接從 pandas
名稱空間中使用:
-
get_option()
/set_option()
- 獲取/設定單個選項的值。 -
reset_option()
- 將一個或多個選項重置為它們的預設值。 -
describe_option()
- 列印一個或多個選項的描述。 -
option_context()
- 在執行後恢復到先前設定的一組選項的程式碼塊。
注意
開發者可以檢視 pandas/core/config_init.py 獲取更多資訊。
上述所有函式都接受正規表示式模式(re.search
樣式)作為引數,以匹配一個明確的子字串:
In [5]: pd.get_option("display.chop_threshold")
In [6]: pd.set_option("display.chop_threshold", 2)
In [7]: pd.get_option("display.chop_threshold")
Out[7]: 2
In [8]: pd.set_option("chop", 4)
In [9]: pd.get_option("display.chop_threshold")
Out[9]: 4
以下內容 不會生效,因為它匹配多個選項名稱,例如 display.max_colwidth
、display.max_rows
、display.max_columns
:
In [10]: pd.get_option("max")
---------------------------------------------------------------------------
OptionError Traceback (most recent call last)
Cell In[10], line 1
----> 1 pd.get_option("max")
File ~/work/pandas/pandas/pandas/_config/config.py:274, in CallableDynamicDoc.__call__(self, *args, **kwds)
273 def __call__(self, *args, **kwds) -> T:
--> 274 return self.__func__(*args, **kwds)
File ~/work/pandas/pandas/pandas/_config/config.py:146, in _get_option(pat, silent)
145 def _get_option(pat: str, silent: bool = False) -> Any:
--> 146 key = _get_single_key(pat, silent)
148 # walk the nested dict
149 root, k = _get_root(key)
File ~/work/pandas/pandas/pandas/_config/config.py:134, in _get_single_key(pat, silent)
132 raise OptionError(f"No such keys(s): {repr(pat)}")
133 if len(keys) > 1:
--> 134 raise OptionError("Pattern matched multiple keys")
135 key = keys[0]
137 if not silent:
OptionError: Pattern matched multiple keys
警告
如果將此形式的速記用法用於未來版本中新增了類似名稱的新選項,可能會導致您的程式碼中斷。
可用選項
您可以使用 describe_option()
獲取可用選項及其描述的列表。當沒有引數呼叫 describe_option()
時,將列印出所有可用選項的描述。
In [11]: pd.describe_option()
compute.use_bottleneck : bool
Use the bottleneck library to accelerate if it is installed,
the default is True
Valid values: False,True
[default: True] [currently: True]
compute.use_numba : bool
Use the numba engine option for select operations if it is installed,
the default is False
Valid values: False,True
[default: False] [currently: False]
compute.use_numexpr : bool
Use the numexpr library to accelerate computation if it is installed,
the default is True
Valid values: False,True
[default: True] [currently: True]
display.chop_threshold : float or None
if set to a float value, all float values smaller than the given threshold
will be displayed as exactly 0 by repr and friends.
[default: None] [currently: None]
display.colheader_justify : 'left'/'right'
Controls the justification of column headers. used by DataFrameFormatter.
[default: right] [currently: right]
display.date_dayfirst : boolean
When True, prints and parses dates with the day first, eg 20/01/2005
[default: False] [currently: False]
display.date_yearfirst : boolean
When True, prints and parses dates with the year first, eg 2005/01/20
[default: False] [currently: False]
display.encoding : str/unicode
Defaults to the detected encoding of the console.
Specifies the encoding to be used for strings returned by to_string,
these are generally strings meant to be displayed on the console.
[default: utf-8] [currently: utf8]
display.expand_frame_repr : boolean
Whether to print out the full DataFrame repr for wide DataFrames across
multiple lines, `max_columns` is still respected, but the output will
wrap-around across multiple "pages" if its width exceeds `display.width`.
[default: True] [currently: True]
display.float_format : callable
The callable should accept a floating point number and return
a string with the desired format of the number. This is used
in some places like SeriesFormatter.
See formats.format.EngFormatter for an example.
[default: None] [currently: None]
display.html.border : int
A ``border=value`` attribute is inserted in the ``<table>`` tag
for the DataFrame HTML repr.
[default: 1] [currently: 1]
display.html.table_schema : boolean
Whether to publish a Table Schema representation for frontends
that support it.
(default: False)
[default: False] [currently: False]
display.html.use_mathjax : boolean
When True, Jupyter notebook will process table contents using MathJax,
rendering mathematical expressions enclosed by the dollar symbol.
(default: True)
[default: True] [currently: True]
display.large_repr : 'truncate'/'info'
For DataFrames exceeding max_rows/max_cols, the repr (and HTML repr) can
show a truncated table, or switch to the view from
df.info() (the behaviour in earlier versions of pandas).
[default: truncate] [currently: truncate]
display.max_categories : int
This sets the maximum number of categories pandas should output when
printing out a `Categorical` or a Series of dtype "category".
[default: 8] [currently: 8]
display.max_columns : int
If max_cols is exceeded, switch to truncate view. Depending on
`large_repr`, objects are either centrally truncated or printed as
a summary view. 'None' value means unlimited.
In case python/IPython is running in a terminal and `large_repr`
equals 'truncate' this can be set to 0 or None and pandas will auto-detect
the width of the terminal and print a truncated object which fits
the screen width. The IPython notebook, IPython qtconsole, or IDLE
do not run in a terminal and hence it is not possible to do
correct auto-detection and defaults to 20.
[default: 0] [currently: 0]
display.max_colwidth : int or None
The maximum width in characters of a column in the repr of
a pandas data structure. When the column overflows, a "..."
placeholder is embedded in the output. A 'None' value means unlimited.
[default: 50] [currently: 50]
display.max_dir_items : int
The number of items that will be added to `dir(...)`. 'None' value means
unlimited. Because dir is cached, changing this option will not immediately
affect already existing dataframes until a column is deleted or added.
This is for instance used to suggest columns from a dataframe to tab
completion.
[default: 100] [currently: 100]
display.max_info_columns : int
max_info_columns is used in DataFrame.info method to decide if
per column information will be printed.
[default: 100] [currently: 100]
display.max_info_rows : int
df.info() will usually show null-counts for each column.
For large frames this can be quite slow. max_info_rows and max_info_cols
limit this null check only to frames with smaller dimensions than
specified.
[default: 1690785] [currently: 1690785]
display.max_rows : int
If max_rows is exceeded, switch to truncate view. Depending on
`large_repr`, objects are either centrally truncated or printed as
a summary view. 'None' value means unlimited.
In case python/IPython is running in a terminal and `large_repr`
equals 'truncate' this can be set to 0 and pandas will auto-detect
the height of the terminal and print a truncated object which fits
the screen height. The IPython notebook, IPython qtconsole, or
IDLE do not run in a terminal and hence it is not possible to do
correct auto-detection.
[default: 60] [currently: 60]
display.max_seq_items : int or None
When pretty-printing a long sequence, no more then `max_seq_items`
will be printed. If items are omitted, they will be denoted by the
addition of "..." to the resulting string.
If set to None, the number of items to be printed is unlimited.
[default: 100] [currently: 100]
display.memory_usage : bool, string or None
This specifies if the memory usage of a DataFrame should be displayed when
df.info() is called. Valid values True,False,'deep'
[default: True] [currently: True]
display.min_rows : int
The numbers of rows to show in a truncated view (when `max_rows` is
exceeded). Ignored when `max_rows` is set to None or 0\. When set to
None, follows the value of `max_rows`.
[default: 10] [currently: 10]
display.multi_sparse : boolean
"sparsify" MultiIndex display (don't display repeated
elements in outer levels within groups)
[default: True] [currently: True]
display.notebook_repr_html : boolean
When True, IPython notebook will use html representation for
pandas objects (if it is available).
[default: True] [currently: True]
display.pprint_nest_depth : int
Controls the number of nested levels to process when pretty-printing
[default: 3] [currently: 3]
display.precision : int
Floating point output precision in terms of number of places after the
decimal, for regular formatting as well as scientific notation. Similar
to ``precision`` in :meth:`numpy.set_printoptions`.
[default: 6] [currently: 6]
display.show_dimensions : boolean or 'truncate'
Whether to print out dimensions at the end of DataFrame repr.
If 'truncate' is specified, only print out the dimensions if the
frame is truncated (e.g. not display all rows and/or columns)
[default: truncate] [currently: truncate]
display.unicode.ambiguous_as_wide : boolean
Whether to use the Unicode East Asian Width to calculate the display text
width.
Enabling this may affect to the performance (default: False)
[default: False] [currently: False]
display.unicode.east_asian_width : boolean
Whether to use the Unicode East Asian Width to calculate the display text
width.
Enabling this may affect to the performance (default: False)
[default: False] [currently: False]
display.width : int
Width of the display in characters. In case python/IPython is running in
a terminal this can be set to None and pandas will correctly auto-detect
the width.
Note that the IPython notebook, IPython qtconsole, or IDLE do not run in a
terminal and hence it is not possible to correctly detect the width.
[default: 80] [currently: 80]
future.infer_string Whether to infer sequence of str objects as pyarrow string dtype, which will be the default in pandas 3.0 (at which point this option will be deprecated).
[default: False] [currently: False]
future.no_silent_downcasting Whether to opt-in to the future behavior which will *not* silently downcast results from Series and DataFrame `where`, `mask`, and `clip` methods. Silent downcasting will be removed in pandas 3.0 (at which point this option will be deprecated).
[default: False] [currently: False]
io.excel.ods.reader : string
The default Excel reader engine for 'ods' files. Available options:
auto, odf, calamine.
[default: auto] [currently: auto]
io.excel.ods.writer : string
The default Excel writer engine for 'ods' files. Available options:
auto, odf.
[default: auto] [currently: auto]
io.excel.xls.reader : string
The default Excel reader engine for 'xls' files. Available options:
auto, xlrd, calamine.
[default: auto] [currently: auto]
io.excel.xlsb.reader : string
The default Excel reader engine for 'xlsb' files. Available options:
auto, pyxlsb, calamine.
[default: auto] [currently: auto]
io.excel.xlsm.reader : string
The default Excel reader engine for 'xlsm' files. Available options:
auto, xlrd, openpyxl, calamine.
[default: auto] [currently: auto]
io.excel.xlsm.writer : string
The default Excel writer engine for 'xlsm' files. Available options:
auto, openpyxl.
[default: auto] [currently: auto]
io.excel.xlsx.reader : string
The default Excel reader engine for 'xlsx' files. Available options:
auto, xlrd, openpyxl, calamine.
[default: auto] [currently: auto]
io.excel.xlsx.writer : string
The default Excel writer engine for 'xlsx' files. Available options:
auto, openpyxl, xlsxwriter.
[default: auto] [currently: auto]
io.hdf.default_format : format
default format writing format, if None, then
put will default to 'fixed' and append will default to 'table'
[default: None] [currently: None]
io.hdf.dropna_table : boolean
drop ALL nan rows when appending to a table
[default: False] [currently: False]
io.parquet.engine : string
The default parquet reader/writer engine. Available options:
'auto', 'pyarrow', 'fastparquet', the default is 'auto'
[default: auto] [currently: auto]
io.sql.engine : string
The default sql reader/writer engine. Available options:
'auto', 'sqlalchemy', the default is 'auto'
[default: auto] [currently: auto]
mode.chained_assignment : string
Raise an exception, warn, or no action if trying to use chained assignment,
The default is warn
[default: warn] [currently: warn]
mode.copy_on_write : bool
Use new copy-view behaviour using Copy-on-Write. Defaults to False,
unless overridden by the 'PANDAS_COPY_ON_WRITE' environment variable
(if set to "1" for True, needs to be set before pandas is imported).
[default: False] [currently: False]
mode.data_manager : string
Internal data manager type; can be "block" or "array". Defaults to "block",
unless overridden by the 'PANDAS_DATA_MANAGER' environment variable (needs
to be set before pandas is imported).
[default: block] [currently: block]
(Deprecated, use `` instead.)
mode.sim_interactive : boolean
Whether to simulate interactive mode for purposes of testing
[default: False] [currently: False]
mode.string_storage : string
The default storage for StringDtype. This option is ignored if
``future.infer_string`` is set to True.
[default: python] [currently: python]
mode.use_inf_as_na : boolean
True means treat None, NaN, INF, -INF as NA (old way),
False means None and NaN are null, but INF, -INF are not NA
(new way).
This option is deprecated in pandas 2.1.0 and will be removed in 3.0.
[default: False] [currently: False]
(Deprecated, use `` instead.)
plotting.backend : str
The plotting backend to use. The default value is "matplotlib", the
backend provided with pandas. Other backends can be specified by
providing the name of the module that implements the backend.
[default: matplotlib] [currently: matplotlib]
plotting.matplotlib.register_converters : bool or 'auto'.
Whether to register converters with matplotlib's units registry for
dates, times, datetimes, and Periods. Toggling to False will remove
the converters, restoring any converters that pandas overwrote.
[default: auto] [currently: auto]
styler.format.decimal : str
The character representation for the decimal separator for floats and complex.
[default: .] [currently: .]
styler.format.escape : str, optional
Whether to escape certain characters according to the given context; html or latex.
[default: None] [currently: None]
styler.format.formatter : str, callable, dict, optional
A formatter object to be used as default within ``Styler.format``.
[default: None] [currently: None]
styler.format.na_rep : str, optional
The string representation for values identified as missing.
[default: None] [currently: None]
styler.format.precision : int
The precision for floats and complex numbers.
[default: 6] [currently: 6]
styler.format.thousands : str, optional
The character representation for thousands separator for floats, int and complex.
[default: None] [currently: None]
styler.html.mathjax : bool
If False will render special CSS classes to table attributes that indicate Mathjax
will not be used in Jupyter Notebook.
[default: True] [currently: True]
styler.latex.environment : str
The environment to replace ``\begin{table}``. If "longtable" is used results
in a specific longtable environment format.
[default: None] [currently: None]
styler.latex.hrules : bool
Whether to add horizontal rules on top and bottom and below the headers.
[default: False] [currently: False]
styler.latex.multicol_align : {"r", "c", "l", "naive-l", "naive-r"}
The specifier for horizontal alignment of sparsified LaTeX multicolumns. Pipe
decorators can also be added to non-naive values to draw vertical
rules, e.g. "\|r" will draw a rule on the left side of right aligned merged cells.
[default: r] [currently: r]
styler.latex.multirow_align : {"c", "t", "b"}
The specifier for vertical alignment of sparsified LaTeX multirows.
[default: c] [currently: c]
styler.render.encoding : str
The encoding used for output HTML and LaTeX files.
[default: utf-8] [currently: utf-8]
styler.render.max_columns : int, optional
The maximum number of columns that will be rendered. May still be reduced to
satisfy ``max_elements``, which takes precedence.
[default: None] [currently: None]
styler.render.max_elements : int
The maximum number of data-cell (<td>) elements that will be rendered before
trimming will occur over columns, rows or both if needed.
[default: 262144] [currently: 262144]
styler.render.max_rows : int, optional
The maximum number of rows that will be rendered. May still be reduced to
satisfy ``max_elements``, which takes precedence.
[default: None] [currently: None]
styler.render.repr : str
Determine which output to use in Jupyter Notebook in {"html", "latex"}.
[default: html] [currently: html]
styler.sparse.columns : bool
Whether to sparsify the display of hierarchical columns. Setting to False will
display each explicit level element in a hierarchical key for each column.
[default: True] [currently: True]
styler.sparse.index : bool
Whether to sparsify the display of a hierarchical index. Setting to False will
display each explicit level element in a hierarchical key for each row.
[default: True] [currently: True]
獲取和設定選項
如上所述,get_option()
和 set_option()
可從 pandas 名稱空間中呼叫。要更改選項,請呼叫 set_option('選項正規表示式', 新值)
。
In [12]: pd.get_option("mode.sim_interactive")
Out[12]: False
In [13]: pd.set_option("mode.sim_interactive", True)
In [14]: pd.get_option("mode.sim_interactive")
Out[14]: True
注意
選項 'mode.sim_interactive'
主要用於除錯目的。
您可以使用 reset_option()
將設定恢復為預設值
In [15]: pd.get_option("display.max_rows")
Out[15]: 60
In [16]: pd.set_option("display.max_rows", 999)
In [17]: pd.get_option("display.max_rows")
Out[17]: 999
In [18]: pd.reset_option("display.max_rows")
In [19]: pd.get_option("display.max_rows")
Out[19]: 60
還可以一次性重置多個選項(使用正規表示式):
In [20]: pd.reset_option("^display")
option_context()
上下文管理器已透過頂級 API 暴露,允許您使用給定的選項值執行程式碼。退出with
塊時,選項值會自動恢復:
In [21]: with pd.option_context("display.max_rows", 10, "display.max_columns", 5):
....: print(pd.get_option("display.max_rows"))
....: print(pd.get_option("display.max_columns"))
....:
10
5
In [22]: print(pd.get_option("display.max_rows"))
60
In [23]: print(pd.get_option("display.max_columns"))
0
在 Python/IPython 環境中設定啟動選項
使用 Python/IPython 環境的啟動指令碼匯入 pandas 並設定選項可以使與 pandas 的工作更高效。為此,在所需配置檔案的啟動目錄中建立一個.py
或.ipy
指令碼。在預設 IPython 配置檔案中找到啟動資料夾的示例:
$IPYTHONDIR/profile_default/startup
更多資訊可以在IPython 文件中找到。下面顯示了 pandas 的示例啟動指令碼:
import pandas as pd
pd.set_option("display.max_rows", 999)
pd.set_option("display.precision", 5)
經常使用的選項
以下演示了更常用的顯示選項。
display.max_rows
和 display.max_columns
設定在列印漂亮的幀時顯示的最大行數和列數。被截斷的行用省略號替換。
In [24]: df = pd.DataFrame(np.random.randn(7, 2))
In [25]: pd.set_option("display.max_rows", 7)
In [26]: df
Out[26]:
0 1
0 0.469112 -0.282863
1 -1.509059 -1.135632
2 1.212112 -0.173215
3 0.119209 -1.044236
4 -0.861849 -2.104569
5 -0.494929 1.071804
6 0.721555 -0.706771
In [27]: pd.set_option("display.max_rows", 5)
In [28]: df
Out[28]:
0 1
0 0.469112 -0.282863
1 -1.509059 -1.135632
.. ... ...
5 -0.494929 1.071804
6 0.721555 -0.706771
[7 rows x 2 columns]
In [29]: pd.reset_option("display.max_rows")
一旦超過display.max_rows
,display.min_rows
選項確定在截斷的 repr 中顯示多少行。
In [30]: pd.set_option("display.max_rows", 8)
In [31]: pd.set_option("display.min_rows", 4)
# below max_rows -> all rows shown
In [32]: df = pd.DataFrame(np.random.randn(7, 2))
In [33]: df
Out[33]:
0 1
0 -1.039575 0.271860
1 -0.424972 0.567020
2 0.276232 -1.087401
3 -0.673690 0.113648
4 -1.478427 0.524988
5 0.404705 0.577046
6 -1.715002 -1.039268
# above max_rows -> only min_rows (4) rows shown
In [34]: df = pd.DataFrame(np.random.randn(9, 2))
In [35]: df
Out[35]:
0 1
0 -0.370647 -1.157892
1 -1.344312 0.844885
.. ... ...
7 0.276662 -0.472035
8 -0.013960 -0.362543
[9 rows x 2 columns]
In [36]: pd.reset_option("display.max_rows")
In [37]: pd.reset_option("display.min_rows")
display.expand_frame_repr
允許DataFrame
的表示跨越頁面,覆蓋所有列。
In [38]: df = pd.DataFrame(np.random.randn(5, 10))
In [39]: pd.set_option("expand_frame_repr", True)
In [40]: df
Out[40]:
0 1 2 ... 7 8 9
0 -0.006154 -0.923061 0.895717 ... 1.340309 -1.170299 -0.226169
1 0.410835 0.813850 0.132003 ... -1.436737 -1.413681 1.607920
2 1.024180 0.569605 0.875906 ... -0.078638 0.545952 -1.219217
3 -1.226825 0.769804 -1.281247 ... 0.341734 0.959726 -1.110336
4 -0.619976 0.149748 -0.732339 ... 0.301624 -2.179861 -1.369849
[5 rows x 10 columns]
In [41]: pd.set_option("expand_frame_repr", False)
In [42]: df
Out[42]:
0 1 2 3 4 5 6 7 8 9
0 -0.006154 -0.923061 0.895717 0.805244 -1.206412 2.565646 1.431256 1.340309 -1.170299 -0.226169
1 0.410835 0.813850 0.132003 -0.827317 -0.076467 -1.187678 1.130127 -1.436737 -1.413681 1.607920
2 1.024180 0.569605 0.875906 -2.211372 0.974466 -2.006747 -0.410001 -0.078638 0.545952 -1.219217
3 -1.226825 0.769804 -1.281247 -0.727707 -0.121306 -0.097883 0.695775 0.341734 0.959726 -1.110336
4 -0.619976 0.149748 -0.732339 0.687738 0.176444 0.403310 -0.154951 0.301624 -2.179861 -1.369849
In [43]: pd.reset_option("expand_frame_repr")
display.large_repr
顯示超過max_columns
或max_rows
的DataFrame
為截斷幀或摘要。
In [44]: df = pd.DataFrame(np.random.randn(10, 10))
In [45]: pd.set_option("display.max_rows", 5)
In [46]: pd.set_option("large_repr", "truncate")
In [47]: df
Out[47]:
0 1 2 ... 7 8 9
0 -0.954208 1.462696 -1.743161 ... 0.995761 2.396780 0.014871
1 3.357427 -0.317441 -1.236269 ... 0.380396 0.084844 0.432390
.. ... ... ... ... ... ... ...
8 -0.303421 -0.858447 0.306996 ... 0.476720 0.473424 -0.242861
9 -0.014805 -0.284319 0.650776 ... 1.613616 0.464000 0.227371
[10 rows x 10 columns]
In [48]: pd.set_option("large_repr", "info")
In [49]: df
Out[49]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 10 non-null float64
1 1 10 non-null float64
2 2 10 non-null float64
3 3 10 non-null float64
4 4 10 non-null float64
5 5 10 non-null float64
6 6 10 non-null float64
7 7 10 non-null float64
8 8 10 non-null float64
9 9 10 non-null float64
dtypes: float64(10)
memory usage: 928.0 bytes
In [50]: pd.reset_option("large_repr")
In [51]: pd.reset_option("display.max_rows")
display.max_colwidth
設定列的最大寬度。長度超過此長度的單元格將被截斷為省略號。
In [52]: df = pd.DataFrame(
....: np.array(
....: [
....: ["foo", "bar", "bim", "uncomfortably long string"],
....: ["horse", "cow", "banana", "apple"],
....: ]
....: )
....: )
....:
In [53]: pd.set_option("max_colwidth", 40)
In [54]: df
Out[54]:
0 1 2 3
0 foo bar bim uncomfortably long string
1 horse cow banana apple
In [55]: pd.set_option("max_colwidth", 6)
In [56]: df
Out[56]:
0 1 2 3
0 foo bar bim un...
1 horse cow ba... apple
In [57]: pd.reset_option("max_colwidth")
display.max_info_columns
設定呼叫info()
時顯示的列數閾值。
In [58]: df = pd.DataFrame(np.random.randn(10, 10))
In [59]: pd.set_option("max_info_columns", 11)
In [60]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 10 non-null float64
1 1 10 non-null float64
2 2 10 non-null float64
3 3 10 non-null float64
4 4 10 non-null float64
5 5 10 non-null float64
6 6 10 non-null float64
7 7 10 non-null float64
8 8 10 non-null float64
9 9 10 non-null float64
dtypes: float64(10)
memory usage: 928.0 bytes
In [61]: pd.set_option("max_info_columns", 5)
In [62]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Columns: 10 entries, 0 to 9
dtypes: float64(10)
memory usage: 928.0 bytes
In [63]: pd.reset_option("max_info_columns")
display.max_info_rows
:info()
通常會顯示每列的空值計數。對於大型DataFrame
,這可能會相當慢。max_info_rows
和 max_info_cols
分別限制了此空值檢查的行數和列數。info()
的關鍵字引數show_counts=True
將覆蓋此設定。
In [64]: df = pd.DataFrame(np.random.choice([0, 1, np.nan], size=(10, 10)))
In [65]: df
Out[65]:
0 1 2 3 4 5 6 7 8 9
0 0.0 NaN 1.0 NaN NaN 0.0 NaN 0.0 NaN 1.0
1 1.0 NaN 1.0 1.0 1.0 1.0 NaN 0.0 0.0 NaN
2 0.0 NaN 1.0 0.0 0.0 NaN NaN NaN NaN 0.0
3 NaN NaN NaN 0.0 1.0 1.0 NaN 1.0 NaN 1.0
4 0.0 NaN NaN NaN 0.0 NaN NaN NaN 1.0 0.0
5 0.0 1.0 1.0 1.0 1.0 0.0 NaN NaN 1.0 0.0
6 1.0 1.0 1.0 NaN 1.0 NaN 1.0 0.0 NaN NaN
7 0.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 0.0 NaN
8 NaN NaN NaN 0.0 NaN NaN NaN NaN 1.0 NaN
9 0.0 NaN 0.0 NaN NaN 0.0 NaN 1.0 1.0 0.0
In [66]: pd.set_option("max_info_rows", 11)
In [67]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 8 non-null float64
1 1 3 non-null float64
2 2 7 non-null float64
3 3 6 non-null float64
4 4 7 non-null float64
5 5 6 non-null float64
6 6 2 non-null float64
7 7 6 non-null float64
8 8 6 non-null float64
9 9 6 non-null float64
dtypes: float64(10)
memory usage: 928.0 bytes
In [68]: pd.set_option("max_info_rows", 5)
In [69]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
# Column Dtype
--- ------ -----
0 0 float64
1 1 float64
2 2 float64
3 3 float64
4 4 float64
5 5 float64
6 6 float64
7 7 float64
8 8 float64
9 9 float64
dtypes: float64(10)
memory usage: 928.0 bytes
In [70]: pd.reset_option("max_info_rows")
display.precision
設定輸出顯示精度,以小數位數表示。
In [71]: df = pd.DataFrame(np.random.randn(5, 5))
In [72]: pd.set_option("display.precision", 7)
In [73]: df
Out[73]:
0 1 2 3 4
0 -1.1506406 -0.7983341 -0.5576966 0.3813531 1.3371217
1 -1.5310949 1.3314582 -0.5713290 -0.0266708 -1.0856630
2 -1.1147378 -0.0582158 -0.4867681 1.6851483 0.1125723
3 -1.4953086 0.8984347 -0.1482168 -1.5960698 0.1596530
4 0.2621358 0.0362196 0.1847350 -0.2550694 -0.2710197
In [74]: pd.set_option("display.precision", 4)
In [75]: df
Out[75]:
0 1 2 3 4
0 -1.1506 -0.7983 -0.5577 0.3814 1.3371
1 -1.5311 1.3315 -0.5713 -0.0267 -1.0857
2 -1.1147 -0.0582 -0.4868 1.6851 0.1126
3 -1.4953 0.8984 -0.1482 -1.5961 0.1597
4 0.2621 0.0362 0.1847 -0.2551 -0.2710
display.chop_threshold
設定顯示Series
或DataFrame
時的舍入閾值為零。此設定不會改變儲存數字的精度。
In [76]: df = pd.DataFrame(np.random.randn(6, 6))
In [77]: pd.set_option("chop_threshold", 0)
In [78]: df
Out[78]:
0 1 2 3 4 5
0 1.2884 0.2946 -1.1658 0.8470 -0.6856 0.6091
1 -0.3040 0.6256 -0.0593 0.2497 1.1039 -1.0875
2 1.9980 -0.2445 0.1362 0.8863 -1.3507 -0.8863
3 -1.0133 1.9209 -0.3882 -2.3144 0.6655 0.4026
4 0.3996 -1.7660 0.8504 0.3881 0.9923 0.7441
5 -0.7398 -1.0549 -0.1796 0.6396 1.5850 1.9067
In [79]: pd.set_option("chop_threshold", 0.5)
In [80]: df
Out[80]:
0 1 2 3 4 5
0 1.2884 0.0000 -1.1658 0.8470 -0.6856 0.6091
1 0.0000 0.6256 0.0000 0.0000 1.1039 -1.0875
2 1.9980 0.0000 0.0000 0.8863 -1.3507 -0.8863
3 -1.0133 1.9209 0.0000 -2.3144 0.6655 0.0000
4 0.0000 -1.7660 0.8504 0.0000 0.9923 0.7441
5 -0.7398 -1.0549 0.0000 0.6396 1.5850 1.9067
In [81]: pd.reset_option("chop_threshold")
display.colheader_justify
控制標題的對齊方式。選項為'right'
和'left'
。
In [82]: df = pd.DataFrame(
....: np.array([np.random.randn(6), np.random.randint(1, 9, 6) * 0.1, np.zeros(6)]).T,
....: columns=["A", "B", "C"],
....: dtype="float",
....: )
....:
In [83]: pd.set_option("colheader_justify", "right")
In [84]: df
Out[84]:
A B C
0 0.1040 0.1 0.0
1 0.1741 0.5 0.0
2 -0.4395 0.4 0.0
3 -0.7413 0.8 0.0
4 -0.0797 0.4 0.0
5 -0.9229 0.3 0.0
In [85]: pd.set_option("colheader_justify", "left")
In [86]: df
Out[86]:
A B C
0 0.1040 0.1 0.0
1 0.1741 0.5 0.0
2 -0.4395 0.4 0.0
3 -0.7413 0.8 0.0
4 -0.0797 0.4 0.0
5 -0.9229 0.3 0.0
In [87]: pd.reset_option("colheader_justify")
``` ## 數字格式化
pandas 還允許您設定控制數字在控制檯中的顯示方式。此選項不透過`set_options` API 設定。
使用`set_eng_float_format`函式來更改 pandas 物件的浮點格式,以產生特定的格式。
```py
In [88]: import numpy as np
In [89]: pd.set_eng_float_format(accuracy=3, use_eng_prefix=True)
In [90]: s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])
In [91]: s / 1.0e3
Out[91]:
a 303.638u
b -721.084u
c -622.696u
d 648.250u
e -1.945m
dtype: float64
In [92]: s / 1.0e6
Out[92]:
a 303.638n
b -721.084n
c -622.696n
d 648.250n
e -1.945u
dtype: float64
使用round()
來專門控制單個DataFrame
的四捨五入 ## Unicode 格式化
警告
啟用此選項會影響 DataFrame 和 Series 的列印效能(大約慢 2 倍)。僅在實際需要時使用。
一些東亞國家使用 Unicode 字元,其寬度相當於兩個拉丁字元。如果 DataFrame 或 Series 包含這些字元,則預設輸出模式可能無法正確對齊它們。
In [93]: df = pd.DataFrame({"國籍": ["UK", "日本"], "名前": ["Alice", "しのぶ"]})
In [94]: df
Out[94]:
國籍 名前
0 UK Alice
1 日本 しのぶ
啟用display.unicode.east_asian_width
允許 pandas 檢查每個字元的“東亞寬度”屬性。透過將此選項設定為True
,可以正確對齊這些字元。但是,這將導致比標準len
函式更長的渲染時間。
In [95]: pd.set_option("display.unicode.east_asian_width", True)
In [96]: df
Out[96]:
國籍 名前
0 UK Alice
1 日本 しのぶ
此外,Unicode 字元的寬度“模稜兩可”,取決於終端設定或編碼,可以是 1 或 2 個字元寬。選項display.unicode.ambiguous_as_wide
可用於處理模糊性。
預設情況下,“模稜兩可”的字元寬度,例如下面的“¡”(倒置感嘆號),被認為是 1。
In [97]: df = pd.DataFrame({"a": ["xxx", "¡¡"], "b": ["yyy", "¡¡"]})
In [98]: df
Out[98]:
a b
0 xxx yyy
1 ¡¡ ¡¡
啟用display.unicode.ambiguous_as_wide
使 pandas 將這些字元的寬度解釋為 2。(請注意,僅當啟用display.unicode.east_asian_width
時,此選項才會生效。)
但是,如果錯誤地為您的終端設定此選項,這些字元將被錯誤地對齊:
In [99]: pd.set_option("display.unicode.ambiguous_as_wide", True)
In [100]: df
Out[100]:
a b
0 xxx yyy
1 ¡¡ ¡¡
``` ## 表模式顯示
預設情況下,`DataFrame`和`Series`將以表模式表示。可以使用`display.html.table_schema`選項在全域性範圍內啟用此功能:
```py
In [101]: pd.set_option("display.html.table_schema", True)
只有'display.max_rows'
被序列化和釋出。
概述
pandas 具有選項 API,可配置和自定義與DataFrame
顯示、資料行為等相關的全域性行為。
選項具有完整的“點格式”,不區分大小寫的名稱(例如display.max_rows
)。您可以直接作為頂級options
屬性的屬性獲取/設定選項:
In [1]: import pandas as pd
In [2]: pd.options.display.max_rows
Out[2]: 15
In [3]: pd.options.display.max_rows = 999
In [4]: pd.options.display.max_rows
Out[4]: 999
該 API 由 5 個相關函式組成,可直接從pandas
名稱空間中獲取:
-
get_option()
/set_option()
- 獲取/設定單個選項的值。 -
reset_option()
- 將一個或多個選項重置為其預設值。 -
describe_option()
- 列印一個或多個選項的描述。 -
option_context()
- 使用一組選項執行程式碼塊,在執行後恢復到先前的設定。
注意
開發人員可以檢視 pandas/core/config_init.py 獲取更多資訊。
上述所有函式都接受正規表示式模式(re.search
樣式)作為引數,以匹配一個明確的子字串:
In [5]: pd.get_option("display.chop_threshold")
In [6]: pd.set_option("display.chop_threshold", 2)
In [7]: pd.get_option("display.chop_threshold")
Out[7]: 2
In [8]: pd.set_option("chop", 4)
In [9]: pd.get_option("display.chop_threshold")
Out[9]: 4
以下內容無效,因為它匹配多個選項名稱,例如display.max_colwidth
、display.max_rows
、display.max_columns
:
In [10]: pd.get_option("max")
---------------------------------------------------------------------------
OptionError Traceback (most recent call last)
Cell In[10], line 1
----> 1 pd.get_option("max")
File ~/work/pandas/pandas/pandas/_config/config.py:274, in CallableDynamicDoc.__call__(self, *args, **kwds)
273 def __call__(self, *args, **kwds) -> T:
--> 274 return self.__func__(*args, **kwds)
File ~/work/pandas/pandas/pandas/_config/config.py:146, in _get_option(pat, silent)
145 def _get_option(pat: str, silent: bool = False) -> Any:
--> 146 key = _get_single_key(pat, silent)
148 # walk the nested dict
149 root, k = _get_root(key)
File ~/work/pandas/pandas/pandas/_config/config.py:134, in _get_single_key(pat, silent)
132 raise OptionError(f"No such keys(s): {repr(pat)}")
133 if len(keys) > 1:
--> 134 raise OptionError("Pattern matched multiple keys")
135 key = keys[0]
137 if not silent:
OptionError: Pattern matched multiple keys
警告
使用這種簡寫形式可能會導致您的程式碼在將來版本中新增類似名稱的新選項時出現問題。
可用選項
您可以使用describe_option()
獲取可用選項及其描述。當不帶引數呼叫describe_option()
時,將列印出所有可用選項的描述。
In [11]: pd.describe_option()
compute.use_bottleneck : bool
Use the bottleneck library to accelerate if it is installed,
the default is True
Valid values: False,True
[default: True] [currently: True]
compute.use_numba : bool
Use the numba engine option for select operations if it is installed,
the default is False
Valid values: False,True
[default: False] [currently: False]
compute.use_numexpr : bool
Use the numexpr library to accelerate computation if it is installed,
the default is True
Valid values: False,True
[default: True] [currently: True]
display.chop_threshold : float or None
if set to a float value, all float values smaller than the given threshold
will be displayed as exactly 0 by repr and friends.
[default: None] [currently: None]
display.colheader_justify : 'left'/'right'
Controls the justification of column headers. used by DataFrameFormatter.
[default: right] [currently: right]
display.date_dayfirst : boolean
When True, prints and parses dates with the day first, eg 20/01/2005
[default: False] [currently: False]
display.date_yearfirst : boolean
When True, prints and parses dates with the year first, eg 2005/01/20
[default: False] [currently: False]
display.encoding : str/unicode
Defaults to the detected encoding of the console.
Specifies the encoding to be used for strings returned by to_string,
these are generally strings meant to be displayed on the console.
[default: utf-8] [currently: utf8]
display.expand_frame_repr : boolean
Whether to print out the full DataFrame repr for wide DataFrames across
multiple lines, `max_columns` is still respected, but the output will
wrap-around across multiple "pages" if its width exceeds `display.width`.
[default: True] [currently: True]
display.float_format : callable
The callable should accept a floating point number and return
a string with the desired format of the number. This is used
in some places like SeriesFormatter.
See formats.format.EngFormatter for an example.
[default: None] [currently: None]
display.html.border : int
A ``border=value`` attribute is inserted in the ``<table>`` tag
for the DataFrame HTML repr.
[default: 1] [currently: 1]
display.html.table_schema : boolean
Whether to publish a Table Schema representation for frontends
that support it.
(default: False)
[default: False] [currently: False]
display.html.use_mathjax : boolean
When True, Jupyter notebook will process table contents using MathJax,
rendering mathematical expressions enclosed by the dollar symbol.
(default: True)
[default: True] [currently: True]
display.large_repr : 'truncate'/'info'
For DataFrames exceeding max_rows/max_cols, the repr (and HTML repr) can
show a truncated table, or switch to the view from
df.info() (the behaviour in earlier versions of pandas).
[default: truncate] [currently: truncate]
display.max_categories : int
This sets the maximum number of categories pandas should output when
printing out a `Categorical` or a Series of dtype "category".
[default: 8] [currently: 8]
display.max_columns : int
If max_cols is exceeded, switch to truncate view. Depending on
`large_repr`, objects are either centrally truncated or printed as
a summary view. 'None' value means unlimited.
In case python/IPython is running in a terminal and `large_repr`
equals 'truncate' this can be set to 0 or None and pandas will auto-detect
the width of the terminal and print a truncated object which fits
the screen width. The IPython notebook, IPython qtconsole, or IDLE
do not run in a terminal and hence it is not possible to do
correct auto-detection and defaults to 20.
[default: 0] [currently: 0]
display.max_colwidth : int or None
The maximum width in characters of a column in the repr of
a pandas data structure. When the column overflows, a "..."
placeholder is embedded in the output. A 'None' value means unlimited.
[default: 50] [currently: 50]
display.max_dir_items : int
The number of items that will be added to `dir(...)`. 'None' value means
unlimited. Because dir is cached, changing this option will not immediately
affect already existing dataframes until a column is deleted or added.
This is for instance used to suggest columns from a dataframe to tab
completion.
[default: 100] [currently: 100]
display.max_info_columns : int
max_info_columns is used in DataFrame.info method to decide if
per column information will be printed.
[default: 100] [currently: 100]
display.max_info_rows : int
df.info() will usually show null-counts for each column.
For large frames this can be quite slow. max_info_rows and max_info_cols
limit this null check only to frames with smaller dimensions than
specified.
[default: 1690785] [currently: 1690785]
display.max_rows : int
If max_rows is exceeded, switch to truncate view. Depending on
`large_repr`, objects are either centrally truncated or printed as
a summary view. 'None' value means unlimited.
In case python/IPython is running in a terminal and `large_repr`
equals 'truncate' this can be set to 0 and pandas will auto-detect
the height of the terminal and print a truncated object which fits
the screen height. The IPython notebook, IPython qtconsole, or
IDLE do not run in a terminal and hence it is not possible to do
correct auto-detection.
[default: 60] [currently: 60]
display.max_seq_items : int or None
When pretty-printing a long sequence, no more then `max_seq_items`
will be printed. If items are omitted, they will be denoted by the
addition of "..." to the resulting string.
If set to None, the number of items to be printed is unlimited.
[default: 100] [currently: 100]
display.memory_usage : bool, string or None
This specifies if the memory usage of a DataFrame should be displayed when
df.info() is called. Valid values True,False,'deep'
[default: True] [currently: True]
display.min_rows : int
The numbers of rows to show in a truncated view (when `max_rows` is
exceeded). Ignored when `max_rows` is set to None or 0\. When set to
None, follows the value of `max_rows`.
[default: 10] [currently: 10]
display.multi_sparse : boolean
"sparsify" MultiIndex display (don't display repeated
elements in outer levels within groups)
[default: True] [currently: True]
display.notebook_repr_html : boolean
When True, IPython notebook will use html representation for
pandas objects (if it is available).
[default: True] [currently: True]
display.pprint_nest_depth : int
Controls the number of nested levels to process when pretty-printing
[default: 3] [currently: 3]
display.precision : int
Floating point output precision in terms of number of places after the
decimal, for regular formatting as well as scientific notation. Similar
to ``precision`` in :meth:`numpy.set_printoptions`.
[default: 6] [currently: 6]
display.show_dimensions : boolean or 'truncate'
Whether to print out dimensions at the end of DataFrame repr.
If 'truncate' is specified, only print out the dimensions if the
frame is truncated (e.g. not display all rows and/or columns)
[default: truncate] [currently: truncate]
display.unicode.ambiguous_as_wide : boolean
Whether to use the Unicode East Asian Width to calculate the display text
width.
Enabling this may affect to the performance (default: False)
[default: False] [currently: False]
display.unicode.east_asian_width : boolean
Whether to use the Unicode East Asian Width to calculate the display text
width.
Enabling this may affect to the performance (default: False)
[default: False] [currently: False]
display.width : int
Width of the display in characters. In case python/IPython is running in
a terminal this can be set to None and pandas will correctly auto-detect
the width.
Note that the IPython notebook, IPython qtconsole, or IDLE do not run in a
terminal and hence it is not possible to correctly detect the width.
[default: 80] [currently: 80]
future.infer_string Whether to infer sequence of str objects as pyarrow string dtype, which will be the default in pandas 3.0 (at which point this option will be deprecated).
[default: False] [currently: False]
future.no_silent_downcasting Whether to opt-in to the future behavior which will *not* silently downcast results from Series and DataFrame `where`, `mask`, and `clip` methods. Silent downcasting will be removed in pandas 3.0 (at which point this option will be deprecated).
[default: False] [currently: False]
io.excel.ods.reader : string
The default Excel reader engine for 'ods' files. Available options:
auto, odf, calamine.
[default: auto] [currently: auto]
io.excel.ods.writer : string
The default Excel writer engine for 'ods' files. Available options:
auto, odf.
[default: auto] [currently: auto]
io.excel.xls.reader : string
The default Excel reader engine for 'xls' files. Available options:
auto, xlrd, calamine.
[default: auto] [currently: auto]
io.excel.xlsb.reader : string
The default Excel reader engine for 'xlsb' files. Available options:
auto, pyxlsb, calamine.
[default: auto] [currently: auto]
io.excel.xlsm.reader : string
The default Excel reader engine for 'xlsm' files. Available options:
auto, xlrd, openpyxl, calamine.
[default: auto] [currently: auto]
io.excel.xlsm.writer : string
The default Excel writer engine for 'xlsm' files. Available options:
auto, openpyxl.
[default: auto] [currently: auto]
io.excel.xlsx.reader : string
The default Excel reader engine for 'xlsx' files. Available options:
auto, xlrd, openpyxl, calamine.
[default: auto] [currently: auto]
io.excel.xlsx.writer : string
The default Excel writer engine for 'xlsx' files. Available options:
auto, openpyxl, xlsxwriter.
[default: auto] [currently: auto]
io.hdf.default_format : format
default format writing format, if None, then
put will default to 'fixed' and append will default to 'table'
[default: None] [currently: None]
io.hdf.dropna_table : boolean
drop ALL nan rows when appending to a table
[default: False] [currently: False]
io.parquet.engine : string
The default parquet reader/writer engine. Available options:
'auto', 'pyarrow', 'fastparquet', the default is 'auto'
[default: auto] [currently: auto]
io.sql.engine : string
The default sql reader/writer engine. Available options:
'auto', 'sqlalchemy', the default is 'auto'
[default: auto] [currently: auto]
mode.chained_assignment : string
Raise an exception, warn, or no action if trying to use chained assignment,
The default is warn
[default: warn] [currently: warn]
mode.copy_on_write : bool
Use new copy-view behaviour using Copy-on-Write. Defaults to False,
unless overridden by the 'PANDAS_COPY_ON_WRITE' environment variable
(if set to "1" for True, needs to be set before pandas is imported).
[default: False] [currently: False]
mode.data_manager : string
Internal data manager type; can be "block" or "array". Defaults to "block",
unless overridden by the 'PANDAS_DATA_MANAGER' environment variable (needs
to be set before pandas is imported).
[default: block] [currently: block]
(Deprecated, use `` instead.)
mode.sim_interactive : boolean
Whether to simulate interactive mode for purposes of testing
[default: False] [currently: False]
mode.string_storage : string
The default storage for StringDtype. This option is ignored if
``future.infer_string`` is set to True.
[default: python] [currently: python]
mode.use_inf_as_na : boolean
True means treat None, NaN, INF, -INF as NA (old way),
False means None and NaN are null, but INF, -INF are not NA
(new way).
This option is deprecated in pandas 2.1.0 and will be removed in 3.0.
[default: False] [currently: False]
(Deprecated, use `` instead.)
plotting.backend : str
The plotting backend to use. The default value is "matplotlib", the
backend provided with pandas. Other backends can be specified by
providing the name of the module that implements the backend.
[default: matplotlib] [currently: matplotlib]
plotting.matplotlib.register_converters : bool or 'auto'.
Whether to register converters with matplotlib's units registry for
dates, times, datetimes, and Periods. Toggling to False will remove
the converters, restoring any converters that pandas overwrote.
[default: auto] [currently: auto]
styler.format.decimal : str
The character representation for the decimal separator for floats and complex.
[default: .] [currently: .]
styler.format.escape : str, optional
Whether to escape certain characters according to the given context; html or latex.
[default: None] [currently: None]
styler.format.formatter : str, callable, dict, optional
A formatter object to be used as default within ``Styler.format``.
[default: None] [currently: None]
styler.format.na_rep : str, optional
The string representation for values identified as missing.
[default: None] [currently: None]
styler.format.precision : int
The precision for floats and complex numbers.
[default: 6] [currently: 6]
styler.format.thousands : str, optional
The character representation for thousands separator for floats, int and complex.
[default: None] [currently: None]
styler.html.mathjax : bool
If False will render special CSS classes to table attributes that indicate Mathjax
will not be used in Jupyter Notebook.
[default: True] [currently: True]
styler.latex.environment : str
The environment to replace ``\begin{table}``. If "longtable" is used results
in a specific longtable environment format.
[default: None] [currently: None]
styler.latex.hrules : bool
Whether to add horizontal rules on top and bottom and below the headers.
[default: False] [currently: False]
styler.latex.multicol_align : {"r", "c", "l", "naive-l", "naive-r"}
The specifier for horizontal alignment of sparsified LaTeX multicolumns. Pipe
decorators can also be added to non-naive values to draw vertical
rules, e.g. "\|r" will draw a rule on the left side of right aligned merged cells.
[default: r] [currently: r]
styler.latex.multirow_align : {"c", "t", "b"}
The specifier for vertical alignment of sparsified LaTeX multirows.
[default: c] [currently: c]
styler.render.encoding : str
The encoding used for output HTML and LaTeX files.
[default: utf-8] [currently: utf-8]
styler.render.max_columns : int, optional
The maximum number of columns that will be rendered. May still be reduced to
satisfy ``max_elements``, which takes precedence.
[default: None] [currently: None]
styler.render.max_elements : int
The maximum number of data-cell (<td>) elements that will be rendered before
trimming will occur over columns, rows or both if needed.
[default: 262144] [currently: 262144]
styler.render.max_rows : int, optional
The maximum number of rows that will be rendered. May still be reduced to
satisfy ``max_elements``, which takes precedence.
[default: None] [currently: None]
styler.render.repr : str
Determine which output to use in Jupyter Notebook in {"html", "latex"}.
[default: html] [currently: html]
styler.sparse.columns : bool
Whether to sparsify the display of hierarchical columns. Setting to False will
display each explicit level element in a hierarchical key for each column.
[default: True] [currently: True]
styler.sparse.index : bool
Whether to sparsify the display of a hierarchical index. Setting to False will
display each explicit level element in a hierarchical key for each row.
[default: True] [currently: True]
獲取和設定選項
如上所述,get_option()
和 set_option()
可從 pandas 名稱空間中呼叫。要更改選項,請呼叫 set_option('option regex', new_value)
。
In [12]: pd.get_option("mode.sim_interactive")
Out[12]: False
In [13]: pd.set_option("mode.sim_interactive", True)
In [14]: pd.get_option("mode.sim_interactive")
Out[14]: True
注意
選項'mode.sim_interactive'
主要用於除錯目的。
您可以使用reset_option()
將設定恢復為預設值。
In [15]: pd.get_option("display.max_rows")
Out[15]: 60
In [16]: pd.set_option("display.max_rows", 999)
In [17]: pd.get_option("display.max_rows")
Out[17]: 999
In [18]: pd.reset_option("display.max_rows")
In [19]: pd.get_option("display.max_rows")
Out[19]: 60
還可以一次重置多個選項(使用正規表示式):
In [20]: pd.reset_option("^display")
option_context()
上下文管理器已透過頂層 API 暴露,允許您使用給定的選項值執行程式碼。在退出 with
塊時,選項值會自動恢復:
In [21]: with pd.option_context("display.max_rows", 10, "display.max_columns", 5):
....: print(pd.get_option("display.max_rows"))
....: print(pd.get_option("display.max_columns"))
....:
10
5
In [22]: print(pd.get_option("display.max_rows"))
60
In [23]: print(pd.get_option("display.max_columns"))
0
在 Python/IPython 環境中設定啟動選項
使用 Python/IPython 環境的啟動指令碼匯入 pandas 並設定選項可以使與 pandas 的工作更有效率。為此,請在所需配置檔案的啟動目錄中建立一個 .py
或 .ipy
指令碼。在預設 IPython 配置資料夾中的啟動資料夾的示例可以在以下位置找到:
$IPYTHONDIR/profile_default/startup
更多資訊可以在 IPython 文件 中找到。下面是 pandas 的一個示例啟動指令碼:
import pandas as pd
pd.set_option("display.max_rows", 999)
pd.set_option("display.precision", 5)
常用選項
以下是更常用的顯示選項的演示。
display.max_rows
和 display.max_columns
設定在美觀列印框架時顯示的最大行數和列數。截斷的行將被省略號替換。
In [24]: df = pd.DataFrame(np.random.randn(7, 2))
In [25]: pd.set_option("display.max_rows", 7)
In [26]: df
Out[26]:
0 1
0 0.469112 -0.282863
1 -1.509059 -1.135632
2 1.212112 -0.173215
3 0.119209 -1.044236
4 -0.861849 -2.104569
5 -0.494929 1.071804
6 0.721555 -0.706771
In [27]: pd.set_option("display.max_rows", 5)
In [28]: df
Out[28]:
0 1
0 0.469112 -0.282863
1 -1.509059 -1.135632
.. ... ...
5 -0.494929 1.071804
6 0.721555 -0.706771
[7 rows x 2 columns]
In [29]: pd.reset_option("display.max_rows")
一旦超過 display.max_rows
,display.min_rows
選項確定截斷的 repr 中顯示多少行。
In [30]: pd.set_option("display.max_rows", 8)
In [31]: pd.set_option("display.min_rows", 4)
# below max_rows -> all rows shown
In [32]: df = pd.DataFrame(np.random.randn(7, 2))
In [33]: df
Out[33]:
0 1
0 -1.039575 0.271860
1 -0.424972 0.567020
2 0.276232 -1.087401
3 -0.673690 0.113648
4 -1.478427 0.524988
5 0.404705 0.577046
6 -1.715002 -1.039268
# above max_rows -> only min_rows (4) rows shown
In [34]: df = pd.DataFrame(np.random.randn(9, 2))
In [35]: df
Out[35]:
0 1
0 -0.370647 -1.157892
1 -1.344312 0.844885
.. ... ...
7 0.276662 -0.472035
8 -0.013960 -0.362543
[9 rows x 2 columns]
In [36]: pd.reset_option("display.max_rows")
In [37]: pd.reset_option("display.min_rows")
display.expand_frame_repr
允許DataFrame
的表示跨越頁面,跨越所有列進行換行。
In [38]: df = pd.DataFrame(np.random.randn(5, 10))
In [39]: pd.set_option("expand_frame_repr", True)
In [40]: df
Out[40]:
0 1 2 ... 7 8 9
0 -0.006154 -0.923061 0.895717 ... 1.340309 -1.170299 -0.226169
1 0.410835 0.813850 0.132003 ... -1.436737 -1.413681 1.607920
2 1.024180 0.569605 0.875906 ... -0.078638 0.545952 -1.219217
3 -1.226825 0.769804 -1.281247 ... 0.341734 0.959726 -1.110336
4 -0.619976 0.149748 -0.732339 ... 0.301624 -2.179861 -1.369849
[5 rows x 10 columns]
In [41]: pd.set_option("expand_frame_repr", False)
In [42]: df
Out[42]:
0 1 2 3 4 5 6 7 8 9
0 -0.006154 -0.923061 0.895717 0.805244 -1.206412 2.565646 1.431256 1.340309 -1.170299 -0.226169
1 0.410835 0.813850 0.132003 -0.827317 -0.076467 -1.187678 1.130127 -1.436737 -1.413681 1.607920
2 1.024180 0.569605 0.875906 -2.211372 0.974466 -2.006747 -0.410001 -0.078638 0.545952 -1.219217
3 -1.226825 0.769804 -1.281247 -0.727707 -0.121306 -0.097883 0.695775 0.341734 0.959726 -1.110336
4 -0.619976 0.149748 -0.732339 0.687738 0.176444 0.403310 -0.154951 0.301624 -2.179861 -1.369849
In [43]: pd.reset_option("expand_frame_repr")
display.large_repr
顯示超過 max_columns
或 max_rows
的 DataFrame
為截斷的框架或摘要。
In [44]: df = pd.DataFrame(np.random.randn(10, 10))
In [45]: pd.set_option("display.max_rows", 5)
In [46]: pd.set_option("large_repr", "truncate")
In [47]: df
Out[47]:
0 1 2 ... 7 8 9
0 -0.954208 1.462696 -1.743161 ... 0.995761 2.396780 0.014871
1 3.357427 -0.317441 -1.236269 ... 0.380396 0.084844 0.432390
.. ... ... ... ... ... ... ...
8 -0.303421 -0.858447 0.306996 ... 0.476720 0.473424 -0.242861
9 -0.014805 -0.284319 0.650776 ... 1.613616 0.464000 0.227371
[10 rows x 10 columns]
In [48]: pd.set_option("large_repr", "info")
In [49]: df
Out[49]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 10 non-null float64
1 1 10 non-null float64
2 2 10 non-null float64
3 3 10 non-null float64
4 4 10 non-null float64
5 5 10 non-null float64
6 6 10 non-null float64
7 7 10 non-null float64
8 8 10 non-null float64
9 9 10 non-null float64
dtypes: float64(10)
memory usage: 928.0 bytes
In [50]: pd.reset_option("large_repr")
In [51]: pd.reset_option("display.max_rows")
display.max_colwidth
設定列的最大寬度。超過此長度的單元格將以省略號截斷。
In [52]: df = pd.DataFrame(
....: np.array(
....: [
....: ["foo", "bar", "bim", "uncomfortably long string"],
....: ["horse", "cow", "banana", "apple"],
....: ]
....: )
....: )
....:
In [53]: pd.set_option("max_colwidth", 40)
In [54]: df
Out[54]:
0 1 2 3
0 foo bar bim uncomfortably long string
1 horse cow banana apple
In [55]: pd.set_option("max_colwidth", 6)
In [56]: df
Out[56]:
0 1 2 3
0 foo bar bim un...
1 horse cow ba... apple
In [57]: pd.reset_option("max_colwidth")
display.max_info_columns
設定在呼叫 info()
時顯示的列數閾值。
In [58]: df = pd.DataFrame(np.random.randn(10, 10))
In [59]: pd.set_option("max_info_columns", 11)
In [60]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 10 non-null float64
1 1 10 non-null float64
2 2 10 non-null float64
3 3 10 non-null float64
4 4 10 non-null float64
5 5 10 non-null float64
6 6 10 non-null float64
7 7 10 non-null float64
8 8 10 non-null float64
9 9 10 non-null float64
dtypes: float64(10)
memory usage: 928.0 bytes
In [61]: pd.set_option("max_info_columns", 5)
In [62]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Columns: 10 entries, 0 to 9
dtypes: float64(10)
memory usage: 928.0 bytes
In [63]: pd.reset_option("max_info_columns")
display.max_info_rows
:info()
通常會顯示每列的空值計數。對於大型 DataFrame
來說,這可能會相當慢。max_info_rows
和 max_info_cols
將此空值檢查限制為分別指定的行和列。info()
的關鍵字引數 show_counts=True
將覆蓋此設定。
In [64]: df = pd.DataFrame(np.random.choice([0, 1, np.nan], size=(10, 10)))
In [65]: df
Out[65]:
0 1 2 3 4 5 6 7 8 9
0 0.0 NaN 1.0 NaN NaN 0.0 NaN 0.0 NaN 1.0
1 1.0 NaN 1.0 1.0 1.0 1.0 NaN 0.0 0.0 NaN
2 0.0 NaN 1.0 0.0 0.0 NaN NaN NaN NaN 0.0
3 NaN NaN NaN 0.0 1.0 1.0 NaN 1.0 NaN 1.0
4 0.0 NaN NaN NaN 0.0 NaN NaN NaN 1.0 0.0
5 0.0 1.0 1.0 1.0 1.0 0.0 NaN NaN 1.0 0.0
6 1.0 1.0 1.0 NaN 1.0 NaN 1.0 0.0 NaN NaN
7 0.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 0.0 NaN
8 NaN NaN NaN 0.0 NaN NaN NaN NaN 1.0 NaN
9 0.0 NaN 0.0 NaN NaN 0.0 NaN 1.0 1.0 0.0
In [66]: pd.set_option("max_info_rows", 11)
In [67]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 8 non-null float64
1 1 3 non-null float64
2 2 7 non-null float64
3 3 6 non-null float64
4 4 7 non-null float64
5 5 6 non-null float64
6 6 2 non-null float64
7 7 6 non-null float64
8 8 6 non-null float64
9 9 6 non-null float64
dtypes: float64(10)
memory usage: 928.0 bytes
In [68]: pd.set_option("max_info_rows", 5)
In [69]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
# Column Dtype
--- ------ -----
0 0 float64
1 1 float64
2 2 float64
3 3 float64
4 4 float64
5 5 float64
6 6 float64
7 7 float64
8 8 float64
9 9 float64
dtypes: float64(10)
memory usage: 928.0 bytes
In [70]: pd.reset_option("max_info_rows")
display.precision
設定輸出顯示精度,即小數位數。
In [71]: df = pd.DataFrame(np.random.randn(5, 5))
In [72]: pd.set_option("display.precision", 7)
In [73]: df
Out[73]:
0 1 2 3 4
0 -1.1506406 -0.7983341 -0.5576966 0.3813531 1.3371217
1 -1.5310949 1.3314582 -0.5713290 -0.0266708 -1.0856630
2 -1.1147378 -0.0582158 -0.4867681 1.6851483 0.1125723
3 -1.4953086 0.8984347 -0.1482168 -1.5960698 0.1596530
4 0.2621358 0.0362196 0.1847350 -0.2550694 -0.2710197
In [74]: pd.set_option("display.precision", 4)
In [75]: df
Out[75]:
0 1 2 3 4
0 -1.1506 -0.7983 -0.5577 0.3814 1.3371
1 -1.5311 1.3315 -0.5713 -0.0267 -1.0857
2 -1.1147 -0.0582 -0.4868 1.6851 0.1126
3 -1.4953 0.8984 -0.1482 -1.5961 0.1597
4 0.2621 0.0362 0.1847 -0.2551 -0.2710
display.chop_threshold
設定在顯示 Series
或 DataFrame
時將舍入閾值設為零。該設定不會改變儲存數字的精度。
In [76]: df = pd.DataFrame(np.random.randn(6, 6))
In [77]: pd.set_option("chop_threshold", 0)
In [78]: df
Out[78]:
0 1 2 3 4 5
0 1.2884 0.2946 -1.1658 0.8470 -0.6856 0.6091
1 -0.3040 0.6256 -0.0593 0.2497 1.1039 -1.0875
2 1.9980 -0.2445 0.1362 0.8863 -1.3507 -0.8863
3 -1.0133 1.9209 -0.3882 -2.3144 0.6655 0.4026
4 0.3996 -1.7660 0.8504 0.3881 0.9923 0.7441
5 -0.7398 -1.0549 -0.1796 0.6396 1.5850 1.9067
In [79]: pd.set_option("chop_threshold", 0.5)
In [80]: df
Out[80]:
0 1 2 3 4 5
0 1.2884 0.0000 -1.1658 0.8470 -0.6856 0.6091
1 0.0000 0.6256 0.0000 0.0000 1.1039 -1.0875
2 1.9980 0.0000 0.0000 0.8863 -1.3507 -0.8863
3 -1.0133 1.9209 0.0000 -2.3144 0.6655 0.0000
4 0.0000 -1.7660 0.8504 0.0000 0.9923 0.7441
5 -0.7398 -1.0549 0.0000 0.6396 1.5850 1.9067
In [81]: pd.reset_option("chop_threshold")
display.colheader_justify
控制標題的對齊方式。選項有 'right'
和 'left'
。
In [82]: df = pd.DataFrame(
....: np.array([np.random.randn(6), np.random.randint(1, 9, 6) * 0.1, np.zeros(6)]).T,
....: columns=["A", "B", "C"],
....: dtype="float",
....: )
....:
In [83]: pd.set_option("colheader_justify", "right")
In [84]: df
Out[84]:
A B C
0 0.1040 0.1 0.0
1 0.1741 0.5 0.0
2 -0.4395 0.4 0.0
3 -0.7413 0.8 0.0
4 -0.0797 0.4 0.0
5 -0.9229 0.3 0.0
In [85]: pd.set_option("colheader_justify", "left")
In [86]: df
Out[86]:
A B C
0 0.1040 0.1 0.0
1 0.1741 0.5 0.0
2 -0.4395 0.4 0.0
3 -0.7413 0.8 0.0
4 -0.0797 0.4 0.0
5 -0.9229 0.3 0.0
In [87]: pd.reset_option("colheader_justify")
數字格式化
pandas 還允許您設定在控制檯中顯示數字的方式。此選項不是透過 set_options
API 設定的。
使用set_eng_float_format
函式來改變 pandas 物件的浮點格式,以產生特定格式。
In [88]: import numpy as np
In [89]: pd.set_eng_float_format(accuracy=3, use_eng_prefix=True)
In [90]: s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])
In [91]: s / 1.0e3
Out[91]:
a 303.638u
b -721.084u
c -622.696u
d 648.250u
e -1.945m
dtype: float64
In [92]: s / 1.0e6
Out[92]:
a 303.638n
b -721.084n
c -622.696n
d 648.250n
e -1.945u
dtype: float64
使用round()
來專門控制單個DataFrame
的四捨五入
Unicode 格式化
警告
啟用此選項將影響 DataFrame 和 Series 的列印效能(大約慢 2 倍)。僅在實際需要時使用。
一些東亞國家使用 Unicode 字元,其寬度相當於兩個拉丁字元。如果 DataFrame 或 Series 包含這些字元,則預設輸出模式可能無法正確對齊它們。
In [93]: df = pd.DataFrame({"國籍": ["UK", "日本"], "名前": ["Alice", "しのぶ"]})
In [94]: df
Out[94]:
國籍 名前
0 UK Alice
1 日本 しのぶ
啟用display.unicode.east_asian_width
允許 pandas 檢查每個字元的“東亞寬度”屬性。透過將此選項設定為True
,可以正確對齊這些字元。然而,這將導致比標準len
函式更長的渲染時間。
In [95]: pd.set_option("display.unicode.east_asian_width", True)
In [96]: df
Out[96]:
國籍 名前
0 UK Alice
1 日本 しのぶ
此外,Unicode 字元的寬度“模糊”可能是 1 或 2 個字元寬,取決於終端設定或編碼。選項display.unicode.ambiguous_as_wide
可用於處理這種模糊性。
預設情況下,“模糊”字元的寬度,例如下面示例中的“¡”(倒歎號),被視為 1。
In [97]: df = pd.DataFrame({"a": ["xxx", "¡¡"], "b": ["yyy", "¡¡"]})
In [98]: df
Out[98]:
a b
0 xxx yyy
1 ¡¡ ¡¡
啟用display.unicode.ambiguous_as_wide
使得 pandas 將這些字元的寬度解釋為 2。(請注意,此選項僅在啟用display.unicode.east_asian_width
時才有效。)
然而,為您的終端錯誤設定此選項將導致這些字元對齊不正確:
In [99]: pd.set_option("display.unicode.ambiguous_as_wide", True)
In [100]: df
Out[100]:
a b
0 xxx yyy
1 ¡¡ ¡¡
表模式顯示
DataFrame
和 Series
預設會釋出一個表模式表示。可以透過display.html.table_schema
選項在全域性範圍內啟用此功能:
In [101]: pd.set_option("display.html.table_schema", True)
只有'display.max_rows'
被序列化和釋出。