Pandas 2.2 中文官方教程和指南(二十二)

绝不原创的飞龙發表於2024-04-24

原文:pandas.pydata.org/docs/

時間增量

原文:pandas.pydata.org/docs/user_guide/timedeltas.html

時間增量是時間之間的差異,以不同的單位表示,例如天、小時、分鐘、秒。它們可以是正數也可以是負數。

Timedeltadatetime.timedelta的子類,並且行為類似,但也允許與np.timedelta64型別相容,以及一系列自定義表示、解析和屬性。

解析

您可以透過各種引數構造一個Timedelta標量,包括ISO 8601 Duration字串。

In [1]: import datetime

# strings
In [2]: pd.Timedelta("1 days")
Out[2]: Timedelta('1 days 00:00:00')

In [3]: pd.Timedelta("1 days 00:00:00")
Out[3]: Timedelta('1 days 00:00:00')

In [4]: pd.Timedelta("1 days 2 hours")
Out[4]: Timedelta('1 days 02:00:00')

In [5]: pd.Timedelta("-1 days 2 min 3us")
Out[5]: Timedelta('-2 days +23:57:59.999997')

# like datetime.timedelta
# note: these MUST be specified as keyword arguments
In [6]: pd.Timedelta(days=1, seconds=1)
Out[6]: Timedelta('1 days 00:00:01')

# integers with a unit
In [7]: pd.Timedelta(1, unit="d")
Out[7]: Timedelta('1 days 00:00:00')

# from a datetime.timedelta/np.timedelta64
In [8]: pd.Timedelta(datetime.timedelta(days=1, seconds=1))
Out[8]: Timedelta('1 days 00:00:01')

In [9]: pd.Timedelta(np.timedelta64(1, "ms"))
Out[9]: Timedelta('0 days 00:00:00.001000')

# negative Timedeltas have this string repr
# to be more consistent with datetime.timedelta conventions
In [10]: pd.Timedelta("-1us")
Out[10]: Timedelta('-1 days +23:59:59.999999')

# a NaT
In [11]: pd.Timedelta("nan")
Out[11]: NaT

In [12]: pd.Timedelta("nat")
Out[12]: NaT

# ISO 8601 Duration strings
In [13]: pd.Timedelta("P0DT0H1M0S")
Out[13]: Timedelta('0 days 00:01:00')

In [14]: pd.Timedelta("P0DT0H0M0.000000123S")
Out[14]: Timedelta('0 days 00:00:00.000000123') 

日期偏移(Day, Hour, Minute, Second, Milli, Micro, Nano)也可以用於構建。

In [15]: pd.Timedelta(pd.offsets.Second(2))
Out[15]: Timedelta('0 days 00:00:02') 

此外,標量與標量之間的操作將產生另一個標量Timedelta

In [16]: pd.Timedelta(pd.offsets.Day(2)) + pd.Timedelta(pd.offsets.Second(2)) + pd.Timedelta(
 ....:    "00:00:00.000123"
 ....: )
 ....: 
Out[16]: Timedelta('2 days 00:00:02.000123') 

to_timedelta

使用頂級的 pd.to_timedelta,您可以將識別的時間增量格式/值的標量、陣列、列表或序列轉換為 Timedelta 型別。如果輸入是序列,則將構造序列,如果輸入類似於標量,則將輸出標量,否則將輸出 TimedeltaIndex

您可以將單個字串解析為一個時間增量:

In [17]: pd.to_timedelta("1 days 06:05:01.00003")
Out[17]: Timedelta('1 days 06:05:01.000030')

In [18]: pd.to_timedelta("15.5us")
Out[18]: Timedelta('0 days 00:00:00.000015500') 

或者一個字串的列表/陣列:

In [19]: pd.to_timedelta(["1 days 06:05:01.00003", "15.5us", "nan"])
Out[19]: TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015500', NaT], dtype='timedelta64[ns]', freq=None) 

unit 關鍵字引數指定了 Timedelta 的單位,如果輸入是數字的話:

In [20]: pd.to_timedelta(np.arange(5), unit="s")
Out[20]: 
TimedeltaIndex(['0 days 00:00:00', '0 days 00:00:01', '0 days 00:00:02',
 '0 days 00:00:03', '0 days 00:00:04'],
 dtype='timedelta64[ns]', freq=None)

In [21]: pd.to_timedelta(np.arange(5), unit="d")
Out[21]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None) 

警告

如果將字串或字串陣列作為輸入傳遞,則將忽略unit關鍵字引數。如果傳遞沒有單位的字串,則假定預設單位為納秒。

時間增量的限制

pandas 使用 64 位整數以納秒解析度表示Timedeltas。因此,64 位整數限制確定了Timedelta的限制。

In [22]: pd.Timedelta.min
Out[22]: Timedelta('-106752 days +00:12:43.145224193')

In [23]: pd.Timedelta.max
Out[23]: Timedelta('106751 days 23:47:16.854775807') 
```  ## 操作

您可以對序列/資料框進行操作,並透過在`datetime64[ns]`序列或`Timestamps`上執行減法操作來構建`timedelta64[ns]`序列。

```py
In [24]: s = pd.Series(pd.date_range("2012-1-1", periods=3, freq="D"))

In [25]: td = pd.Series([pd.Timedelta(days=i) for i in range(3)])

In [26]: df = pd.DataFrame({"A": s, "B": td})

In [27]: df
Out[27]: 
 A      B
0 2012-01-01 0 days
1 2012-01-02 1 days
2 2012-01-03 2 days

In [28]: df["C"] = df["A"] + df["B"]

In [29]: df
Out[29]: 
 A      B          C
0 2012-01-01 0 days 2012-01-01
1 2012-01-02 1 days 2012-01-03
2 2012-01-03 2 days 2012-01-05

In [30]: df.dtypes
Out[30]: 
A     datetime64[ns]
B    timedelta64[ns]
C     datetime64[ns]
dtype: object

In [31]: s - s.max()
Out[31]: 
0   -2 days
1   -1 days
2    0 days
dtype: timedelta64[ns]

In [32]: s - datetime.datetime(2011, 1, 1, 3, 5)
Out[32]: 
0   364 days 20:55:00
1   365 days 20:55:00
2   366 days 20:55:00
dtype: timedelta64[ns]

In [33]: s + datetime.timedelta(minutes=5)
Out[33]: 
0   2012-01-01 00:05:00
1   2012-01-02 00:05:00
2   2012-01-03 00:05:00
dtype: datetime64[ns]

In [34]: s + pd.offsets.Minute(5)
Out[34]: 
0   2012-01-01 00:05:00
1   2012-01-02 00:05:00
2   2012-01-03 00:05:00
dtype: datetime64[ns]

In [35]: s + pd.offsets.Minute(5) + pd.offsets.Milli(5)
Out[35]: 
0   2012-01-01 00:05:00.005
1   2012-01-02 00:05:00.005
2   2012-01-03 00:05:00.005
dtype: datetime64[ns] 

使用timedelta64[ns]序列的標量進行操作:

In [36]: y = s - s[0]

In [37]: y
Out[37]: 
0   0 days
1   1 days
2   2 days
dtype: timedelta64[ns] 

支援帶有NaT值的時間增量序列:

In [38]: y = s - s.shift()

In [39]: y
Out[39]: 
0      NaT
1   1 days
2   1 days
dtype: timedelta64[ns] 

可以使用np.nan將元素設定為NaT,類似於日期時間:

In [40]: y[1] = np.nan

In [41]: y
Out[41]: 
0      NaT
1      NaT
2   1 days
dtype: timedelta64[ns] 

運算元也可以以相反的順序出現(一個物件與一個序列進行操作):

In [42]: s.max() - s
Out[42]: 
0   2 days
1   1 days
2   0 days
dtype: timedelta64[ns]

In [43]: datetime.datetime(2011, 1, 1, 3, 5) - s
Out[43]: 
0   -365 days +03:05:00
1   -366 days +03:05:00
2   -367 days +03:05:00
dtype: timedelta64[ns]

In [44]: datetime.timedelta(minutes=5) + s
Out[44]: 
0   2012-01-01 00:05:00
1   2012-01-02 00:05:00
2   2012-01-03 00:05:00
dtype: datetime64[ns] 

min, max 和相應的 idxmin, idxmax 操作也適用於框架:

In [45]: A = s - pd.Timestamp("20120101") - pd.Timedelta("00:05:05")

In [46]: B = s - pd.Series(pd.date_range("2012-1-2", periods=3, freq="D"))

In [47]: df = pd.DataFrame({"A": A, "B": B})

In [48]: df
Out[48]: 
 A       B
0 -1 days +23:54:55 -1 days
1   0 days 23:54:55 -1 days
2   1 days 23:54:55 -1 days

In [49]: df.min()
Out[49]: 
A   -1 days +23:54:55
B   -1 days +00:00:00
dtype: timedelta64[ns]

In [50]: df.min(axis=1)
Out[50]: 
0   -1 days
1   -1 days
2   -1 days
dtype: timedelta64[ns]

In [51]: df.idxmin()
Out[51]: 
A    0
B    0
dtype: int64

In [52]: df.idxmax()
Out[52]: 
A    2
B    0
dtype: int64 

min, max, idxmin, idxmax 操作也適用於序列。標量結果將是一個Timedelta

In [53]: df.min().max()
Out[53]: Timedelta('-1 days +23:54:55')

In [54]: df.min(axis=1).min()
Out[54]: Timedelta('-1 days +00:00:00')

In [55]: df.min().idxmax()
Out[55]: 'A'

In [56]: df.min(axis=1).idxmin()
Out[56]: 0 

您可以對時間增量進行填充,傳遞一個時間增量以獲得特定值。

In [57]: y.fillna(pd.Timedelta(0))
Out[57]: 
0   0 days
1   0 days
2   1 days
dtype: timedelta64[ns]

In [58]: y.fillna(pd.Timedelta(10, unit="s"))
Out[58]: 
0   0 days 00:00:10
1   0 days 00:00:10
2   1 days 00:00:00
dtype: timedelta64[ns]

In [59]: y.fillna(pd.Timedelta("-1 days, 00:00:05"))
Out[59]: 
0   -1 days +00:00:05
1   -1 days +00:00:05
2     1 days 00:00:00
dtype: timedelta64[ns] 

您還可以對Timedeltas進行取反、乘法和使用abs

In [60]: td1 = pd.Timedelta("-1 days 2 hours 3 seconds")

In [61]: td1
Out[61]: Timedelta('-2 days +21:59:57')

In [62]: -1 * td1
Out[62]: Timedelta('1 days 02:00:03')

In [63]: -td1
Out[63]: Timedelta('1 days 02:00:03')

In [64]: abs(td1)
Out[64]: Timedelta('1 days 02:00:03') 
```  ## 縮減

`timedelta64[ns]`的數值縮減操作將返回`Timedelta`物件。通常在評估過程中會跳過`NaT`。

```py
In [65]: y2 = pd.Series(
 ....:    pd.to_timedelta(["-1 days +00:00:05", "nat", "-1 days +00:00:05", "1 days"])
 ....: )
 ....: 

In [66]: y2
Out[66]: 
0   -1 days +00:00:05
1                 NaT
2   -1 days +00:00:05
3     1 days 00:00:00
dtype: timedelta64[ns]

In [67]: y2.mean()
Out[67]: Timedelta('-1 days +16:00:03.333333334')

In [68]: y2.median()
Out[68]: Timedelta('-1 days +00:00:05')

In [69]: y2.quantile(0.1)
Out[69]: Timedelta('-1 days +00:00:05')

In [70]: y2.sum()
Out[70]: Timedelta('-1 days +00:00:10') 
```  ## 頻率轉換

時間增量序列和 `TimedeltaIndex`,以及 `Timedelta` 可以透過轉換為特定的時間增量資料型別來轉換為其他頻率。

```py
In [71]: december = pd.Series(pd.date_range("20121201", periods=4))

In [72]: january = pd.Series(pd.date_range("20130101", periods=4))

In [73]: td = january - december

In [74]: td[2] += datetime.timedelta(minutes=5, seconds=3)

In [75]: td[3] = np.nan

In [76]: td
Out[76]: 
0   31 days 00:00:00
1   31 days 00:00:00
2   31 days 00:05:03
3                NaT
dtype: timedelta64[ns]

# to seconds
In [77]: td.astype("timedelta64[s]")
Out[77]: 
0   31 days 00:00:00
1   31 days 00:00:00
2   31 days 00:05:03
3                NaT
dtype: timedelta64[s] 

對於除了支援的“s”、“ms”、“us”、“ns”之外的 timedelta64 解析度,另一種方法是除以另一個 timedelta 物件。請注意,透過 NumPy 標量進行的除法是真除法,而 astyping 等同於 floor division。

# to days
In [78]: td / np.timedelta64(1, "D")
Out[78]: 
0    31.000000
1    31.000000
2    31.003507
3          NaN
dtype: float64 

timedelta64[ns] Series 除以整數或整數系列,或者乘以整數,會得到另一個timedelta64[ns] dtypes Series。

In [79]: td * -1
Out[79]: 
0   -31 days +00:00:00
1   -31 days +00:00:00
2   -32 days +23:54:57
3                  NaT
dtype: timedelta64[ns]

In [80]: td * pd.Series([1, 2, 3, 4])
Out[80]: 
0   31 days 00:00:00
1   62 days 00:00:00
2   93 days 00:15:09
3                NaT
dtype: timedelta64[ns] 

timedelta64[ns] Series 進行四捨五入除法(floor-division)得到一個整數系列。

In [81]: td // pd.Timedelta(days=3, hours=4)
Out[81]: 
0    9.0
1    9.0
2    9.0
3    NaN
dtype: float64

In [82]: pd.Timedelta(days=3, hours=4) // td
Out[82]: 
0    0.0
1    0.0
2    0.0
3    NaN
dtype: float64 

當與另一個類似timedelta或數值引數操作時,Timedelta定義了 mod(%)和 divmod 操作。

In [83]: pd.Timedelta(hours=37) % datetime.timedelta(hours=2)
Out[83]: Timedelta('0 days 01:00:00')

# divmod against a timedelta-like returns a pair (int, Timedelta)
In [84]: divmod(datetime.timedelta(hours=2), pd.Timedelta(minutes=11))
Out[84]: (10, Timedelta('0 days 00:10:00'))

# divmod against a numeric returns a pair (Timedelta, Timedelta)
In [85]: divmod(pd.Timedelta(hours=25), 86400000000000)
Out[85]: (Timedelta('0 days 00:00:00.000000001'), Timedelta('0 days 01:00:00')) 

屬性

你可以直接使用days,seconds,microseconds,nanoseconds屬性訪問TimedeltaTimedeltaIndex的各個元件。這些與datetime.timedelta返回的值相同,例如,.seconds屬性表示大於等於 0 且小於 1 天的秒數。這些值根據Timedelta是否有符號而有所不同。

這些操作也可以透過Series.dt屬性直接訪問。

注意

注意,屬性不是Timedelta的顯示值。使用.components來檢索顯示值。

對於Series

In [86]: td.dt.days
Out[86]: 
0    31.0
1    31.0
2    31.0
3     NaN
dtype: float64

In [87]: td.dt.seconds
Out[87]: 
0      0.0
1      0.0
2    303.0
3      NaN
dtype: float64 

你可以直接訪問標量Timedelta的欄位值。

In [88]: tds = pd.Timedelta("31 days 5 min 3 sec")

In [89]: tds.days
Out[89]: 31

In [90]: tds.seconds
Out[90]: 303

In [91]: (-tds).seconds
Out[91]: 86097 

你可以使用.components屬性訪問時間間隔的簡化形式。這將返回一個類似於Series的索引的DataFrame。這些是Timedelta顯示值。

In [92]: td.dt.components
Out[92]: 
 days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0  31.0    0.0      0.0      0.0           0.0           0.0          0.0
1  31.0    0.0      0.0      0.0           0.0           0.0          0.0
2  31.0    0.0      5.0      3.0           0.0           0.0          0.0
3   NaN    NaN      NaN      NaN           NaN           NaN          NaN

In [93]: td.dt.components.seconds
Out[93]: 
0    0.0
1    0.0
2    3.0
3    NaN
Name: seconds, dtype: float64 

你可以使用.isoformat方法將Timedelta轉換為ISO 8601 Duration字串。

In [94]: pd.Timedelta(
 ....:    days=6, minutes=50, seconds=3, milliseconds=10, microseconds=10, nanoseconds=12
 ....: ).isoformat()
 ....: 
Out[94]: 'P6DT0H50M3.010010012S' 

TimedeltaIndex

要生成帶有時間間隔的索引,可以使用TimedeltaIndextimedelta_range()建構函式。

使用TimedeltaIndex可以傳遞類似字串、Timedeltatimedeltanp.timedelta64的物件。傳遞np.nan/pd.NaT/nat將表示缺失值。

In [95]: pd.TimedeltaIndex(
 ....:    [
 ....:        "1 days",
 ....:        "1 days, 00:00:05",
 ....:        np.timedelta64(2, "D"),
 ....:        datetime.timedelta(days=2, seconds=2),
 ....:    ]
 ....: )
 ....: 
Out[95]: 
TimedeltaIndex(['1 days 00:00:00', '1 days 00:00:05', '2 days 00:00:00',
 '2 days 00:00:02'],
 dtype='timedelta64[ns]', freq=None) 

字串‘infer’可以傳遞以將索引的頻率設定為建立時推斷的頻率:

In [96]: pd.TimedeltaIndex(["0 days", "10 days", "20 days"], freq="infer")
Out[96]: TimedeltaIndex(['0 days', '10 days', '20 days'], dtype='timedelta64[ns]', freq='10D') 

生成時間間隔範圍

類似於date_range(),你可以使用timedelta_range()構造TimedeltaIndex的常規範圍。timedelta_range的預設頻率是日曆日:

In [97]: pd.timedelta_range(start="1 days", periods=5)
Out[97]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D') 

可以使用timedelta_rangestartendperiods的各種組合:

In [98]: pd.timedelta_range(start="1 days", end="5 days")
Out[98]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D')

In [99]: pd.timedelta_range(end="10 days", periods=4)
Out[99]: TimedeltaIndex(['7 days', '8 days', '9 days', '10 days'], dtype='timedelta64[ns]', freq='D') 

freq引數可以傳遞各種 frequency aliases:

In [100]: pd.timedelta_range(start="1 days", end="2 days", freq="30min")
Out[100]: 
TimedeltaIndex(['1 days 00:00:00', '1 days 00:30:00', '1 days 01:00:00',
 '1 days 01:30:00', '1 days 02:00:00', '1 days 02:30:00',
 '1 days 03:00:00', '1 days 03:30:00', '1 days 04:00:00',
 '1 days 04:30:00', '1 days 05:00:00', '1 days 05:30:00',
 '1 days 06:00:00', '1 days 06:30:00', '1 days 07:00:00',
 '1 days 07:30:00', '1 days 08:00:00', '1 days 08:30:00',
 '1 days 09:00:00', '1 days 09:30:00', '1 days 10:00:00',
 '1 days 10:30:00', '1 days 11:00:00', '1 days 11:30:00',
 '1 days 12:00:00', '1 days 12:30:00', '1 days 13:00:00',
 '1 days 13:30:00', '1 days 14:00:00', '1 days 14:30:00',
 '1 days 15:00:00', '1 days 15:30:00', '1 days 16:00:00',
 '1 days 16:30:00', '1 days 17:00:00', '1 days 17:30:00',
 '1 days 18:00:00', '1 days 18:30:00', '1 days 19:00:00',
 '1 days 19:30:00', '1 days 20:00:00', '1 days 20:30:00',
 '1 days 21:00:00', '1 days 21:30:00', '1 days 22:00:00',
 '1 days 22:30:00', '1 days 23:00:00', '1 days 23:30:00',
 '2 days 00:00:00'],
 dtype='timedelta64[ns]', freq='30min')

In [101]: pd.timedelta_range(start="1 days", periods=5, freq="2D5h")
Out[101]: 
TimedeltaIndex(['1 days 00:00:00', '3 days 05:00:00', '5 days 10:00:00',
 '7 days 15:00:00', '9 days 20:00:00'],
 dtype='timedelta64[ns]', freq='53h') 

指定 startendperiods 將生成一系列從 startend 的等間隔 timedeltas,其中結果 TimedeltaIndex 中的元素數為 periods

In [102]: pd.timedelta_range("0 days", "4 days", periods=5)
Out[102]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)

In [103]: pd.timedelta_range("0 days", "4 days", periods=10)
Out[103]: 
TimedeltaIndex(['0 days 00:00:00', '0 days 10:40:00', '0 days 21:20:00',
 '1 days 08:00:00', '1 days 18:40:00', '2 days 05:20:00',
 '2 days 16:00:00', '3 days 02:40:00', '3 days 13:20:00',
 '4 days 00:00:00'],
 dtype='timedelta64[ns]', freq=None) 

使用 TimedeltaIndex

類似於其他日期時間索引,DatetimeIndexPeriodIndex,你可以將 TimedeltaIndex 用作 pandas 物件的索引。

In [104]: s = pd.Series(
 .....:    np.arange(100),
 .....:    index=pd.timedelta_range("1 days", periods=100, freq="h"),
 .....: )
 .....: 

In [105]: s
Out[105]: 
1 days 00:00:00     0
1 days 01:00:00     1
1 days 02:00:00     2
1 days 03:00:00     3
1 days 04:00:00     4
 ..
4 days 23:00:00    95
5 days 00:00:00    96
5 days 01:00:00    97
5 days 02:00:00    98
5 days 03:00:00    99
Freq: h, Length: 100, dtype: int64 

選擇方式類似,對於字串樣式和切片都會進行強制轉換:

In [106]: s["1 day":"2 day"]
Out[106]: 
1 days 00:00:00     0
1 days 01:00:00     1
1 days 02:00:00     2
1 days 03:00:00     3
1 days 04:00:00     4
 ..
2 days 19:00:00    43
2 days 20:00:00    44
2 days 21:00:00    45
2 days 22:00:00    46
2 days 23:00:00    47
Freq: h, Length: 48, dtype: int64

In [107]: s["1 day 01:00:00"]
Out[107]: 1

In [108]: s[pd.Timedelta("1 day 1h")]
Out[108]: 1 

此外,你可以使用部分字串選擇,範圍將被推斷:

In [109]: s["1 day":"1 day 5 hours"]
Out[109]: 
1 days 00:00:00    0
1 days 01:00:00    1
1 days 02:00:00    2
1 days 03:00:00    3
1 days 04:00:00    4
1 days 05:00:00    5
Freq: h, dtype: int64 

操作

最後,TimedeltaIndexDatetimeIndex 的組合允許保留某些組合操作的 NaT:

In [110]: tdi = pd.TimedeltaIndex(["1 days", pd.NaT, "2 days"])

In [111]: tdi.to_list()
Out[111]: [Timedelta('1 days 00:00:00'), NaT, Timedelta('2 days 00:00:00')]

In [112]: dti = pd.date_range("20130101", periods=3)

In [113]: dti.to_list()
Out[113]: 
[Timestamp('2013-01-01 00:00:00'),
 Timestamp('2013-01-02 00:00:00'),
 Timestamp('2013-01-03 00:00:00')]

In [114]: (dti + tdi).to_list()
Out[114]: [Timestamp('2013-01-02 00:00:00'), NaT, Timestamp('2013-01-05 00:00:00')]

In [115]: (dti - tdi).to_list()
Out[115]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2013-01-01 00:00:00')] 

轉換

類似於上面對Series進行的頻率轉換,你可以將這些索引轉換為另一個索引。

In [116]: tdi / np.timedelta64(1, "s")
Out[116]: Index([86400.0, nan, 172800.0], dtype='float64')

In [117]: tdi.astype("timedelta64[s]")
Out[117]: TimedeltaIndex(['1 days', NaT, '2 days'], dtype='timedelta64[s]', freq=None) 

標量型別操作也有效。這些可能返回一個不同型別的索引。

# adding or timedelta and date -> datelike
In [118]: tdi + pd.Timestamp("20130101")
Out[118]: DatetimeIndex(['2013-01-02', 'NaT', '2013-01-03'], dtype='datetime64[ns]', freq=None)

# subtraction of a date and a timedelta -> datelike
# note that trying to subtract a date from a Timedelta will raise an exception
In [119]: (pd.Timestamp("20130101") - tdi).to_list()
Out[119]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2012-12-30 00:00:00')]

# timedelta + timedelta -> timedelta
In [120]: tdi + pd.Timedelta("10 days")
Out[120]: TimedeltaIndex(['11 days', NaT, '12 days'], dtype='timedelta64[ns]', freq=None)

# division can result in a Timedelta if the divisor is an integer
In [121]: tdi / 2
Out[121]: TimedeltaIndex(['0 days 12:00:00', NaT, '1 days 00:00:00'], dtype='timedelta64[ns]', freq=None)

# or a float64 Index if the divisor is a Timedelta
In [122]: tdi / tdi[0]
Out[122]: Index([1.0, nan, 2.0], dtype='float64') 
```  ## 重新取樣

類似於時間序列重取樣,我們可以使用 `TimedeltaIndex` 進行重新取樣。

```py
In [123]: s.resample("D").mean()
Out[123]: 
1 days    11.5
2 days    35.5
3 days    59.5
4 days    83.5
5 days    97.5
Freq: D, dtype: float64 

解析

你可以透過各種引數構造一個 Timedelta 標量,包括ISO 8601 Duration字串。

In [1]: import datetime

# strings
In [2]: pd.Timedelta("1 days")
Out[2]: Timedelta('1 days 00:00:00')

In [3]: pd.Timedelta("1 days 00:00:00")
Out[3]: Timedelta('1 days 00:00:00')

In [4]: pd.Timedelta("1 days 2 hours")
Out[4]: Timedelta('1 days 02:00:00')

In [5]: pd.Timedelta("-1 days 2 min 3us")
Out[5]: Timedelta('-2 days +23:57:59.999997')

# like datetime.timedelta
# note: these MUST be specified as keyword arguments
In [6]: pd.Timedelta(days=1, seconds=1)
Out[6]: Timedelta('1 days 00:00:01')

# integers with a unit
In [7]: pd.Timedelta(1, unit="d")
Out[7]: Timedelta('1 days 00:00:00')

# from a datetime.timedelta/np.timedelta64
In [8]: pd.Timedelta(datetime.timedelta(days=1, seconds=1))
Out[8]: Timedelta('1 days 00:00:01')

In [9]: pd.Timedelta(np.timedelta64(1, "ms"))
Out[9]: Timedelta('0 days 00:00:00.001000')

# negative Timedeltas have this string repr
# to be more consistent with datetime.timedelta conventions
In [10]: pd.Timedelta("-1us")
Out[10]: Timedelta('-1 days +23:59:59.999999')

# a NaT
In [11]: pd.Timedelta("nan")
Out[11]: NaT

In [12]: pd.Timedelta("nat")
Out[12]: NaT

# ISO 8601 Duration strings
In [13]: pd.Timedelta("P0DT0H1M0S")
Out[13]: Timedelta('0 days 00:01:00')

In [14]: pd.Timedelta("P0DT0H0M0.000000123S")
Out[14]: Timedelta('0 days 00:00:00.000000123') 

DateOffsets(Day, Hour, Minute, Second, Milli, Micro, Nano)也可以在構造中使用。

In [15]: pd.Timedelta(pd.offsets.Second(2))
Out[15]: Timedelta('0 days 00:00:02') 

此外,標量之間的操作會產生另一個標量 Timedelta

In [16]: pd.Timedelta(pd.offsets.Day(2)) + pd.Timedelta(pd.offsets.Second(2)) + pd.Timedelta(
 ....:    "00:00:00.000123"
 ....: )
 ....: 
Out[16]: Timedelta('2 days 00:00:02.000123') 

to_timedelta

使用頂層的 pd.to_timedelta,你可以將一個被識別的時間差格式/值的標量、陣列、列表或 Series 轉換為 Timedelta 型別。如果輸入是 Series,則會構造 Series;如果輸入類似於標量,則會構造標量,否則將輸出一個 TimedeltaIndex

你可以將單個字串解析為 Timedelta:

In [17]: pd.to_timedelta("1 days 06:05:01.00003")
Out[17]: Timedelta('1 days 06:05:01.000030')

In [18]: pd.to_timedelta("15.5us")
Out[18]: Timedelta('0 days 00:00:00.000015500') 

或者一個字串列表/陣列:

In [19]: pd.to_timedelta(["1 days 06:05:01.00003", "15.5us", "nan"])
Out[19]: TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015500', NaT], dtype='timedelta64[ns]', freq=None) 

如果輸入是數字,則unit關鍵字引數指定 Timedelta 的單位:

In [20]: pd.to_timedelta(np.arange(5), unit="s")
Out[20]: 
TimedeltaIndex(['0 days 00:00:00', '0 days 00:00:01', '0 days 00:00:02',
 '0 days 00:00:03', '0 days 00:00:04'],
 dtype='timedelta64[ns]', freq=None)

In [21]: pd.to_timedelta(np.arange(5), unit="d")
Out[21]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None) 

警告

如果作為輸入傳遞了字串或字串陣列,則unit關鍵字引數將被忽略。如果傳遞的是沒有單位的字串,則假定為預設單位為納秒。

Timedelta 的限制

pandas 使用 64 位整數以納秒解析度表示 Timedeltas。因此,64 位整數的限制確定了 Timedelta 的限制。

In [22]: pd.Timedelta.min
Out[22]: Timedelta('-106752 days +00:12:43.145224193')

In [23]: pd.Timedelta.max
Out[23]: Timedelta('106751 days 23:47:16.854775807') 

to_timedelta

使用頂層的 pd.to_timedelta,你可以將一個被識別的時間差格式/值的標量、陣列、列表或 Series 轉換為 Timedelta 型別。如果輸入是 Series,則會構造 Series;如果輸入類似於標量,則會構造標量,否則將輸出一個 TimedeltaIndex

你可以將單個字串解析為 Timedelta:

In [17]: pd.to_timedelta("1 days 06:05:01.00003")
Out[17]: Timedelta('1 days 06:05:01.000030')

In [18]: pd.to_timedelta("15.5us")
Out[18]: Timedelta('0 days 00:00:00.000015500') 

或者一個字串列表/陣列:

In [19]: pd.to_timedelta(["1 days 06:05:01.00003", "15.5us", "nan"])
Out[19]: TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015500', NaT], dtype='timedelta64[ns]', freq=None) 

如果輸入是數字,則unit關鍵字引數指定 Timedelta 的單位:

In [20]: pd.to_timedelta(np.arange(5), unit="s")
Out[20]: 
TimedeltaIndex(['0 days 00:00:00', '0 days 00:00:01', '0 days 00:00:02',
 '0 days 00:00:03', '0 days 00:00:04'],
 dtype='timedelta64[ns]', freq=None)

In [21]: pd.to_timedelta(np.arange(5), unit="d")
Out[21]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None) 

警告

如果作為輸入傳遞了字串或字串陣列,則unit關鍵字引數將被忽略。如果傳遞的是沒有單位的字串,則假定為預設單位為納秒。

Timedelta 的限制

pandas 使用 64 位整數以納秒解析度表示 Timedeltas。因此,64 位整數限制確定了 Timedelta 的限制。

In [22]: pd.Timedelta.min
Out[22]: Timedelta('-106752 days +00:12:43.145224193')

In [23]: pd.Timedelta.max
Out[23]: Timedelta('106751 days 23:47:16.854775807') 

操作

您可以對 Series/DataFrames 進行操作,並透過減法操作在 datetime64[ns] Series 或 Timestamps 上構建 timedelta64[ns] Series。

In [24]: s = pd.Series(pd.date_range("2012-1-1", periods=3, freq="D"))

In [25]: td = pd.Series([pd.Timedelta(days=i) for i in range(3)])

In [26]: df = pd.DataFrame({"A": s, "B": td})

In [27]: df
Out[27]: 
 A      B
0 2012-01-01 0 days
1 2012-01-02 1 days
2 2012-01-03 2 days

In [28]: df["C"] = df["A"] + df["B"]

In [29]: df
Out[29]: 
 A      B          C
0 2012-01-01 0 days 2012-01-01
1 2012-01-02 1 days 2012-01-03
2 2012-01-03 2 days 2012-01-05

In [30]: df.dtypes
Out[30]: 
A     datetime64[ns]
B    timedelta64[ns]
C     datetime64[ns]
dtype: object

In [31]: s - s.max()
Out[31]: 
0   -2 days
1   -1 days
2    0 days
dtype: timedelta64[ns]

In [32]: s - datetime.datetime(2011, 1, 1, 3, 5)
Out[32]: 
0   364 days 20:55:00
1   365 days 20:55:00
2   366 days 20:55:00
dtype: timedelta64[ns]

In [33]: s + datetime.timedelta(minutes=5)
Out[33]: 
0   2012-01-01 00:05:00
1   2012-01-02 00:05:00
2   2012-01-03 00:05:00
dtype: datetime64[ns]

In [34]: s + pd.offsets.Minute(5)
Out[34]: 
0   2012-01-01 00:05:00
1   2012-01-02 00:05:00
2   2012-01-03 00:05:00
dtype: datetime64[ns]

In [35]: s + pd.offsets.Minute(5) + pd.offsets.Milli(5)
Out[35]: 
0   2012-01-01 00:05:00.005
1   2012-01-02 00:05:00.005
2   2012-01-03 00:05:00.005
dtype: datetime64[ns] 

timedelta64[ns] Series 中的標量進行操作:

In [36]: y = s - s[0]

In [37]: y
Out[37]: 
0   0 days
1   1 days
2   2 days
dtype: timedelta64[ns] 

支援具有 NaT 值的時間增量 Series:

In [38]: y = s - s.shift()

In [39]: y
Out[39]: 
0      NaT
1   1 days
2   1 days
dtype: timedelta64[ns] 

使用 np.nan 類似於日期時間可以將元素設定為 NaT

In [40]: y[1] = np.nan

In [41]: y
Out[41]: 
0      NaT
1      NaT
2   1 days
dtype: timedelta64[ns] 

運算元也可以以相反的順序出現(一個物件與 Series 進行操作):

In [42]: s.max() - s
Out[42]: 
0   2 days
1   1 days
2   0 days
dtype: timedelta64[ns]

In [43]: datetime.datetime(2011, 1, 1, 3, 5) - s
Out[43]: 
0   -365 days +03:05:00
1   -366 days +03:05:00
2   -367 days +03:05:00
dtype: timedelta64[ns]

In [44]: datetime.timedelta(minutes=5) + s
Out[44]: 
0   2012-01-01 00:05:00
1   2012-01-02 00:05:00
2   2012-01-03 00:05:00
dtype: datetime64[ns] 

在 frames 上支援 min, max 和相應的 idxmin, idxmax 操作:

In [45]: A = s - pd.Timestamp("20120101") - pd.Timedelta("00:05:05")

In [46]: B = s - pd.Series(pd.date_range("2012-1-2", periods=3, freq="D"))

In [47]: df = pd.DataFrame({"A": A, "B": B})

In [48]: df
Out[48]: 
 A       B
0 -1 days +23:54:55 -1 days
1   0 days 23:54:55 -1 days
2   1 days 23:54:55 -1 days

In [49]: df.min()
Out[49]: 
A   -1 days +23:54:55
B   -1 days +00:00:00
dtype: timedelta64[ns]

In [50]: df.min(axis=1)
Out[50]: 
0   -1 days
1   -1 days
2   -1 days
dtype: timedelta64[ns]

In [51]: df.idxmin()
Out[51]: 
A    0
B    0
dtype: int64

In [52]: df.idxmax()
Out[52]: 
A    2
B    0
dtype: int64 

min, max, idxmin, idxmax 操作也支援在 Series 上。標量結果將是一個 Timedelta

In [53]: df.min().max()
Out[53]: Timedelta('-1 days +23:54:55')

In [54]: df.min(axis=1).min()
Out[54]: Timedelta('-1 days +00:00:00')

In [55]: df.min().idxmax()
Out[55]: 'A'

In [56]: df.min(axis=1).idxmin()
Out[56]: 0 

您可以在 timedeltas 上使用 fillna,傳遞一個 timedelta 以獲取特定值。

In [57]: y.fillna(pd.Timedelta(0))
Out[57]: 
0   0 days
1   0 days
2   1 days
dtype: timedelta64[ns]

In [58]: y.fillna(pd.Timedelta(10, unit="s"))
Out[58]: 
0   0 days 00:00:10
1   0 days 00:00:10
2   1 days 00:00:00
dtype: timedelta64[ns]

In [59]: y.fillna(pd.Timedelta("-1 days, 00:00:05"))
Out[59]: 
0   -1 days +00:00:05
1   -1 days +00:00:05
2     1 days 00:00:00
dtype: timedelta64[ns] 

您還可以對 Timedeltas 進行取反、乘法和使用 abs

In [60]: td1 = pd.Timedelta("-1 days 2 hours 3 seconds")

In [61]: td1
Out[61]: Timedelta('-2 days +21:59:57')

In [62]: -1 * td1
Out[62]: Timedelta('1 days 02:00:03')

In [63]: -td1
Out[63]: Timedelta('1 days 02:00:03')

In [64]: abs(td1)
Out[64]: Timedelta('1 days 02:00:03') 

縮減

對於 timedelta64[ns] 的數值縮減操作將返回 Timedelta 物件。通常在評估過程中跳過 NaT

In [65]: y2 = pd.Series(
 ....:    pd.to_timedelta(["-1 days +00:00:05", "nat", "-1 days +00:00:05", "1 days"])
 ....: )
 ....: 

In [66]: y2
Out[66]: 
0   -1 days +00:00:05
1                 NaT
2   -1 days +00:00:05
3     1 days 00:00:00
dtype: timedelta64[ns]

In [67]: y2.mean()
Out[67]: Timedelta('-1 days +16:00:03.333333334')

In [68]: y2.median()
Out[68]: Timedelta('-1 days +00:00:05')

In [69]: y2.quantile(0.1)
Out[69]: Timedelta('-1 days +00:00:05')

In [70]: y2.sum()
Out[70]: Timedelta('-1 days +00:00:10') 

頻率轉換

Timedelta Series 和 TimedeltaIndex,以及 Timedelta 可以透過轉換為特定的 timedelta dtype 轉換為其他頻率。

In [71]: december = pd.Series(pd.date_range("20121201", periods=4))

In [72]: january = pd.Series(pd.date_range("20130101", periods=4))

In [73]: td = january - december

In [74]: td[2] += datetime.timedelta(minutes=5, seconds=3)

In [75]: td[3] = np.nan

In [76]: td
Out[76]: 
0   31 days 00:00:00
1   31 days 00:00:00
2   31 days 00:05:03
3                NaT
dtype: timedelta64[ns]

# to seconds
In [77]: td.astype("timedelta64[s]")
Out[77]: 
0   31 days 00:00:00
1   31 days 00:00:00
2   31 days 00:05:03
3                NaT
dtype: timedelta64[s] 

對於不支援的“s”、“ms”、“us”、“ns” 的 timedelta64 解析度,另一種方法是除以另一個 timedelta 物件。請注意,除以 NumPy 標量是真除法,而 astyping 相當於地板除法。

# to days
In [78]: td / np.timedelta64(1, "D")
Out[78]: 
0    31.000000
1    31.000000
2    31.003507
3          NaN
dtype: float64 

timedelta64[ns] Series 除以整數或整數 Series,或乘以整數,將產生另一個 timedelta64[ns] dtypes Series。

In [79]: td * -1
Out[79]: 
0   -31 days +00:00:00
1   -31 days +00:00:00
2   -32 days +23:54:57
3                  NaT
dtype: timedelta64[ns]

In [80]: td * pd.Series([1, 2, 3, 4])
Out[80]: 
0   31 days 00:00:00
1   62 days 00:00:00
2   93 days 00:15:09
3                NaT
dtype: timedelta64[ns] 

timedelta64[ns] Series 透過標量 Timedelta 進行四捨五入的除法運算將得到一個整數 Series。

In [81]: td // pd.Timedelta(days=3, hours=4)
Out[81]: 
0    9.0
1    9.0
2    9.0
3    NaN
dtype: float64

In [82]: pd.Timedelta(days=3, hours=4) // td
Out[82]: 
0    0.0
1    0.0
2    0.0
3    NaN
dtype: float64 

當與另一個類似 timedelta 或數值引數進行操作時,Timedelta 定義了 mod (%) 和 divmod 操作。

In [83]: pd.Timedelta(hours=37) % datetime.timedelta(hours=2)
Out[83]: Timedelta('0 days 01:00:00')

# divmod against a timedelta-like returns a pair (int, Timedelta)
In [84]: divmod(datetime.timedelta(hours=2), pd.Timedelta(minutes=11))
Out[84]: (10, Timedelta('0 days 00:10:00'))

# divmod against a numeric returns a pair (Timedelta, Timedelta)
In [85]: divmod(pd.Timedelta(hours=25), 86400000000000)
Out[85]: (Timedelta('0 days 00:00:00.000000001'), Timedelta('0 days 01:00:00')) 

屬性

您可以直接使用屬性 days,seconds,microseconds,nanoseconds 訪問 TimedeltaTimedeltaIndex 的各個元件。這些與 datetime.timedelta 返回的值相同,例如,.seconds 屬性表示大於等於 0 且小於 1 天的秒數。這些根據 Timedelta 是否有符號而有符號。

這些操作也可以透過 Series.dt 屬性直接訪問。

注意

注意,屬性不是 Timedelta 的顯示值。使用 .components 檢索顯示值。

對於一個 Series

In [86]: td.dt.days
Out[86]: 
0    31.0
1    31.0
2    31.0
3     NaN
dtype: float64

In [87]: td.dt.seconds
Out[87]: 
0      0.0
1      0.0
2    303.0
3      NaN
dtype: float64 

您可以直接訪問標量 Timedelta 的欄位值。

In [88]: tds = pd.Timedelta("31 days 5 min 3 sec")

In [89]: tds.days
Out[89]: 31

In [90]: tds.seconds
Out[90]: 303

In [91]: (-tds).seconds
Out[91]: 86097 

您可以使用 .components 屬性訪問時間增量的縮減形式。這將返回一個類似於 Series 的索引的 DataFrame。這些是 Timedelta顯示值。

In [92]: td.dt.components
Out[92]: 
 days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0  31.0    0.0      0.0      0.0           0.0           0.0          0.0
1  31.0    0.0      0.0      0.0           0.0           0.0          0.0
2  31.0    0.0      5.0      3.0           0.0           0.0          0.0
3   NaN    NaN      NaN      NaN           NaN           NaN          NaN

In [93]: td.dt.components.seconds
Out[93]: 
0    0.0
1    0.0
2    3.0
3    NaN
Name: seconds, dtype: float64 

您可以使用 .isoformat 方法將 Timedelta 轉換為 ISO 8601 Duration 字串。

In [94]: pd.Timedelta(
 ....:    days=6, minutes=50, seconds=3, milliseconds=10, microseconds=10, nanoseconds=12
 ....: ).isoformat()
 ....: 
Out[94]: 'P6DT0H50M3.010010012S' 

TimedeltaIndex

要生成具有時間增量的索引,您可以使用TimedeltaIndextimedelta_range()建構函式。

使用TimedeltaIndex,您可以傳遞類似字串的、Timedeltatimedeltanp.timedelta64物件。傳遞np.nan/pd.NaT/nat將表示缺失值。

In [95]: pd.TimedeltaIndex(
 ....:    [
 ....:        "1 days",
 ....:        "1 days, 00:00:05",
 ....:        np.timedelta64(2, "D"),
 ....:        datetime.timedelta(days=2, seconds=2),
 ....:    ]
 ....: )
 ....: 
Out[95]: 
TimedeltaIndex(['1 days 00:00:00', '1 days 00:00:05', '2 days 00:00:00',
 '2 days 00:00:02'],
 dtype='timedelta64[ns]', freq=None) 

字串‘infer’可以傳遞以設定索引的頻率為建立時推斷的頻率:

In [96]: pd.TimedeltaIndex(["0 days", "10 days", "20 days"], freq="infer")
Out[96]: TimedeltaIndex(['0 days', '10 days', '20 days'], dtype='timedelta64[ns]', freq='10D') 

生成時間增量範圍

類似於date_range(),您可以使用timedelta_range()構建TimedeltaIndex的常規範圍。timedelta_range的預設頻率是日曆日:

In [97]: pd.timedelta_range(start="1 days", periods=5)
Out[97]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D') 

可以使用timedelta_range的各種startendperiods組合:

In [98]: pd.timedelta_range(start="1 days", end="5 days")
Out[98]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D')

In [99]: pd.timedelta_range(end="10 days", periods=4)
Out[99]: TimedeltaIndex(['7 days', '8 days', '9 days', '10 days'], dtype='timedelta64[ns]', freq='D') 

freq引數可以傳遞各種頻率別名:

In [100]: pd.timedelta_range(start="1 days", end="2 days", freq="30min")
Out[100]: 
TimedeltaIndex(['1 days 00:00:00', '1 days 00:30:00', '1 days 01:00:00',
 '1 days 01:30:00', '1 days 02:00:00', '1 days 02:30:00',
 '1 days 03:00:00', '1 days 03:30:00', '1 days 04:00:00',
 '1 days 04:30:00', '1 days 05:00:00', '1 days 05:30:00',
 '1 days 06:00:00', '1 days 06:30:00', '1 days 07:00:00',
 '1 days 07:30:00', '1 days 08:00:00', '1 days 08:30:00',
 '1 days 09:00:00', '1 days 09:30:00', '1 days 10:00:00',
 '1 days 10:30:00', '1 days 11:00:00', '1 days 11:30:00',
 '1 days 12:00:00', '1 days 12:30:00', '1 days 13:00:00',
 '1 days 13:30:00', '1 days 14:00:00', '1 days 14:30:00',
 '1 days 15:00:00', '1 days 15:30:00', '1 days 16:00:00',
 '1 days 16:30:00', '1 days 17:00:00', '1 days 17:30:00',
 '1 days 18:00:00', '1 days 18:30:00', '1 days 19:00:00',
 '1 days 19:30:00', '1 days 20:00:00', '1 days 20:30:00',
 '1 days 21:00:00', '1 days 21:30:00', '1 days 22:00:00',
 '1 days 22:30:00', '1 days 23:00:00', '1 days 23:30:00',
 '2 days 00:00:00'],
 dtype='timedelta64[ns]', freq='30min')

In [101]: pd.timedelta_range(start="1 days", periods=5, freq="2D5h")
Out[101]: 
TimedeltaIndex(['1 days 00:00:00', '3 days 05:00:00', '5 days 10:00:00',
 '7 days 15:00:00', '9 days 20:00:00'],
 dtype='timedelta64[ns]', freq='53h') 

指定startendperiods將生成從startend的一系列均勻間隔的時間增量,包括startend,結果為TimedeltaIndex中的periods個元素:

In [102]: pd.timedelta_range("0 days", "4 days", periods=5)
Out[102]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)

In [103]: pd.timedelta_range("0 days", "4 days", periods=10)
Out[103]: 
TimedeltaIndex(['0 days 00:00:00', '0 days 10:40:00', '0 days 21:20:00',
 '1 days 08:00:00', '1 days 18:40:00', '2 days 05:20:00',
 '2 days 16:00:00', '3 days 02:40:00', '3 days 13:20:00',
 '4 days 00:00:00'],
 dtype='timedelta64[ns]', freq=None) 

使用TimedeltaIndex

與其他類似日期時間索引,如DatetimeIndexPeriodIndex,一樣,您可以將TimedeltaIndex用作 pandas 物件的索引。

In [104]: s = pd.Series(
 .....:    np.arange(100),
 .....:    index=pd.timedelta_range("1 days", periods=100, freq="h"),
 .....: )
 .....: 

In [105]: s
Out[105]: 
1 days 00:00:00     0
1 days 01:00:00     1
1 days 02:00:00     2
1 days 03:00:00     3
1 days 04:00:00     4
 ..
4 days 23:00:00    95
5 days 00:00:00    96
5 days 01:00:00    97
5 days 02:00:00    98
5 days 03:00:00    99
Freq: h, Length: 100, dtype: int64 

選擇工作方式類似,對於類似字串和切片的強制轉換:

In [106]: s["1 day":"2 day"]
Out[106]: 
1 days 00:00:00     0
1 days 01:00:00     1
1 days 02:00:00     2
1 days 03:00:00     3
1 days 04:00:00     4
 ..
2 days 19:00:00    43
2 days 20:00:00    44
2 days 21:00:00    45
2 days 22:00:00    46
2 days 23:00:00    47
Freq: h, Length: 48, dtype: int64

In [107]: s["1 day 01:00:00"]
Out[107]: 1

In [108]: s[pd.Timedelta("1 day 1h")]
Out[108]: 1 

此外,您可以使用部分字串選擇,範圍將被推斷:

In [109]: s["1 day":"1 day 5 hours"]
Out[109]: 
1 days 00:00:00    0
1 days 01:00:00    1
1 days 02:00:00    2
1 days 03:00:00    3
1 days 04:00:00    4
1 days 05:00:00    5
Freq: h, dtype: int64 

操作

最後,TimedeltaIndexDatetimeIndex的組合允許進行某些保留 NaT 的組合操作:

In [110]: tdi = pd.TimedeltaIndex(["1 days", pd.NaT, "2 days"])

In [111]: tdi.to_list()
Out[111]: [Timedelta('1 days 00:00:00'), NaT, Timedelta('2 days 00:00:00')]

In [112]: dti = pd.date_range("20130101", periods=3)

In [113]: dti.to_list()
Out[113]: 
[Timestamp('2013-01-01 00:00:00'),
 Timestamp('2013-01-02 00:00:00'),
 Timestamp('2013-01-03 00:00:00')]

In [114]: (dti + tdi).to_list()
Out[114]: [Timestamp('2013-01-02 00:00:00'), NaT, Timestamp('2013-01-05 00:00:00')]

In [115]: (dti - tdi).to_list()
Out[115]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2013-01-01 00:00:00')] 

轉換

與上面Series上的頻率轉換類似,您可以將這些索引轉換為另一個索引。

In [116]: tdi / np.timedelta64(1, "s")
Out[116]: Index([86400.0, nan, 172800.0], dtype='float64')

In [117]: tdi.astype("timedelta64[s]")
Out[117]: TimedeltaIndex(['1 days', NaT, '2 days'], dtype='timedelta64[s]', freq=None) 

標量型別操作也有效。這些可能返回不同型別的索引。

# adding or timedelta and date -> datelike
In [118]: tdi + pd.Timestamp("20130101")
Out[118]: DatetimeIndex(['2013-01-02', 'NaT', '2013-01-03'], dtype='datetime64[ns]', freq=None)

# subtraction of a date and a timedelta -> datelike
# note that trying to subtract a date from a Timedelta will raise an exception
In [119]: (pd.Timestamp("20130101") - tdi).to_list()
Out[119]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2012-12-30 00:00:00')]

# timedelta + timedelta -> timedelta
In [120]: tdi + pd.Timedelta("10 days")
Out[120]: TimedeltaIndex(['11 days', NaT, '12 days'], dtype='timedelta64[ns]', freq=None)

# division can result in a Timedelta if the divisor is an integer
In [121]: tdi / 2
Out[121]: TimedeltaIndex(['0 days 12:00:00', NaT, '1 days 00:00:00'], dtype='timedelta64[ns]', freq=None)

# or a float64 Index if the divisor is a Timedelta
In [122]: tdi / tdi[0]
Out[122]: Index([1.0, nan, 2.0], dtype='float64') 

生成時間增量範圍

類似於date_range(),您可以使用timedelta_range()構建TimedeltaIndex的常規範圍。timedelta_range的預設頻率是日曆日:

In [97]: pd.timedelta_range(start="1 days", periods=5)
Out[97]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D') 

可以使用timedelta_range的各種startendperiods組合:

In [98]: pd.timedelta_range(start="1 days", end="5 days")
Out[98]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D')

In [99]: pd.timedelta_range(end="10 days", periods=4)
Out[99]: TimedeltaIndex(['7 days', '8 days', '9 days', '10 days'], dtype='timedelta64[ns]', freq='D') 

freq引數可以傳遞各種頻率別名:

In [100]: pd.timedelta_range(start="1 days", end="2 days", freq="30min")
Out[100]: 
TimedeltaIndex(['1 days 00:00:00', '1 days 00:30:00', '1 days 01:00:00',
 '1 days 01:30:00', '1 days 02:00:00', '1 days 02:30:00',
 '1 days 03:00:00', '1 days 03:30:00', '1 days 04:00:00',
 '1 days 04:30:00', '1 days 05:00:00', '1 days 05:30:00',
 '1 days 06:00:00', '1 days 06:30:00', '1 days 07:00:00',
 '1 days 07:30:00', '1 days 08:00:00', '1 days 08:30:00',
 '1 days 09:00:00', '1 days 09:30:00', '1 days 10:00:00',
 '1 days 10:30:00', '1 days 11:00:00', '1 days 11:30:00',
 '1 days 12:00:00', '1 days 12:30:00', '1 days 13:00:00',
 '1 days 13:30:00', '1 days 14:00:00', '1 days 14:30:00',
 '1 days 15:00:00', '1 days 15:30:00', '1 days 16:00:00',
 '1 days 16:30:00', '1 days 17:00:00', '1 days 17:30:00',
 '1 days 18:00:00', '1 days 18:30:00', '1 days 19:00:00',
 '1 days 19:30:00', '1 days 20:00:00', '1 days 20:30:00',
 '1 days 21:00:00', '1 days 21:30:00', '1 days 22:00:00',
 '1 days 22:30:00', '1 days 23:00:00', '1 days 23:30:00',
 '2 days 00:00:00'],
 dtype='timedelta64[ns]', freq='30min')

In [101]: pd.timedelta_range(start="1 days", periods=5, freq="2D5h")
Out[101]: 
TimedeltaIndex(['1 days 00:00:00', '3 days 05:00:00', '5 days 10:00:00',
 '7 days 15:00:00', '9 days 20:00:00'],
 dtype='timedelta64[ns]', freq='53h') 

指定startendperiods將生成從startend的一系列均勻間隔的時間增量,包括startend,結果為TimedeltaIndex中的periods個元素:

In [102]: pd.timedelta_range("0 days", "4 days", periods=5)
Out[102]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)

In [103]: pd.timedelta_range("0 days", "4 days", periods=10)
Out[103]: 
TimedeltaIndex(['0 days 00:00:00', '0 days 10:40:00', '0 days 21:20:00',
 '1 days 08:00:00', '1 days 18:40:00', '2 days 05:20:00',
 '2 days 16:00:00', '3 days 02:40:00', '3 days 13:20:00',
 '4 days 00:00:00'],
 dtype='timedelta64[ns]', freq=None) 

使用TimedeltaIndex

與其他類似日期時間索引,如DatetimeIndexPeriodIndex,一樣,您可以將TimedeltaIndex用作 pandas 物件的索引。

In [104]: s = pd.Series(
 .....:    np.arange(100),
 .....:    index=pd.timedelta_range("1 days", periods=100, freq="h"),
 .....: )
 .....: 

In [105]: s
Out[105]: 
1 days 00:00:00     0
1 days 01:00:00     1
1 days 02:00:00     2
1 days 03:00:00     3
1 days 04:00:00     4
 ..
4 days 23:00:00    95
5 days 00:00:00    96
5 days 01:00:00    97
5 days 02:00:00    98
5 days 03:00:00    99
Freq: h, Length: 100, dtype: int64 

選擇操作類似,對於類似字串和切片的情況會進行強制轉換:

In [106]: s["1 day":"2 day"]
Out[106]: 
1 days 00:00:00     0
1 days 01:00:00     1
1 days 02:00:00     2
1 days 03:00:00     3
1 days 04:00:00     4
 ..
2 days 19:00:00    43
2 days 20:00:00    44
2 days 21:00:00    45
2 days 22:00:00    46
2 days 23:00:00    47
Freq: h, Length: 48, dtype: int64

In [107]: s["1 day 01:00:00"]
Out[107]: 1

In [108]: s[pd.Timedelta("1 day 1h")]
Out[108]: 1 

此外,您可以使用部分字串選擇,範圍將被推斷:

In [109]: s["1 day":"1 day 5 hours"]
Out[109]: 
1 days 00:00:00    0
1 days 01:00:00    1
1 days 02:00:00    2
1 days 03:00:00    3
1 days 04:00:00    4
1 days 05:00:00    5
Freq: h, dtype: int64 

操作

最後,TimedeltaIndexDatetimeIndex 的組合允許進行某些保留 NaT 的組合操作:

In [110]: tdi = pd.TimedeltaIndex(["1 days", pd.NaT, "2 days"])

In [111]: tdi.to_list()
Out[111]: [Timedelta('1 days 00:00:00'), NaT, Timedelta('2 days 00:00:00')]

In [112]: dti = pd.date_range("20130101", periods=3)

In [113]: dti.to_list()
Out[113]: 
[Timestamp('2013-01-01 00:00:00'),
 Timestamp('2013-01-02 00:00:00'),
 Timestamp('2013-01-03 00:00:00')]

In [114]: (dti + tdi).to_list()
Out[114]: [Timestamp('2013-01-02 00:00:00'), NaT, Timestamp('2013-01-05 00:00:00')]

In [115]: (dti - tdi).to_list()
Out[115]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2013-01-01 00:00:00')] 

轉換

類似於上面 Series 上的頻率轉換,您可以將這些索引轉換為另一個索引。

In [116]: tdi / np.timedelta64(1, "s")
Out[116]: Index([86400.0, nan, 172800.0], dtype='float64')

In [117]: tdi.astype("timedelta64[s]")
Out[117]: TimedeltaIndex(['1 days', NaT, '2 days'], dtype='timedelta64[s]', freq=None) 

標量型別的操作也可以正常工作。這些操作可能返回一個不同型別的索引。

# adding or timedelta and date -> datelike
In [118]: tdi + pd.Timestamp("20130101")
Out[118]: DatetimeIndex(['2013-01-02', 'NaT', '2013-01-03'], dtype='datetime64[ns]', freq=None)

# subtraction of a date and a timedelta -> datelike
# note that trying to subtract a date from a Timedelta will raise an exception
In [119]: (pd.Timestamp("20130101") - tdi).to_list()
Out[119]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2012-12-30 00:00:00')]

# timedelta + timedelta -> timedelta
In [120]: tdi + pd.Timedelta("10 days")
Out[120]: TimedeltaIndex(['11 days', NaT, '12 days'], dtype='timedelta64[ns]', freq=None)

# division can result in a Timedelta if the divisor is an integer
In [121]: tdi / 2
Out[121]: TimedeltaIndex(['0 days 12:00:00', NaT, '1 days 00:00:00'], dtype='timedelta64[ns]', freq=None)

# or a float64 Index if the divisor is a Timedelta
In [122]: tdi / tdi[0]
Out[122]: Index([1.0, nan, 2.0], dtype='float64') 

重取樣

類似於時間序列重取樣,我們可以使用 TimedeltaIndex 進行重取樣。

In [123]: s.resample("D").mean()
Out[123]: 
1 days    11.5
2 days    35.5
3 days    59.5
4 days    83.5
5 days    97.5
Freq: D, dtype: float64 

選項和設定

原文:pandas.pydata.org/docs/user_guide/options.html

概覽

pandas 有一個選項 API,可以配置和自定義與 DataFrame 顯示、資料行為等全域性行為相關的行為。

選項具有完整的“點樣式”、不區分大小寫的名稱(例如 display.max_rows)。您可以直接將選項設定/獲取為頂級 options 屬性的屬性:

In [1]: import pandas as pd

In [2]: pd.options.display.max_rows
Out[2]: 15

In [3]: pd.options.display.max_rows = 999

In [4]: pd.options.display.max_rows
Out[4]: 999 

API 由 5 個相關函式組成,可直接從 pandas 名稱空間中使用:

  • get_option() / set_option() - 獲取/設定單個選項的值。

  • reset_option() - 將一個或多個選項重置為它們的預設值。

  • describe_option() - 列印一個或多個選項的描述。

  • option_context() - 在執行後恢復到先前設定的一組選項的程式碼塊。

注意

開發者可以檢視 pandas/core/config_init.py 獲取更多資訊。

上述所有函式都接受正規表示式模式(re.search 樣式)作為引數,以匹配一個明確的子字串:

In [5]: pd.get_option("display.chop_threshold")

In [6]: pd.set_option("display.chop_threshold", 2)

In [7]: pd.get_option("display.chop_threshold")
Out[7]: 2

In [8]: pd.set_option("chop", 4)

In [9]: pd.get_option("display.chop_threshold")
Out[9]: 4 

以下內容 不會生效,因為它匹配多個選項名稱,例如 display.max_colwidthdisplay.max_rowsdisplay.max_columns

In [10]: pd.get_option("max")
---------------------------------------------------------------------------
OptionError  Traceback (most recent call last)
Cell In[10], line 1
----> 1 pd.get_option("max")

File ~/work/pandas/pandas/pandas/_config/config.py:274, in CallableDynamicDoc.__call__(self, *args, **kwds)
  273 def __call__(self, *args, **kwds) -> T:
--> 274     return self.__func__(*args, **kwds)

File ~/work/pandas/pandas/pandas/_config/config.py:146, in _get_option(pat, silent)
  145 def _get_option(pat: str, silent: bool = False) -> Any:
--> 146     key = _get_single_key(pat, silent)
  148     # walk the nested dict
  149     root, k = _get_root(key)

File ~/work/pandas/pandas/pandas/_config/config.py:134, in _get_single_key(pat, silent)
  132     raise OptionError(f"No such keys(s): {repr(pat)}")
  133 if len(keys) > 1:
--> 134     raise OptionError("Pattern matched multiple keys")
  135 key = keys[0]
  137 if not silent:

OptionError: Pattern matched multiple keys 

警告

如果將此形式的速記用法用於未來版本中新增了類似名稱的新選項,可能會導致您的程式碼中斷。

可用選項

您可以使用 describe_option() 獲取可用選項及其描述的列表。當沒有引數呼叫 describe_option() 時,將列印出所有可用選項的描述。

In [11]: pd.describe_option()
compute.use_bottleneck : bool
 Use the bottleneck library to accelerate if it is installed,
 the default is True
 Valid values: False,True
 [default: True] [currently: True]
compute.use_numba : bool
 Use the numba engine option for select operations if it is installed,
 the default is False
 Valid values: False,True
 [default: False] [currently: False]
compute.use_numexpr : bool
 Use the numexpr library to accelerate computation if it is installed,
 the default is True
 Valid values: False,True
 [default: True] [currently: True]
display.chop_threshold : float or None
 if set to a float value, all float values smaller than the given threshold
 will be displayed as exactly 0 by repr and friends.
 [default: None] [currently: None]
display.colheader_justify : 'left'/'right'
 Controls the justification of column headers. used by DataFrameFormatter.
 [default: right] [currently: right]
display.date_dayfirst : boolean
 When True, prints and parses dates with the day first, eg 20/01/2005
 [default: False] [currently: False]
display.date_yearfirst : boolean
 When True, prints and parses dates with the year first, eg 2005/01/20
 [default: False] [currently: False]
display.encoding : str/unicode
 Defaults to the detected encoding of the console.
 Specifies the encoding to be used for strings returned by to_string,
 these are generally strings meant to be displayed on the console.
 [default: utf-8] [currently: utf8]
display.expand_frame_repr : boolean
 Whether to print out the full DataFrame repr for wide DataFrames across
 multiple lines, `max_columns` is still respected, but the output will
 wrap-around across multiple "pages" if its width exceeds `display.width`.
 [default: True] [currently: True]
display.float_format : callable
 The callable should accept a floating point number and return
 a string with the desired format of the number. This is used
 in some places like SeriesFormatter.
 See formats.format.EngFormatter for an example.
 [default: None] [currently: None]
display.html.border : int
 A ``border=value`` attribute is inserted in the ``<table>`` tag
 for the DataFrame HTML repr.
 [default: 1] [currently: 1]
display.html.table_schema : boolean
 Whether to publish a Table Schema representation for frontends
 that support it.
 (default: False)
 [default: False] [currently: False]
display.html.use_mathjax : boolean
 When True, Jupyter notebook will process table contents using MathJax,
 rendering mathematical expressions enclosed by the dollar symbol.
 (default: True)
 [default: True] [currently: True]
display.large_repr : 'truncate'/'info'
 For DataFrames exceeding max_rows/max_cols, the repr (and HTML repr) can
 show a truncated table, or switch to the view from
 df.info() (the behaviour in earlier versions of pandas).
 [default: truncate] [currently: truncate]
display.max_categories : int
 This sets the maximum number of categories pandas should output when
 printing out a `Categorical` or a Series of dtype "category".
 [default: 8] [currently: 8]
display.max_columns : int
 If max_cols is exceeded, switch to truncate view. Depending on
 `large_repr`, objects are either centrally truncated or printed as
 a summary view. 'None' value means unlimited.

 In case python/IPython is running in a terminal and `large_repr`
 equals 'truncate' this can be set to 0 or None and pandas will auto-detect
 the width of the terminal and print a truncated object which fits
 the screen width. The IPython notebook, IPython qtconsole, or IDLE
 do not run in a terminal and hence it is not possible to do
 correct auto-detection and defaults to 20.
 [default: 0] [currently: 0]
display.max_colwidth : int or None
 The maximum width in characters of a column in the repr of
 a pandas data structure. When the column overflows, a "..."
 placeholder is embedded in the output. A 'None' value means unlimited.
 [default: 50] [currently: 50]
display.max_dir_items : int
 The number of items that will be added to `dir(...)`. 'None' value means
 unlimited. Because dir is cached, changing this option will not immediately
 affect already existing dataframes until a column is deleted or added.

 This is for instance used to suggest columns from a dataframe to tab
 completion.
 [default: 100] [currently: 100]
display.max_info_columns : int
 max_info_columns is used in DataFrame.info method to decide if
 per column information will be printed.
 [default: 100] [currently: 100]
display.max_info_rows : int
 df.info() will usually show null-counts for each column.
 For large frames this can be quite slow. max_info_rows and max_info_cols
 limit this null check only to frames with smaller dimensions than
 specified.
 [default: 1690785] [currently: 1690785]
display.max_rows : int
 If max_rows is exceeded, switch to truncate view. Depending on
 `large_repr`, objects are either centrally truncated or printed as
 a summary view. 'None' value means unlimited.

 In case python/IPython is running in a terminal and `large_repr`
 equals 'truncate' this can be set to 0 and pandas will auto-detect
 the height of the terminal and print a truncated object which fits
 the screen height. The IPython notebook, IPython qtconsole, or
 IDLE do not run in a terminal and hence it is not possible to do
 correct auto-detection.
 [default: 60] [currently: 60]
display.max_seq_items : int or None
 When pretty-printing a long sequence, no more then `max_seq_items`
 will be printed. If items are omitted, they will be denoted by the
 addition of "..." to the resulting string.

 If set to None, the number of items to be printed is unlimited.
 [default: 100] [currently: 100]
display.memory_usage : bool, string or None
 This specifies if the memory usage of a DataFrame should be displayed when
 df.info() is called. Valid values True,False,'deep'
 [default: True] [currently: True]
display.min_rows : int
 The numbers of rows to show in a truncated view (when `max_rows` is
 exceeded). Ignored when `max_rows` is set to None or 0\. When set to
 None, follows the value of `max_rows`.
 [default: 10] [currently: 10]
display.multi_sparse : boolean
 "sparsify" MultiIndex display (don't display repeated
 elements in outer levels within groups)
 [default: True] [currently: True]
display.notebook_repr_html : boolean
 When True, IPython notebook will use html representation for
 pandas objects (if it is available).
 [default: True] [currently: True]
display.pprint_nest_depth : int
 Controls the number of nested levels to process when pretty-printing
 [default: 3] [currently: 3]
display.precision : int
 Floating point output precision in terms of number of places after the
 decimal, for regular formatting as well as scientific notation. Similar
 to ``precision`` in :meth:`numpy.set_printoptions`.
 [default: 6] [currently: 6]
display.show_dimensions : boolean or 'truncate'
 Whether to print out dimensions at the end of DataFrame repr.
 If 'truncate' is specified, only print out the dimensions if the
 frame is truncated (e.g. not display all rows and/or columns)
 [default: truncate] [currently: truncate]
display.unicode.ambiguous_as_wide : boolean
 Whether to use the Unicode East Asian Width to calculate the display text
 width.
 Enabling this may affect to the performance (default: False)
 [default: False] [currently: False]
display.unicode.east_asian_width : boolean
 Whether to use the Unicode East Asian Width to calculate the display text
 width.
 Enabling this may affect to the performance (default: False)
 [default: False] [currently: False]
display.width : int
 Width of the display in characters. In case python/IPython is running in
 a terminal this can be set to None and pandas will correctly auto-detect
 the width.
 Note that the IPython notebook, IPython qtconsole, or IDLE do not run in a
 terminal and hence it is not possible to correctly detect the width.
 [default: 80] [currently: 80]
future.infer_string Whether to infer sequence of str objects as pyarrow string dtype, which will be the default in pandas 3.0 (at which point this option will be deprecated).
 [default: False] [currently: False]
future.no_silent_downcasting Whether to opt-in to the future behavior which will *not* silently downcast results from Series and DataFrame `where`, `mask`, and `clip` methods. Silent downcasting will be removed in pandas 3.0 (at which point this option will be deprecated).
 [default: False] [currently: False]
io.excel.ods.reader : string
 The default Excel reader engine for 'ods' files. Available options:
 auto, odf, calamine.
 [default: auto] [currently: auto]
io.excel.ods.writer : string
 The default Excel writer engine for 'ods' files. Available options:
 auto, odf.
 [default: auto] [currently: auto]
io.excel.xls.reader : string
 The default Excel reader engine for 'xls' files. Available options:
 auto, xlrd, calamine.
 [default: auto] [currently: auto]
io.excel.xlsb.reader : string
 The default Excel reader engine for 'xlsb' files. Available options:
 auto, pyxlsb, calamine.
 [default: auto] [currently: auto]
io.excel.xlsm.reader : string
 The default Excel reader engine for 'xlsm' files. Available options:
 auto, xlrd, openpyxl, calamine.
 [default: auto] [currently: auto]
io.excel.xlsm.writer : string
 The default Excel writer engine for 'xlsm' files. Available options:
 auto, openpyxl.
 [default: auto] [currently: auto]
io.excel.xlsx.reader : string
 The default Excel reader engine for 'xlsx' files. Available options:
 auto, xlrd, openpyxl, calamine.
 [default: auto] [currently: auto]
io.excel.xlsx.writer : string
 The default Excel writer engine for 'xlsx' files. Available options:
 auto, openpyxl, xlsxwriter.
 [default: auto] [currently: auto]
io.hdf.default_format : format
 default format writing format, if None, then
 put will default to 'fixed' and append will default to 'table'
 [default: None] [currently: None]
io.hdf.dropna_table : boolean
 drop ALL nan rows when appending to a table
 [default: False] [currently: False]
io.parquet.engine : string
 The default parquet reader/writer engine. Available options:
 'auto', 'pyarrow', 'fastparquet', the default is 'auto'
 [default: auto] [currently: auto]
io.sql.engine : string
 The default sql reader/writer engine. Available options:
 'auto', 'sqlalchemy', the default is 'auto'
 [default: auto] [currently: auto]
mode.chained_assignment : string
 Raise an exception, warn, or no action if trying to use chained assignment,
 The default is warn
 [default: warn] [currently: warn]
mode.copy_on_write : bool
 Use new copy-view behaviour using Copy-on-Write. Defaults to False,
 unless overridden by the 'PANDAS_COPY_ON_WRITE' environment variable
 (if set to "1" for True, needs to be set before pandas is imported).
 [default: False] [currently: False]
mode.data_manager : string
 Internal data manager type; can be "block" or "array". Defaults to "block",
 unless overridden by the 'PANDAS_DATA_MANAGER' environment variable (needs
 to be set before pandas is imported).
 [default: block] [currently: block]
 (Deprecated, use `` instead.)
mode.sim_interactive : boolean
 Whether to simulate interactive mode for purposes of testing
 [default: False] [currently: False]
mode.string_storage : string
 The default storage for StringDtype. This option is ignored if
 ``future.infer_string`` is set to True.
 [default: python] [currently: python]
mode.use_inf_as_na : boolean
 True means treat None, NaN, INF, -INF as NA (old way),
 False means None and NaN are null, but INF, -INF are not NA
 (new way).

 This option is deprecated in pandas 2.1.0 and will be removed in 3.0.
 [default: False] [currently: False]
 (Deprecated, use `` instead.)
plotting.backend : str
 The plotting backend to use. The default value is "matplotlib", the
 backend provided with pandas. Other backends can be specified by
 providing the name of the module that implements the backend.
 [default: matplotlib] [currently: matplotlib]
plotting.matplotlib.register_converters : bool or 'auto'.
 Whether to register converters with matplotlib's units registry for
 dates, times, datetimes, and Periods. Toggling to False will remove
 the converters, restoring any converters that pandas overwrote.
 [default: auto] [currently: auto]
styler.format.decimal : str
 The character representation for the decimal separator for floats and complex.
 [default: .] [currently: .]
styler.format.escape : str, optional
 Whether to escape certain characters according to the given context; html or latex.
 [default: None] [currently: None]
styler.format.formatter : str, callable, dict, optional
 A formatter object to be used as default within ``Styler.format``.
 [default: None] [currently: None]
styler.format.na_rep : str, optional
 The string representation for values identified as missing.
 [default: None] [currently: None]
styler.format.precision : int
 The precision for floats and complex numbers.
 [default: 6] [currently: 6]
styler.format.thousands : str, optional
 The character representation for thousands separator for floats, int and complex.
 [default: None] [currently: None]
styler.html.mathjax : bool
 If False will render special CSS classes to table attributes that indicate Mathjax
 will not be used in Jupyter Notebook.
 [default: True] [currently: True]
styler.latex.environment : str
 The environment to replace ``\begin{table}``. If "longtable" is used results
 in a specific longtable environment format.
 [default: None] [currently: None]
styler.latex.hrules : bool
 Whether to add horizontal rules on top and bottom and below the headers.
 [default: False] [currently: False]
styler.latex.multicol_align : {"r", "c", "l", "naive-l", "naive-r"}
 The specifier for horizontal alignment of sparsified LaTeX multicolumns. Pipe
 decorators can also be added to non-naive values to draw vertical
 rules, e.g. "\|r" will draw a rule on the left side of right aligned merged cells.
 [default: r] [currently: r]
styler.latex.multirow_align : {"c", "t", "b"}
 The specifier for vertical alignment of sparsified LaTeX multirows.
 [default: c] [currently: c]
styler.render.encoding : str
 The encoding used for output HTML and LaTeX files.
 [default: utf-8] [currently: utf-8]
styler.render.max_columns : int, optional
 The maximum number of columns that will be rendered. May still be reduced to
 satisfy ``max_elements``, which takes precedence.
 [default: None] [currently: None]
styler.render.max_elements : int
 The maximum number of data-cell (<td>) elements that will be rendered before
 trimming will occur over columns, rows or both if needed.
 [default: 262144] [currently: 262144]
styler.render.max_rows : int, optional
 The maximum number of rows that will be rendered. May still be reduced to
 satisfy ``max_elements``, which takes precedence.
 [default: None] [currently: None]
styler.render.repr : str
 Determine which output to use in Jupyter Notebook in {"html", "latex"}.
 [default: html] [currently: html]
styler.sparse.columns : bool
 Whether to sparsify the display of hierarchical columns. Setting to False will
 display each explicit level element in a hierarchical key for each column.
 [default: True] [currently: True]
styler.sparse.index : bool
 Whether to sparsify the display of a hierarchical index. Setting to False will
 display each explicit level element in a hierarchical key for each row.
 [default: True] [currently: True] 

獲取和設定選項

如上所述,get_option()set_option() 可從 pandas 名稱空間中呼叫。要更改選項,請呼叫 set_option('選項正規表示式', 新值)

In [12]: pd.get_option("mode.sim_interactive")
Out[12]: False

In [13]: pd.set_option("mode.sim_interactive", True)

In [14]: pd.get_option("mode.sim_interactive")
Out[14]: True 

注意

選項 'mode.sim_interactive' 主要用於除錯目的。

您可以使用 reset_option() 將設定恢復為預設值

In [15]: pd.get_option("display.max_rows")
Out[15]: 60

In [16]: pd.set_option("display.max_rows", 999)

In [17]: pd.get_option("display.max_rows")
Out[17]: 999

In [18]: pd.reset_option("display.max_rows")

In [19]: pd.get_option("display.max_rows")
Out[19]: 60 

還可以一次性重置多個選項(使用正規表示式):

In [20]: pd.reset_option("^display") 

option_context() 上下文管理器已透過頂級 API 暴露,允許您使用給定的選項值執行程式碼。退出with塊時,選項值會自動恢復:

In [21]: with pd.option_context("display.max_rows", 10, "display.max_columns", 5):
 ....:    print(pd.get_option("display.max_rows"))
 ....:    print(pd.get_option("display.max_columns"))
 ....: 
10
5

In [22]: print(pd.get_option("display.max_rows"))
60

In [23]: print(pd.get_option("display.max_columns"))
0 

在 Python/IPython 環境中設定啟動選項

使用 Python/IPython 環境的啟動指令碼匯入 pandas 並設定選項可以使與 pandas 的工作更高效。為此,在所需配置檔案的啟動目錄中建立一個.py.ipy指令碼。在預設 IPython 配置檔案中找到啟動資料夾的示例:

$IPYTHONDIR/profile_default/startup 

更多資訊可以在IPython 文件中找到。下面顯示了 pandas 的示例啟動指令碼:

import pandas as pd

pd.set_option("display.max_rows", 999)
pd.set_option("display.precision", 5) 

經常使用的選項

以下演示了更常用的顯示選項。

display.max_rowsdisplay.max_columns 設定在列印漂亮的幀時顯示的最大行數和列數。被截斷的行用省略號替換。

In [24]: df = pd.DataFrame(np.random.randn(7, 2))

In [25]: pd.set_option("display.max_rows", 7)

In [26]: df
Out[26]: 
 0         1
0  0.469112 -0.282863
1 -1.509059 -1.135632
2  1.212112 -0.173215
3  0.119209 -1.044236
4 -0.861849 -2.104569
5 -0.494929  1.071804
6  0.721555 -0.706771

In [27]: pd.set_option("display.max_rows", 5)

In [28]: df
Out[28]: 
 0         1
0   0.469112 -0.282863
1  -1.509059 -1.135632
..       ...       ...
5  -0.494929  1.071804
6   0.721555 -0.706771

[7 rows x 2 columns]

In [29]: pd.reset_option("display.max_rows") 

一旦超過display.max_rowsdisplay.min_rows 選項確定在截斷的 repr 中顯示多少行。

In [30]: pd.set_option("display.max_rows", 8)

In [31]: pd.set_option("display.min_rows", 4)

# below max_rows -> all rows shown
In [32]: df = pd.DataFrame(np.random.randn(7, 2))

In [33]: df
Out[33]: 
 0         1
0 -1.039575  0.271860
1 -0.424972  0.567020
2  0.276232 -1.087401
3 -0.673690  0.113648
4 -1.478427  0.524988
5  0.404705  0.577046
6 -1.715002 -1.039268

# above max_rows -> only min_rows (4) rows shown
In [34]: df = pd.DataFrame(np.random.randn(9, 2))

In [35]: df
Out[35]: 
 0         1
0  -0.370647 -1.157892
1  -1.344312  0.844885
..       ...       ...
7   0.276662 -0.472035
8  -0.013960 -0.362543

[9 rows x 2 columns]

In [36]: pd.reset_option("display.max_rows")

In [37]: pd.reset_option("display.min_rows") 

display.expand_frame_repr 允許DataFrame的表示跨越頁面,覆蓋所有列。

In [38]: df = pd.DataFrame(np.random.randn(5, 10))

In [39]: pd.set_option("expand_frame_repr", True)

In [40]: df
Out[40]: 
 0         1         2  ...         7         8         9
0 -0.006154 -0.923061  0.895717  ...  1.340309 -1.170299 -0.226169
1  0.410835  0.813850  0.132003  ... -1.436737 -1.413681  1.607920
2  1.024180  0.569605  0.875906  ... -0.078638  0.545952 -1.219217
3 -1.226825  0.769804 -1.281247  ...  0.341734  0.959726 -1.110336
4 -0.619976  0.149748 -0.732339  ...  0.301624 -2.179861 -1.369849

[5 rows x 10 columns]

In [41]: pd.set_option("expand_frame_repr", False)

In [42]: df
Out[42]: 
 0         1         2         3         4         5         6         7         8         9
0 -0.006154 -0.923061  0.895717  0.805244 -1.206412  2.565646  1.431256  1.340309 -1.170299 -0.226169
1  0.410835  0.813850  0.132003 -0.827317 -0.076467 -1.187678  1.130127 -1.436737 -1.413681  1.607920
2  1.024180  0.569605  0.875906 -2.211372  0.974466 -2.006747 -0.410001 -0.078638  0.545952 -1.219217
3 -1.226825  0.769804 -1.281247 -0.727707 -0.121306 -0.097883  0.695775  0.341734  0.959726 -1.110336
4 -0.619976  0.149748 -0.732339  0.687738  0.176444  0.403310 -0.154951  0.301624 -2.179861 -1.369849

In [43]: pd.reset_option("expand_frame_repr") 

display.large_repr 顯示超過max_columnsmax_rowsDataFrame為截斷幀或摘要。

In [44]: df = pd.DataFrame(np.random.randn(10, 10))

In [45]: pd.set_option("display.max_rows", 5)

In [46]: pd.set_option("large_repr", "truncate")

In [47]: df
Out[47]: 
 0         1         2  ...         7         8         9
0  -0.954208  1.462696 -1.743161  ...  0.995761  2.396780  0.014871
1   3.357427 -0.317441 -1.236269  ...  0.380396  0.084844  0.432390
..       ...       ...       ...  ...       ...       ...       ...
8  -0.303421 -0.858447  0.306996  ...  0.476720  0.473424 -0.242861
9  -0.014805 -0.284319  0.650776  ...  1.613616  0.464000  0.227371

[10 rows x 10 columns]

In [48]: pd.set_option("large_repr", "info")

In [49]: df
Out[49]: 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       10 non-null     float64
 1   1       10 non-null     float64
 2   2       10 non-null     float64
 3   3       10 non-null     float64
 4   4       10 non-null     float64
 5   5       10 non-null     float64
 6   6       10 non-null     float64
 7   7       10 non-null     float64
 8   8       10 non-null     float64
 9   9       10 non-null     float64
dtypes: float64(10)
memory usage: 928.0 bytes

In [50]: pd.reset_option("large_repr")

In [51]: pd.reset_option("display.max_rows") 

display.max_colwidth 設定列的最大寬度。長度超過此長度的單元格將被截斷為省略號。

In [52]: df = pd.DataFrame(
 ....:    np.array(
 ....:        [
 ....:            ["foo", "bar", "bim", "uncomfortably long string"],
 ....:            ["horse", "cow", "banana", "apple"],
 ....:        ]
 ....:    )
 ....: )
 ....: 

In [53]: pd.set_option("max_colwidth", 40)

In [54]: df
Out[54]: 
 0    1       2                          3
0    foo  bar     bim  uncomfortably long string
1  horse  cow  banana                      apple

In [55]: pd.set_option("max_colwidth", 6)

In [56]: df
Out[56]: 
 0    1      2      3
0    foo  bar    bim  un...
1  horse  cow  ba...  apple

In [57]: pd.reset_option("max_colwidth") 

display.max_info_columns 設定呼叫info()時顯示的列數閾值。

In [58]: df = pd.DataFrame(np.random.randn(10, 10))

In [59]: pd.set_option("max_info_columns", 11)

In [60]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       10 non-null     float64
 1   1       10 non-null     float64
 2   2       10 non-null     float64
 3   3       10 non-null     float64
 4   4       10 non-null     float64
 5   5       10 non-null     float64
 6   6       10 non-null     float64
 7   7       10 non-null     float64
 8   8       10 non-null     float64
 9   9       10 non-null     float64
dtypes: float64(10)
memory usage: 928.0 bytes

In [61]: pd.set_option("max_info_columns", 5)

In [62]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Columns: 10 entries, 0 to 9
dtypes: float64(10)
memory usage: 928.0 bytes

In [63]: pd.reset_option("max_info_columns") 

display.max_info_rowsinfo()通常會顯示每列的空值計數。對於大型DataFrame,這可能會相當慢。max_info_rowsmax_info_cols 分別限制了此空值檢查的行數和列數。info()的關鍵字引數show_counts=True將覆蓋此設定。

In [64]: df = pd.DataFrame(np.random.choice([0, 1, np.nan], size=(10, 10)))

In [65]: df
Out[65]: 
 0    1    2    3    4    5    6    7    8    9
0  0.0  NaN  1.0  NaN  NaN  0.0  NaN  0.0  NaN  1.0
1  1.0  NaN  1.0  1.0  1.0  1.0  NaN  0.0  0.0  NaN
2  0.0  NaN  1.0  0.0  0.0  NaN  NaN  NaN  NaN  0.0
3  NaN  NaN  NaN  0.0  1.0  1.0  NaN  1.0  NaN  1.0
4  0.0  NaN  NaN  NaN  0.0  NaN  NaN  NaN  1.0  0.0
5  0.0  1.0  1.0  1.0  1.0  0.0  NaN  NaN  1.0  0.0
6  1.0  1.0  1.0  NaN  1.0  NaN  1.0  0.0  NaN  NaN
7  0.0  0.0  1.0  0.0  1.0  0.0  1.0  1.0  0.0  NaN
8  NaN  NaN  NaN  0.0  NaN  NaN  NaN  NaN  1.0  NaN
9  0.0  NaN  0.0  NaN  NaN  0.0  NaN  1.0  1.0  0.0

In [66]: pd.set_option("max_info_rows", 11)

In [67]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       8 non-null      float64
 1   1       3 non-null      float64
 2   2       7 non-null      float64
 3   3       6 non-null      float64
 4   4       7 non-null      float64
 5   5       6 non-null      float64
 6   6       2 non-null      float64
 7   7       6 non-null      float64
 8   8       6 non-null      float64
 9   9       6 non-null      float64
dtypes: float64(10)
memory usage: 928.0 bytes

In [68]: pd.set_option("max_info_rows", 5)

In [69]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
 #   Column  Dtype 
---  ------  ----- 
 0   0       float64
 1   1       float64
 2   2       float64
 3   3       float64
 4   4       float64
 5   5       float64
 6   6       float64
 7   7       float64
 8   8       float64
 9   9       float64
dtypes: float64(10)
memory usage: 928.0 bytes

In [70]: pd.reset_option("max_info_rows") 

display.precision 設定輸出顯示精度,以小數位數表示。

In [71]: df = pd.DataFrame(np.random.randn(5, 5))

In [72]: pd.set_option("display.precision", 7)

In [73]: df
Out[73]: 
 0          1          2          3          4
0 -1.1506406 -0.7983341 -0.5576966  0.3813531  1.3371217
1 -1.5310949  1.3314582 -0.5713290 -0.0266708 -1.0856630
2 -1.1147378 -0.0582158 -0.4867681  1.6851483  0.1125723
3 -1.4953086  0.8984347 -0.1482168 -1.5960698  0.1596530
4  0.2621358  0.0362196  0.1847350 -0.2550694 -0.2710197

In [74]: pd.set_option("display.precision", 4)

In [75]: df
Out[75]: 
 0       1       2       3       4
0 -1.1506 -0.7983 -0.5577  0.3814  1.3371
1 -1.5311  1.3315 -0.5713 -0.0267 -1.0857
2 -1.1147 -0.0582 -0.4868  1.6851  0.1126
3 -1.4953  0.8984 -0.1482 -1.5961  0.1597
4  0.2621  0.0362  0.1847 -0.2551 -0.2710 

display.chop_threshold 設定顯示SeriesDataFrame時的舍入閾值為零。此設定不會改變儲存數字的精度。

In [76]: df = pd.DataFrame(np.random.randn(6, 6))

In [77]: pd.set_option("chop_threshold", 0)

In [78]: df
Out[78]: 
 0       1       2       3       4       5
0  1.2884  0.2946 -1.1658  0.8470 -0.6856  0.6091
1 -0.3040  0.6256 -0.0593  0.2497  1.1039 -1.0875
2  1.9980 -0.2445  0.1362  0.8863 -1.3507 -0.8863
3 -1.0133  1.9209 -0.3882 -2.3144  0.6655  0.4026
4  0.3996 -1.7660  0.8504  0.3881  0.9923  0.7441
5 -0.7398 -1.0549 -0.1796  0.6396  1.5850  1.9067

In [79]: pd.set_option("chop_threshold", 0.5)

In [80]: df
Out[80]: 
 0       1       2       3       4       5
0  1.2884  0.0000 -1.1658  0.8470 -0.6856  0.6091
1  0.0000  0.6256  0.0000  0.0000  1.1039 -1.0875
2  1.9980  0.0000  0.0000  0.8863 -1.3507 -0.8863
3 -1.0133  1.9209  0.0000 -2.3144  0.6655  0.0000
4  0.0000 -1.7660  0.8504  0.0000  0.9923  0.7441
5 -0.7398 -1.0549  0.0000  0.6396  1.5850  1.9067

In [81]: pd.reset_option("chop_threshold") 

display.colheader_justify控制標題的對齊方式。選項為'right''left'

In [82]: df = pd.DataFrame(
 ....:    np.array([np.random.randn(6), np.random.randint(1, 9, 6) * 0.1, np.zeros(6)]).T,
 ....:    columns=["A", "B", "C"],
 ....:    dtype="float",
 ....: )
 ....: 

In [83]: pd.set_option("colheader_justify", "right")

In [84]: df
Out[84]: 
 A    B    C
0  0.1040  0.1  0.0
1  0.1741  0.5  0.0
2 -0.4395  0.4  0.0
3 -0.7413  0.8  0.0
4 -0.0797  0.4  0.0
5 -0.9229  0.3  0.0

In [85]: pd.set_option("colheader_justify", "left")

In [86]: df
Out[86]: 
 A       B    C 
0  0.1040  0.1  0.0
1  0.1741  0.5  0.0
2 -0.4395  0.4  0.0
3 -0.7413  0.8  0.0
4 -0.0797  0.4  0.0
5 -0.9229  0.3  0.0

In [87]: pd.reset_option("colheader_justify") 
```  ## 數字格式化

pandas 還允許您設定控制數字在控制檯中的顯示方式。此選項不透過`set_options` API 設定。

使用`set_eng_float_format`函式來更改 pandas 物件的浮點格式,以產生特定的格式。

```py
In [88]: import numpy as np

In [89]: pd.set_eng_float_format(accuracy=3, use_eng_prefix=True)

In [90]: s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])

In [91]: s / 1.0e3
Out[91]: 
a    303.638u
b   -721.084u
c   -622.696u
d    648.250u
e     -1.945m
dtype: float64

In [92]: s / 1.0e6
Out[92]: 
a    303.638n
b   -721.084n
c   -622.696n
d    648.250n
e     -1.945u
dtype: float64 

使用round()來專門控制單個DataFrame的四捨五入 ## Unicode 格式化

警告

啟用此選項會影響 DataFrame 和 Series 的列印效能(大約慢 2 倍)。僅在實際需要時使用。

一些東亞國家使用 Unicode 字元,其寬度相當於兩個拉丁字元。如果 DataFrame 或 Series 包含這些字元,則預設輸出模式可能無法正確對齊它們。

In [93]: df = pd.DataFrame({"國籍": ["UK", "日本"], "名前": ["Alice", "しのぶ"]})

In [94]: df
Out[94]: 
 國籍     名前
0  UK  Alice
1  日本    しのぶ 

啟用display.unicode.east_asian_width允許 pandas 檢查每個字元的“東亞寬度”屬性。透過將此選項設定為True,可以正確對齊這些字元。但是,這將導致比標準len函式更長的渲染時間。

In [95]: pd.set_option("display.unicode.east_asian_width", True)

In [96]: df
Out[96]: 
 國籍    名前
0    UK   Alice
1  日本  しのぶ 

此外,Unicode 字元的寬度“模稜兩可”,取決於終端設定或編碼,可以是 1 或 2 個字元寬。選項display.unicode.ambiguous_as_wide可用於處理模糊性。

預設情況下,“模稜兩可”的字元寬度,例如下面的“¡”(倒置感嘆號),被認為是 1。

In [97]: df = pd.DataFrame({"a": ["xxx", "¡¡"], "b": ["yyy", "¡¡"]})

In [98]: df
Out[98]: 
 a    b
0  xxx  yyy
1   ¡¡   ¡¡ 

啟用display.unicode.ambiguous_as_wide使 pandas 將這些字元的寬度解釋為 2。(請注意,僅當啟用display.unicode.east_asian_width時,此選項才會生效。)

但是,如果錯誤地為您的終端設定此選項,這些字元將被錯誤地對齊:

In [99]: pd.set_option("display.unicode.ambiguous_as_wide", True)

In [100]: df
Out[100]: 
 a     b
0   xxx   yyy
1  ¡¡  ¡¡ 
```  ## 表模式顯示

預設情況下,`DataFrame`和`Series`將以表模式表示。可以使用`display.html.table_schema`選項在全域性範圍內啟用此功能:

```py
In [101]: pd.set_option("display.html.table_schema", True) 

只有'display.max_rows'被序列化和釋出。

概述

pandas 具有選項 API,可配置和自定義與DataFrame顯示、資料行為等相關的全域性行為。

選項具有完整的“點格式”,不區分大小寫的名稱(例如display.max_rows)。您可以直接作為頂級options屬性的屬性獲取/設定選項:

In [1]: import pandas as pd

In [2]: pd.options.display.max_rows
Out[2]: 15

In [3]: pd.options.display.max_rows = 999

In [4]: pd.options.display.max_rows
Out[4]: 999 

該 API 由 5 個相關函式組成,可直接從pandas名稱空間中獲取:

  • get_option() / set_option() - 獲取/設定單個選項的值。

  • reset_option() - 將一個或多個選項重置為其預設值。

  • describe_option() - 列印一個或多個選項的描述。

  • option_context() - 使用一組選項執行程式碼塊,在執行後恢復到先前的設定。

注意

開發人員可以檢視 pandas/core/config_init.py 獲取更多資訊。

上述所有函式都接受正規表示式模式(re.search 樣式)作為引數,以匹配一個明確的子字串:

In [5]: pd.get_option("display.chop_threshold")

In [6]: pd.set_option("display.chop_threshold", 2)

In [7]: pd.get_option("display.chop_threshold")
Out[7]: 2

In [8]: pd.set_option("chop", 4)

In [9]: pd.get_option("display.chop_threshold")
Out[9]: 4 

以下內容無效,因為它匹配多個選項名稱,例如display.max_colwidthdisplay.max_rowsdisplay.max_columns

In [10]: pd.get_option("max")
---------------------------------------------------------------------------
OptionError  Traceback (most recent call last)
Cell In[10], line 1
----> 1 pd.get_option("max")

File ~/work/pandas/pandas/pandas/_config/config.py:274, in CallableDynamicDoc.__call__(self, *args, **kwds)
  273 def __call__(self, *args, **kwds) -> T:
--> 274     return self.__func__(*args, **kwds)

File ~/work/pandas/pandas/pandas/_config/config.py:146, in _get_option(pat, silent)
  145 def _get_option(pat: str, silent: bool = False) -> Any:
--> 146     key = _get_single_key(pat, silent)
  148     # walk the nested dict
  149     root, k = _get_root(key)

File ~/work/pandas/pandas/pandas/_config/config.py:134, in _get_single_key(pat, silent)
  132     raise OptionError(f"No such keys(s): {repr(pat)}")
  133 if len(keys) > 1:
--> 134     raise OptionError("Pattern matched multiple keys")
  135 key = keys[0]
  137 if not silent:

OptionError: Pattern matched multiple keys 

警告

使用這種簡寫形式可能會導致您的程式碼在將來版本中新增類似名稱的新選項時出現問題。

可用選項

您可以使用describe_option()獲取可用選項及其描述。當不帶引數呼叫describe_option()時,將列印出所有可用選項的描述。

In [11]: pd.describe_option()
compute.use_bottleneck : bool
 Use the bottleneck library to accelerate if it is installed,
 the default is True
 Valid values: False,True
 [default: True] [currently: True]
compute.use_numba : bool
 Use the numba engine option for select operations if it is installed,
 the default is False
 Valid values: False,True
 [default: False] [currently: False]
compute.use_numexpr : bool
 Use the numexpr library to accelerate computation if it is installed,
 the default is True
 Valid values: False,True
 [default: True] [currently: True]
display.chop_threshold : float or None
 if set to a float value, all float values smaller than the given threshold
 will be displayed as exactly 0 by repr and friends.
 [default: None] [currently: None]
display.colheader_justify : 'left'/'right'
 Controls the justification of column headers. used by DataFrameFormatter.
 [default: right] [currently: right]
display.date_dayfirst : boolean
 When True, prints and parses dates with the day first, eg 20/01/2005
 [default: False] [currently: False]
display.date_yearfirst : boolean
 When True, prints and parses dates with the year first, eg 2005/01/20
 [default: False] [currently: False]
display.encoding : str/unicode
 Defaults to the detected encoding of the console.
 Specifies the encoding to be used for strings returned by to_string,
 these are generally strings meant to be displayed on the console.
 [default: utf-8] [currently: utf8]
display.expand_frame_repr : boolean
 Whether to print out the full DataFrame repr for wide DataFrames across
 multiple lines, `max_columns` is still respected, but the output will
 wrap-around across multiple "pages" if its width exceeds `display.width`.
 [default: True] [currently: True]
display.float_format : callable
 The callable should accept a floating point number and return
 a string with the desired format of the number. This is used
 in some places like SeriesFormatter.
 See formats.format.EngFormatter for an example.
 [default: None] [currently: None]
display.html.border : int
 A ``border=value`` attribute is inserted in the ``<table>`` tag
 for the DataFrame HTML repr.
 [default: 1] [currently: 1]
display.html.table_schema : boolean
 Whether to publish a Table Schema representation for frontends
 that support it.
 (default: False)
 [default: False] [currently: False]
display.html.use_mathjax : boolean
 When True, Jupyter notebook will process table contents using MathJax,
 rendering mathematical expressions enclosed by the dollar symbol.
 (default: True)
 [default: True] [currently: True]
display.large_repr : 'truncate'/'info'
 For DataFrames exceeding max_rows/max_cols, the repr (and HTML repr) can
 show a truncated table, or switch to the view from
 df.info() (the behaviour in earlier versions of pandas).
 [default: truncate] [currently: truncate]
display.max_categories : int
 This sets the maximum number of categories pandas should output when
 printing out a `Categorical` or a Series of dtype "category".
 [default: 8] [currently: 8]
display.max_columns : int
 If max_cols is exceeded, switch to truncate view. Depending on
 `large_repr`, objects are either centrally truncated or printed as
 a summary view. 'None' value means unlimited.

 In case python/IPython is running in a terminal and `large_repr`
 equals 'truncate' this can be set to 0 or None and pandas will auto-detect
 the width of the terminal and print a truncated object which fits
 the screen width. The IPython notebook, IPython qtconsole, or IDLE
 do not run in a terminal and hence it is not possible to do
 correct auto-detection and defaults to 20.
 [default: 0] [currently: 0]
display.max_colwidth : int or None
 The maximum width in characters of a column in the repr of
 a pandas data structure. When the column overflows, a "..."
 placeholder is embedded in the output. A 'None' value means unlimited.
 [default: 50] [currently: 50]
display.max_dir_items : int
 The number of items that will be added to `dir(...)`. 'None' value means
 unlimited. Because dir is cached, changing this option will not immediately
 affect already existing dataframes until a column is deleted or added.

 This is for instance used to suggest columns from a dataframe to tab
 completion.
 [default: 100] [currently: 100]
display.max_info_columns : int
 max_info_columns is used in DataFrame.info method to decide if
 per column information will be printed.
 [default: 100] [currently: 100]
display.max_info_rows : int
 df.info() will usually show null-counts for each column.
 For large frames this can be quite slow. max_info_rows and max_info_cols
 limit this null check only to frames with smaller dimensions than
 specified.
 [default: 1690785] [currently: 1690785]
display.max_rows : int
 If max_rows is exceeded, switch to truncate view. Depending on
 `large_repr`, objects are either centrally truncated or printed as
 a summary view. 'None' value means unlimited.

 In case python/IPython is running in a terminal and `large_repr`
 equals 'truncate' this can be set to 0 and pandas will auto-detect
 the height of the terminal and print a truncated object which fits
 the screen height. The IPython notebook, IPython qtconsole, or
 IDLE do not run in a terminal and hence it is not possible to do
 correct auto-detection.
 [default: 60] [currently: 60]
display.max_seq_items : int or None
 When pretty-printing a long sequence, no more then `max_seq_items`
 will be printed. If items are omitted, they will be denoted by the
 addition of "..." to the resulting string.

 If set to None, the number of items to be printed is unlimited.
 [default: 100] [currently: 100]
display.memory_usage : bool, string or None
 This specifies if the memory usage of a DataFrame should be displayed when
 df.info() is called. Valid values True,False,'deep'
 [default: True] [currently: True]
display.min_rows : int
 The numbers of rows to show in a truncated view (when `max_rows` is
 exceeded). Ignored when `max_rows` is set to None or 0\. When set to
 None, follows the value of `max_rows`.
 [default: 10] [currently: 10]
display.multi_sparse : boolean
 "sparsify" MultiIndex display (don't display repeated
 elements in outer levels within groups)
 [default: True] [currently: True]
display.notebook_repr_html : boolean
 When True, IPython notebook will use html representation for
 pandas objects (if it is available).
 [default: True] [currently: True]
display.pprint_nest_depth : int
 Controls the number of nested levels to process when pretty-printing
 [default: 3] [currently: 3]
display.precision : int
 Floating point output precision in terms of number of places after the
 decimal, for regular formatting as well as scientific notation. Similar
 to ``precision`` in :meth:`numpy.set_printoptions`.
 [default: 6] [currently: 6]
display.show_dimensions : boolean or 'truncate'
 Whether to print out dimensions at the end of DataFrame repr.
 If 'truncate' is specified, only print out the dimensions if the
 frame is truncated (e.g. not display all rows and/or columns)
 [default: truncate] [currently: truncate]
display.unicode.ambiguous_as_wide : boolean
 Whether to use the Unicode East Asian Width to calculate the display text
 width.
 Enabling this may affect to the performance (default: False)
 [default: False] [currently: False]
display.unicode.east_asian_width : boolean
 Whether to use the Unicode East Asian Width to calculate the display text
 width.
 Enabling this may affect to the performance (default: False)
 [default: False] [currently: False]
display.width : int
 Width of the display in characters. In case python/IPython is running in
 a terminal this can be set to None and pandas will correctly auto-detect
 the width.
 Note that the IPython notebook, IPython qtconsole, or IDLE do not run in a
 terminal and hence it is not possible to correctly detect the width.
 [default: 80] [currently: 80]
future.infer_string Whether to infer sequence of str objects as pyarrow string dtype, which will be the default in pandas 3.0 (at which point this option will be deprecated).
 [default: False] [currently: False]
future.no_silent_downcasting Whether to opt-in to the future behavior which will *not* silently downcast results from Series and DataFrame `where`, `mask`, and `clip` methods. Silent downcasting will be removed in pandas 3.0 (at which point this option will be deprecated).
 [default: False] [currently: False]
io.excel.ods.reader : string
 The default Excel reader engine for 'ods' files. Available options:
 auto, odf, calamine.
 [default: auto] [currently: auto]
io.excel.ods.writer : string
 The default Excel writer engine for 'ods' files. Available options:
 auto, odf.
 [default: auto] [currently: auto]
io.excel.xls.reader : string
 The default Excel reader engine for 'xls' files. Available options:
 auto, xlrd, calamine.
 [default: auto] [currently: auto]
io.excel.xlsb.reader : string
 The default Excel reader engine for 'xlsb' files. Available options:
 auto, pyxlsb, calamine.
 [default: auto] [currently: auto]
io.excel.xlsm.reader : string
 The default Excel reader engine for 'xlsm' files. Available options:
 auto, xlrd, openpyxl, calamine.
 [default: auto] [currently: auto]
io.excel.xlsm.writer : string
 The default Excel writer engine for 'xlsm' files. Available options:
 auto, openpyxl.
 [default: auto] [currently: auto]
io.excel.xlsx.reader : string
 The default Excel reader engine for 'xlsx' files. Available options:
 auto, xlrd, openpyxl, calamine.
 [default: auto] [currently: auto]
io.excel.xlsx.writer : string
 The default Excel writer engine for 'xlsx' files. Available options:
 auto, openpyxl, xlsxwriter.
 [default: auto] [currently: auto]
io.hdf.default_format : format
 default format writing format, if None, then
 put will default to 'fixed' and append will default to 'table'
 [default: None] [currently: None]
io.hdf.dropna_table : boolean
 drop ALL nan rows when appending to a table
 [default: False] [currently: False]
io.parquet.engine : string
 The default parquet reader/writer engine. Available options:
 'auto', 'pyarrow', 'fastparquet', the default is 'auto'
 [default: auto] [currently: auto]
io.sql.engine : string
 The default sql reader/writer engine. Available options:
 'auto', 'sqlalchemy', the default is 'auto'
 [default: auto] [currently: auto]
mode.chained_assignment : string
 Raise an exception, warn, or no action if trying to use chained assignment,
 The default is warn
 [default: warn] [currently: warn]
mode.copy_on_write : bool
 Use new copy-view behaviour using Copy-on-Write. Defaults to False,
 unless overridden by the 'PANDAS_COPY_ON_WRITE' environment variable
 (if set to "1" for True, needs to be set before pandas is imported).
 [default: False] [currently: False]
mode.data_manager : string
 Internal data manager type; can be "block" or "array". Defaults to "block",
 unless overridden by the 'PANDAS_DATA_MANAGER' environment variable (needs
 to be set before pandas is imported).
 [default: block] [currently: block]
 (Deprecated, use `` instead.)
mode.sim_interactive : boolean
 Whether to simulate interactive mode for purposes of testing
 [default: False] [currently: False]
mode.string_storage : string
 The default storage for StringDtype. This option is ignored if
 ``future.infer_string`` is set to True.
 [default: python] [currently: python]
mode.use_inf_as_na : boolean
 True means treat None, NaN, INF, -INF as NA (old way),
 False means None and NaN are null, but INF, -INF are not NA
 (new way).

 This option is deprecated in pandas 2.1.0 and will be removed in 3.0.
 [default: False] [currently: False]
 (Deprecated, use `` instead.)
plotting.backend : str
 The plotting backend to use. The default value is "matplotlib", the
 backend provided with pandas. Other backends can be specified by
 providing the name of the module that implements the backend.
 [default: matplotlib] [currently: matplotlib]
plotting.matplotlib.register_converters : bool or 'auto'.
 Whether to register converters with matplotlib's units registry for
 dates, times, datetimes, and Periods. Toggling to False will remove
 the converters, restoring any converters that pandas overwrote.
 [default: auto] [currently: auto]
styler.format.decimal : str
 The character representation for the decimal separator for floats and complex.
 [default: .] [currently: .]
styler.format.escape : str, optional
 Whether to escape certain characters according to the given context; html or latex.
 [default: None] [currently: None]
styler.format.formatter : str, callable, dict, optional
 A formatter object to be used as default within ``Styler.format``.
 [default: None] [currently: None]
styler.format.na_rep : str, optional
 The string representation for values identified as missing.
 [default: None] [currently: None]
styler.format.precision : int
 The precision for floats and complex numbers.
 [default: 6] [currently: 6]
styler.format.thousands : str, optional
 The character representation for thousands separator for floats, int and complex.
 [default: None] [currently: None]
styler.html.mathjax : bool
 If False will render special CSS classes to table attributes that indicate Mathjax
 will not be used in Jupyter Notebook.
 [default: True] [currently: True]
styler.latex.environment : str
 The environment to replace ``\begin{table}``. If "longtable" is used results
 in a specific longtable environment format.
 [default: None] [currently: None]
styler.latex.hrules : bool
 Whether to add horizontal rules on top and bottom and below the headers.
 [default: False] [currently: False]
styler.latex.multicol_align : {"r", "c", "l", "naive-l", "naive-r"}
 The specifier for horizontal alignment of sparsified LaTeX multicolumns. Pipe
 decorators can also be added to non-naive values to draw vertical
 rules, e.g. "\|r" will draw a rule on the left side of right aligned merged cells.
 [default: r] [currently: r]
styler.latex.multirow_align : {"c", "t", "b"}
 The specifier for vertical alignment of sparsified LaTeX multirows.
 [default: c] [currently: c]
styler.render.encoding : str
 The encoding used for output HTML and LaTeX files.
 [default: utf-8] [currently: utf-8]
styler.render.max_columns : int, optional
 The maximum number of columns that will be rendered. May still be reduced to
 satisfy ``max_elements``, which takes precedence.
 [default: None] [currently: None]
styler.render.max_elements : int
 The maximum number of data-cell (<td>) elements that will be rendered before
 trimming will occur over columns, rows or both if needed.
 [default: 262144] [currently: 262144]
styler.render.max_rows : int, optional
 The maximum number of rows that will be rendered. May still be reduced to
 satisfy ``max_elements``, which takes precedence.
 [default: None] [currently: None]
styler.render.repr : str
 Determine which output to use in Jupyter Notebook in {"html", "latex"}.
 [default: html] [currently: html]
styler.sparse.columns : bool
 Whether to sparsify the display of hierarchical columns. Setting to False will
 display each explicit level element in a hierarchical key for each column.
 [default: True] [currently: True]
styler.sparse.index : bool
 Whether to sparsify the display of a hierarchical index. Setting to False will
 display each explicit level element in a hierarchical key for each row.
 [default: True] [currently: True] 

獲取和設定選項

如上所述,get_option()set_option() 可從 pandas 名稱空間中呼叫。要更改選項,請呼叫 set_option('option regex', new_value)

In [12]: pd.get_option("mode.sim_interactive")
Out[12]: False

In [13]: pd.set_option("mode.sim_interactive", True)

In [14]: pd.get_option("mode.sim_interactive")
Out[14]: True 

注意

選項'mode.sim_interactive'主要用於除錯目的。

您可以使用reset_option()將設定恢復為預設值。

In [15]: pd.get_option("display.max_rows")
Out[15]: 60

In [16]: pd.set_option("display.max_rows", 999)

In [17]: pd.get_option("display.max_rows")
Out[17]: 999

In [18]: pd.reset_option("display.max_rows")

In [19]: pd.get_option("display.max_rows")
Out[19]: 60 

還可以一次重置多個選項(使用正規表示式):

In [20]: pd.reset_option("^display") 

option_context() 上下文管理器已透過頂層 API 暴露,允許您使用給定的選項值執行程式碼。在退出 with 塊時,選項值會自動恢復:

In [21]: with pd.option_context("display.max_rows", 10, "display.max_columns", 5):
 ....:    print(pd.get_option("display.max_rows"))
 ....:    print(pd.get_option("display.max_columns"))
 ....: 
10
5

In [22]: print(pd.get_option("display.max_rows"))
60

In [23]: print(pd.get_option("display.max_columns"))
0 

在 Python/IPython 環境中設定啟動選項

使用 Python/IPython 環境的啟動指令碼匯入 pandas 並設定選項可以使與 pandas 的工作更有效率。為此,請在所需配置檔案的啟動目錄中建立一個 .py.ipy 指令碼。在預設 IPython 配置資料夾中的啟動資料夾的示例可以在以下位置找到:

$IPYTHONDIR/profile_default/startup 

更多資訊可以在 IPython 文件 中找到。下面是 pandas 的一個示例啟動指令碼:

import pandas as pd

pd.set_option("display.max_rows", 999)
pd.set_option("display.precision", 5) 

常用選項

以下是更常用的顯示選項的演示。

display.max_rowsdisplay.max_columns 設定在美觀列印框架時顯示的最大行數和列數。截斷的行將被省略號替換。

In [24]: df = pd.DataFrame(np.random.randn(7, 2))

In [25]: pd.set_option("display.max_rows", 7)

In [26]: df
Out[26]: 
 0         1
0  0.469112 -0.282863
1 -1.509059 -1.135632
2  1.212112 -0.173215
3  0.119209 -1.044236
4 -0.861849 -2.104569
5 -0.494929  1.071804
6  0.721555 -0.706771

In [27]: pd.set_option("display.max_rows", 5)

In [28]: df
Out[28]: 
 0         1
0   0.469112 -0.282863
1  -1.509059 -1.135632
..       ...       ...
5  -0.494929  1.071804
6   0.721555 -0.706771

[7 rows x 2 columns]

In [29]: pd.reset_option("display.max_rows") 

一旦超過 display.max_rowsdisplay.min_rows 選項確定截斷的 repr 中顯示多少行。

In [30]: pd.set_option("display.max_rows", 8)

In [31]: pd.set_option("display.min_rows", 4)

# below max_rows -> all rows shown
In [32]: df = pd.DataFrame(np.random.randn(7, 2))

In [33]: df
Out[33]: 
 0         1
0 -1.039575  0.271860
1 -0.424972  0.567020
2  0.276232 -1.087401
3 -0.673690  0.113648
4 -1.478427  0.524988
5  0.404705  0.577046
6 -1.715002 -1.039268

# above max_rows -> only min_rows (4) rows shown
In [34]: df = pd.DataFrame(np.random.randn(9, 2))

In [35]: df
Out[35]: 
 0         1
0  -0.370647 -1.157892
1  -1.344312  0.844885
..       ...       ...
7   0.276662 -0.472035
8  -0.013960 -0.362543

[9 rows x 2 columns]

In [36]: pd.reset_option("display.max_rows")

In [37]: pd.reset_option("display.min_rows") 

display.expand_frame_repr 允許DataFrame 的表示跨越頁面,跨越所有列進行換行。

In [38]: df = pd.DataFrame(np.random.randn(5, 10))

In [39]: pd.set_option("expand_frame_repr", True)

In [40]: df
Out[40]: 
 0         1         2  ...         7         8         9
0 -0.006154 -0.923061  0.895717  ...  1.340309 -1.170299 -0.226169
1  0.410835  0.813850  0.132003  ... -1.436737 -1.413681  1.607920
2  1.024180  0.569605  0.875906  ... -0.078638  0.545952 -1.219217
3 -1.226825  0.769804 -1.281247  ...  0.341734  0.959726 -1.110336
4 -0.619976  0.149748 -0.732339  ...  0.301624 -2.179861 -1.369849

[5 rows x 10 columns]

In [41]: pd.set_option("expand_frame_repr", False)

In [42]: df
Out[42]: 
 0         1         2         3         4         5         6         7         8         9
0 -0.006154 -0.923061  0.895717  0.805244 -1.206412  2.565646  1.431256  1.340309 -1.170299 -0.226169
1  0.410835  0.813850  0.132003 -0.827317 -0.076467 -1.187678  1.130127 -1.436737 -1.413681  1.607920
2  1.024180  0.569605  0.875906 -2.211372  0.974466 -2.006747 -0.410001 -0.078638  0.545952 -1.219217
3 -1.226825  0.769804 -1.281247 -0.727707 -0.121306 -0.097883  0.695775  0.341734  0.959726 -1.110336
4 -0.619976  0.149748 -0.732339  0.687738  0.176444  0.403310 -0.154951  0.301624 -2.179861 -1.369849

In [43]: pd.reset_option("expand_frame_repr") 

display.large_repr 顯示超過 max_columnsmax_rowsDataFrame 為截斷的框架或摘要。

In [44]: df = pd.DataFrame(np.random.randn(10, 10))

In [45]: pd.set_option("display.max_rows", 5)

In [46]: pd.set_option("large_repr", "truncate")

In [47]: df
Out[47]: 
 0         1         2  ...         7         8         9
0  -0.954208  1.462696 -1.743161  ...  0.995761  2.396780  0.014871
1   3.357427 -0.317441 -1.236269  ...  0.380396  0.084844  0.432390
..       ...       ...       ...  ...       ...       ...       ...
8  -0.303421 -0.858447  0.306996  ...  0.476720  0.473424 -0.242861
9  -0.014805 -0.284319  0.650776  ...  1.613616  0.464000  0.227371

[10 rows x 10 columns]

In [48]: pd.set_option("large_repr", "info")

In [49]: df
Out[49]: 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       10 non-null     float64
 1   1       10 non-null     float64
 2   2       10 non-null     float64
 3   3       10 non-null     float64
 4   4       10 non-null     float64
 5   5       10 non-null     float64
 6   6       10 non-null     float64
 7   7       10 non-null     float64
 8   8       10 non-null     float64
 9   9       10 non-null     float64
dtypes: float64(10)
memory usage: 928.0 bytes

In [50]: pd.reset_option("large_repr")

In [51]: pd.reset_option("display.max_rows") 

display.max_colwidth 設定列的最大寬度。超過此長度的單元格將以省略號截斷。

In [52]: df = pd.DataFrame(
 ....:    np.array(
 ....:        [
 ....:            ["foo", "bar", "bim", "uncomfortably long string"],
 ....:            ["horse", "cow", "banana", "apple"],
 ....:        ]
 ....:    )
 ....: )
 ....: 

In [53]: pd.set_option("max_colwidth", 40)

In [54]: df
Out[54]: 
 0    1       2                          3
0    foo  bar     bim  uncomfortably long string
1  horse  cow  banana                      apple

In [55]: pd.set_option("max_colwidth", 6)

In [56]: df
Out[56]: 
 0    1      2      3
0    foo  bar    bim  un...
1  horse  cow  ba...  apple

In [57]: pd.reset_option("max_colwidth") 

display.max_info_columns 設定在呼叫 info() 時顯示的列數閾值。

In [58]: df = pd.DataFrame(np.random.randn(10, 10))

In [59]: pd.set_option("max_info_columns", 11)

In [60]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       10 non-null     float64
 1   1       10 non-null     float64
 2   2       10 non-null     float64
 3   3       10 non-null     float64
 4   4       10 non-null     float64
 5   5       10 non-null     float64
 6   6       10 non-null     float64
 7   7       10 non-null     float64
 8   8       10 non-null     float64
 9   9       10 non-null     float64
dtypes: float64(10)
memory usage: 928.0 bytes

In [61]: pd.set_option("max_info_columns", 5)

In [62]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Columns: 10 entries, 0 to 9
dtypes: float64(10)
memory usage: 928.0 bytes

In [63]: pd.reset_option("max_info_columns") 

display.max_info_rowsinfo() 通常會顯示每列的空值計數。對於大型 DataFrame 來說,這可能會相當慢。max_info_rowsmax_info_cols 將此空值檢查限制為分別指定的行和列。info() 的關鍵字引數 show_counts=True 將覆蓋此設定。

In [64]: df = pd.DataFrame(np.random.choice([0, 1, np.nan], size=(10, 10)))

In [65]: df
Out[65]: 
 0    1    2    3    4    5    6    7    8    9
0  0.0  NaN  1.0  NaN  NaN  0.0  NaN  0.0  NaN  1.0
1  1.0  NaN  1.0  1.0  1.0  1.0  NaN  0.0  0.0  NaN
2  0.0  NaN  1.0  0.0  0.0  NaN  NaN  NaN  NaN  0.0
3  NaN  NaN  NaN  0.0  1.0  1.0  NaN  1.0  NaN  1.0
4  0.0  NaN  NaN  NaN  0.0  NaN  NaN  NaN  1.0  0.0
5  0.0  1.0  1.0  1.0  1.0  0.0  NaN  NaN  1.0  0.0
6  1.0  1.0  1.0  NaN  1.0  NaN  1.0  0.0  NaN  NaN
7  0.0  0.0  1.0  0.0  1.0  0.0  1.0  1.0  0.0  NaN
8  NaN  NaN  NaN  0.0  NaN  NaN  NaN  NaN  1.0  NaN
9  0.0  NaN  0.0  NaN  NaN  0.0  NaN  1.0  1.0  0.0

In [66]: pd.set_option("max_info_rows", 11)

In [67]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       8 non-null      float64
 1   1       3 non-null      float64
 2   2       7 non-null      float64
 3   3       6 non-null      float64
 4   4       7 non-null      float64
 5   5       6 non-null      float64
 6   6       2 non-null      float64
 7   7       6 non-null      float64
 8   8       6 non-null      float64
 9   9       6 non-null      float64
dtypes: float64(10)
memory usage: 928.0 bytes

In [68]: pd.set_option("max_info_rows", 5)

In [69]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
 #   Column  Dtype 
---  ------  ----- 
 0   0       float64
 1   1       float64
 2   2       float64
 3   3       float64
 4   4       float64
 5   5       float64
 6   6       float64
 7   7       float64
 8   8       float64
 9   9       float64
dtypes: float64(10)
memory usage: 928.0 bytes

In [70]: pd.reset_option("max_info_rows") 

display.precision 設定輸出顯示精度,即小數位數。

In [71]: df = pd.DataFrame(np.random.randn(5, 5))

In [72]: pd.set_option("display.precision", 7)

In [73]: df
Out[73]: 
 0          1          2          3          4
0 -1.1506406 -0.7983341 -0.5576966  0.3813531  1.3371217
1 -1.5310949  1.3314582 -0.5713290 -0.0266708 -1.0856630
2 -1.1147378 -0.0582158 -0.4867681  1.6851483  0.1125723
3 -1.4953086  0.8984347 -0.1482168 -1.5960698  0.1596530
4  0.2621358  0.0362196  0.1847350 -0.2550694 -0.2710197

In [74]: pd.set_option("display.precision", 4)

In [75]: df
Out[75]: 
 0       1       2       3       4
0 -1.1506 -0.7983 -0.5577  0.3814  1.3371
1 -1.5311  1.3315 -0.5713 -0.0267 -1.0857
2 -1.1147 -0.0582 -0.4868  1.6851  0.1126
3 -1.4953  0.8984 -0.1482 -1.5961  0.1597
4  0.2621  0.0362  0.1847 -0.2551 -0.2710 

display.chop_threshold 設定在顯示 SeriesDataFrame 時將舍入閾值設為零。該設定不會改變儲存數字的精度。

In [76]: df = pd.DataFrame(np.random.randn(6, 6))

In [77]: pd.set_option("chop_threshold", 0)

In [78]: df
Out[78]: 
 0       1       2       3       4       5
0  1.2884  0.2946 -1.1658  0.8470 -0.6856  0.6091
1 -0.3040  0.6256 -0.0593  0.2497  1.1039 -1.0875
2  1.9980 -0.2445  0.1362  0.8863 -1.3507 -0.8863
3 -1.0133  1.9209 -0.3882 -2.3144  0.6655  0.4026
4  0.3996 -1.7660  0.8504  0.3881  0.9923  0.7441
5 -0.7398 -1.0549 -0.1796  0.6396  1.5850  1.9067

In [79]: pd.set_option("chop_threshold", 0.5)

In [80]: df
Out[80]: 
 0       1       2       3       4       5
0  1.2884  0.0000 -1.1658  0.8470 -0.6856  0.6091
1  0.0000  0.6256  0.0000  0.0000  1.1039 -1.0875
2  1.9980  0.0000  0.0000  0.8863 -1.3507 -0.8863
3 -1.0133  1.9209  0.0000 -2.3144  0.6655  0.0000
4  0.0000 -1.7660  0.8504  0.0000  0.9923  0.7441
5 -0.7398 -1.0549  0.0000  0.6396  1.5850  1.9067

In [81]: pd.reset_option("chop_threshold") 

display.colheader_justify 控制標題的對齊方式。選項有 'right''left'

In [82]: df = pd.DataFrame(
 ....:    np.array([np.random.randn(6), np.random.randint(1, 9, 6) * 0.1, np.zeros(6)]).T,
 ....:    columns=["A", "B", "C"],
 ....:    dtype="float",
 ....: )
 ....: 

In [83]: pd.set_option("colheader_justify", "right")

In [84]: df
Out[84]: 
 A    B    C
0  0.1040  0.1  0.0
1  0.1741  0.5  0.0
2 -0.4395  0.4  0.0
3 -0.7413  0.8  0.0
4 -0.0797  0.4  0.0
5 -0.9229  0.3  0.0

In [85]: pd.set_option("colheader_justify", "left")

In [86]: df
Out[86]: 
 A       B    C 
0  0.1040  0.1  0.0
1  0.1741  0.5  0.0
2 -0.4395  0.4  0.0
3 -0.7413  0.8  0.0
4 -0.0797  0.4  0.0
5 -0.9229  0.3  0.0

In [87]: pd.reset_option("colheader_justify") 

數字格式化

pandas 還允許您設定在控制檯中顯示數字的方式。此選項不是透過 set_options API 設定的。

使用set_eng_float_format函式來改變 pandas 物件的浮點格式,以產生特定格式。

In [88]: import numpy as np

In [89]: pd.set_eng_float_format(accuracy=3, use_eng_prefix=True)

In [90]: s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])

In [91]: s / 1.0e3
Out[91]: 
a    303.638u
b   -721.084u
c   -622.696u
d    648.250u
e     -1.945m
dtype: float64

In [92]: s / 1.0e6
Out[92]: 
a    303.638n
b   -721.084n
c   -622.696n
d    648.250n
e     -1.945u
dtype: float64 

使用round()來專門控制單個DataFrame的四捨五入

Unicode 格式化

警告

啟用此選項將影響 DataFrame 和 Series 的列印效能(大約慢 2 倍)。僅在實際需要時使用。

一些東亞國家使用 Unicode 字元,其寬度相當於兩個拉丁字元。如果 DataFrame 或 Series 包含這些字元,則預設輸出模式可能無法正確對齊它們。

In [93]: df = pd.DataFrame({"國籍": ["UK", "日本"], "名前": ["Alice", "しのぶ"]})

In [94]: df
Out[94]: 
 國籍     名前
0  UK  Alice
1  日本    しのぶ 

啟用display.unicode.east_asian_width允許 pandas 檢查每個字元的“東亞寬度”屬性。透過將此選項設定為True,可以正確對齊這些字元。然而,這將導致比標準len函式更長的渲染時間。

In [95]: pd.set_option("display.unicode.east_asian_width", True)

In [96]: df
Out[96]: 
 國籍    名前
0    UK   Alice
1  日本  しのぶ 

此外,Unicode 字元的寬度“模糊”可能是 1 或 2 個字元寬,取決於終端設定或編碼。選項display.unicode.ambiguous_as_wide可用於處理這種模糊性。

預設情況下,“模糊”字元的寬度,例如下面示例中的“¡”(倒歎號),被視為 1。

In [97]: df = pd.DataFrame({"a": ["xxx", "¡¡"], "b": ["yyy", "¡¡"]})

In [98]: df
Out[98]: 
 a    b
0  xxx  yyy
1   ¡¡   ¡¡ 

啟用display.unicode.ambiguous_as_wide使得 pandas 將這些字元的寬度解釋為 2。(請注意,此選項僅在啟用display.unicode.east_asian_width時才有效。)

然而,為您的終端錯誤設定此選項將導致這些字元對齊不正確:

In [99]: pd.set_option("display.unicode.ambiguous_as_wide", True)

In [100]: df
Out[100]: 
 a     b
0   xxx   yyy
1  ¡¡  ¡¡ 

表模式顯示

DataFrameSeries 預設會釋出一個表模式表示。可以透過display.html.table_schema選項在全域性範圍內啟用此功能:

In [101]: pd.set_option("display.html.table_schema", True) 

只有'display.max_rows'被序列化和釋出。

相關文章