Pandas - Series.str.xxx 常用函式

Himmelbleu發表於2024-03-13

原文網址 : https://www.cnblogs.com/Himmelbleu/p/18066920

大小寫

大小寫轉換、首字母大寫、英文書寫格式、大小寫反轉的函式。pandas.Series.str.capitalize

file:[例1]
ser = pd.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])

print(ser.str.lower())
"""
0                 lower
1              capitals
2    this is a sentence
3              swapcase
dtype: object
"""

print(ser.str.upper())
"""
0                 LOWER
1              CAPITALS
2    THIS IS A SENTENCE
3              SWAPCASE
dtype: object
"""

print(ser.str.title())
"""
0                 Lower
1              Capitals
2    This Is A Sentence
3              Swapcase
dtype: object
"""

print(ser.str.capitalize())
"""
0                 Lower
1              Capitals
2    This is a sentence
3              Swapcase
dtype: object
"""

print(ser.str.swapcase())
"""
0                 LOWER
1              capitals
2    THIS IS A SENTENCE
3              sWaPcAsE
dtype: object
"""

casefold

移除字串中的所有大小寫區別。等同於 lower()。pandas.Series.casefold

file:[例2]
ser = pd.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
print(ser.str.casefold())

"""
0                 lower
1              capitals
2    this is a sentence
3              swapcase
dtype: object
"""

查詢

contains

測試字串或正規表示式是否包含在 Series 的字串中，如例3所示，每一個字串都包含了 XL。pandas.Series.str.contains

file:[例3]
data = pd.read_excel(rf'C:\examples.xlsx')
print(data.loc[data['產品編碼'].str.contains(pat='XL')]['產品編碼'])

"""
1       XL03ACR107B001030204021
2       XL03ACR107B014020104870
4       XL01EBJ008A001010105847
5       XL01XGP043B001020104021
10      XL01EAY059A001010105847
                 ...           
2354    XR06AXL226A001010405848
2361    XL04AAL019A001010104859
2409    XR06AXL226A001010405848
2440    XR06AXL226K001010101442
2479    XR06AXL226A001010405848
Name: 產品編碼, Length: 100, dtype: object
"""

fullmatch

確定字串或正規表示式是否完全在 Series 的字串中，返回一個布林 Series。pandas.Series.str.fullmatch

file:[例4]
print(data.loc[data['產品編碼'].str.fullmatch(pat='XL')]['產品編碼'])

"""
Series([], Name: 產品編碼, dtype: object)
"""

如例4所示，XL 不完全匹配產品編碼的字串，即空。修改正規表示式，匹配範圍更廣一點：

file:[例5]
print(data.loc[data['產品編碼'].str.fullmatch('.*XL.*')]['產品編碼'])
"""
1       XL03ACR107B001030204021
2       XL03ACR107B014020104870
4       XL01EBJ008A001010105847
5       XL01XGP043B001020104021
10      XL01EAY059A001010105847
                 ...           
2354    XR06AXL226A001010405848
2361    XL04AAL019A001010104859
2409    XR06AXL226A001010405848
2440    XR06AXL226K001010101442
2479    XR06AXL226A001010405848
Name: 產品編碼, Length: 100, dtype: object
"""

fm = data.loc[data['產品編碼'].str.fullmatch('.*XL.*')]['產品編碼']
co = data.loc[data['產品編碼'].str.contains('.*XL.*')]['產品編碼']
print(fm.equals(co))
"""
True
"""

例3與例5的結果和數量都一致。

findall

在 Series 中查詢正規表示式的所有匹配項。pandas.Series.str.findall

file:[例6]
data = pd.read_excel(rf'C:\examples.xlsx')
print(data['產品編碼'].str.findall(r'[\d]+'))

"""
0       [10, 247, 002010103546]
1       [03, 107, 001030204021]
2       [03, 107, 014020104870]
3       [01, 112, 010030105198]
4       [01, 008, 001010105847]
                 ...           
2522    [10, 079, 002010207103]
2523    [10, 080, 002010107103]
2524    [10, 074, 002030100227]
2525    [10, 087, 002010107103]
2526    [10, 025, 002020181708]
Name: 產品編碼, Length: 2527, dtype: object
"""

findall 查詢的是匹配正規表示式中匹配的項，例如 XL03ACR107B001030204021 就會得到一陣列 [10, 247, 002010103546]。

endswith 和 startswith

endswith：查詢在字串中是否包含指定結尾的子字串。不接受正規表示式。
startswith：與 endswith 相反，查詢開頭的子字串。

file:[例7]
ser = pd.Series(['bat', 'bear', 'caT', np.nan])
print(ser.str.endswith('t'))

"""
0     True
1    False
2    False
3      NaN
dtype: object
"""

print(ser.str.startswith('b'))

"""
0     True
1    False
2    False
3      NaN
dtype: object
"""

這兩個函式可以結合 loc 來做索引 DataFrame 的操作，例如：df.loc[df['產品編碼'].str.startswith('XL')]。

count

計算 Series 中每個字串中的字元出現的次數。

file:[例8]
ser = pd.Series(['A', 'B', 'Aaba', 'Baca', np.nan, 'CABA', 'cat'])
print(ser.str.count('a'))

"""
0    0.0
1    0.0
2    2.0
3    2.0
4    NaN
5    0.0
6    1.0
dtype: float64
"""

文字處理

center、ljust、rjust

具體檢視官方示例文件 pandas.Series.str.center。

center：左右兩邊填充字元到指定長度
ljust：右填充字元到指定長度
rjust：左填充字元到指定長度

這三個函式還與 pandas.Series.str.pad 函式、pandas.Series.str.zfill 函式類似。

strip、lstrip、rstrip

具體檢視官方示例文件 pandas.Series.str.strip。

strip：清除首尾字元
lstrip：清除首字元
rstrip：清除尾字元

引數 to_strip 指的是：指定刪除的字符集，此字符集的所有組合都將被剝離為陣列。若 None 或不傳遞，則刪除空格。

file:[例9]
ser = pd.Series(['1. Ant.  ', '2. Bee!\n', '3. Cat?\t', np.nan, 10, True])

ser.str.lstrip('123.')
"""首
0    Ant.
1    Bee!\n
2    Cat?\t
3       NaN
4       NaN
5       NaN
dtype: object
"""

ser.str.rstrip('.!? \n\t')
"""尾
0    1. Ant
1    2. Bee
2    3. Cat
3       NaN
4       NaN
5       NaN
dtype: object
"""

ser.str.strip('123.!? \n\t')
"""首尾
0    Ant
1    Bee
2    Cat
3    NaN
4    NaN
5    NaN
dtype: object
"""

與 lstrip 和 rstrip 相似的還有 removeprefix 和 removesuffix。

replace

替換 Series 中每次出現的 pattern 或者 regex。

file:[例10]

pd.Series(['foo', 'fuz', np.nan]).str.replace('f.', 'ba', regex=True)
"""
0    bao
1    baz
2    NaN
dtype: object
"""

pd.Series(['f.o', 'fuz', np.nan]).str.replace('f.', 'ba', regex=False)
"""
0    bao
1    fuz
2    NaN
dtype: object
"""

若要用正規表示式進行替換，需給 regex 傳遞 True，不然就會被當作為 patter 進行替換。如例10中的第二個，f.o 被替換為 bao，因為 regex 為 False，patter 為 f.。

numpy、pandas常用函式功能
2020-10-11
函式
pandas之常用基本函式學習筆記
2021-01-03
函式筆記
Pandas - pandas.Series.pipe 函式
2024-03-13
函式
Mysql 常用函式（1）- 常用函式彙總
2020-05-14
MySql函式
常用函式
2024-10-30
函式
Mysql 常用函式（20）- ceiling 函式
2020-05-16
MySql函式
Mysql 常用函式（15）- upper 函式
2020-05-15
MySql函式
pandas.DataFrame.groupby函式應用
2018-05-14
函式
pandas 將函式應用到列（qbit）
2022-12-29
函式
MySQL 常用函式
2018-10-25
MySql函式
kotlin常用函式
2018-11-20
Kotlin函式
js 常用函式
2018-11-19
JS函式
Oracle常用函式
2024-05-08
Oracle函式
常用函式整理
2020-10-01
函式
TensorFlow常用函式
2020-04-06
函式
PHP 常用函式
2019-05-22
PHP函式
QT常用函式
2024-07-16
QT函式
常用助手函式
2023-03-14
函式
loadrunner常用函式
2021-07-26
函式
PHP常用函式
2021-09-09
PHP函式
MySQL 常用函式。
2020-12-07
MySql函式
常用函式集合
2021-01-12
函式
Hive常用函式及自定義函式
2018-06-08
Hive函式
核函式多項式核函式高斯核函式(常用)
2020-10-30
函式
pandas dataframe 時間欄位 diff 函式
2020-10-28
函式
pandas中如何使用合併append函式？
2021-09-11
APP函式
15個常用excel函式公式 excel函式辦公常用公式
2022-03-22
Excel函式公式
javascript常用函式大全
2018-10-24
JavaScript函式
PHP常用函式篇
2019-02-16
PHP函式
Js常用的函式
2018-04-28
JS函式
Carbon 類常用函式
2019-08-07
函式
Hyperf 常用助手函式
2019-09-23
函式
常用啟用函式
2020-10-23
函式
常用函式彙總
2019-07-10
函式
常用的Css函式
2019-05-06
CSS函式
PHP常用函式大全
2019-05-11
PHP函式
Hive（五）常用函式
2024-10-09
Hive函式
python的常用函式
2020-12-11
Python函式