Python 的內建字串方法（收藏專用）

發表於2016-05-20

字串處理是非常常用的技能，但 Python 內建字串方法太多，常常遺忘，為了便於快速參考，特地依據 Python 3.5.1 給每個內建方法寫了示例並進行了歸類，便於大家索引。
PS: 可以點選概覽內的綠色標題進入相應分類或者通過右側邊欄文章目錄快速索引相應方法。

概覽

字串大小寫轉換

str.capitalize()
str.lower()
str.casefold()
str.swapcase()
str.title()
str.upper()

字串格式輸出

str.center(width[, fillchar])
str.ljust(width[, fillchar]); str.rjust(width[, fillchar])
str.zfill(width)
str.expandtabs(tabsize=8)
str.format(^args, ^^kwargs)
str.format_map(mapping)

字串搜尋定位與替換

str.count(sub[, start[, end]])
str.find(sub[, start[, end]]); str.rfind(sub[, start[, end]])
str.index(sub[, start[, end]]); str.rindex(sub[, start[, end]])
str.replace(old, new[, count])
str.lstrip([chars]); str.rstrip([chars]); str.strip([chars])
static str.maketrans(x[, y[, z]]); str.translate(table)

字串的聯合與分割

str.join(iterable)
str.partition(sep); str.rpartition(sep)
str.split(sep=None, maxsplit=-1); str.rsplit(sep=None, maxsplit=-1)
str.splitlines([keepends])

字串條件判斷

str.endswith(suffix[, start[, end]]); str.startswith(prefix[, start[, end]])
str.isalnum()
str.isalpha()
str.isdecimal(); str.isdigit(); str.isnumeric()
str.isidentifier()
str.islower()
str.isprintable()
str.isspace()
str.istitle()
str.isupper()

字串編碼

str.encode(encoding=”utf-8″, errors=”strict”)

大小寫轉換

str.capitalize()

將首字母轉換成大寫，需要注意的是如果首字沒有大寫形式，則返回原字串。

'adi dog'.capitalize()
# 'Adi dog'

'abcd 徐'.capitalize()
# 'Abcd 徐'

'徐 abcd'.capitalize()
# '徐 abcd'

'ß'.capitalize()
# 'SS'

'adi dog'.capitalize()

# 'Adi dog'

'abcd 徐'.capitalize()

# 'Abcd 徐'

'徐 abcd'.capitalize()

# '徐 abcd'

'ß'.capitalize()

# 'SS'

str.lower()

將字串轉換成小寫，其僅對 ASCII 編碼的字母有效。

'DOBI'.lower()
# 'dobi'

'ß'.lower()   # 'ß' 為德語小寫字母，其有另一種小寫 'ss'， lower 方法無法轉換
# 'ß'

'徐 ABCD'.lower()
# '徐 abcd'

'DOBI'.lower()

# 'dobi'

'ß'.lower() # 'ß' 為德語小寫字母，其有另一種小寫 'ss'， lower 方法無法轉換

# 'ß'

'徐 ABCD'.lower()

# '徐 abcd'

str.casefold()

將字串轉換成小寫，Unicode 編碼中凡是有對應的小寫形式的，都會轉換。

'DOBI'.casefold()
# 'dobi'

'ß'.casefold()   #德語中小寫字母 ß 等同於小寫字母 ss， 其大寫為 SS 
# 'ss'

'DOBI'.casefold()

# 'dobi'

'ß'.casefold() #德語中小寫字母 ß 等同於小寫字母 ss，其大寫為 SS

# 'ss'

str.swapcase()

對字串字母的大小寫進行反轉。

'徐Dobi a123 ß'.swapcase()
#: '徐dOBI A123 SS'    這裡的 ß 被轉成 SS 是一種大寫

1 2	'徐Dobi a123 ß'.swapcase() #: '徐dOBI A123 SS' 這裡的 ß 被轉成 SS 是一種大寫

但需要注意的是 s.swapcase().swapcase() == s 不一定為真：

u'xb5'
# 'µ'

u'xb5'.swapcase()
# 'Μ'

u'xb5'.swapcase().swapcase()
# 'μ'

hex(ord(u'xb5'.swapcase().swapcase()))
Out[154]: '0x3bc'

u'xb5'

# 'µ'

u'xb5'.swapcase()

# 'Μ'

u'xb5'.swapcase().swapcase()

# 'μ'

hex(ord(u'xb5'.swapcase().swapcase()))

Out[154]: '0x3bc'

這裡 'Μ'(是 mu 不是 M) 的小寫正好與 'μ' 的寫法一致。

str.title()

將字串中每個“單詞”首字母大寫。其判斷“單詞”的依據則是基於空格和標點，所以應對英文撇好所有格或一些英文大寫的簡寫時，會出錯。

'Hello world'.title()
# 'Hello World'

'中文abc def 12gh'.title()
# '中文Abc Def 12Gh'

# 但這個方法並不完美：
"they're bill's friends from the UK".title()
# "They'Re Bill'S Friends From The Uk"

'Hello world'.title()

# 'Hello World'

'中文abc def 12gh'.title()

# '中文Abc Def 12Gh'

# 但這個方法並不完美：

"they're bill's friends from the UK".title()

# "They'Re Bill'S Friends From The Uk"

str.upper()

將字串所有字母變為大寫，會自動忽略不可轉成大寫的字元。

'中文abc def 12gh'.upper()
# '中文ABC DEF 12GH'

1 2	'中文abc def 12gh'.upper() # '中文ABC DEF 12GH'

需要注意的是 s.upper().isupper() 不一定為 True。

字串格式輸出

str.center(width[, fillchar])

將字串按照給定的寬度居中顯示，可以給定特定的字元填充多餘的長度，如果指定的長度小於字串長度，則返回原字串。

'12345'.center(10, '*')
# '**12345***'

'12345'.center(10)
# '  12345   '

'12345'.center(10, '*')

# '**12345***'

'12345'.center(10)

# ' 12345 '

str.ljust(width[, fillchar]); str.rjust(width[, fillchar])

返回指定長度的字串，字串內容居左（右）如果長度小於字串長度，則返回原始字串，預設填充為 ASCII 空格，可指定填充的字串。

'dobi'.ljust(10)
# 'dobi      '

'dobi'.ljust(10, '~')
# 'dobi~~~~~~'

'dobi'.ljust(3, '~')
# 'dobi'

'dobi'.ljust(3)
# 'dobi'

'dobi'.ljust(10)

# 'dobi '

'dobi'.ljust(10, '~')

# 'dobi~~~~~~'

'dobi'.ljust(3, '~')

# 'dobi'

'dobi'.ljust(3)

# 'dobi'

str.zfill(width)

用 ‘0’ 填充字串，並返回指定寬度的字串。

"42".zfill(5)
# '00042'
"-42".zfill(5)
# '-0042'

'dd'.zfill(5)
# '000dd'

'--'.zfill(5)
# '-000-'

' '.zfill(5)
# '0000 '

''.zfill(5)
# '00000'

'dddddddd'.zfill(5)
# 'dddddddd'

"42".zfill(5)

# '00042'

"-42".zfill(5)

# '-0042'

'dd'.zfill(5)

# '000dd'

'--'.zfill(5)

# '-000-'

' '.zfill(5)

# '0000 '

''.zfill(5)

# '00000'

'dddddddd'.zfill(5)

# 'dddddddd'

str.expandtabs(tabsize=8)

用指定的空格替代橫向製表符，使得相鄰字串之間的間距保持在指定的空格數以內。

tab = '1t23t456t7890t1112131415t161718192021'

tab.expandtabs()
# '1       23      456     7890    1112131415      161718192021'
# '123456781234567812345678123456781234567812345678'  注意空格的計數與上面輸出位置的關係

tab.expandtabs(4)
# '1   23  456 7890    1112131415  161718192021'
# '12341234123412341234123412341234'

tab = '1t23t456t7890t1112131415t161718192021'

tab.expandtabs()

# '1 23 456 7890 1112131415 161718192021'

# '123456781234567812345678123456781234567812345678' 注意空格的計數與上面輸出位置的關係

tab.expandtabs(4)

# '1 23 456 7890 1112131415 161718192021'

# '12341234123412341234123412341234'

str.format(^args, ^^kwargs)

格式化字串的語法比較繁多，官方文件已經有比較詳細的 examples，這裡就不寫例子了，想了解的童鞋可以直接戳這裡 Format examples.

str.format_map(mapping)

類似 str.format(*args, **kwargs) ，不同的是 mapping 是一個字典物件。

People = {'name':'john', 'age':56}

'My name is {name},i am {age} old'.format_map(People)
# 'My name is john,i am 56 old'

People = {'name':'john', 'age':56}

'My name is {name},i am {age} old'.format_map(People)

# 'My name is john,i am 56 old'

字串搜尋定位與替換

str.count(sub[, start[, end]])

text = 'outer protective covering'

text.count('e')
# 4

text.count('e', 5, 11)
# 1

text.count('e', 5, 10)
# 0

text = 'outer protective covering'

text.count('e')

# 4

text.count('e', 5, 11)

# 1

text.count('e', 5, 10)

# 0

str.find(sub[, start[, end]]); str.rfind(sub[, start[, end]])

text = 'outer protective covering'

text.find('er')
# 3

text.find('to')
# -1

text.find('er', 3)
Out[121]: 3

text.find('er', 4)
Out[122]: 20

text.find('er', 4, 21)
Out[123]: -1

text.find('er', 4, 22)
Out[124]: 20

text.rfind('er')
Out[125]: 20

text.rfind('er', 20)
Out[126]: 20

text.rfind('er', 20, 21)
Out[129]: -1

text = 'outer protective covering'

text.find('er')

# 3

text.find('to')

# -1

text.find('er', 3)

Out[121]: 3

text.find('er', 4)

Out[122]: 20

text.find('er', 4, 21)

Out[123]: -1

text.find('er', 4, 22)

Out[124]: 20

text.rfind('er')

Out[125]: 20

text.rfind('er', 20)

Out[126]: 20

text.rfind('er', 20, 21)

Out[129]: -1

str.index(sub[, start[, end]]); str.rindex(sub[, start[, end]])

與 find() rfind() 類似，不同的是如果找不到，就會引發 ValueError。

str.replace(old, new[, count])

'dog wow wow jiao'.replace('wow', 'wang')
# 'dog wang wang jiao'

'dog wow wow jiao'.replace('wow', 'wang', 1)
# 'dog wang wow jiao'

'dog wow wow jiao'.replace('wow', 'wang', 0)
# 'dog wow wow jiao'

'dog wow wow jiao'.replace('wow', 'wang', 2)
# 'dog wang wang jiao'

'dog wow wow jiao'.replace('wow', 'wang', 3)
# 'dog wang wang jiao'

'dog wow wow jiao'.replace('wow', 'wang')

# 'dog wang wang jiao'

'dog wow wow jiao'.replace('wow', 'wang', 1)

# 'dog wang wow jiao'

'dog wow wow jiao'.replace('wow', 'wang', 0)

# 'dog wow wow jiao'

'dog wow wow jiao'.replace('wow', 'wang', 2)

# 'dog wang wang jiao'

'dog wow wow jiao'.replace('wow', 'wang', 3)

# 'dog wang wang jiao'

str.lstrip([chars]); str.rstrip([chars]); str.strip([chars])

'  dobi'.lstrip()
# 'dobi'
'db.kun.ac.cn'.lstrip('dbk')
# '.kun.ac.cn'

' dobi   '.rstrip()
# ' dobi'
'db.kun.ac.cn'.rstrip('acn')
# 'db.kun.ac.'

'   dobi   '.strip()
# 'dobi'
'db.kun.ac.cn'.strip('db.c')
# 'kun.ac.cn'
'db.kun.ac.cn'.strip('cbd.un')
# 'kun.a'

' dobi'.lstrip()

# 'dobi'

'db.kun.ac.cn'.lstrip('dbk')

# '.kun.ac.cn'

' dobi '.rstrip()

# ' dobi'

'db.kun.ac.cn'.rstrip('acn')

# 'db.kun.ac.'

' dobi '.strip()

# 'dobi'

'db.kun.ac.cn'.strip('db.c')

# 'kun.ac.cn'

'db.kun.ac.cn'.strip('cbd.un')

# 'kun.a'

static str.maketrans(x[, y[, z]]); str.translate(table)

maktrans 是一個靜態方法，用於生成一個對照表，以供 translate 使用。
如果 maktrans 僅一個引數，則該引數必須是一個字典，字典的 key 要麼是一個 Unicode 編碼（一個整數），要麼是一個長度為 1 的字串，字典的 value 則可以是任意字串、None或者 Unicode 編碼。

a = 'dobi'
ord('o')
# 111

ord('a')
# 97

hex(ord('狗'))
# '0x72d7'

b = {'d':'dobi', 111:' is ', 'b':97, 'i':'u72d7u72d7'}
table = str.maketrans(b)

a.translate(table)
# 'dobi is a狗狗'

a = 'dobi'

ord('o')

# 111

ord('a')

# 97

hex(ord('狗'))

# '0x72d7'

b = {'d':'dobi', 111:' is ', 'b':97, 'i':'u72d7u72d7'}

table = str.maketrans(b)

a.translate(table)

# 'dobi is a狗狗'

如果 maktrans 有兩個引數，則兩個引數形成對映，且兩個字串必須是長度相等；如果有第三個引數，則第三個引數也必須是字串，該字串將自動對映到 None：

a = 'dobi is a dog'

table = str.maketrans('dobi', 'alph')

a.translate(table)
# 'alph hs a alg'

table = str.maketrans('dobi', 'alph', 'o')

a.translate(table)
# 'aph hs a ag'

a = 'dobi is a dog'

table = str.maketrans('dobi', 'alph')

a.translate(table)

# 'alph hs a alg'

table = str.maketrans('dobi', 'alph', 'o')

a.translate(table)

# 'aph hs a ag'

字串的聯合與分割

str.join(iterable)

用指定的字串，連線元素為字串的可迭代物件。

'-'.join(['2012', '3', '12'])
# '2012-3-12'

'-'.join([2012, 3, 12])
# TypeError: sequence item 0: expected str instance, int found

'-'.join(['2012', '3', b'12'])  #bytes 為非字串
# TypeError: sequence item 2: expected str instance, bytes found

'-'.join(['2012'])
# '2012'

'-'.join([])
# ''

'-'.join([None])
# TypeError: sequence item 0: expected str instance, NoneType found

'-'.join([''])
# ''

','.join({'dobi':'dog', 'polly':'bird'})
# 'dobi,polly'

','.join({'dobi':'dog', 'polly':'bird'}.values())
# 'dog,bird'

'-'.join(['2012', '3', '12'])

# '2012-3-12'

'-'.join([2012, 3, 12])

# TypeError: sequence item 0: expected str instance, int found

'-'.join(['2012', '3', b'12']) #bytes 為非字串

# TypeError: sequence item 2: expected str instance, bytes found

'-'.join(['2012'])

# '2012'

'-'.join([])

# ''

'-'.join([None])

# TypeError: sequence item 0: expected str instance, NoneType found

'-'.join([''])

# ''

','.join({'dobi':'dog', 'polly':'bird'})

# 'dobi,polly'

','.join({'dobi':'dog', 'polly':'bird'}.values())

# 'dog,bird'

str.partition(sep); str.rpartition(sep)

'dog wow wow jiao'.partition('wow')
# ('dog ', 'wow', ' wow jiao')

'dog wow wow jiao'.partition('dog')
# ('', 'dog', ' wow wow jiao')

'dog wow wow jiao'.partition('jiao')
# ('dog wow wow ', 'jiao', '')

'dog wow wow jiao'.partition('ww')
# ('dog wow wow jiao', '', '')



'dog wow wow jiao'.rpartition('wow')
Out[131]: ('dog wow ', 'wow', ' jiao')

'dog wow wow jiao'.rpartition('dog')
Out[132]: ('', 'dog', ' wow wow jiao')

'dog wow wow jiao'.rpartition('jiao')
Out[133]: ('dog wow wow ', 'jiao', '')

'dog wow wow jiao'.rpartition('ww')
Out[135]: ('', '', 'dog wow wow jiao')

'dog wow wow jiao'.partition('wow')

# ('dog ', 'wow', ' wow jiao')

'dog wow wow jiao'.partition('dog')

# ('', 'dog', ' wow wow jiao')

'dog wow wow jiao'.partition('jiao')

# ('dog wow wow ', 'jiao', '')

'dog wow wow jiao'.partition('ww')

# ('dog wow wow jiao', '', '')

'dog wow wow jiao'.rpartition('wow')

Out[131]: ('dog wow ', 'wow', ' jiao')

'dog wow wow jiao'.rpartition('dog')

Out[132]: ('', 'dog', ' wow wow jiao')

'dog wow wow jiao'.rpartition('jiao')

Out[133]: ('dog wow wow ', 'jiao', '')

'dog wow wow jiao'.rpartition('ww')

Out[135]: ('', '', 'dog wow wow jiao')

str.split(sep=None, maxsplit=-1); str.rsplit(sep=None, maxsplit=-1)

'1,2,3'.split(','), '1, 2, 3'.rsplit()
# (['1', '2', '3'], ['1,', '2,', '3'])

'1,2,3'.split(',', maxsplit=1),  '1,2,3'.rsplit(',', maxsplit=1)
# (['1', '2,3'], ['1,2', '3'])

'1 2 3'.split(), '1 2 3'.rsplit()
# (['1', '2', '3'], ['1', '2', '3'])

'1 2 3'.split(maxsplit=1), '1 2 3'.rsplit(maxsplit=1)
# (['1', '2 3'], ['1 2', '3'])

'   1   2   3   '.split()
# ['1', '2', '3']

'1,2,,3,'.split(','), '1,2,,3,'.rsplit(',')
# (['1', '2', '', '3', ''], ['1', '2', '', '3', ''])

''.split()
# []
''.split('a')
# ['']
'bcd'.split('a')
# ['bcd']
'bcd'.split(None)
# ['bcd']

'1,2,3'.split(','), '1, 2, 3'.rsplit()

# (['1', '2', '3'], ['1,', '2,', '3'])

'1,2,3'.split(',', maxsplit=1), '1,2,3'.rsplit(',', maxsplit=1)

# (['1', '2,3'], ['1,2', '3'])

'1 2 3'.split(), '1 2 3'.rsplit()

# (['1', '2', '3'], ['1', '2', '3'])

'1 2 3'.split(maxsplit=1), '1 2 3'.rsplit(maxsplit=1)

# (['1', '2 3'], ['1 2', '3'])

' 1 2 3 '.split()

# ['1', '2', '3']

'1,2,,3,'.split(','), '1,2,,3,'.rsplit(',')

# (['1', '2', '', '3', ''], ['1', '2', '', '3', ''])

''.split()

# []

''.split('a')

# ['']

'bcd'.split('a')

# ['bcd']

'bcd'.split(None)

# ['bcd']

str.splitlines([keepends])

字串以行界符為分隔符拆分為列表；當 keepends 為True，拆分後保留行界符，能被識別的行界符見官方文件。

'ab cnnde fgrklrn'.splitlines()
# ['ab c', '', 'de fg', 'kl']
'ab cnnde fgrklrn'.splitlines(keepends=True)
# ['ab cn', 'n', 'de fgr', 'klrn']

"".splitlines()， ''.split('n')      #注意兩者的區別
# ([], [''])
"One linen".splitlines()
# (['One line'], ['Two lines', ''])

'ab cnnde fgrklrn'.splitlines()

# ['ab c', '', 'de fg', 'kl']

'ab cnnde fgrklrn'.splitlines(keepends=True)

# ['ab cn', 'n', 'de fgr', 'klrn']

"".splitlines()， ''.split('n') #注意兩者的區別

# ([], [''])

"One linen".splitlines()

# (['One line'], ['Two lines', ''])

字串條件判斷

str.endswith(suffix[, start[, end]]); str.startswith(prefix[, start[, end]])

text = 'outer protective covering'

text.endswith('ing')
# True

text.endswith(('gin', 'ing'))
# True
text.endswith('ter', 2, 5)
# True

text.endswith('ter', 2, 4)
# False

text = 'outer protective covering'

text.endswith('ing')

# True

text.endswith(('gin', 'ing'))

# True

text.endswith('ter', 2, 5)

# True

text.endswith('ter', 2, 4)

# False

str.isalnum()

字串和數字的任意組合，即為真，簡而言之：

只要 c.isalpha(), c.isdecimal(), c.isdigit(), c.isnumeric() 中任意一個為真，則 c.isalnum() 為真。

'dobi'.isalnum()
# True

'dobi123'.isalnum()
# True

'123'.isalnum()
# True

'徐'.isalnum()
# True

'dobi_123'.isalnum()
# False

'dobi 123'.isalnum()
# False

'%'.isalnum()
# False

'dobi'.isalnum()

# True

'dobi123'.isalnum()

# True

'123'.isalnum()

# True

'徐'.isalnum()

# True

'dobi_123'.isalnum()

# False

'dobi 123'.isalnum()

# False

'%'.isalnum()

# False

str.isalpha()

Unicode 字元資料庫中作為 “Letter”（這些字元一般具有 “Lm”, “Lt”, “Lu”, “Ll”, or “Lo” 等標識，不同於 Alphabetic）的，均為真。

'dobi'.isalpha()
# True

'do bi'.isalpha()
# False

'dobi123'.isalpha()
# False

'徐'.isalpha()
# True

'dobi'.isalpha()

# True

'do bi'.isalpha()

# False

'dobi123'.isalpha()

# False

'徐'.isalpha()

# True

str.isdecimal(); str.isdigit(); str.isnumeric()

三個方法的區別在於對 Unicode 通用標識的真值判斷範圍不同：

isdecimal: Nd,
isdigit: No, Nd,
isnumeric: No, Nd, Nl

digit 與 decimal 的區別在於有些數值字串，是 digit 卻非 decimal ，具體戳這裡

num = 'u2155'
print(num)
# ⅕
num.isdecimal(), num.isdigit(), num.isnumeric()
# (False, False, True)

num = 'u00B2'
print(num)
# ²
num.isdecimal(), num.isdigit(), num.isnumeric()
# (False, True, True)

num = "1"  #unicode
num.isdecimal(), num.isdigit(), num.isnumeric()
# (Ture, True, True)

num = "'Ⅶ'" 
num.isdecimal(), num.isdigit(), num.isnumeric()
# (False, False, True)

num = "十"
num.isdecimal(), num.isdigit(), num.isnumeric()
# (False, False, True)

num = b"1" # byte
num.isdigit()   # True
num.isdecimal() # AttributeError 'bytes' object has no attribute 'isdecimal'
num.isnumeric() # AttributeError 'bytes' object has no attribute 'isnumeric'

num = 'u2155'

print(num)

# ⅕