[PY3]——字串的分割、匹配、搜尋方法總結

Jelly_lyj發表於2017-03-18

？分割、匹配、搜尋時可以用到什麼樣的解決方法？

分割方法總結

1. str.split( )

* 分割字串

* 返回列表

s1='I  love  python'
# 預設以空格為界定符，且多個空格都當做一個處理
print(s1.split())
['I', 'love', 'python']

# (s1中有兩個空格)如果這是指定了空格為界定符，則會有其中一個空格會被當做字元輸出
print(s1.split(' '))
['I', '', 'love', '', 'python']

# 可指定任意字元/字串作為界定符
print(s1.split('o'))
['I  l', 've  pyth', 'n']

# maxsplit=n，指定分割n次
print(s1.split(maxsplit=1))
['I', 'love  python']

2. re.split()

* 可定義多個界定符

import re
line = 'asdf fjdk; afed, fjek,asdf, foo'

# 可指定多個字元作為界定符
print(re.split(r'[;,\s]\s*',line))
['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']

# 加一個括號表示捕獲分組
print(re.split(r'(;|,|\s)\s*',line))
['asdf', ' ', 'fjdk', ';', 'afed', ',', 'fjek', ',', 'asdf', ',', 'foo']

# (?:)強調為非捕獲分組
print(re.split(r'(?:,|;|\s)\s*',line))
['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']

搜尋和匹配方法總結

1. str.startswith() | str.endswith()

* 開頭/結尾匹配
* 返回True/False
* 常用於“判斷資料夾中是否存在指定檔案型別”、“URL”

url="http://www.python.org"
# startswith('string')判斷是否以string開頭
print(url.startswith('http'))
True

# endswith('string')判斷是否以string結尾
print(url.endswith('com'))
False

# startswith('string',n,m) 可指定索引範圍n-m
print(url.endswith('n',11,17))
True

# 要注意一個特性，傳遞給startswith/endswith處理的只能是tuple，不能是list
choices=['http:','ftp:']
print(url.startswith(choices))TypeError: startswith first arg must be str or a tuple of str, not list
print(url.startswith(tuple(choices)))
True

# endswith()，應用在檢索/判斷，一個目錄中是否有某一型別結尾的檔案
import os
filenames=os.listdir('/test')

#Example-1
print(filenames)
['aa', 'zhuabao', '.python-version', 'test.sh', 'hh.c', '.test.py.swp', 'zhuabao2', 'abc', 'linshi.sh']
print([candsh for candsh in filenames if candsh.endswith(('.sh','.c'))])
['test.sh', 'hh.c', 'linshi.sh']

#Example-2
if any(name.endswith(('.sh','.c')) for name in os.listdir('/test')):
    print('have')
have

2. fnmatch() | fnmatchcase()

* 使用Shell萬用字元匹配

3. str.find()

* 返回索引

4. re.match(r'')

* 使用正規表示式匹配

* 只檢查字串開始位置

5. re.findall(r'')

* 從任意位置開始匹配
* 以列表方式返回

6. re.finditer(r'')

* 以迭代方式返回

7. r' $'——>正規表示式以$結尾

* 確保精確

8. re.compile(r'')——>先編譯正規表示式

* 做多次/大量的匹配和搜尋操作時

import re
text1='2017/07/26'
text2='Nov 27,2012'
text3='Today is 11/27/2012. PyCon starts 3/13/2013.'
text5='26/07/2017 is today,PyCon starts 3/13/2013.'

# 編譯一個匹配 m/y/d/格式的正規表示式
datepat=re.compile(r'\d+/\d+/\d+')

# re.match('string')實現在string中搜尋
print(datepat.match(text1))
<_sre.SRE_Match object; span=(0, 10), match='2017/07/26'>
print(datepat.match(text2))
None

# 我們發現re.match() 只能實現從開始位置搜尋，也只能搜尋出開頭的第一個匹配項
print(datepat.match(text3))
None
print(datepat.match(text5))
<_sre.SRE_Match object; span=(0, 10), match='26/07/2017'>

# 這種情況有時可能得不到我們想要的結果，一種情況是可以在末尾加$，實現精確匹配
text6='26/07/2017abcdef'
datepat1=re.compile(r'\d+/\d+/\d+')
print(datepat1.match(text6))
<_sre.SRE_Match object; span=(0, 10), match='26/07/2017'>
datepat2=re.compile(r'\d+/\d+/\d+$')
print(datepat2.match(text6))
None

# 另一種情況是可以使用考慮使用re.findall('string') 可在string中的全部位置進行搜尋
print(datepat.findall(text3))
['11/27/2012', '3/13/2013']

# re.findall返回列表，re.finditer()返回迭代物件
for m in datepat.finditer(text5):
    print(m.groups())

# # 捕獲分組 # #
datepat=re.compile(r'(\d+)/(\d+)/(\d+)')
m=datepat.match(text1)
print(m.group(0))
2017/07/26
print(m.group(1))
2017
print(m.group(2))
07
print(m.group(3))
26
print(m.groups())
('2017', '07', '26')

for month,day,year in datepat.findall(text3):
    print('{}-{}-{}'.format(year,month,day))
012-11-272013-3-13

9. ?修飾符

* 將貪婪匹配變為非貪婪匹配

* 從而實現最短匹配模式

text6 = 'Computer says "no." Phone says "yes."'
pat1=re.compile(r'\"(.*)\"')  #匹配冒號包含的文字
print(pat1.findall(text6))
['no." Phone says "yes.']

pat2=re.compile(r'\"(.*?)\"') #增加 ?修飾符
print(pat2.findall(text6))
['no.', 'yes.']

10. （? : . | \n） | re.DOTALL

* 使得（.）能夠匹配包括換行符在內的所有字元

* 從而實現多行匹配模式

text7=''' /*this is a
multiline comment*/
'''

pat1=re.compile(r'/\*(.*?)\*/')
print(pat1.findall(text7))
[]                                      #為什麼沒匹配出來，因為(.)並不能匹配換行符

pat2=re.compile(r'/\*((?:.|\n)*?)\*/')  #把(.) ——> (?:.|\n)
print(pat2.findall(text7))
['this is a\nmultiline comment']

# re.DOTALL可以讓正規表示式中的點(.)匹配包括換行符在內的任意字元
pat3=re.compile(r'/\*(.*?)\*/',re.DOTALL)
print(pat3.findall(text7))
['this is a\nmultiline comment']

搜尋和替換方法總結

1. str.replace()

# S.replace(old, new[, count]) -> str

text5="a b c d e e e"
print(text5.replace("e","a"))
# a b c d a a a
print(text5.replace("e","a",2))
# a b c d a a e

2. re.sub() | re.(flags=re.IGNORECASE)

* 匹配並替換 | 忽略大小寫匹配

# sub(pattern, repl, string, count=0, flags=0)
# 第1個引數：匹配什麼
# 第2個引數：替換什麼
# 第3個引數：處理的文字
# 第4個引數：替換次數
text1="l o v e"
print(re.sub(r'\s','-',text1))
# l-o-v-e
print(re.sub(r'\s','-',text1,count=1))
# l-o v e

# flags=re.IGNORECASE 忽略大小寫
text3 = 'UPPER PYTHON, lower python, Mixed Python'
print(re.sub('python','snake',text3,flags=re.IGNORECASE))
# UPPER snake, lower snake, Mixed snake

# 如果想替換字元跟匹配字元的大小寫保持一致，我們需要一個輔助函式
def matchcase(word):
    def replace(m):
        text=m.group()
        if text.isupper():
            return word.upper()
        elif text.islower():
            return word.lower()
        elif text[0].isupper():
            return word.capitalize()
        else:
            return word
    return replace
print(re.sub('python',matchcase('snake'),text3,flags=re.IGNORECASE))
# UPPER SNAKE, lower snake, Mixed Snake

3. re.compile()

* 同理，多次替換時可先進行編譯

# 同樣可以先編譯、可以捕獲分組
text2='Today is 11/27/2012. PyCon starts 3/13/2013.'
datepat=re.compile(r'(\d+)/(\d+)/(\d+)')
print(datepat.sub(r'\3-\1-\2',text2))
# Today is 2012-11-27. PyCon starts 2013-3-13.

4. re.subn()

* 獲取替換的次數

# re.subn()可以統計替換髮生次數
newtext,n=datepat.subn(r'\3-\1-\2',text2)
print(newtext)
# Today is 2012-11-27. PyCon starts 2013-3-13.
print(n)
# 2

HDU 5469 Antonidas(樹上的字串匹配/搜尋)
2015-09-29
字串匹配
解決jive搜尋結果中的中文搜尋字串高亮度顯示的方法
2003-07-25
字串
字串搜尋
2016-06-01
字串
關於字串匹配查詢的總結
2014-11-27
字串匹配
演算法總結--搜尋
2023-03-27
演算法
js字串方法總結
2018-11-08
JS字串
字串分割方法
2014-06-16
字串
【技術點】計算機基礎演算法——排序 & 搜尋 & 字串匹配
2020-12-04
計算機演算法排序字串匹配
Python的字串分割方法
2020-11-04
Python字串
使用grep搜尋多個字串
2020-05-19
字串
kmp字串匹配，A星尋路演算法
2018-09-21
KMP字串匹配演算法
Python字串常用方法總結
2018-05-11
Python字串
jQuery的搜尋關鍵詞自動匹配外掛
2012-06-05
jQuery
[JS高程] 字串模式匹配方法
2021-11-25
JS字串模式
JS切割擷取字串方法總結
2019-05-27
JS字串
谷歌搜尋用上BERT，10%搜尋結果將改善
2019-11-01
谷歌
Linux 搜尋命令總結 – whereis,which,locate,find,grep
2019-02-16
Linux
[CareerCup] 18.8 Search String 搜尋字串
2016-05-09
字串
[PY3]——內建資料結構(5)——字串編碼
2017-03-18
資料結構字串編碼
搜尋引擎的體系結構
2007-04-03
【搜尋引擎】 PostgreSQL 10 實時全文檢索和分詞、相似搜尋、模糊匹配實現類似Google搜尋自動提示
2019-07-11
SQL分詞Go
深度學習在視覺搜尋和匹配中的應用
2020-09-27
深度學習視覺
Python字串加密解密方法總結薦
2013-01-05
Python字串加密解密
Oracle多行轉換成字串方法總結
2012-02-10
Oracle字串
字串分組相加方法四之總結
2007-11-07
字串
匹配搜尋關鍵高亮 new RegEXP 填坑
2018-07-18
Elasticsearch搜尋資料彙總
2020-11-25
Elasticsearch
字串匹配
2019-05-11
字串匹配
搜尋檔案下包含某個字串的檔案
2017-11-18
字串
vue2實現搜尋結果中的搜尋關鍵字高亮
2018-08-15
Vue
py匹配字串中間的字串
2017-12-05
字串
【譯】Swift演算法俱樂部-暴力字串搜尋
2019-03-02
Swift演算法字串
[PY3]——內建資料結構(3)——字串及其常用操作
2017-03-18
資料結構字串
啟發式搜尋的方式（深度優先，廣度優先）和搜尋方法（Dijkstra‘s演算法，代價一致搜尋，貪心搜尋，A星搜尋）
2021-01-02
演算法
直播系統搭建，可自動模糊匹配的搜尋下拉框
2023-04-14
深度解析搜尋引擎的原理結構
2015-05-14
win10 搜尋不到匹配的內容怎麼辦 win10 搜尋不到檔案怎麼解決
2020-10-21
Win10
再來一篇深度優先遍歷/搜尋總結？
2020-05-22