python的正規表示式

morra發表於2016-10-31

1 元字元：

1.1 .

.除了換行符以外的任何單個字元

1.2 ^

^只匹配起始字元

temp1=re.findall('^morra','nsudi werwuirnmorra')
temp2=re.findall('^morra','morransudi werwuirn')

print(temp1)    #[]
print(temp2)    #['morra']

1.3 $

$只匹配結尾字元

temp2=re.findall('morra$','morransudi werwuirn')
temp1=re.findall('morra$','nsudi werwuirnmorra')

print(temp1)    #['morra']
print(temp2)    #[]

1.4 *

*匹配0到多次，等同於{0,}

temp1=re.findall('morra*','wwwmorr')
temp2=re.findall('morra*','wwwmorra')
temp3=re.findall('morra*','wwwmorraa')     #貪婪匹配

print(temp1)    #['morr']
print(temp2)    #['morra']
print(temp3)    #['morraa']

1.5 +

+匹配1到多次，{1,}

temp1=re.findall('morra+','wwwmorr')
temp2=re.findall('morra+','wwwmorra')
temp3=re.findall('morra+','wwwmorraa')

print(temp1)    #[]
print(temp2)    #['morra']
print(temp3)    #['morraa']

1.6 ?

？匹配0到1次，{0,1}

temp1=re.findall('morra?','wwwmorr')
temp2=re.findall('morra?','wwwmorra')
temp3=re.findall('morra?','wwwmorraa')

print(temp1)    #['morr']
print(temp2)    #['morra']
print(temp3)    #['morra']

1.7 { }

{ }自定義匹配次數:{1}匹配1次，{1,2}匹配1到2次

temp1=re.findall('morra{0}','wwwmorr')
temp2=re.findall('morra{2}','wwwmorra')
temp3=re.findall('morra{2}','wwwmorraa')

print(temp1)    #['morr']
print(temp2)    #[]
print(temp3)    #['morraa']

1.8 [ ]

[ ]字符集，只匹配單個字元

temp2=re.findall('morr[ab]','wwwmorra')
temp3=re.findall('morr[ab]','wwwmorrab')

print(temp2)    #['morra']
print(temp3)    #['morra']

注意：元字元在字符集裡就不再具有特殊意義，變成了普通字了，但是\d \w \s除外

temp4=re.findall('morr[?]','wwwmorr?')    #['morr?']
temp1=re.findall('[ab]','wwwmorrab.ab')    #['a', 'b', 'a', 'b']
temp2=re.findall('[1-9]','wwwmorr1234')    #['1', '2', '3', '4']

例外
temp5 = re.findall('[\d]', 'www.m12orr1234')      #['1', '2', '1', '2', '3', '4']
temp6 = re.findall('[\w]', 'www.m12')      #['w', 'w', 'w', 'm', '1', '2']
temp7 = re.findall('[\s]', 'www.m12or r1234')      #[' ']

1.9 ^

^反向匹配
temp1=re.findall('[^1-9]','www.m12orr1234') #匹配除數字意外的所有字元
print(temp1) #['w', 'w', 'w', '.', 'm', 'o', 'r', 'r']

1.10 \是最重要的元字元，可實現的功能如下：

(1) 後面跟元字元就是去除其特殊功能。

(2) 後面跟普通字元可以實現一些特殊功能，

(3) 引用序號對應的字組所匹配的字串。

1.10.1 後面跟元字元就是去除其特殊功能。

temp1 = re.findall('\$', 'www.m12$orr1234')
print(temp1)    #['$']

1.10.2 後面跟普通字元可以實現一些特殊功能。

可實現的功能如下：
\d 匹配任何單個十進位制數，等同於[0-9],如果想匹配多個可以使用\d\d,\d+,\d{2}

temp1 = re.findall('\d', 'www.m12orr1234')
temp2 = re.findall('\d\d', 'www.m12orr1234')
temp3 = re.findall('\d\d\d', 'www.m12orr1234')
temp4 = re.findall('\d{2,3}', 'www.m12orr1234')  # 匹配2次，或3次

print(temp1)  # ['1', '2', '1', '2', '3', '4']
print(temp2)  # ['12', '12', '34']
print(temp3)  # ['123']
print(temp4)  # ['12', '123']

\D 匹配任何非數字字元，等同於[^0-9]
\s 匹配任何空白字元，等同於[ \t\n\r\f\v]
\S 匹配任何非空白字元，等同於[^ \t\n\r\f\v]
\w 匹配任何字母數字字元，等同於[A-Za-z0-9]
\W 匹配任何非字母數字字元，等同於[^A-Za-z0-9]
\b 匹配一個單詞邊界，也就是指單詞和空格間的位置

temp = re.findall(r"abc\b", "asdas abc 1231231")
print(temp)     #['abc']

temp = re.findall("abc\\b", "asdas abc 1231231")
print(temp)     #['abc']

temp = re.findall(r"abc\b", "asdas*abc*1231231")
print(temp)     #['abc']，同樣也可以識別特殊字元和單詞之間的邊界

temp = re.findall(r"abc2\b", "asdas*abc2*31231")
print(temp)     #['abc2']，同樣也可以識別特殊字元和單詞之間的邊界

temp = re.findall(r"\bI", "I MISS IOU")
print(temp)     #['I', 'I']

1.10.3 引用序號對應的字組所匹配的字串。

temp = re.search(r'(alex)(eric)com\1', 'alexericcomalex').group()   #alexericcomalex，\1就是第一組等同於alex
temp2 = re.search(r'(alex)(eric)com\2', 'alexericcomeric').group()  #alexericcomeric，\2就是第二組等同於eric

2 正則函式

返回一個物件：match、search、finditer
返回一個列表：findall、split

2.1 match

match，從頭匹配，使用的比較多

def match(pattern, string, flags=0):
    """
    match(規則，字串，標誌位)
    flag(標誌位)用於修改正規表示式的匹配方式，常見的工作方式如下：
    re.I    使匹配對大小寫不敏感
    re.L    做本地化識別（local-aware）匹配
    re.M    多行匹配，影響^和$，不匹配換行符
    re.S    使.匹配包括換行在內的所有字元
    """
    
    """Try to apply the pattern at the start of the string, returning
    a match object, or None if no match was found."""
    return _compile(pattern, flags).match(string)

2.2search

匹配字串，匹配到一個後，就不再往下匹配了。

- 非貪婪匹配
從前面可以看到'*','+','?'都是貪婪的。但是一旦在\d+後面加上？之後，就表示非貪婪匹配。

temp = re.search(r'a(\d+?)','a23b').group()   #a2
temp = re.search(r'a(\d+?)b','a23b').group()    #ab，當()再a、b中間的時候?的作用失效，整個匹配又成了貪婪模式
match()

2.3findall

將所有匹配到的所有內容都放置在一個列表中。

2.3.1 findall的優先匹配原則

findall與search、match等不同的是，他會優先取組裡的內容，可以使用?:來關閉

temp = re.findall(r"www.(bing|google).com", "123 www.bing.com123www.google.com123")
print(temp)        #['bing', 'google']，只把組裡的內容匹配到了

temp = re.findall(r"www.(?:bing|google).com", "123 www.bing.com123www.google.com123") 
print(temp)        #['www.bing.com', 'www.google.com'],#在(後面加上?:可以關閉findall的優先捕獲功能

2.3.2 匹配順序

一旦有字元被匹配到，就會把匹配到的字元拿走，然後再匹配剩下的字元，如下。

n = re.findall("\d+\w\d+","a2b3c4d5")
print(n)    #['2b3', '4d5']，

2.3.3 匹配空值

當匹配規則為空時，如果沒有匹配到也會把空值放到結果中：

n = re.findall("","a2b3c4d5")
print(n)    #['', '', '', '', '', '', '', '', '']

因此在寫python正則的時候，要儘量使正則不為空

n = re.findall(r"(\w)+", "alex")    #推薦寫法
print(n)  # ['x']

n = re.findall(r"(\w)*", "alex")    #不推薦寫法
print(n)  # ['x','']

2.3.4 findall的分組匹配

temp = re.findall('a(\d+)','a23b')    #['23']
temp = re.findall('a(\d+?)','a23b')    #['2']，非貪婪模式匹配
temp = re.findall(r'a(\d+?)b', 'a23b')  # ['23']，當()再a、b中間的時候?的作用失效，整個匹配又成了貪婪模式
match()

如果正則裡有一個組，那麼會把組裡匹配到的元素加入到最終的列表裡。
如果正則裡有一個組，會把元素合併到一個元組裡，然後作為列表的一個元素。

n = re.findall(r"(\w)", "alex")
print(n)  # ['a', 'l', 'e', 'x']

n = re.findall(r"(\w)(\w)(\w)(\w)", "alex")
print(n)  # [('a', 'l', 'e', 'x')]，有幾個括號就取幾次

n = re.findall(r"(\w){4}", "alex")
print(n)  # ['x']，有幾個括號就取幾次

n = re.findall(r"(\w)*", "alex")
print(n)  # ['x','']，有幾個括號就取幾次

2.4 finditer

finditer返回的是個可迭代的物件

p = re.compile(r'\d+')
source = '12  dsnjfkbsfj1334jnkb3kj242'
w1 = p.finditer(source)
w2 = p.findall(source)
print(w1)       #<callable_iterator object at 0x102079e10>
print(w2)       #['12', '1334', '3', '242']

for match in w1:
    print(match.group(), match.span())
    """
    12 (0, 2)
    1334 (14, 18)
    3 (22, 23)
    242 (25, 28)
    """

p = re.compile(r'\d+')

iterate = p.finditer('')
for match in iterate:
    match.group(), match.span()

2.5 分組匹配

a = '123abc456'
temp1 = re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(0)   #123abc456
temp2 = re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(1)   #123
temp3 = re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(2)   #abc
temp4 = re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(3)   #456
temp5 = re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(1,3)   #('123', '456')
temp6 = re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(1,2,3)  #('123', 'abc', '456')

print(temp1)
print(temp2)
print(temp3)
print(temp4)
print(temp5)
print(temp6)

match、search、findall都有兩個匹配方式：簡單匹配和分組匹配

2.5.1 無分組：

import re

origin = "hello morra bcd morra lge morra acd 19"
r = re.match("h\w+", origin)

print(r.group())    #hello，獲取匹配到的所有結果
print(r.groups())   #()，獲取模型中匹配到的分組結果
print(r.groupdict())    #{}，獲取模型中匹配到的分組中所有執行了key的組

2.5.2 有分組：

import re

origin = "hello morra bcd morra lge morra acd 19"
r = re.match("h(\w+)", origin)

print(r.group())    #hello，獲取匹配到的所有結果
print(r.groups())   #('ello',)，獲取模型中匹配到的分組結果
print(r.groupdict())    #{}，獲取模型中匹配到的分組中所有執行了key的組

2.5.3 多個分組

import re

origin = "hello morra bcd morra lge morra acd 19"
r = re.match("(?P<n1>h)(?P<n2>\w+)", origin)

print(r.group())    #hello，獲取匹配到的所有結果
print(r.groups())   #('h', 'ello')，獲取模型中匹配到的分組結果
print(r.groupdict())    #{'n2': 'ello', 'n1': 'h'}，獲取模型中匹配到的分組中所有執行了key的組

2.6 分割：

def split(pattern, string, maxsplit=0, flags=0):

    return _compile(pattern, flags).split(string, maxsplit)
#maxsplit=0，最多分割次數

2.6.1 split優先匹配

由於split和findall一樣結果返回的是所有元素的列表，因此他和findall一樣具有優先匹配特性

2.6.2 有組與無組

origin = "hello alex bcd abcd lge acd 19"
n = re.split("a\w+",origin,1)
print(n)    #['hello ', ' bcd abcd lge acd 19']，無組分割，結果不含自己

origin = "hello alex bcd abcd lge acd 19"
n = re.split("(a\w+)",origin,1)
print(n)    #['hello ', 'alex', ' bcd abcd lge acd 19']，有組分割，結果會包含自己

origin = "hello alex bcd abcd lge acd 19"
n = re.split("a(\w+)",origin,1)
print(n)    #['hello ', 'lex', ' bcd abcd lge acd 19']，有組分割，結果會包含自己

2.6.3 特殊情況分析

temp = re.split('\d+','one1two2three3four4')
print(temp)     #['one', 'two', 'three', 'four', '']

temp = re.split('\d+','1one1two2three3four4')
print(temp)     #['', 'one', 'two', 'three', 'four', '']

temp = re.split('[bc]', 'abcd')
print(temp)  # ['a', '', 'd']   #思考題

2.7 替換：

2.7.1 sub

temp = re.sub('g.t', 'have', 'I get A,I got B,I gut C')  # I have A,I have B,I have C

temp = re.sub('g.t', 'have', 'I get A,I got B,I gut C', count=0)  # I have A,I have B,I have C，count預設為0，全部替換

temp = re.sub('g.t', 'have', 'I get A,I got B,I gut C', count=1)  # I have A,I got B,I gut C

temp = re.sub('g.t', 'have', 'I get A,I got B,I gut C', count=2)  # I have A,I have B,I gut C
temp = re.sub('g.t', 'have', 'I get A,I got B,I gut C', 2)  # I have A,I have B,I gut C

temp = re.sub('g.t', 'have', 'I get A,I got B,I gut C', count=3)  # I have A,I have B,I have C
temp = re.sub('g.t', 'have', 'I get A,I got B,I gut C', 3)  # I have A,I have B,I have C

2.7.2 subn

subn最後會統計被替換的次數:

temp = re.subn('g.t', 'have', 'I get A,I got B,I gut C')  #('I have A,I have B,I have C', 3)，一共被替換了三次

2.8 編譯：

compile可以吧一個規則編譯到一個物件裡，在多次批量呼叫的時候比較方便

text = "g123dwsdsam cool coolksf"
regex = re.compile(r'\w*oo\w*') 
temp = regex.findall(text)  # 查詢所有包含'oo'的單詞
print(temp)                #['cool', 'coolksf']

3 應用

3.1 計算器

import re

source = "1 - 2 * (( 60-30 + (-9-2-5-2*3-5/3-40*4/2-3/5+6*3)*(-9-2-5-2*5/3 +7/3*99/4*2998 + 10 * 568 /14)) -(-4*3)/(16-3*2))"
temp = re.search("\([^()]+\)",source).group()       #匹配括號裡的內容
print(temp)


source2 = '2**3'        #冪運算
re.search('\d+\.?\d*([*/]|\*\*)\d+\.?\d*',source2)      # \d+\.?\d* 匹配一個數（整數或浮點數），[*/]|\*\*匹配乘除,或冪運算
print(source2)

print(2**3)

3.2 匹配IP

source = "13*192.168.1.112\12321334"
temp= re.search(r"(([01]?\d?\d|2[0-4]\d|25[0-5])\.){3}([01]?\d?\d|2[0-4]\d|25[0-5])",source).group()

print(temp)     #192.168.1.112

4 補充

為了防止python語法和正則語法衝突，在正則匹配規則前面加前面建議加r

temp = re.findall(r"abc\b", "asdas abc 1231231")
print(temp)     #['abc']

temp = re.findall("abc\\b", "asdas abc 1231231")
print(temp)     #['abc']

python 的正規表示式
2013-10-10
Python
Python——正規表示式
2019-08-05
Python
Python 正規表示式
2021-09-09
Python
Python：正規表示式
2021-04-22
Python
python正規表示式
2023-02-14
Python
Python正規表示式手稿
2020-04-04
Python
python之正規表示式
2018-08-11
Python
Python正規表示式大全
2020-11-26
Python
python工具_正規表示式
2017-05-20
Python
python 正規表示式匹配
2024-04-19
Python
python中的re（正規表示式）
2016-12-07
Python
【正規表示式】常用的正規表示式（數字，漢字，字串，金額等的正規表示式）
2021-12-13
字串
python正規表示式(re模組)
2020-08-08
Python
正規表示式（python3）
2021-03-11
Python
python re模組正規表示式
2018-09-12
Python
Python 正規表示式 re 模組
2018-10-12
Python
詳解 Python 正規表示式
2020-11-20
Python
Python正規表示式詳解
2023-11-24
Python
Python正規表示式精講
2017-07-27
Python
Python正規表示式基礎
2013-06-19
Python
正規表示式
2024-03-23
python正規表示式問號的使用
2021-09-11
Python
python 學習 -- 正規表示式的使用
2016-10-23
Python
如何編寫python的正規表示式
2015-10-10
Python
JavaScript的正規表示式
2017-10-21
JavaScript
JS 的正規表示式
2017-09-11
JS
Python 之 RE（正規表示式）常用
2020-03-16
Python
python 關於正規表示式re
2020-04-21
Python
Python正規表示式初識（四）
2021-09-09
Python
Python 正規表示式模組詳解
2018-11-02
Python
Python-day-15-正規表示式
2018-08-03
Python
python正規表示式（簡明版）
2020-12-19
Python
Python 正規表示式（RegEx）指南
2023-11-02
Python
python基礎操作——正規表示式
2023-04-10
Python
python筆記(2) 正規表示式
2017-02-20
Python筆記
正規表示式之python實現
2015-03-19
Python
JS常用正規表示式及驗證時間的正規表示式
2022-03-19
JS
Python正規表示式 findall函式詳解
2018-03-20
Python函式