Python：正規表示式

thoustree發表於2021-04-22

原文網址 : https://www.cnblogs.com/ydkh/p/14688029.html

Python

1. 正規表示式概述　

　　正規表示式（簡稱為 regex）是一些由字元和特殊符號組成的字串，描述了模式的重複或者表述多個字元。

　　正規表示式能按照某種模式匹配一系列有相似特徵的字串。

　　換句話說，它們能夠匹配多個字串。

　　不同語言的正規表示式有差異，本文敘述是Python的正規表示式。

　　解釋程式碼大多摘自《Python程式設計快速上手讓繁瑣工作自動化》

2. 正規表示式書寫

　　正規表示式就是一個字串，與普通字串不同的是，正規表示式包含了0個或多個表示式符號以及特殊字元，詳見《Python核心程式設計》1.2節。

# 正規表示式書寫

'hing'
'\wing'
'123456'
'\d\d\d\d\d\d'
'regex.py'
'.*\.py'

3. 建立正規表示式物件

　　孤立的一個正規表示式並不能起到匹配字串的作用，要讓其能夠匹配目標字元，需要建立一個正規表示式物件。通常向compile()函式傳入一個原始字元形式的正規表示式，即 r'.....'

>>> # re模組的compile()函式將返回（建立）一個Regex模式物件
>>> import re
>>> phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')

4. 常用的正規表示式模式

4.1 括號分組

>>> Regex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
>>> mo = Regex.search('My number is 415-555-4242.')
>>> Regex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)') # 建立Regex物件
>>> mo = Regex.search('My number is 415-555-4242.')   # 返回Match物件
>>> mo.group()         # 呼叫Regex物件的group()方法將返回整個匹配文字
'415-555-4242'
>>> mo.group(1)
'415'
>>> mo.group(2)
'555-4242'
>>> mo.group(0)
'415-555-4242'
>>> mo.groups()
('415', '555-4242')
>>> a,b = mo.groups()   # groups()方法返回多個值得元組
>>> a
'415'
>>> b
'555-4242'
>>>

4.2 用管道匹配多個分組

>>> heroRegex = re.compile (r'Batman|Tina Fey')
>>> mo1 = heroRegex.search('Batman and Tina Fey.')
>>> mo1.group()
'Batman'
>>> mo2 = heroRegex.search('Tina Fey and Batman.')
>>> mo2.group()
'Tina Fey

4.3 用問號實現可選匹配

>>> batRegex = re.compile(r'Bat(wo)?man')   # 如果'wo'沒有用括號括起來，則可選的字元將是Batwo
>>> mo1 = batRegex.search('The Adventures of Batman')
>>> mo1.group()
'Batman'
>>> mo2 = batRegex.search('The Adventures of Batwoman')
>>> mo2.group()
'Batwoman'

4.4 用星號匹配零次或多次

>>> batRegex = re.compile(r'Bat(wo)*man') # 如果要匹配'*'號則用\*
>>> mo1 = batRegex.search('The Adventures of Batman')
>>> mo1.group()
'Batman'
>>> mo2 = batRegex.search('The Adventures of Batwoman')
>>> mo2.group()
'Batwoman'
>>> mo3 = batRegex.search('The Adventures of Batwowowowoman')
>>> mo3.group()
'Batwowowowoman

4.5 用加號匹配一次或多次

>>> batRegex = re.compile(r'Bat(wo)+man')  # 如果要匹配+號用\+
>>> mo1 = batRegex.search('The Adventures of Batwoman')
>>> mo1.group()
'Batwoman'
>>> mo2 = batRegex.search('The Adventures of Batwowowowoman')
>>> mo2.group()
'Batwowowowoman'
>>> mo3 = batRegex.search('The Adventures of Batman')
>>> mo3 == None
True

4.6 用花括號匹配特定次數

　　下面程式碼的 “?” 表示非貪心匹配。問號在正規表示式中可能有兩種含義：宣告非貪心匹配或表示可選的分組。這兩種含義是完全無關的。

>>> greedyHaRegex = re.compile(r'(Ha){3,5}') # 若果要匹配{,則用\{
>>> mo1 = greedyHaRegex.search('HaHaHaHaHa')
>>> mo1.group()
'HaHaHaHaHa'
>>> nongreedyHaRegex = re.compile(r'(Ha){3,5}?')
>>> mo2 = nongreedyHaRegex.search('HaHaHaHaHa')
>>> mo2.group()
'HaHaHa'

5. 貪心和非貪心匹配

　　利用非貪心匹配的目的往往在於不想讓萬用字元（.）連萬用字元之外的匹配字元也被匹配，程式碼如下。當然3.6也是非貪心匹配的一個例子

>>> nongreedyRegex = re.compile(r'<.*?>')
>>> mo = nongreedyRegex.search('<To serve man> for dinner.>')
>>> mo.group()
'<To serve man>'
>>> greedyRegex = re.compile(r'<.*>')
>>> mo = greedyRegex.search('<To serve man> for dinner.>')
>>> mo.group()
'<To serve man> for dinner.>'

6. Regex 物件常用方法

　　如上所述，compile()函式建立了一個Regex物件，Regex物件常用方法如下

6.1 search(), group(), groups()

>> Regex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
>>> mo = Regex.search('My number is 415-555-4242.')
>>> Regex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)') # 建立Regex物件
>>> mo = Regex.search('My number is 415-555-4242.')   # 返回Match物件
>>> mo.group()         # 呼叫Regex物件的group()方法將返回整個匹配文字
'415-555-4242'
>>> mo.group(1)
'415'
>>> mo.group(2)
'555-4242'
>>> mo.group(0)
'415-555-4242'
>>> mo.groups()
('415', '555-4242')
>>> a,b = mo.groups()   # groups()方法返回多個值得元組
>>> a
'415'
>>> b
'555-4242'
>>>

6.2 findall()

　　如果呼叫在一個沒有分組的正規表示式上，findall()將返回一個匹配字串的列表。

　　如果呼叫在一個有分組的正規表示式上，findall()將返回一個字串的元組的列表（每個分組對應一個字串）

>>> Regex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') # has no groups
>>> Regex.findall('Cell: 415-555-9999 Work: 212-555-0000')
['415-555-9999', '212-555-0000']
>>> Regex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)') # has groups
>>> Regex.findall('Cell: 415-555-9999 Work: 212-555-0000')
[('415', '555', '1122'), ('212', '555', '0000')]

6.3 sub()

>>> namesRegex = re.compile(r'Agent \w+')
>>> namesRegex.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob.')
'CENSORED gave the secret documents to CENSORED.'
>>> namesRegex = re.compile(r'Agent \w+')
>>> namesRegex.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob.' , 1)  # 匹配1次
'CENSORED gave the secret documents to Agent Bob.'

7. re.IGNOREC ASE、 re.DOTALL 和 re.VERBOSE

　　要讓正規表示式不區分大小寫，可以向 re.compile()傳入 re.IGNORECASE 或 re.I，作為第二個引數。

　　通過傳入 re.DOTALL 作為 re.compile()的第二個引數，可以讓句點字元匹配所有字元，包括換行字元。

　　要在多行正規表示式中新增註釋，則向 re.compile()傳入變數 re.VERBOSE，作為第二個引數。

>>> someRegexValue = re.compile('foo', re.IGNORECASE | re.DOTALL | re.VERBOSE)

8. (?:…)

>>> re.findall(r'http://(?:\w+\.)*(\w+\.com)', 'http://google.com http://www.google.com http://code.google.com')
['google.com', 'google.com', 'google.com']
>>>

9.程式碼實踐

# （檔案讀寫）瘋狂填詞2.py

'''
建立一個瘋狂填詞（ Mad Libs）程式，它將讀入文字檔案， 並讓使用者在該文字檔案中出現 
ADJECTIVE、 NOUN、 ADVERB 或 VERB 等單詞的地方， 加上他們自己的文字。例如，一個文字檔案可能看起來像這樣：
The ADJECTIVE panda walked to the NOUN and then VERB. A nearby NOUN was
unaffected by these events.
程式將找到這些出現的單詞， 並提示使用者取代它們。
Enter an adjective:
silly
Enter a noun:
chandelier
Enter a verb:
screamed
Enter a noun:
pickup truck
以下的文字檔案將被建立：
The silly panda walked to the chandelier and then screamed. A nearby pickup truck was unaffected by these events.
結果應該列印到螢幕上， 並儲存為一個新的文字檔案。
'''


import re

def mad_libs(filename_path, save_path):
    with open(filename_path,'r') as strings: # 相對路徑下的文件
        words = strings.read()
    Regex = re.compile(r'\w[A-Z]+')   # \w ：匹配1個任何字母、數字或下劃線
    finds = Regex.findall(words)
    for i in finds:
        replace = input('輸入你想替換 {} 的單詞:\n'.format(i)) 
        Regex2 = re.compile(i)
        words = Regex2.sub(replace,words,1) # 這個變數必須要是words與上面一致否則只列印最後替換的一個,可以畫棧堆圖跟蹤這個變數的值
    print(words)
    
    # strings.close()  不用這一行，with 上下文管理器會自動關閉

    with open(save_path,'a') as txt: 
        txt.write(words + '\n') #分行寫
        txt.close()
        
    # save_txt = open('儲存瘋狂填詞文件.txt','a')
    # save_txt.write(words)
    # save_txt.close()

if __name__ == '__main__': 
    filename_path = input('輸入要替換的txt文字路徑：')    # '瘋狂填詞原始文件.txt'
    save_path = input('輸入要儲存的檔案路徑(包含檔名稱）:') # '儲存瘋狂填詞文件.txt'
    mad_libs(filename_path, save_path)

Python——正規表示式
2019-08-05
Python
python正規表示式
2024-06-15
Python
Python 正規表示式
2021-09-09
Python
python之正規表示式
2018-08-11
Python
python 正規表示式匹配
2024-04-19
Python
Python正規表示式手稿
2020-04-04
Python
Python正規表示式大全
2020-11-26
Python
Python正規表示式詳解
2023-11-24
Python
Python 正規表示式（RegEx）指南
2023-11-02
Python
詳解 Python 正規表示式
2020-11-20
Python
正規表示式（python3）
2021-03-11
Python
Python 正規表示式 re 模組
2018-10-12
Python
python re模組正規表示式
2018-09-12
Python
Python-day-15-正規表示式
2018-08-03
Python
Python 之 RE（正規表示式）常用
2020-03-16
Python
python正規表示式(re模組)
2020-08-08
Python
python 關於正規表示式re
2020-04-21
Python
Python基礎之正規表示式
2024-06-30
Python
Python正規表示式初識（四）
2021-09-09
Python
python基礎操作——正規表示式
2023-04-10
Python
python正規表示式（簡明版）
2020-12-19
Python
正規表示式
2024-10-30
正規表示式.
2019-11-10
Python正規表示式 findall函式詳解
2018-03-20
Python函式
Python 正規表示式模組詳解
2018-11-02
Python
Python學習筆記 - 正規表示式
2019-01-16
Python筆記
python正規表示式小例幾則
2018-08-09
Python
Python筆記五之正規表示式
2024-02-25
Python筆記
python爬蟲正規表示式詳解
2024-11-25
Python爬蟲
Python正規表示式提取車牌號
2024-08-22
Python
python正規表示式問號的使用
2021-09-11
Python
【正規表示式】常用的正規表示式（數字，漢字，字串，金額等的正規表示式）
2021-12-13
字串
Python學習筆記|Python之正規表示式
2018-12-18
Python筆記
php –正規表示式
2019-02-16
PHP
【Linux】正規表示式
2018-10-18
Linux
【JavaScript】正規表示式
2019-03-02
JavaScript
URL正規表示式
2019-04-11
正規表示式 split()
2018-09-07

Python：正規表示式

1. 正規表示式概述

2. 正規表示式書寫

3. 建立正規表示式物件

4. 常用的正規表示式模式

4.1 括號分組

4.2 用管道匹配多個分組

4.3 用問號實現可選匹配

4.4 用星號匹配零次或多次

4.5 用加號匹配一次或多次

4.6 用花括號匹配特定次數

5. 貪心和非貪心匹配

6. Regex 物件常用方法

6.1 search(), group(), groups()

6.2 findall()

6.3 sub()

7. re.IGNOREC ASE、 re.DOTALL 和 re.VERBOSE

8. (?:…)

9.程式碼實踐

相關文章

1. 正規表示式概述