re模組函式模式詳解

ihav2carryon發表於2024-11-26

原文網址 : https://www.cnblogs.com/ihave2carryon/p/18570981

函式模式

re模組

python爬蟲過程中,實現頁面元素解析的方法很多,正則解析只是其中之一,常見的還有BeautifulSoup和lxml,它們都支援網頁HTML元素解析,re模組提供了強大的正規表示式功能

re模組常用方法

compile(pattern,flags=0) :用於編譯一個正規表示式字串,生成一個re.pattern物件
- 引數: pattern:正規表示式字串,flags:可選標誌,用於修改匹配行為(下面會詳細講解)
```
pattern = re.compile(r'a.*e')
```
search(pattern,string,flag=0):搜尋字串中第一個匹配,返回一個匹配物件,否則返回None
- 引數:pattern:正規表示式字串或編譯後的re.pattern物件,string:要搜尋的字串,flags:用於修改匹配行為（如 re.IGNORECASE、re.MULTILINE 等)
- 返回值:返回一個re.Match物件或None
```
result = re.search(r'\bcat\b', 'The cat sat on the mat')
#輸出<re.Match object; span=(4, 7), match='cat'> 接下來會解釋
```
match(pattern,string,flags=0):從字串的開頭匹配,如果匹配成功返回一個匹配物件,否則返回None
- 引數pattern:正規表示式字串或編譯後的re.Pattern 物件,string:要匹配的字串,flags:可選的標誌,用於修改匹配行為
- 返回值:返回一個re.Match物件或None
```
result = re.search(r'\bcat\b', 'The cat sat on the mat')
```
findall(pattern,string,flags=0):返回所有匹配的子串列表
- 返回值:一個包含所有匹配子串的列表
```
result = re.findall(r'\bcat\b', 'The cat sat on the cat mat')
#輸出['cat', 'cat']
```

finditer(pattern,string,flags=0) :返回一個迭代器,產生所有匹配的 Match物件

返回值:一個迭代器，產生 re.Match 物件

result = re.finditer(r'\bcat\b', 'The cat sat on the cat mat')
for match in result:
    print(match)
#<re.Match object; span=(4, 7), match='cat'>
#<re.Match object; span=(19, 22), match='cat'>

sub(pattern,repl,string,count=0,flags=0):替換字串中所有匹配的子串
- 引數:repl:替換字串或替換函式,count:可選引數,指定最大替換次數,預設為0(替換所有匹配)
- 返回值:替換後的字串
```
result = re.sub(r'\bcat\b', 'dog', 'The cat sat on the cat mat')
print(result)
#The dog sat on the dog mat
```
split(pattern,string,maxsplit=0,flags=0) :根據匹配的字串分割字串
- 引數:maxsplit:可選引數,指定最大分割次數,預設為0(不限分割次數)
- 返回值:一個包含分割結果的列表
```
text = "The cat sat on the cat mat"
result = re.split(r'\b', text)
print(result)
#['', 'The', ' ', 'cat', ' ', 'sat', ' ', 'on', ' ', 'the', ' ', 'cat', ' ', 'mat', '']
```

flags模式標誌位

模式匹配支援多種模式標誌,用於修改匹配模式行為

`re.IGNORECASE或re.I`

忽略大小寫匹配模式

eg:

text = "The cat sat on the cat mat"
regex = re.compile("CAT", re.IGNORECASE)
result = regex.search(text)
if result:
    print(result.group())
else:
    print("not found")
#output-> cat

`re.MULTILINE或re.M`

多行模式,使 ^ 和 $ 匹配每行的開始和結束

eg:

text = """first Line
second Line2
third Line3"""

regex = re.compile(r'^\w+', re.MULTILINE)
result = regex.findall(text)
print(result)
#output->['first', 'second', 'third']

`re.DOTALL 或 re.S`

可以使得. 匹配包括換行符在內的所有字元

eg:

text = "abc\ndef\nghi"

regex = re.compile(r'.*', re.DOTALL)
result = regex.findall(text)
print(result)
#output->['abc\ndef\nghi', '']

`re.VERBOSE 或 re.X`

允許在正規表示式中使用註釋和空白

eg:

	text = "The cat sat on the mat"
	
	regex = re.compile(r"""
	                \b #單詞邊界
	                cat#匹配"cat"
	                \b#單詞邊界
	""", re.VERBOSE)
	result = regex.findall(text)
	print(result)
#output->['cat']

內嵌模式標誌

內嵌模式的功能與flags標誌位的功能一致,與之不同的是使用內嵌模式標誌和註釋可以增強正規表示式的可讀性和靈活性

常用的內嵌模式標誌

(?i):忽略大小寫

eg:

text = "hello world"

regex = re.compile(r"(?#忽略大小寫)(?i)HELLO WORLD")
# (?#...) 是一個註釋語法
result = regex.match(text).group()
print(result)

(?m):多行模式,使 ^ 和 $ 匹配每行的開始和結束，而不僅僅是整個字串的開始和結束

eg:

text = """first Line
second Line
third Line"""

regex = re.compile(r"(?#多行模式)(?m)^\w+")
result = regex.findall(text)
print(result)
#output->['first', 'second', 'third']

(?s):點匹配所有字元模式

eg:

text = "abc\ndef\nghi"

regex = re.compile(r"(?#.號匹配換行符在內的所有字元)(?s).*")
result = regex.findall(text)
print(result)
#output->['abc\ndef\nghi', '']

(?x):允許註釋和空白

eg:

text = "The cat sat on the mat"

regex = re.compile(r"(?#允許註釋)(?ix)\bCAT\b")
# ix->忽略大小寫的註釋模式
result = regex.findall(text)
print(result)
#output->['cat']

re.Match 物件

re.Match物件是re模組中的一個類,用於表示正規表示式匹配的結果,當使用re.search,re,match或re.finditer方法時,若找到了匹配項,這些方法就會返回一個re.Match物件

re.Match返回欄位解釋

當使用print列印re.Match物件通常會返回<re.Match object; span=(4, 7), match='cat'>這種類似的欄位

<re.Match object;:該部分表示的這是一個re.Match物件
span=(4, 7):表示一個元組,表示的是匹配字串在原始字串中起始索引和結束索引
match='cat’:match是一個字串,表示實際匹配的字串

訪問 re.Match 物件的屬性

透過訪問re.Match物件的屬性,我們可以獲取更多資訊

group():返回匹配的子字串,如match.group()返回’cat’
start():返回匹配字串在原始字串的起始索引
end():返回匹配字串在原始字串的結束索引
span():返回元組表示字串的起始與結束索引

python IO模組【二】:open函式詳解
2020-12-12
Python函式
Python 正規表示式 re 模組
2018-10-12
Python
python re模組正規表示式
2018-09-12
Python
python正規表示式(re模組)
2020-08-08
Python
re模組
2019-03-22
python的os模組的常見函式及用途詳解
2024-09-16
Python函式
socket模組函式
2020-08-14
函式
Python 正規表示式模組詳解
2018-11-02
Python
python中re模組的使用（正規表示式）
2021-01-17
Python
No.7、函式模組
2020-12-05
函式
序列化模組，subprocess模組，re模組，常用正則
2024-04-23
python基礎之正規表示式和re模組
2020-03-12
Python
python模組詳解
2019-03-04
Python
matplotlib模組詳解
2021-12-20
difflib模組詳解
2021-11-21
psutil模組詳解
2021-11-17
SetupDiGetClassDevs函式詳解
2019-02-28
dev函式
Lua封裝函式模組並由其他模組呼叫
2024-05-30
封裝函式
hive視窗分析函式使用詳解系列二之分組排序視窗函式
2024-04-13
Hive函式排序
函式組合的 N 種模式
2020-01-19
函式模式
python threading模組有哪些函式
2021-09-11
Pythonthread函式
紅外模組詳解
2024-02-04
lms框架模組詳解
2021-06-07
框架
詳解Java函式式介面
2020-12-23
Java函式
ts函式組註解
2024-06-14
函式
day23 正則，re模組
2018-11-17
re模組下的的常用方法
2018-08-18
3.11 solidity 函式詳解
2018-11-08
Solid函式
mysql常用函式詳解
2023-11-09
MySql函式
python socket函式詳解
2020-07-26
Python函式
fcntl函式用法詳解
2018-03-02
函式
建構函式詳解
2021-09-09
函式
箭頭函式詳解
2023-02-22
函式
函式引數詳解
2021-01-04
函式
使用 Python 函式進行模組化
2019-09-01
Python函式
將函式儲存在模組中（2）
2019-03-22
函式
python檢視模組下的函式
2018-04-14
Python函式
ORALCE函式：LAG()和LEAD() 分析函式詳解
2018-11-08
函式

re模組 函式模式詳解

re模組

re模組常用方法

flags模式標誌位

re.IGNORECASE或re.I

re.MULTILINE或re.M

re.DOTALL 或 re.S

re.VERBOSE 或 re.X

內嵌模式標誌

常用的內嵌模式標誌

re.Match 物件

re.Match返回欄位解釋

訪問 re.Match 物件的屬性

相關文章

re模組函式模式詳解

`re.IGNORECASE或re.I`

`re.MULTILINE或re.M`

`re.DOTALL 或 re.S`

`re.VERBOSE 或 re.X`