BeautifulSoup模組的使用方法

阿麗米熱發表於2023-03-17

原文網址 : https://www.cnblogs.com/almira998/p/17226553.html

本篇文章主要講bs4模組(BeautifulSoup),這個模組能做麼呢？用一句話來概括的話：beautifulsoup4 從HTML或XML檔案中提取資料的Python庫,用它來解析爬取回來的xml。從而從網站中精準爬取自己想要的內容。
它是Python的第三方模組，因此需要下載

pip install pip install beautifulsoup4

這個模組與另一個lxml(解析庫)模組配合這個用

pip install lxml

最基本用法

html_doc = '需要解析的HTML內容'
soup = BeautifulSoup(html_doc, 'lxml')

一、BeautifulSoup遍歷檔案樹

1. 美化，不是標準xml，完成美化
print(soup.prettify())

2. 遍歷檔案樹(透過點來遍歷)
print(soup.html.body.p)  # 一層一層找

3. 獲取標籤的名稱
print(soup.a.name)

4. 獲取標籤的屬性
print(soup.a.attrs.get('class'))

5. 獲取標籤的內容
print(soup.p.text)
print(list(soup.p.strings)) # generator

二、BeautifulSoup搜尋檔案樹

2. 透過find或find_all來搜尋

# 1 字串--->查詢的條件是字串
res=soup.find_all(name='p')
res=soup.find_all('p')
print(res)

# 2 正規表示式
import re
res=soup.find_all(class_=re.compile('^s'))
print(res)

# 3 列表
res=soup.find_all(id=['link1','link2'])
print(res)
print(soup.find_all(name=['a','b']))
print(soup.find_all(['a','b']))


# 4 True
res=soup.find_all(id=True)  # 所有有id的標籤
res=soup.find_all(href=True)
res=soup.find_all(class_=True)
print(res)

3. 透過css選擇器來搜尋

其實css選擇器是前端重點內容，但是對於後端程式設計師而言會用就行，這裡我放大招哈哈哈
首先去瀏覽器右鍵檢查、然後用定位箭頭定位目的地、點選對用的HTML右鍵、選擇copy、再現在copy selector 按照這個步驟就快速得到一個css選擇器咯,css選擇器大痛點解決之後，接下來我們看一下如何用css選擇器搜尋檔案樹吧，具體請看如下程式碼框

from bs4 import BeautifulSoup
import requests
res=requests.get('https://www.w3school.com.cn/css/css_selector_attribute.asp')
soup=BeautifulSoup(res.text,'lxml')
print(soup.select('#intro > p:nth-child(1) > strong')[0].text)

SOLIDWORKS焊件模組使用方法
2022-02-14
Solid
CANoe中Logging模組使用方法及妙招⭐
2024-05-22
python re模組常見使用方法整理
2021-09-11
Python
快速瞭解電源模組的使用方法 BDB10-12W系列電源模組
2024-11-13
SAP SD模組中POD功能使用方法
2020-01-25
透過Requests模組獲取網頁內容並使用BeautifulSoup進行解析
2024-03-26
網頁
git 子模組使用方法
2024-04-22
Git
序列化模組，隨機數模組，os模組，sys模組，hashlib模組
2019-03-19
隨機
BeautifulSoup的使用姿勢
2018-03-05
BeautifulSoup庫
2024-05-19
python 模組：itsdangerous 模組
2020-02-16
Python
path模組 fs模組
2020-11-09
Python模組：time模組
2021-09-09
Python
day18：json模組&time模組&zipfile模組
2020-07-28
JSON
Python模組之urllib模組
2020-10-30
Python
python模組之collections模組
2019-01-04
Python
CommonJS模組和 ECMAScript模組
2022-09-14
JS
ES6模組與commonJS模組的差異
2019-03-05
JS
Python BeautifulSoup 使用
2019-01-20
Python
序列化模組，subprocess模組，re模組，常用正則
2024-04-23
聊天模組及分享模組分享
2018-09-24
[Python模組學習] glob模組
2018-05-26
Python
模組學習之hashlib模組
2024-05-09
模組學習之logging模組
2024-05-09
python的logging模組
2019-02-16
Python
pymysql模組的使用
2018-10-18
MySql
python的os模組
2018-12-01
Python
Nginx常用的模組
2018-09-09
Nginx
wtforms模組的使用
2024-03-06
ORM
Jetty的server模組
2024-03-10
JettyServer
Jetty的threadpool模組
2024-03-06
Jettythread
NodeJS的模組原理
2020-04-06
NodeJS
Nodejs 的 fs 模組
2024-08-20
NodeJS
nodejs的stream模組
2018-04-06
NodeJS
Python的shutil模組
2018-03-27
Python
Tensorflow的資料輸入模組tf.data模組
2020-11-21
Python的常見模組：OS和 time模組介紹
2021-06-08
Python
Python常用模組（random隨機模組&json序列化模組）
2024-03-23
Pythonrandom隨機JSON

BeautifulSoup模組的使用方法

一、BeautifulSoup遍歷檔案樹

二、BeautifulSoup搜尋檔案樹

2. 透過find或find_all來搜尋

3. 透過css選擇器來搜尋

相關文章