Python BeautifulSoup 使用

一清發表於2019-01-20

原文網址 : https://flycode.co/archives/232107

Python

BS4庫簡單使用:

1.最好配合LXML庫，下載：pip install lxml

2.最好配合Requests庫，下載：pip install requests

3.下載bs4：pip install bs4

4.直接輸入pip沒用？解決：環境變數->系統變數->Path->新建：C:Python27Scripts

案例：獲取網站標題

# -*- coding:utf-8 -*-

from bs4 import BeautifulSoup

import requests

url = “https://www.baidu.com”

response = requests.get(url)

soup = BeautifulSoup(response.content, `lxml`)

print soup.title.text

標籤識別

示例1：

# -*- coding:utf-8 -*-

from bs4 import BeautifulSoup

html = “`

<html>

<head><title>The Dormouse`s story</title></head>

<body>

<p class=”title”><b>The Dormouse`s story</b></p>

<p class=”story”>Once upon a time there were three little sisters; and their names were

<a href=”http://example.com/elsie” class=”sister” id=”link1″>Elsie</a>,

<a href=”http://example.com/lacie” class=”sister” id=”link2″>Lacie</a> and

<a href=”http://example.com/tillie” class=”sister” id=”link3″>Tillie</a>;

and they lived at the bottom of a well.</p>

</body>

</html>

“`

soup = BeautifulSoup(html, `lxml`)

# BeautifulSoup中有內建的方法來實現格式化輸出

print(soup.prettify())

# title標籤內容

print(soup.title.string)

# title標籤的父節點名

print(soup.title.parent.name)

# 標籤名為p的內容

print(soup.p)

# 標籤名為p的class內容

print(soup.p[“class”])

# 標籤名為a的內容

print(soup.a)

# 查詢所有的字元a

print(soup.find_all(`a`))

# 查詢id=`link3`的內容

print(soup.find(id=`link3`))

示例2：

# -*- coding:utf-8 -*-

from bs4 import BeautifulSoup

html = “`

<html>

<head><title>The Dormouse`s story</title></head>

<body>

<p class=”story”>Once upon a time there were three little sisters; and their names were

<a href=”http://example.com/elsie” class=”sister” id=”link1″>Elsie</a>,

<a href=”http://example.com/lacie” class=”sister” id=”link2″>Lacie</a> and

<a href=”http://example.com/tillie” class=”sister” id=”link3″>Tillie</a>;

and they lived at the bottom of a well.</p>

</body>

</html>

“`

soup = BeautifulSoup(html, `lxml`)

# 將p標籤下的所有子標籤存入到了一個列表中

print (soup.p.contents)

find_all示例:

# -*- coding:utf-8 -*-

from bs4 import BeautifulSoup

html = “`

<h4>Hello</h4>

</div>

</ul>

</ul>

</div>

“`

soup = BeautifulSoup(html, `lxml`)

# 查詢所有的ul標籤內容

print(soup.find_all(`ul`))

# 針對結果再次find_all,從而獲取所有的li標籤資訊

for ul in soup.find_all(`ul`):

print(ul.find_all(`li`))

# 查詢id為list-1的內容

print(soup.find_all(attrs={`id`: `list-1`}))

# 查詢class為element的內容

print(soup.find_all(attrs={`class`: `element`}))

# 查詢所有的text=`Foo`的文字

print(soup.find_all(text=`Foo`))

CSS選擇器示例：

# -*- coding:utf-8 -*-

from bs4 import BeautifulSoup

html = “`

<h4>Hello</h4>

</div>

</ul>

</ul>

</div>

“`

soup = BeautifulSoup(html, `lxml`)

# 獲取class名為panel下panel-heading的內容

print(soup.select(`.panel .panel-heading`))

# 獲取class名為ul和li的內容

print(soup.select(`ul li`))

# 獲取class名為element，id為list-2的內容

print(soup.select(`#list-2 .element`))

# 使用get_text()獲取文字內容

for li in soup.select(`li`):

print(li.get_text())

# 獲取屬性的時候可以通過[屬性名]或者attrs[屬性名]

for ul in soup.select(`ul`):

print(ul[`id`])

# print(ul.attrs[`id`])

21.8 Python 使用BeautifulSoup庫
2023-10-27
Python
python爬蟲：使用BeautifulSoup修改網頁內容
2020-04-05
Python爬蟲網頁
Python爬蟲之BeautifulSoup
2019-02-16
Python爬蟲
python BeautifulSoup用法介紹
2020-10-05
Python
Python爬蟲之BeautifulSoup庫
2020-12-14
Python爬蟲
[譯] 如何使用 Python 和 BeautifulSoup 爬取網站內容
2019-02-23
Python網站
BeautifulSoup的使用姿勢
2018-03-05
python 小爬蟲 DrissionPage+BeautifulSoup
2024-06-16
Python爬蟲
Python BeautifulSoup中文亂碼問題
2020-12-12
Python
使用Beautifulsoup去除特定標籤
2018-07-12
python爬蟲之 BeautifulSoup庫入門
2019-12-09
Python爬蟲
Python-BeautifulSoup4 學習筆記
2020-11-18
Python筆記
BeautifulSoup使用手冊（查詢篇）
2020-10-03
BeautifulSoup模組的使用方法
2023-03-17
python爬蟲常用庫之BeautifulSoup詳解
2018-04-01
Python爬蟲
BeautifulSoup + requests 爬取扇貝 python 單詞書
2019-07-11
Python
BeautifulSoup庫
2024-05-19
Python 實用爬蟲-04-使用 BeautifulSoup 去水印下載 CSDN 部落格圖片
2019-06-16
Python爬蟲
BeautifulSoup和etree的區別和使用場景
2024-12-05
Python3爬蟲利器:BeautifulSoup4的安裝
2021-09-11
Python爬蟲
使用requests+BeautifulSoup的簡單爬蟲練習
2018-04-06
爬蟲
BeautifulSoup4庫
2022-05-09
Python 爬蟲十六式 - 第五式：BeautifulSoup，美味的湯
2019-01-13
Python爬蟲
Python爬蟲教程-25-資料提取-BeautifulSoup4（三）
2018-09-06
Python爬蟲
Python爬蟲教程-24-資料提取-BeautifulSoup4（二）
2018-09-06
Python爬蟲
Python爬蟲教程-23-資料提取-BeautifulSoup4（一）
2018-09-06
Python爬蟲
實戰（二）輕鬆使用requests庫和beautifulsoup爬連結
2019-03-03
爬蟲-使用BeautifulSoup4（bs4）解析html資料
2021-01-24
爬蟲HTML
python爬蟲學習(一)：BeautifulSoup庫基礎及一般元素提取方法
2018-04-05
Python爬蟲
11.18爬蟲學習（BeautifulSoup類）
2024-11-18
爬蟲
[python應用案例] 一.BeautifulSoup爬取天氣資訊併傳送至QQ郵箱
2018-05-03
Python
Datawhale-爬蟲-Task3(beautifulsoup)
2019-03-03
爬蟲
BeautifulSoup(bs4)細緻講解
2024-11-30
透過Requests模組獲取網頁內容並使用BeautifulSoup進行解析
2024-03-26
網頁
Python 爬蟲進階篇-利用beautifulsoup庫爬取網頁文章內容實戰演示
2020-09-14
Python爬蟲網頁
爬蟲（6） - 網頁資料解析(2) | BeautifulSoup4在爬蟲中的使用
2022-07-04
爬蟲網頁
使用beautifulsoup和re抓取鏈家資料基礎並儲存為csv檔案
2021-01-02
[python爬蟲] BeautifulSoup設定Cookie解決網站攔截並爬取螞蟻短租
2018-03-07
Python爬蟲Cookie網站

Python BeautifulSoup 使用

相關文章