Python爬蟲教程-24-資料提取-BeautifulSoup4（二）

肖朋偉發表於2018-09-06

原文網址 : https://www.cnblogs.com/xpwi/p/9600957.html

Python爬蟲教程-24-資料提取-BeautifulSoup4（二）

本篇介紹 bs 如何遍歷一個文件物件

遍歷文件物件

contents：tag 的子節點以列表的方式輸出
children：子節點以迭代器形式返回
descendants：所有子孫節點
string：用string列印出標籤的具體內容，不帶有標籤，只有內容
案例程式碼27bs3.py檔案：https://xpwi.github.io/py/py%E7%88%AC%E8%99%AB/py27bs3.py

# BeautifulSoup 的使用案例
# 遍歷文件物件

from urllib import request
from bs4 import BeautifulSoup

url = 'http://www.baidu.com/'

rsp = request.urlopen(url)
content = rsp.read()

soup = BeautifulSoup(content, 'lxml')

# bs 自動解碼
content = soup.prettify()

print("=="*12)
# 使用 contents
for node in soup.head.contents:
    if node.name == "meta":
        print(node)
    if node.name == "title":
        print(node.string)
print("=="*12)

執行結果

這裡寫圖片描述
常用string列印出標籤的具體內容，不帶有標籤，只有內容
當然，如果覺得遍歷太耗費資源，沒有必要遍歷的時候，可以使用搜尋

搜尋文件物件

find_all(name, attrs, recursive, text, ** kwargs)
- 使用find_all()，返回的列表格式，也就是說如果 find_all(name='meta') ，如果有多個 meta 就以列表形式返回
- name 引數：按照哪個字元搜尋，可以傳入的內容為
  - 1.字串
  - 2.正規表示式，使用正則需要編譯：
    例如：我們需要列印所有以 me 開頭的標籤內容
    tags = soup.find_all(re.compile('^me'))
  - 3.也可以是列表
keyword 引數，可以用來表示屬性
text：對應 tag 的文字值
案例程式碼27bs4.py檔案：https://xpwi.github.io/py/py%E7%88%AC%E8%99%AB/py27bs4.py

# BeautifulSoup 的使用案例
# 搜尋文件物件

from urllib import request
from bs4 import BeautifulSoup
import re

url = 'http://www.baidu.com/'

rsp = request.urlopen(url)
content = rsp.read()

soup = BeautifulSoup(content, 'lxml')

# bs 自動解碼
content = soup.prettify()

# 使用 find_all
# 使用 name 引數
print("=="*12)
tags = soup.find_all(name='link')
for i in tags:
    print(i)

# 使用正規表示式
print("=="*12)
# 同時使用兩個條件
tags = soup.find_all(re.compile('^me'), content='always')
# 這裡直接列印 tags 會列印一個列表
for i in tags:
    print(i)

執行結果：

這裡寫圖片描述
因為使用兩個條件，所以只匹配到一條 meta
下一篇介紹，BeautifulSoup 的 css 選擇器

更多文章連結：Python 爬蟲隨筆

本筆記不允許任何個人和組織轉載

Python爬蟲教程-25-資料提取-BeautifulSoup4（三）
2018-09-06
Python爬蟲
Python爬蟲教程-23-資料提取-BeautifulSoup4（一）
2018-09-06
Python爬蟲
Python【爬蟲實戰】提取資料
2020-11-17
Python爬蟲
Python爬蟲教程-18-頁面解析和資料提取
2018-09-06
Python爬蟲
Python爬蟲教程-19-資料提取-正規表示式(re)
2018-09-06
Python爬蟲
爬蟲（6） - 網頁資料解析(2) | BeautifulSoup4在爬蟲中的使用
2022-07-04
爬蟲網頁
Python 爬蟲網頁內容提取工具xpath(二)
2018-12-08
Python爬蟲網頁
爬蟲-使用BeautifulSoup4（bs4）解析html資料
2021-01-24
爬蟲HTML
Python3爬蟲利器:BeautifulSoup4的安裝
2021-09-11
Python爬蟲
[Python] 網路爬蟲與資訊提取（1）網路爬蟲之規則
2020-11-06
Python爬蟲
爬蟲系列 | 6、詳解爬蟲中BeautifulSoup4的用法
2021-01-19
爬蟲
怎麼利用Python網路爬蟲來提取資訊
2020-03-20
Python爬蟲
Python爬蟲新手教程：手機APP資料抓取 pyspider
2019-07-20
Python爬蟲APPIDE
Python爬蟲教程-01-爬蟲介紹
2018-09-06
Python爬蟲
Python爬蟲初學二（網路資料採集）
2020-05-03
Python爬蟲
資料提取方法-多程式多執行緒爬蟲
2020-11-16
執行緒爬蟲
python爬蟲實戰教程-Python爬蟲開發實戰教程（微課版）
2020-11-11
Python爬蟲
python簡單爬蟲(二)
2018-04-18
Python爬蟲
網路爬蟲大型教程(二)
2018-05-14
爬蟲
python網路爬蟲_Python爬蟲：30個小時搞定Python網路爬蟲視訊教程
2020-10-21
Python爬蟲
Python爬蟲教程-34-分散式爬蟲介紹
2018-09-06
Python爬蟲分散式
Python爬蟲教程-30-Scrapy 爬蟲框架介紹
2018-09-06
Python爬蟲框架
《Python3網路爬蟲開發實戰》教程||爬蟲教程
2018-11-13
Python爬蟲
Python爬蟲入門教程 50-100 Python3爬蟲爬取VIP視訊-Python爬蟲6操作
2019-02-14
Python爬蟲
Python爬蟲之資料解析（XPath）
2018-12-18
Python爬蟲
爬蟲第一章資料提取與清洗策略
2020-11-10
爬蟲
Python 爬蟲網頁內容提取工具xpath(一)
2018-12-06
Python爬蟲網頁
Python爬蟲教程+書籍分享
2018-11-29
Python爬蟲
企業資料爬蟲專案（二）
2018-10-06
爬蟲
Python爬蟲教程-31-建立 Scrapy 爬蟲框架專案
2018-09-04
Python爬蟲框架
python爬蟲之js逆向（二）
2019-11-05
Python爬蟲JS
python爬蟲總是爬不到資料，你需要解決反爬蟲了
2020-06-26
Python爬蟲
【Python學習】爬蟲爬蟲爬蟲爬蟲~
2018-05-03
Python爬蟲
Python爬蟲教程-05-python爬蟲實現百度翻譯
2018-09-06
Python爬蟲
Python爬蟲新手教程：微醫掛號網醫生資料抓取
2019-07-20
Python爬蟲
使用Python進行Web爬取和資料提取
2020-07-28
PythonWeb
大資料爬蟲專案實戰教程
2018-11-14
大資料爬蟲
爬蟲程式最佳化要點—附Python爬蟲影片教程
2020-10-15
爬蟲Python

Python爬蟲教程-24-資料提取-BeautifulSoup4（二）

Python爬蟲教程-24-資料提取-BeautifulSoup4（二）

遍歷文件物件

執行結果

搜尋文件物件

執行結果：

更多文章連結：Python 爬蟲隨筆

相關文章