Python Beautiful Soup+requests實現爬蟲

HuangZhang_123發表於2017-02-27

歡迎加入學習交流QQ群：657341423

Python 爬蟲庫大概有標準庫 urllib 或第三方庫 requests,scrapy,BeautifulSoup 用於獲取資料網站較多。scrapy其實是框架形式，適用於大規模爬取，requests就是通過http的post，get方式實現爬蟲。Beautiful Soup 是一個可以從HTML或XML檔案中提取資料的Python庫
本次介紹Beautiful Soup+requests實現爬蟲，這方法結合最簡單容易上手。requests主要用get獲取html資訊，Beautiful Soup對Html內容進行篩選，獲取自己想要的內容。
Beautiful Soup安裝：
pip install beautifulsoup4
安裝完後還需安裝
pip install lxml
pip install html5lib

requests安裝
pip install requests

requests獲取網站Html內容

import requests
from bs4 import BeautifulSoup

r = requests.get(url='https://www.baidu.com/')    # 最基本的GET請求
print(r.status_code)    # 獲取返回狀態
r.encoding = 'utf-8' #沒有的話，中文會顯示亂碼
print(r.text)

使用BeautifulSoup解析這段程式碼

soup = BeautifulSoup(r.text,"html.parser")
print(soup.prettify())

執行結果：

這裡寫圖片描述

這個涉及到編碼的問題了。網上找了很多資料都無法解決。最後發現，這個問題是print的問題。
在程式碼中加入，即可解決

import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030')

如果要將soup.prettify()寫入txt

f =open("ttt.txt","w",encoding='utf-8')
f.write(soup.prettify())

完整程式碼

from bs4 import BeautifulSoup
import requests
import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030')


page = requests.get('https://www.baidu.com/')
page.encoding = "utf-8"

soup = BeautifulSoup(page.text,"html.parser")
print(soup.prettify())

f =open("ttt.txt","w",encoding='utf-8')
f.write(soup.prettify())

BeautifulSoup官網文件
https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html

歡迎加入學習交流QQ群：657341423

Python爬蟲學習（11）：Beautiful Soup的使用
2016-11-29
Python爬蟲
python爬蟲之Beautiful Soup基礎知識+例項
2020-08-12
Python爬蟲
JB的Python之旅-爬蟲篇--urllib和Beautiful Soup
2018-05-15
Python爬蟲
Python實現微博爬蟲，爬取新浪微博
2020-12-14
Python爬蟲
爬蟲——爬取貴陽房價（Python實現）
2022-02-09
爬蟲Python
python的爬蟲功能如何實現
2019-02-28
Python爬蟲
Python爬蟲是如何實現的？
2022-07-15
Python爬蟲
一起學爬蟲——使用Beautiful Soup爬取網頁
2018-11-26
爬蟲網頁
Python爬蟲教程-05-python爬蟲實現百度翻譯
2018-09-06
Python爬蟲
Python爬蟲的兩套解析方法和四種爬蟲實現
2018-07-03
Python爬蟲
【Python學習】爬蟲爬蟲爬蟲爬蟲~
2018-05-03
Python爬蟲
Beautiful Soup在爬蟲中的基本使用語法
2020-12-01
爬蟲
python實現selenium網路爬蟲
2021-03-11
Python爬蟲
Python 爬蟲IP代理池的實現
2018-12-17
Python爬蟲
Python爬蟲教程-06-爬蟲實現百度翻譯(requests)
2018-09-06
Python爬蟲
Python 爬蟲實戰
2023-10-16
Python爬蟲
python爬蟲實戰，爬蟲之路，永無止境
2022-01-27
Python爬蟲
圖靈樣書爬蟲 - Python 爬蟲實戰
2017-06-08
圖靈爬蟲Python
【python爬蟲】python爬蟲demo
2018-02-21
Python爬蟲
【Python爬蟲9】Python網路爬蟲例項實戰
2017-02-17
Python爬蟲
Python爬蟲-用Scrapy框架實現漫畫的爬取
2016-12-30
Python爬蟲框架
python爬蟲簡單實現逆向JS解密
2019-08-29
Python爬蟲JS解密
python爬蟲實現成語接龍1.0
2020-10-06
Python爬蟲
Python 爬蟲實戰(2)：股票資料定向爬蟲
2017-08-12
Python爬蟲
微博爬蟲 java實現
2015-08-31
爬蟲Java
【Python3網路爬蟲開發實戰】4-解析庫的使用-2 使用Beautiful Soup
2018-03-19
Python爬蟲
python爬蟲-33個Python爬蟲專案實戰(推薦)
2020-10-28
Python爬蟲
Python《爬蟲初實踐》
2020-12-11
Python爬蟲
python爬蟲---網頁爬蟲，圖片爬蟲，文章爬蟲，Python爬蟲爬取新聞網站新聞
2019-01-04
Python爬蟲網頁網站
python爬蟲實戰教程-Python爬蟲開發實戰教程（微課版）
2020-11-11
Python爬蟲
Python之分散式爬蟲的實現步驟
2018-08-29
Python分散式爬蟲
不踩坑的Python爬蟲：Python爬蟲開發與專案實戰，從爬蟲入門 Python
2021-12-17
Python爬蟲
2個月精通Python爬蟲——3大爬蟲框架+6場實戰+反爬蟲技巧+分散式爬蟲
2018-06-28
Python爬蟲框架分散式
python3網路爬蟲開發實戰_Python3 爬蟲實戰
2022-01-24
Python爬蟲
python 爬蟲實現增量去重和定時爬取例項
2020-03-06
Python爬蟲
python就是爬蟲嗎-python就是爬蟲嗎
2020-10-29
Python爬蟲
Python網路爬蟲實戰
2022-03-18
Python爬蟲
python 爬蟲實戰的原理
2021-10-29
Python爬蟲

Python Beautiful Soup+requests實現爬蟲

相關文章