Python爬蟲教程-03-使用 chardet 檢測編碼

肖朋偉發表於2018-09-06

原文網址 : https://www.cnblogs.com/xpwi/p/9600629.html

Python爬蟲

Spider-03-使用chardet

繼續學習python爬蟲，我們經常出現解碼問題，因為所有的頁面編碼都不統一，我們使用chardet檢測頁面的編碼，儘可能的減少編碼問題的出現

網頁編碼問題解決

使用chardet 可以自動檢測頁面檔案的編碼格式，但是也有可能出錯
需要安裝chardet，
- 如果使用Anaconda環境，使用下面命令：
conda install chardet
- 如果不是，就自己手動在【PyCharm】>【file】>【settings】>【Project Interpreter】>【+】>【chardet】>【install】
具體操作截圖：

這裡寫圖片描述

案例v2

py03chardet.py檔案：https://xpwi.github.io/py/py%E7%88%AC%E8%99%AB/py03chardet.py

# py03chardet.py
# 使用request下載頁面，並自動檢測頁面編碼

from urllib import request
import chardet

if __name__ == '__main__':

    url = 'https://jobs.zhaopin.com/CC375882789J00033399409.htm'

    rsp = request.urlopen(url)
    # 按住Ctrl鍵不送，同時點選urlopen，可以檢視文件，有函式的具體引數和使用方法

    html = rsp.read()
    cs = chardet.detect(html)

    print("cs的型別：{0}".format(type(cs)))
    print("監測到的cs資料：{0}".format(cs))

    html = html.decode(cs.get("encoding", "utf-8"))
    # 意思是監測到就使用監測到的，監測不到就使用utf-8

    print("HTML頁面為：\n%s" % html)

右鍵執行，截圖如下

這裡寫圖片描述
編碼檢測就介紹完了，最要的功能是檢測頁面的編碼，儘可能的減少編碼問題的出現

更多文章連結：Python 爬蟲隨筆

本筆記不允許任何個人和組織轉載

Chardet: 通用字元編碼檢測器
2024-05-21
字元
Python爬蟲教程-02-使用urlopen
2018-08-05
Python爬蟲
Python爬蟲教程-01-爬蟲介紹
2018-09-06
Python爬蟲
Python爬蟲教程-33-scrapy shell 的使用
2018-09-06
Python爬蟲
實用爬蟲-01-檢測爬蟲的 IP
2018-09-08
爬蟲
python爬蟲實戰教程-Python爬蟲開發實戰教程（微課版）
2020-11-11
Python爬蟲
python網路爬蟲_Python爬蟲：30個小時搞定Python網路爬蟲視訊教程
2020-10-21
Python爬蟲
Python爬蟲教程-14-爬蟲使用filecookiejar儲存cookie檔案(人人網)
2018-09-06
Python爬蟲CookieJAR
Python爬蟲教程-34-分散式爬蟲介紹
2018-09-06
Python爬蟲分散式
Python爬蟲教程-30-Scrapy 爬蟲框架介紹
2018-09-06
Python爬蟲框架
《Python3網路爬蟲開發實戰》教程||爬蟲教程
2018-11-13
Python爬蟲
使用python的scrapy來編寫一個爬蟲
2019-03-14
Python爬蟲
Python爬蟲入門教程 50-100 Python3爬蟲爬取VIP視訊-Python爬蟲6操作
2019-02-14
Python爬蟲
Python爬蟲教程+書籍分享
2018-11-29
Python爬蟲
Python爬蟲入門教程 55-100 python爬蟲高階技術之驗證碼篇
2019-04-02
Python爬蟲
Python爬蟲教程-31-建立 Scrapy 爬蟲框架專案
2018-09-04
Python爬蟲框架
【Python學習】爬蟲爬蟲爬蟲爬蟲~
2018-05-03
Python爬蟲
Python爬蟲教程-13-爬蟲使用cookie爬取登入後的頁面(人人網)（下）
2018-09-06
Python爬蟲Cookie
Python爬蟲教程-12-爬蟲使用cookie爬取登入後的頁面(人人網)（上）
2018-09-06
Python爬蟲Cookie
Python爬蟲教程-05-python爬蟲實現百度翻譯
2018-09-06
Python爬蟲
Python爬蟲之路-chrome在爬蟲中的使用
2021-01-04
Python爬蟲Chrome
Python爬蟲之路-selenium在爬蟲中的使用
2021-01-04
Python爬蟲
Python爬蟲教程-22-lxml-etree和xpath配合使用
2018-09-06
Python爬蟲XML
實用爬蟲-03-爬取視訊教程課程名+連結+下載圖片
2018-10-29
爬蟲
爬蟲程式最佳化要點—附Python爬蟲影片教程
2020-10-15
爬蟲Python
Python爬蟲之Pyspider使用
2021-09-11
Python爬蟲IDE
Python爬蟲教程-26-Selenium + PhantomJS
2018-09-06
Python爬蟲JS
[譯] 30 分鐘 Python 爬蟲教程
2018-05-15
Python爬蟲
python網路爬蟲（14）使用Scrapy搭建爬蟲框架
2019-07-27
Python爬蟲框架
使用JavaScript編寫的爬蟲程式
2023-11-07
JavaScript爬蟲
python 爬蟲 response得到亂碼
2018-08-13
Python爬蟲
Python爬蟲亂碼問題
2018-05-11
Python爬蟲
Python 爬蟲 + 人臉檢測 —— 知乎高顏值圖片抓取
2020-12-21
Python爬蟲
python爬蟲之反爬蟲（隨機user-agent，獲取代理ip，檢測代理ip可用性）
2019-01-03
Python爬蟲隨機
Python爬蟲入門教程導航帖
2019-01-08
Python爬蟲
2019最新Python爬蟲教程+書籍分享
2019-01-06
Python爬蟲
Python爬蟲教程-21-xpath 簡介
2018-09-06
Python爬蟲
Python爬蟲教程-20-xml 簡介
2018-09-06
Python爬蟲XML

Python爬蟲教程-03-使用 chardet 檢測編碼

Spider-03-使用chardet

網頁編碼問題解決

案例v2

右鍵執行，截圖如下

更多文章連結：Python 爬蟲隨筆

相關文章