Python 第一個爬蟲，爬取 147 小說

xingkong12138發表於2020-05-08

原文網址 : https://learnku.com/articles/44233

Python爬蟲

最近剛學習了Python，所以做了一個Python 的爬蟲，爬取147的小說。

可以參觀下我的部落格：我的部落格

剛學習Python，有什麼不足的地方大佬請指出

分析147網頁結構

可以通過谷歌，使用F12開啟控制檯

發現章節列表是由<dd></dd>包裹

章節標題是由<div class="bookname"></div>下的H1標籤包裹

章節內容是由 <div id="content"></div>下的P標籤包裹

廢話不多說，上程式碼

#爬取147小說網站的小說
# -*- coding: utf-8 -*-
import requests
import re
import random
import time

#實現抓取章節內容
def GetChapterContent(url):
    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36',
               'Cookie':'Hm_lvt_f9e74ced1e1a12f9e31d3af8376b6d63=1588752082; Hm_lpvt_f9e74ced1e1a12f9e31d3af8376b6d63=1588756920',
               'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3'}
    cookies = dict(Hm_lpvt_f9e74ced1e1a12f9e31d3af8376b6d63="1588758919",Hm_lvt_f9e74ced1e1a12f9e31d3af8376b6d63="1588752082")
    res = requests.get(url,headers=headers,cookies=cookies)
    res.encoding = 'utf-8'
    content_html = res.text
    # 獲取到標題
    title_div=re.findall(r'<div class="bookname">([\s\S]*?)</div>',content_html)[0]
    title = re.findall(r'<h1>(.*?)</h1>', title_div, re.S)[0]
    #獲取內容
    content_div = re.findall(r'<div id="content">([\s\S]*?)</div>', content_html)[0]
    contents = re.findall(r'<p>(.*?)</p>', content_div, re.S)
    # 把標題和內容組合
    content = ''
    content += title + "\n"
    for i in contents:
        content += i + "\n"

    #然後返回內容
    return content





#實現抓取章節內容url
def GetChapterList(url):
    res = requests.get(url)
    res.encoding = 'utf-8'
    chapter_html = res.text

    #獲取到章節列表
    chapter_list_div = re.findall(r'<dl>([\s\S]*?)</dl>',chapter_html)[0]

    #獲取到章節列表以及連結
    chapter_list_dd = re.findall(r'<dd>(.*?)</dd>',chapter_list_div)
    chapter_url_info = []
    for info in chapter_list_dd:
        chapter_list_info = re.findall(r'href="(.*?)">(.*?)<',info)[0]
        chapter_url = "http://www.147xs.org" + chapter_list_info[0]
        chapter_url_info.append([chapter_url,chapter_list_info[1]])
    return chapter_url_info

url="http://www.147xs.org/book/13794/"

chapter_urls = GetChapterList(url)

for url in chapter_urls:
    content = GetChapterContent(url[0])
    #把內容儲存到檔案
    try:
        with open("./xiaoshuo.txt","a+",encoding="UTF-8") as f:
            f.write(content)
        print("章節：{} 抓取成功".format(url[1]))
    except Exception:
        print("章節：{} 抓取失敗".format(url[1]))
    time.sleep(random.random())  # 暫停0~1秒，時間區間：[0,1]

print("抓取成功")

本作品採用《CC 協議》，轉載必須註明作者和本文連結

用PYTHON爬蟲簡單爬取網路小說
2021-09-11
Python爬蟲
python爬蟲初探--第一個python爬蟲專案
2018-05-18
Python爬蟲
Python爬蟲之小說資訊爬取與資料視覺化分析
2021-01-09
Python爬蟲視覺化
python爬蟲---網頁爬蟲，圖片爬蟲，文章爬蟲，Python爬蟲爬取新聞網站新聞
2019-01-04
Python爬蟲網頁網站
python網路爬蟲_Python爬蟲：30個小時搞定Python網路爬蟲視訊教程
2020-10-21
Python爬蟲
爬蟲爬取微信小程式
2019-02-16
爬蟲微信小程式
如何用python爬蟲下載小說？
2021-09-11
Python爬蟲
《從零開始學習Python爬蟲：頂點小說全網爬取實戰》
2024-07-06
Python爬蟲
【Python學習】爬蟲爬蟲爬蟲爬蟲~
2018-05-03
Python爬蟲
python爬蟲之抓取小說(逆天邪神)
2022-03-10
Python爬蟲
python 爬蟲爬取 learnku 精華文章
2020-04-17
Python爬蟲
Python爬蟲入門教程 50-100 Python3爬蟲爬取VIP視訊-Python爬蟲6操作
2019-02-14
Python爬蟲
Python爬蟲小專案：爬一個圖書網站
2018-11-21
Python爬蟲網站
不會Python爬蟲？教你一個通用爬蟲思路輕鬆爬取網頁資料
2019-01-08
Python爬蟲網頁
我的第一個Python爬蟲——談心得
2018-03-30
Python爬蟲
python爬蟲——爬取大學排名資訊
2019-08-02
Python爬蟲
Python爬蟲—爬取某網站圖片
2020-11-19
Python爬蟲網站
python爬蟲--爬取鏈家租房資訊
2020-05-16
Python爬蟲
python 爬蟲 1 爬取酷狗音樂
2020-03-29
Python爬蟲
【Python爬蟲】正則爬取趕集網
2020-12-24
Python爬蟲
Python第一個爬蟲，爬取噹噹網 Top 500 本五星好評書籍
2019-07-19
Python爬蟲
【爬蟲】利用Python爬蟲爬取小麥苗itpub部落格的所有文章的連線地址（1）
2018-12-26
爬蟲Python
python爬蟲58同城（多個資訊一次爬取）
2018-11-04
Python爬蟲
python爬蟲-1w+套個人簡歷模板爬取
2021-03-05
Python爬蟲
Python爬蟲小結（轉）
2018-08-09
Python爬蟲
擼個爬蟲，爬取電影種子
2019-05-11
爬蟲
python爬蟲小專案--飛常準航班資訊爬取variflight（上）
2019-03-23
Python爬蟲
Python爬蟲：爬取instagram，破解js加密引數
2019-04-09
Python爬蟲JS加密
python網路爬蟲--爬取淘寶聯盟
2018-07-17
Python爬蟲
Python爬蟲入門【5】：27270圖片爬取
2019-07-30
Python爬蟲
小白學 Python 爬蟲（25）：爬取股票資訊
2019-12-24
Python爬蟲
爬蟲——爬取貴陽房價（Python實現）
2022-02-09
爬蟲Python
精通Scrapy網路爬蟲【一】第一個爬蟲專案
2021-06-19
爬蟲
Python爬蟲和java爬蟲哪個效率高
2023-10-12
Python爬蟲Java
python爬蟲-33個Python爬蟲專案實戰(推薦)
2020-10-28
Python爬蟲
python例項，python網路爬蟲爬取大學排名!
2018-11-20
Python爬蟲
爬蟲小程式 - 爬取王者榮耀全皮膚
2020-01-31
爬蟲
筆趣閣小說爬取
2024-05-29

Python 第一個爬蟲，爬取 147 小說

分析147網頁結構

廢話不多說，上程式碼

相關文章