Python爬蟲實戰案例-爬取幣世界標紅快訊

JoySang發表於2019-02-16

原文網址 : https://flycode.co/archives/79732

爬取幣世界標紅快訊內容(移動版)

# 引入依賴
from lxml import etree
import requests
import pymongo
import time
client = pymongo.MongoClient(`寫你自己的資料庫地址`, 27017) # 需要自己安裝mongodb客戶端
mydb = client[`mydb`]
information = mydb[`information`] # 資料庫表名
currentTime = time.strftime("%m%d%H", time.localtime())
saveTime = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())

# 偽造成手機
header = {
    `User-Agent`: `Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1`
}

def get_url(url):
    html = requests.get(url, headers=header)
    selector = etree.HTML(html.text)
    infos = selector.xpath(`//div[@id="kuaixun_list"]/div/article/section[@class="focus"]`)
    onlyOne = selector.xpath(`//div[@id="kuaixun_list"]/div/article/section[@class="focus"]`)[0]
    saveId = onlyOne.xpath(`../@id`)[0]
    file = open(r`C:/Users/SCZ/PycharmProjects/CommunityCrawl/newest`, `w`) # 寫你自己的檔案地址
    file.write(currentTime +` `+saveId)
    file.close()
    for info in infos:
        try:
            title = (info.xpath(`h3[@class="text_title"]/text()`)[0]).strip()
            content = (info.xpath(`p[@class="text_show"]/text()`)[0]).strip()
            date = info.xpath(`../h3[@class="timenode"]/text()`)[0]
            infoId = info.xpath(`../@id`)[0]

            data = {
                `title`: title,
                `id`: infoId,
                `date`: saveTime,
                `content`: content,
                `source`: `bishijie`
            }

            print(data)

            if (int(infoId) > int(saveId) - 20):
                print(`插入了一條新資料!`)
                information.insert_one(data)
            else:
                print(`無新資料產生!`)

        except IndexError:
            pass

if __name__ == `__main__`:
    fs = open(`C:/Users/SCZ/PycharmProjects/CommunityCrawl/newest`, `r+`) # 寫你自己的檔案地址
    line = fs.read()
    fileDate = line[0:6]

    if (fileDate != currentTime):
        print(`時間不一致,當機使用當前系統時間進行爬取!`)
        urls = [`http://m.bishijie.com/kuaixun?fm=` + currentTime]
        for url in urls:
            get_url(url)
            time.sleep(2)
    else:
        print(`時間一致, 正常執行!`)
        urls = [`http://m.bishijie.com/kuaixun?fm=` + currentTime]
        for url in urls:
            get_url(url)
            time.sleep(2)

主要要求掌握內容: xpath語法，python操作檔案，python的基礎語法

本文內容比較基礎，寫的不好，多多指教！大家一起進步！！！

我的其他關於python的文章

Python爬蟲入門

Python爬蟲之使用MongoDB儲存資料

Python爬蟲實戰：爬取淘寶的商品資訊
2021-09-11
Python爬蟲
Python爬蟲實戰案例：取喜馬拉雅音訊資料詳解
2020-12-05
Python爬蟲音訊
Python爬蟲實戰詳解：爬取圖片之家
2020-11-04
Python爬蟲
Python 爬蟲實戰
2023-10-16
Python爬蟲
基礎爬蟲案例實戰
2024-05-24
爬蟲
python爬蟲——爬取大學排名資訊
2019-08-02
Python爬蟲
python爬蟲--爬取鏈家租房資訊
2020-05-16
Python爬蟲
python爬蟲實戰，爬蟲之路，永無止境
2022-01-27
Python爬蟲
python爬蟲實戰：爬取西刺代理的代理ip（二）
2019-02-16
Python爬蟲
爬蟲實戰（一）：爬取微博使用者資訊
2018-07-15
爬蟲
【Python爬蟲實戰】使用Selenium爬取QQ音樂歌曲及評論資訊
2021-03-24
Python爬蟲
Python爬蟲入門教程 50-100 Python3爬蟲爬取VIP視訊-Python爬蟲6操作
2019-02-14
Python爬蟲
python爬蟲實戰教程-Python爬蟲開發實戰教程（微課版）
2020-11-11
Python爬蟲
Python零基礎爬蟲教學（實戰案例手把手Python爬蟲教學）
2020-04-17
Python爬蟲
小白學 Python 爬蟲（25）：爬取股票資訊
2019-12-24
Python爬蟲
python爬蟲---網頁爬蟲，圖片爬蟲，文章爬蟲，Python爬蟲爬取新聞網站新聞
2019-01-04
Python爬蟲網頁網站
爬蟲實戰（二）：Selenium 模擬登入並爬取資訊
2018-07-15
爬蟲
爬蟲實戰——58同城租房資料爬取
2019-12-04
爬蟲
python爬蟲-33個Python爬蟲專案實戰(推薦)
2020-10-28
Python爬蟲
python3網路爬蟲開發實戰_Python3 爬蟲實戰
2022-01-24
Python爬蟲
爬蟲搭建代理池、爬取某網站影片案例、爬取新聞案例
2023-03-16
爬蟲網站
Python網路爬蟲實戰
2022-03-18
Python爬蟲
python 爬蟲實戰的原理
2021-10-29
Python爬蟲
Python爬蟲實戰之bilibili
2021-04-04
Python爬蟲
Python網路爬蟲實踐案例：爬取貓眼電影Top100
2024-11-21
Python爬蟲
爬蟲——爬取貴陽房價（Python實現）
2022-02-09
爬蟲Python
Python爬蟲爬取淘寶，京東商品資訊
2020-02-11
Python爬蟲
網路爬蟲——爬蟲實戰（一）
2022-01-29
爬蟲
不踩坑的Python爬蟲：Python爬蟲開發與專案實戰，從爬蟲入門 Python
2021-12-17
Python爬蟲
python3 爬蟲實戰：為爬蟲新增 GUI 影象介面
2020-03-06
Python爬蟲GUI
【Python學習】爬蟲爬蟲爬蟲爬蟲~
2018-05-03
Python爬蟲
python網路爬蟲應用_python網路爬蟲應用實戰
2020-12-29
Python爬蟲
Python爬取鏈家成都二手房源資訊 asyncio + aiohttp 非同步爬蟲實戰
2020-09-22
PythonAIHTTP非同步爬蟲
Python爬蟲實踐--爬取網易雲音樂
2022-02-15
Python爬蟲
Python實現微博爬蟲，爬取新浪微博
2020-12-14
Python爬蟲
2個月精通Python爬蟲——3大爬蟲框架+6場實戰+反爬蟲技巧+分散式爬蟲
2018-06-28
Python爬蟲框架分散式
Python【爬蟲實戰】提取資料
2020-11-17
Python爬蟲
python 爬蟲爬取 learnku 精華文章
2020-04-17
Python爬蟲

Python爬蟲實戰案例-爬取幣世界標紅快訊

爬取幣世界標紅快訊內容(移動版)

主要要求掌握內容: xpath語法，python操作檔案，python的基礎語法

本文內容比較基礎，寫的不好，多多指教！大家一起進步！！！

我的其他關於python的文章

相關文章