爬蟲之股票定向爬取

Mr.chris發表於2018-12-06

原文網址 : https://flycode.co/archives/233035

本次是股票定向爬取，從東方財富網上獲取所有股票程式碼，然後在百度股票網上開啟每個個股股票資訊，提取所要儲存的股票資訊。採取的技術路線是re+bs4+requests。

import requests
from bs4 import BeautifulSoup
import traceback     #處理異常
import re


#獲取頁面的函式
def getHTMLText(url,code=`utf-8`):              #這裡編碼事先檢視網頁的編碼格式
    try:
        r=requests.get(url)
        r.raise_for_status()
        r.encoding = code
        return r.text
    except:
        return ""

#獲取所有的股票程式碼儲存在一個列表中
def getStockList(lst,stockURL):
    html = getHTMLText(stockURL,`GB2312`)
    soup = BeautifulSoup(html,`html.parser`)
    a = soup.find_all(`a`)
    for i in a:
        try:
            href = i.attrs[`href`]
            lst.append(re.findall(r"[s][hz]d{6}",href)[0])
        except:
            continue


#獲取每隻個股股票的資訊並儲存到檔案中,並顯示儲存爬取的進度
def getStockInfo(lst,stockURL,fpath):
    count = 0
    for stock in lst:
        url = stockURL + stock + `.html`
        html = getHTMLText(url)
        try:
            if html == "":
                continue
            infoDict = {}
            soup = BeautifulSoup(html,`html.parser`)
            stockInfo = soup.find(`div`,attrs = {`class`:`stock-bets`})
            name = stockInfo.find.all(attrs={`class`:`bets-name`})[0]
            infoDict.update({`股票名稱`:name.text.split()[0]})

            keyList = stockInfo.find_all(`dt`)
            valueList = stockInfo.find_all(`dd`)
            for i in range(len(keyList)):
                  key = keyList[i].text
                  val = valueList[i].text
                  infoDict[key]=val
            
            with open(fpath,`a`,encoding=`utf-8`) as f:
                  f.write(str(infoDict) + `
`)
                  count = count +1
                  print("
當前進度:{:.2f}%".format(count*100/len(lst)),end="")
             except:
                  count = count +1
                  print("
當前進度:{:.2f}%".format(count*100/len(lst)),end="")
                  continue



#主函式
def main()
    stock_list_url = `http://quote.eastmoney.com/stocklist.html`
    stock_info_url = `https://gupiao.baidu.com/stock/`
    output_file = `D:/BaiduStockInfo.txt`
    slist = []
    getStockList(slist,stock_list_url)
    getStockInfo(slist,stock_info_url,output_file)

main()

小白學 Python 爬蟲（25）：爬取股票資訊
2019-12-24
Python爬蟲
爬蟲入門之淘寶商品資訊定向爬取！雙十一到了學起來啊！
2020-10-30
爬蟲
python爬蟲---網頁爬蟲，圖片爬蟲，文章爬蟲，Python爬蟲爬取新聞網站新聞
2019-01-04
Python爬蟲網頁網站
Python爬蟲抓取股票資訊
2021-01-03
Python爬蟲
反爬蟲之字型反爬蟲
2019-06-27
爬蟲
爬蟲爬取微信小程式
2019-02-16
爬蟲微信小程式
Java爬蟲批量爬取圖片
2021-09-24
Java爬蟲
如何合理控制爬蟲爬取速度？
2022-06-02
爬蟲
【Python學習】爬蟲爬蟲爬蟲爬蟲~
2018-05-03
Python爬蟲
使用 Scrapy 爬取股票程式碼
2019-02-25
node：爬蟲爬取網頁圖片
2019-02-16
爬蟲網頁
python 爬蟲爬取 learnku 精華文章
2020-04-17
Python爬蟲
Java爬蟲-爬取疫苗批次資訊
2024-06-03
Java爬蟲
提高爬蟲爬取效率的辦法
2022-04-06
爬蟲
爬蟲---xpath解析（爬取美女圖片）
2020-12-23
爬蟲
分散式爬蟲之知乎使用者資訊爬取
2018-08-31
分散式爬蟲
機器學習股票價格預測從爬蟲到預測-資料爬取部分
2019-03-04
機器學習爬蟲
Python 爬取 baidu 股票市值資料
2019-02-16
PythonAI
歷史股票資料的爬取
2021-12-31
爬蟲 Scrapy框架爬取圖蟲圖片並下載
2018-08-27
爬蟲框架
python爬蟲——爬取大學排名資訊
2019-08-02
Python爬蟲
Python爬蟲—爬取某網站圖片
2020-11-19
Python爬蟲網站
爬蟲練習——爬取縱橫中文網
2020-10-19
爬蟲
python爬蟲--爬取鏈家租房資訊
2020-05-16
Python爬蟲
python 爬蟲 1 爬取酷狗音樂
2020-03-29
Python爬蟲
擼個爬蟲，爬取電影種子
2019-05-11
爬蟲
如何提高爬取爬蟲採集的效率？
2022-06-11
爬蟲
爬蟲爬取資料如何繞開限制？
2022-06-10
爬蟲
【Python爬蟲】正則爬取趕集網
2020-12-24
Python爬蟲
大規模非同步新聞爬蟲：實現一個同步定向新聞爬蟲
2018-12-03
非同步爬蟲
Python爬蟲入門教程 50-100 Python3爬蟲爬取VIP視訊-Python爬蟲6操作
2019-02-14
Python爬蟲
Java 爬蟲專案實戰之爬蟲簡介
2018-11-24
Java爬蟲
爬蟲的小技巧之–如何尋找爬蟲入口
2018-03-05
爬蟲
爬蟲搭建代理池、爬取某網站影片案例、爬取新聞案例
2023-03-16
爬蟲網站
爬蟲：多程式爬蟲
2021-05-19
爬蟲
Python爬蟲之BeautifulSoup
2019-02-16
Python爬蟲
爬蟲之requests庫
2022-03-20
爬蟲
實時獲取股票資料，免費！——Python爬蟲Sina Stock實戰
2021-10-13
Python爬蟲

爬蟲之股票定向爬取

相關文章