同花順資料爬取

JJJhr發表於2024-06-27

請求獲取資料

import requests

url = 'https://q.10jqka.com.cn/index/index/board/all/field/zdf/order/desc/page/2/ajax/1/'

headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36'
}

response = requests.get(url=url, headers=headers)
html = response.text
print(html)

執行結果：

<html><body>
    <script type="text/javascript" src="//s.thsi.cn/js/chameleon/chameleon.min.1719332.js"></script> <script src="//s.thsi.cn/js/chameleon/chameleon.min.1719332.js" type="text/javascript"></script>
    <script language="javascript" type="text/javascript">
    window.location.href="//q.10jqka.com.cn/index/index/board/all/field/zdf/order/desc/page/2/ajax/1/";
    </script>
    </body></html>

結果中併為出現存在所要爬取的資料

思考：可能存在反爬

1、考慮請求頭資訊

2、存在加密可能

解決方法：1、請求頭中新增Cookie、Referer等後再次執行，問題解決

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36',
    'Cookie': 'v=Awcc3GskLeIF9alu7ByNVWQylrDUDNolNeJfetn1IU-9lCmu4dxrPkWw77Xq',
    'Referer': 'https://q.10jqka.com.cn/'
}

解析資料：

import parsel

response = requests.get(url=url, headers=headers)
html = response.text
selector = parsel.Selector(response.text)

# 提取所有tr標籤
data = selector.css('.m-table tr')[1:]
# print(data)
for i in data:
    info = i.css('td::text').getall()
    numberAndName = i.css('td a::text').getall()
    # print(numberAndName)
    # print(info)
    # 把資料儲存到字典裡
    dit = {
        '序號': info[0],
        '程式碼': numberAndName[0],
        '名稱': numberAndName[1],
        '現價': info[1],
        '漲跌幅(%)': info[2],
        '漲跌': info[3],
        '漲速(%)': info[4],
        '換手(%)': info[5],
        '量比': info[6],
        '振幅(%)' : info[7],
        '成交額': info[8],
        '流通股': info[9],
        '流通市值': info[10],
        '市盈率': info[11]
    }
    print(dit)

儲存資料

股票資料儲存為csv

import csv

# 建立檔案物件
f = open('stockInformation.txt', mode='w', encoding='utf-8', newline='')
# 字典寫入方法
csv_write = csv.DictWriter(f, fieldnames=[
    '序號',
    '程式碼',
    '名稱',
    '現價',
    '漲跌幅(%)',
    '漲跌',
    '漲速(%)',
    '換手(%)',
    '量比',
    '振幅(%)',
    '成交額',
    '流通股',
    '流通市值',
    '市盈率',
])
# 寫入表頭
csv_write.writeheader()

......

for i in data:
    ......
    # 寫入資料
    csv_write.writerow(dit)
......

執行結果：

翻頁爬取

分析請求連線的變化規律

https://q.10jqka.com.cn/index/index/board/all/field/zdf/order/desc/page/1/ajax/1/

https://q.10jqka.com.cn/index/index/board/all/field/zdf/order/desc/page/2/ajax/1/

https://q.10jqka.com.cn/index/index/board/all/field/zdf/order/desc/page/3/ajax/1/

新增翻頁功能：

for page in range(1, 3):
    print(f'正在採集第{page}頁')
    url = f'https://q.10jqka.com.cn/index/index/board/all/field/zdf/order/desc/page/{page}/ajax/1/'

   ......

        print(dit)

執行結果：

目標網址cookies會變化，可以使用使用 requests.Session() 來自動管理 cookies

    with requests.Session() as session:
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36',
            # 不需要設定 'Cookie' 頭部，因為 Session 會自動處理
            'Referer': 'https://q.10jqka.com.cn/'
        }

同花順公司面試
2019-03-25
面試
同花順轉股溢價率中位數獲取
2024-06-26
資料庫改造方案 | 同花順、弘源泰平真實案例分享
2022-10-13
資料庫
房產資料爬取、智慧財產權資料爬取、企業工商資料爬取、抖音直播間資料python爬蟲爬取
2024-07-11
Python爬蟲
耗時6小時的同花順面試
2019-03-15
面試
Flutter 自定義View——仿同花順自選股列表
2020-07-14
FlutterView
爬蟲爬取資料如何繞開限制？
2022-06-10
爬蟲
Python：爬取疫情每日資料
2020-02-17
Python
Puppeteer爬取網頁資料
2019-03-22
網頁
python 爬取飄花電影下載地址
2020-11-22
Python
爬蟲實戰——58同城租房資料爬取
2019-12-04
爬蟲
如何保障爬蟲高效穩定爬取資料？
2022-05-27
爬蟲
python爬取股票資料並存到資料庫
2021-03-29
Python資料庫
Python 爬取 baidu 股票市值資料
2019-02-16
PythonAI
鬥魚彈幕資料爬取
2018-12-08
拉勾網職位資料爬取
2018-08-26
Python爬取噹噹網APP資料
2020-10-21
PythonAPP
豆瓣top250資料爬取
2020-11-09
某魚直播資料全站爬取
2020-04-05
Python爬取CSDN部落格資料
2019-01-03
Python
使用 Python 爬取網站資料
2024-07-27
Python網站
歷史股票資料的爬取
2021-12-31
scrapy爬取豆瓣電影資料
2021-09-11
怎樣高效的爬取資料？
2023-02-07
12 爬取目標的資料分析
2018-12-05
Selenium + Scrapy爬取某商標資料
2018-06-27
python爬取58同城一頁資料
2018-08-04
Python
快速爬取登入網站資料
2020-11-20
網站
如何提升scrapy爬取資料的效率
2019-03-05
如何利用 Selenium 爬取評論資料？
2018-04-12
什麼電商資料值得爬取？
2022-12-08
Dengine在同花順組合管理業務中的優化實踐
2022-05-18
優化
Python爬蟲框架：scrapy爬取高考派大學資料
2019-10-07
Python爬蟲框架
Python爬蟲入門【3】：美空網資料爬取
2019-07-30
Python爬蟲
爬蟲如何爬取貓眼電影TOP榜資料
2019-06-17
爬蟲
輕鬆利用Python爬蟲爬取你想要的資料
2021-09-10
Python爬蟲
Python資料爬蟲學習筆記（11）爬取千圖網圖片資料
2018-09-18
Python爬蟲筆記
Python爬蟲訓練：爬取酷燃網視訊資料
2020-10-23
Python爬蟲

同花順資料爬取

請求獲取資料

解析資料：

儲存資料

翻頁爬取

相關文章