pytorch實現股票預測
1.股票預測概述
股票預測我分了兩大部分,第一個是模型訓練,第二個是模型預測,模型訓練中我又分成資料讀取、特徵選擇、模型訓練三個部分。
- 模型訓練
- 資料讀取
- 特徵選擇
- 模型訓練
- 模型預測
2.模型訓練
2.1資料讀取
模型訓練: 我的想法是通過股票前n天的資料去預測股票五天之後的漲跌,因為我覺得買股票不應該只看後面一天的股價變化,所以我這裡選擇了五天之後的股價。
我這裡約定
漲:五天之後股價的最小值比五天前股價的最大值要高
跌:五天之後股價的最大值比五天前股價的最小值要低
平:其餘的情況
股票資料收集: 這裡股票資料收集採用 http://baostock.com/baostock/index.php/
資料處理: 這裡選擇days_to_train=20,days_to_pred=5,
來表示用20天的股票資料來預測第25天的股價漲跌,在資料處理中去掉不需要的資料。
import baostock as bs
import pandas as pd
import numpy as np
def get_data(stock_id='hz.600000', days_to_train=20, days_to_pred=5, start_data='2019-12-15', end_date='2020-12-15'):
# 需要用20天的資料去預測未來五天的資料
# days_to_train = 20
# days_to_pred = 5
# 登陸系統
lg = bs.login()
# 顯示登陸返回資訊
print('login respond error_code:' + lg.error_code)
print('login respond error_msg:' + lg.error_msg)
# 獲取滬深A股歷史K線資料
# 引數說明:http://baostock.com/baostock/index.php/Python_API%E6%96%87%E6%A1%A3#.E8.8E.B7.E5.8F.96.E5.8E.86.E5.8F.B2A.E8.82.A1K.E7.BA.BF.E6.95.B0.E6.8D.AE.EF.BC.9Aquery_history_k_data_plus.28.29
rs = bs.query_history_k_data_plus(stock_id,
"date,code,open,high,low,close,preclose,volume,amount,turn,tradestatus,pctChg,peTTM,pbMRQ,psTTM,pcfNcfTTM,isST",
start_date=start_data, end_date=end_date,
frequency="d", adjustflag="1")
print('query_history_k_data_plus respond error_code:' + rs.error_code)
print('query_history_k_data_plus respond error_msg:' + rs.error_msg)
# 列印結果集
data_list = []
while (rs.error_code == '0') & rs.next():
# 獲取一條記錄,將記錄合併在一起
data_list.append(rs.get_row_data())
result = pd.DataFrame(data_list, columns=rs.fields)
# 登出系統
bs.logout()
# 處理結果
columns_all = result.columns
columns_need = columns_all[2:-1]
data_need = result[columns_need]
column_low = 'low'
column_high = 'high'
# labels用於記錄股票在五天的時候是漲是跌
# 漲:2
# 平:1
# 跌:0
labels = []
# train_data用於記錄上述分類中使用的訓練資料
train_data = []
for day in data_need.sort_index(ascending=False).index:
day_pred_low = data_need.loc[day][column_low]
day_pred_high = data_need.loc[day][column_high]
if not (day - days_to_train - days_to_pred + 1 < 0):
day_before_low = data_need.loc[day - days_to_pred][column_low]
day_before_high = data_need.loc[day - days_to_pred][column_high]
if day_pred_low > day_before_high:
labels.append(2)
elif day_pred_high < day_before_low:
labels.append(0)
else:
labels.append(1)
train_data.append(data_need.loc[day - days_to_pred - days_to_train + 1:day - days_to_pred])
return train_data, labels
2.2特徵提取
這裡沒有正真的特徵提取,我覺得就算提取股價時間序列的什麼最大值、最小值、陡度等特徵還是不能準確表達股價變化,直接在後面的神經網路中採用LSTM或者是GRU等用於處理時間序列的網路來做特徵提取,這裡我做的特徵提取就是所謂的歸一化,這裡對時間序列中每一個特徵做歸一化,即對每一列做歸一化。
from sklearn import preprocessing
import numpy as np
from model_train_network.get_stock_data_1 import get_data
def norm_data(data):
data_norm = []
scaler = preprocessing.StandardScaler()
data = np.array(data)
for i in range(data.shape[0]):
data_norm.append(scaler.fit_transform(data[i]))
return data_norm
2.3模型訓練
這裡所使用的網路就是GRU+啟用+2層線性層
GRUNetV2: 神經網路 :GRU+啟用+2層線性層
split_data: 分開資料,分成訓練的和測試的
train: 訓練函式
test: 測試函式
對net.train(),net.eval()
的操作說明
同時發現,如果不寫這兩個程式也可以執行,這是因為這兩個方法是針對在網路訓練和測試時採用不同方式的情況,比如Batch Normalization
和 Dropout
。
訓練時是正對每個min-batch的,但是在測試中往往是針對單張圖片,即不存在min-batch的概念。由於網路訓練完畢後引數都是固定的,因此每個批次的均值和方差都是不變的,因此直接結算所有batch的均值和方差。所有Batch Normalization的訓練和測試時的操作不同
在訓練中,每個隱層的神經元先乘概率P,然後在進行啟用,在測試中,所有的神經元先進行啟用,然後每個隱層神經元的輸出乘P。
# -*- coding:utf-8 -*-
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import torch
from torch.utils import data
from torchvision import transforms
from sklearn.model_selection import train_test_split
from model_train_network.get_stock_data_1 import get_data
from model_train_network.select_feature_stock_data_2 import norm_data
# 理解GRU網路結果https://blog.csdn.net/qq_27825451/article/details/99691258
class GRUNetV2(torch.nn.Module):
def __init__(self, input_dim, hidden_size, out_size, drop_out, n_layers=1):
super(GRUNetV2, self).__init__()
# self.batch_size = batch_size
self.hidden_size = hidden_size
self.n_layers = n_layers
self.out_size = out_size
self.drop_out = drop_out
# 這裡指定了BATCH FIRST,所以輸入時BATCH應該在第一維度
self.gru = torch.nn.Sequential(
torch.nn.GRU(input_dim, hidden_size, n_layers, dropout=drop_out, batch_first=True),
)
self.relu = torch.nn.ReLU(inplace=True)
# 加了一個線性層,全連線
self.fc1 = torch.nn.Linear(hidden_size, 32)
# 加入了第二個全連線層
self.fc2 = torch.nn.Linear(32, out_size)
self.softmax = torch.nn.Softmax(dim=1)
def forward(self, x):
# x的格式(batch,seq,feature)
output, hidden = self.gru(x)
output = self.relu(output)
# output是所有隱藏層的狀態,hidden是最後一層隱藏層的狀態
output = self.fc1(output)
output = self.fc2(output)
output = self.softmax(output)
# 僅僅獲取 time seq 維度中的最後一個向量
# the last of time_seq
output = output[:, -1, :]
return output
feature_dim = 14
hidden_size = 64
output_dim = 3
num_layers = 3
drop_out_gru = 0.3
# hyper parameters
BATCH_SIZE = 8 # batch_size
LEARNING_RATE = 0.001 # learning_rate
EPOCH = 600 # epochs
net = GRUNetV2(feature_dim, hidden_size, output_dim, drop_out_gru, num_layers)
net = net.to('cpu')
net = net.double()
print(net)
optimizer = torch.optim.Adam(net.parameters(), lr=LEARNING_RATE, betas=(0.8, 0.8))
loss_func = torch.nn.CrossEntropyLoss()
transform = transforms.Compose([
transforms.ToTensor(),
])
def split_data(X, y):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=3)
X_train = torch.tensor(X_train)
X_test = torch.tensor(X_test)
y_train = torch.tensor(y_train)
y_test = torch.tensor(y_test)
torch_train_dataset = data.TensorDataset(X_train, y_train)
torch_test_dataset = data.TensorDataset(X_test, y_test)
trainloader = data.DataLoader(
dataset=torch_train_dataset,
batch_size=BATCH_SIZE,
shuffle=True,
# num_workers=2
)
testloader = data.DataLoader(
dataset=torch_test_dataset,
batch_size=BATCH_SIZE,
shuffle=True,
# num_workers=2
)
return trainloader, testloader
train_loss = []
train_acc = []
# Training
def train(epoch, trainloader):
global train_acc, train_loss
print('\n train Epoch: %d' % epoch)
net.train()
train_loss_tmp = 0
train_loss_avg = 0
correct = 0
total = 0
for batch_idx, (inputs, targets) in enumerate(trainloader):
# print(batch_idx)
inputs, targets = inputs.to('cpu'), targets.to('cpu')
optimizer.zero_grad()
outputs = net(inputs)
# print(outputs)
loss = loss_func(outputs, targets)
loss.backward()
optimizer.step()
train_loss_tmp += loss.item()
_, predicted = torch.max(outputs, 1)
# print(predicted)
total += targets.size(0)
correct += predicted.eq(targets).sum().item()
train_loss_avg = train_loss_tmp / (batch_idx + 1)
print(batch_idx, len(trainloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
% (train_loss_avg, 100. * correct / total, correct, total))
train_loss.append(train_loss_avg)
train_acc.append(100. * correct / total)
print('\n -----train Epoch Over: %d------\n' % epoch)
print(len(trainloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
% (train_loss_avg, 100. * correct / total, correct, total))
test_acc = []
test_loss = []
best_acc = 0
best_acc_tmp = 0
def test(epoch, testloader):
print('\n test Epoch: %d' % epoch)
global test_acc, test_loss, best_acc_tmp
net.eval()
test_loss_tmp = 0
test_loss_avg = 0
correct = 0
total = 0
with torch.no_grad():
for batch_idx, (inputs, targets) in enumerate(testloader):
inputs, targets = inputs.to('cpu'), targets.to('cpu')
outputs = net(inputs)
loss = loss_func(outputs, targets)
test_loss_tmp += loss.item()
_, predicted = torch.max(outputs, 1)
total += targets.size(0)
correct += predicted.eq(targets).sum().item()
test_loss_avg = test_loss_tmp / (batch_idx + 1)
print(batch_idx, len(testloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
% (test_loss_avg, 100. * correct / total, correct, total))
test_loss.append(test_loss_avg)
test_acc.append(100. * correct / total)
best_acc_tmp = max(test_acc)
print('\n -----test Epoch Over: %d------\n' % epoch)
print(len(testloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
% (test_loss_avg, 100. * correct / total, correct, total))
if __name__ == '__main__':
for j in [12, 13, 14, 15]:
train_loss = []
train_acc = []
test_acc = []
test_loss = []
data_colums = ['train_acc', 'train_loss', 'test_acc', 'test_loss']
data_train, labels = get_data('sz.000651', j, 5)
data_train = norm_data(data_train)
train_loader, test_loader = split_data(data_train, labels)
for i in range(EPOCH):
train(i, train_loader)
test(i, test_loader)
data_result = np.stack((train_acc, train_loss, test_acc, test_loss), axis=1)
print(data_result)
if i == 0:
data_result = pd.Series(data_result.squeeze(), index=data_colums)
else:
data_result = pd.DataFrame(data_result, columns=data_colums)
data_result.to_csv('../result_acc/result.csv')
if best_acc_tmp > best_acc:
best_acc = best_acc_tmp
data_best = pd.Series((best_acc, j))
data_best.to_csv('../result_acc/best.csv')
torch.save(net.state_dict(), '../result_model/params_000651.pkl')
# Data for plotting
plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
t = np.arange(EPOCH)
fig, ax = plt.subplots()
ax.plot(t, train_acc, t, test_acc)
ax.set(xlabel='訓練次數', ylabel='準確性%',
title='訓練準確性')
fig.savefig("../result_acc/acc_test" + str(j) + ".png")
# plt.show()
3.模型預測
模型預測中選擇從今天開始到n天之前的股票資料(包含40組股票資料),剛好也可以看看前一段時間預測的準確性。
import re
import pandas as pd
import numpy as np
import datetime
import calendar
import torch
from model_train_network.train_3 import GRUNetV2
from model_train_network.get_stock_data_1 import get_data
from model_train_network.get_stock_data_1 import get_data_pred
from model_train_network.select_feature_stock_data_2 import norm_data
feature_dim = 14
hidden_size = 64
output_dim = 3
num_layers = 3
drop_out_gru = 0.3
# hyper parameters
BATCH_SIZE = 8 # batch_size
LEARNING_RATE = 0.001 # learning_rate
EPOCH = 600 # epochs
net = GRUNetV2(feature_dim, hidden_size, output_dim, drop_out_gru, num_layers)
net.load_state_dict(torch.load('../result_model/params_000001.pkl'))
net.eval()
net = net.to('cpu')
net = net.double()
print(net)
def model_pred(stock_id, days_to_train):
# stock_id = 'sh.600519'
# days_to_train = 17
date_time = datetime.datetime.now()
year_now = year = date_time.year
month_now = month = date_time.month
day_now = day = date_time.day
num = 40
date_tuple_prev = ()
date_list = []
while True:
if day >= 1:
print('--------------%d---------' % day)
date_tuple_tmp = (str(year), str(month), str(day))
connect_1 = ''
connect_2 = '-'
date = connect_1.join(date_tuple_tmp)
date_list_style = connect_2.join(date_tuple_tmp)
# 0表示週一,2表示週二,...,6表示週日
week = datetime.datetime.strptime(date, "%Y%m%d").weekday()
day -= 1
print(week)
if week in [0, 1, 2, 3, 4]:
num -= 1
date_list.append(date_list_style)
if num == 0:
date_tuple_prev = (str(year), str(month), str(day))
break
elif day == 0:
month -= 1
if month in [1, 3, 5, 7, 8, 10, 12]:
day = 31
if month in [4, 6, 9, 11]:
day = 30
else:
check_year = calendar.isleap(year)
if check_year:
day = 29
else:
day = 28
date_tuple_now = (str(year_now), str(month_now), str(day_now))
connect = '-'
end_data = connect.join(date_tuple_now)
start_data = connect.join(date_tuple_prev)
data_train, labels = get_data_pred(stock_id, days_to_train, 5, start_data=start_data, end_date=end_data)
data_train = norm_data(data_train)
inputs = torch.tensor(data_train)
outputs = net(inputs)
_, predicted = torch.max(outputs, 1)
labels = torch.Tensor(labels)
print('-------預測資料如下-------')
print(predicted)
total = labels.shape[0]
correct = predicted.eq(labels).sum().item()
print(100. * correct / total, '(%d/%d)' % (correct, total))
predicted = np.append(predicted.numpy(), 100. * correct / total)
date_list = date_list[0:num - days_to_train + 2]
date_list.append('correct')
pd.Series(predicted, index=date_list).to_csv(
'../result_up_down/result_' + str(re.findall(r'\d+', stock_id)[0]) + '.csv')
相關文章
- 機器學習股票價格預測初級實戰機器學習
- 機器學習股票價格預測從爬蟲到預測-預測與調參機器學習爬蟲
- Tensorflow神經網路預測股票均價神經網路
- 使用LSTM模型做股票預測【基於Tensorflow】模型
- Pytorch資料讀取與預處理實現與探索PyTorch
- RNN的PyTorch實現RNNPyTorch
- pytorch實現yolov3(5) 實現端到端的目標檢測PyTorchYOLO
- Transformer的Pytorch實現【1】ORMPyTorch
- Pytorch實現分類器PyTorch
- Pytorch實現波阻抗反演PyTorch
- CNN的Pytorch實現(LeNet)CNNPyTorch
- NNLM原理及Pytorch實現PyTorch
- pytorch實現yolov3(3) 實現forwardPyTorchYOLOForward
- 【小白學PyTorch】13 EfficientNet詳解及PyTorch實現PyTorch
- 【小白學PyTorch】12 SENet詳解及PyTorch實現PyTorchSENet
- 機器學習股票價格預測從爬蟲到預測-資料爬取部分機器學習爬蟲
- Conditional AutoEncoder的Pytorch完全實現PyTorch
- pytorch實現線性迴歸PyTorch
- VGG網路的Pytorch實現PyTorch
- 基於pytorch實現模型剪枝PyTorch模型
- python實現股票歷史資料析Python
- Transformer註解及PyTorch實現(下)ORMPyTorch
- Transformer註解及PyTorch實現(上)ORMPyTorch
- LetNet、Alex、VggNet分析及其pytorch實現PyTorch
- pytorch實現 | Deformable Convolutional Networks | CVPR | 2017PyTorchORM
- 為了預測股票,我用TensorFlow深度學習了股市資料深度學習
- 從零開始PyTorch專案:YOLO v3目標檢測實現PyTorchYOLO
- 用pytorch實現LeNet-5網路PyTorch
- Pytorch 實現簡單線性迴歸PyTorch
- 基於Pytorch實現貓狗分類PyTorch
- PyTorch預訓練Bert模型PyTorch模型
- 利用深度學習和機器學習預測股票市場(附程式碼)深度學習機器學習
- 【pytorch_5】線性迴歸的實現PyTorch
- CNN+pytorch實現文字二分類CNNPyTorch
- 如何實現一流的專案可預測性?
- 【AI】Pytorch_預訓練模型AIPyTorch模型
- 使用PyTorch演示實現神經網路過程PyTorch神經網路
- 輕量化模型訓練加速的思考(Pytorch實現)模型PyTorch