关键词:
RNNs在股票价格预测的应用
前言
RNN和LSTMs在时态数据上表现特别好,这就是为什么他们在语音识别上是有效的。我们通过前25天的开高收低价格,去预测下一时刻的前收盘价。每个时间序列通过一个高斯分布和2层LSTM模型训练数据。文章分为两个版块,外汇价格预测和每日盘中价格预测(30分钟、15分钟、5分钟,等等)。源代码请在文末获取!
外汇预测(用英语描述)
a. Daily Data is pulled from Yahoo’s Data Reader
b. Only the training set is preprocessed because we create a separate test set later on
c. “model_forex” is the model for to build and train.
d. Create separate daily test set by specifying dates which start after your training set ends.
e. You can see “model_forex” is plugged in here for running the prediction
predicted_st = predict_standard(X_test_stock,y_test_stock, model_forex)
盘中预测(用英语描述)
a. Intraday Data is pulled from Google’s API. The second argument is the time in seconds (900 secs = 15 mins) and the third argument it the number of days, the max backtrack day for Googles API is 15 days I believe.
df = get_google_data(INTRA_DAY_TICKER, 900, 150)
b. Preprocess the full set of data and train test split it with “train_test_split_intra”
c. “model_intra” is the model for to build and train.
d. You can see “model_intra” is plugged in here for running the prediction
predicted_intra = predict_intra(X_test_intra,y_test_intra, model_intra)
代码展示
SITE = "http://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
def scrape_list(site):
hdr = {‘User-Agent‘: ‘Mozilla/5.0‘}
req = urllib2.Request(site, headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
table = soup.find(‘table‘, {‘class‘: ‘wikitable sortable‘})
sector_tickers = dict() for row in table.findAll(‘tr‘):
col = row.findAll(‘td‘) if len(col) > 0:
sector = str(col[3].string.strip()).lower().replace(‘ ‘, ‘_‘)
ticker = str(col[0].string.strip()) if sector not in sector_tickers:
sector_tickers[sector] = list()
sector_tickers[sector].append(ticker) return sector_tickers
sector_tickers = scrape_list(SITE)
##Help functions to normalize and denormalize values
(省略)
# Sequence Length, or # of days of tradingSEQ_LENGTH = 25
# Number of units in the two hidden (LSTM) layersN_HIDDEN = 256
#Number of attributes used for each trading daynum_attr = 4
#Out of those attribute how many are indicatorsnum_indicators = 0
#Variable to help define how far you want your y to reachREWARD_LAG = 1
#How many dats ahead do you want to predictLOOK_AHEAD = 5
#Window StrideSTRIDE = 1
def _load_data(data, n_prev = SEQ_LENGTH):
docX, docY = [], [] for i in range(len(data)-n_prev):
x,y = norm(data.iloc[i:i+n_prev,:num_attr].as_matrix(),data.iloc[i+n_prev-1,num_attr:].as_matrix())
docX.append(x)
docY.append(y)
alsX = np.array(docX)
alsY = np.array(docY) return alsX, alsYdef _load_data_test(data, n_prev = SEQ_LENGTH):
docX, docY = [], []
num_sequences = (len(data)-n_prev+1)/STRIDE for i in range(num_sequences):
i = i*STRIDE
x = (data.iloc[i:i+n_prev,:num_attr].as_matrix())
y = (data.iloc[i+n_prev-1,num_attr:].as_matrix()) #x,y = norm(data.iloc[i:i+n_prev,:num_attr].as_matrix(),data.iloc[i+n_prev-1,num_attr:].as_matrix())
docX.append(x)
docY.append(y)
alsX = np.array(docX)
alsY = np.array(docY) return alsX, alsYdef _load_data_norm(data, n_prev = SEQ_LENGTH):
docX, docY = [], [] for i in range(len(data)-n_prev):
x = np.array((data.iloc[i:i+n_prev,:num_attr].as_matrix()))
y = np.array((data.iloc[i+n_prev-1,num_attr:].as_matrix()))
(省略)
外汇数据
##Dataset on just single ticker to test performancesdf = data.DataReader(‘EUR=X‘, ‘yahoo‘, datetime(2010,8,1), datetime(2014,8,1))
# df[‘RSI‘] = ta.RSI(df.Close.values,timeperiod=14)# _,_, macdhist = ta.MACD(df.Close.values, fastperiod=12, slowperiod=26, signalperiod=9)# df[‘MACDHist‘] = macdhist
##Add the predicted coloumn Y, as the last coloumn can be defined however you think is a good representation of a good decision
##Clean the rest of the Data Frame
y = []for i in range(0,len(df)): if i >= (len(df)- STRIDE):
y.append(None) else: if (REWARD_LAG > 1):
val = 0
for n in range(REWARD_LAG):
val = val + df[‘Close‘][i+n+1]
val = val / float(REWARD_LAG)
y.append(val) else:
y.append(df[‘Close‘][i+REWARD_LAG])
df[‘Y_Values‘] =np.asarray(y)
df = df.dropna()#print (df)sliced_df = df.drop([‘Adj Close‘,‘Volume‘] ,axis=1)#print (sliced_df)#(X_train, y_train), (X_test, y_test) = train_test_split(sliced_df)(X_train, y_train) = train_test_split(sliced_df)
print(X_train[0],y_train[0])print (X_train.shape,y_train.shape)
(array([[-0.76244909, -0.75153814, -1.36800657, -1.28695383], [-1.28305706, -1.17005084, -1.66649887, -1.50673145],
(省略)
盘中数据
def get_google_data(symbol, period, window):
url_root = ‘http://www.google.com/finance/getprices?i=‘
url_root += str(period) + ‘&p=‘ + str(window)
url_root += ‘d&f=d,o,h,l,c,v&df=cpct&q=‘ + symbol
print(url_root)
response = urllib2.urlopen(url_root)
data = response.read().split(‘
‘) #actual data starts at index = 7
#first line contains full timestamp,
#every other line is offset of period from timestamp
parsed_data = []
anchor_stamp = ‘‘
end = len(data) for i in range(7, end):
cdata = data[i].split(‘,‘) if ‘a‘ in cdata[0]: #first one record anchor timestamp
anchor_stamp = cdata[0].replace(‘a‘, ‘‘)
cts = int(anchor_stamp) else: try:
coffset = int(cdata[0])
cts = int(anchor_stamp) + (coffset * period)
parsed_data.append((dt.datetime.fromtimestamp(float(cts)), float(cdata[1]), float(cdata[2]), float(cdata[3]), float(cdata[4]), float(cdata[5]))) except: pass # for time zone offsets thrown into data
df = pd.DataFrame(parsed_data)
df.columns = [‘ts‘, ‘Open‘, ‘High‘, ‘Low‘, ‘Close‘, ‘Volume‘]
df.index = df.ts del df[‘ts‘] return df
盘中创建单独的数据集
df = get_google_data(‘AAPL‘, 900, 150)#print(df)plt.plot(df[‘Close‘].values[:])
y = []for i in range(0,len(df)): if i >= (len(df)- REWARD_LAG):
y.append(None) else: if (REWARD_LAG > 1):
val = 0
for n in range(REWARD_LAG):
val = val + df[‘Close‘][i+n+1]
val = val / float(REWARD_LAG)
y.append(val)
print(‘here‘) else:
y.append(df[‘Close‘][i+REWARD_LAG])
df[‘Y_Values‘] =np.asarray(y)
df = df.dropna()
sliced_df = df.drop([‘Volume‘] ,axis=1)#print(sliced_df)(X_train, y_train), (X_test, y_test) = train_test_split_intra(sliced_df)#print(X_train[0],y_train[0])print(len(X_train),len(X_test))#print(X_test[0],y_test[0])
(1168, 108)
构建网络结构
model_intra = Sequential()
model_intra.add(LSTM(N_HIDDEN, return_sequences=True, activation=‘tanh‘, input_shape=(SEQ_LENGTH, num_attr)))#model_intra.add(LSTM(N_HIDDEN, return_sequences=True, activation=‘tanh‘))model_intra.add(LSTM(N_HIDDEN, return_sequences=False, activation=‘tanh‘))
model_intra.add(Dense(1,activation=‘linear‘))
model_intra.compile(loss="mean_squared_error", optimizer=‘adam‘)
model_intra_full = Sequential()
model_intra_full.add(LSTM(N_HIDDEN, return_sequences=True, activation=‘tanh‘, input_shape=(SEQ_LENGTH, num_attr)))#model_intra_full.add(LSTM(N_HIDDEN, return_sequences=True, activation=‘tanh‘))model_intra_full.add(LSTM(N_HIDDEN, return_sequences=False, activation=‘tanh‘))
model_intra_full.add(Dense(1,activation=‘linear‘))
model_intra_full.compile(loss="mean_squared_error", optimizer=‘adam‘)
model_forex = Sequential()
model_forex.add(LSTM(N_HIDDEN, return_sequences=True, activation=‘tanh‘, input_shape=(SEQ_LENGTH, num_attr)))#model_forex.add(LSTM(N_HIDDEN, return_sequences=True, activation=‘tanh‘))model_forex.add(LSTM(N_HIDDEN, return_sequences=False, activation=‘tanh‘))
model_forex.add(Dense(1,activation=‘linear‘))
model_forex.compile(loss="mean_squared_error", optimizer
符合模型的模型和参数
print(X_train.shape)
print(y_train.shape)
(1018, 25, 4) (1018, 1)
model_intra.fit(X_train, y_train, batch_size=50, nb_epoch=
Train on 1156 samples, validate on 12 samples Epoch 1/150 1156/1156 [==============================] - 1s - loss: 1.9575 - val_loss: 0.5494 Epoch 2/150 1156/1156 [==============================] - 1s - loss: 1.4731 - val_loss: 0.4006
(省略)
辅助绩效评估功能
#Function to normalize the test input then denormalize the result. Calculate the rmse of the predicted values on the test setdef predict(X_test,y_test, myModel):
predicted = [] for example in X_test:
x = copy.copy(example) #print (x)
x_norm, mn, mx = normalize(x)
toPred = []
toPred.append(x_norm)
add = np.array(toPred) #Predict for the standard model
predict_standard = myModel.predict(add)
pred_st = copy.copy(predict_standard)
y_real_st = deNormalizeY(pred_st,mn,mx)
predicted.append(y_real_st[0]) #Predict for the bidirectional model# predict_bidirectional = myModel.predict([add,add])# pred_bi = copy.copy(predict_bidirectional)# y_real_bi = deNormalizeY(pred_bi,mn,mx)# predicted.append(y_real_bi[0])(省略)df_test = data.DataReader(‘EUR=X‘, ‘yahoo‘, datetime(2014,8,1), datetime(2015,8,1))# df_test[‘RSI‘] = ta.RSI(df_test.Close.values,timeperiod=14)# _,_, macdhist = ta.MACD(df_test.Close.values, fastperiod=12, slowperiod=26, signalperiod=9)# df_test[‘MACDHist‘] = macdhisty = []for i in range(0,len(df_test)): if i >= (len(df_test)- STRIDE):
y.append(None) else: if (REWARD_LAG > 1):
val = 0
for n in range(REWARD_LAG):
val = val + df_test[‘Close‘][i+n+1]
val = val / float(REWARD_LAG)
y.append(val) else:
y.append(df_test[‘Close‘][i+REWARD_LAG])
(省略)
MAE for LSTM is: [0.0035823152701196983] MAE for doing nothing is: [0.0045693478326778786] RMSE for LSTM is: [0.0050684837061917686] RMSE for doing nothing is: [0.0061416562709802761] Net profit for 0.0 threshhold is 245.261025777 making 234 trades Net profit for 0.001 threshhold is 242.673572498 making 201 trades (省略)
盘中交易评价和结果
def predict_intra(X_test, y_test, myModel):
print(len(X_test))
predicted = [] for example in X_test: #Transform the training example into gaussing distribution
x_norm, mean, std = normDist(np.array(example)) #Add examples to array to predict
toPred = []
toPred.append(x_norm)
add = np.array(toPred) #Predict these examples
predict_standard = myModel.predict(add)
pred = copy.copy(predict_standard)
y_real = deNormDist(pred,mean,std)
predicted.append(y_real[0]) return predicted
predicted_intra = predict_intra(X_test,y_test, model_intra)
plt.figure(figsize=(20,20))
plt.plot(y_test)
plt.plot(predicted_intra)
plt.show()
MAE and RMSE 评估
sum_error = 0sum_error_donothing = 0for i in range(len(predicted_intra)): if i>0:
sum_error = sum_error + abs(predicted_intra[i] - y_test[i])
sum_error_donothing = sum_error_donothing + abs(predicted_intra[i] - y_test[i-1])
MAE_lstm = sum_error/len(predicted_intra)
MAE_donothing = sum_error_donothing/len(predicted_intra)
print("MAE for LSTM is: " + str(MAE_lstm))
print("MAE for doing nothing is: " + str(MAE_donothing))
MAE for LSTM is: [0.091961468484759237] MAE for doing nothing is: [0.16699238882416201]
sum_error = 0sum_error_donothing = 0for i in range(len(predicted_intra)): if i>0:
sum_error = sum_error + (predicted_intra[i] - y_test[i])**2
sum_error_donothing = sum_error_donothing + (predicted_intra[i] - y_test[i-1])**2RMSE_lstm = (sum_error/len(predicted_intra))**(1.0/2.0)
RMSE_donothing = (sum_error_donothing/len(predicted_intra))**(1.0/2.0)
print("RMSE for LSTM is: " + str(RMSE_lstm))
print("RMSE for doing nothing is: " + str(RMSE_dono
RMSE for LSTM is: [0.15719269057322682] RMSE for doing nothing is: [0.23207816758496383]
Policy的功能评价
net_profits = []
protits_per_trade = []for i in range(50):
THRESH = i/10000.0
LOT_SIZE = 100
net_profit = 0
num_trades = 0
for i in range(len(predicted_intra)): if i>1:
predicted_change = ((predicted_intra[i] / y_test[i-1]) - 1) #print(predicted_change)
actual_change = (predicted_intra[i] - y_test[i])*LOT_SIZE if predicted_change >= THRESH: #print("Buy")
net_profit = net_profit + actual_change
num_trades = num_trades + 1
(省略)
(array([327.67074597699519], dtype=object), 106) (array([322.81673063817777], dtype=object), 103)
plt.plot(net_profits)
plt.show()
plt.plot(protits_per_trade)
plt.show()
其他
buyTotal = 0sellTotal = 0correct = 0sellCorrect = 0buyCorrect = 0for i in range(len(predicted_st)):
realAnswer = y_test_stock[i][0][0] if predicted_st[i][1] > predicted_st[i][0]:
predicted = 0 #Buy
else:
predicted = 1 #Sell
if realAnswer == 0: ##This is where the actual answer is Buy:Up:[0,1]:0
buyTotal = buyTotal + 1
if predicted == realAnswer:
buyCorrect = buyCorrect + 1
correct = correct + 1(省略)
(349, 730, 0.4780821917808219) (210, 382, 0.5497382198952879) (139, 348, 0.3994252873563218) 0.523287671233 0.476712328767
MMM AYI ALK ALLE(省略)
创造基线RMSE
totalCorrect = 0total = 0for stock in testing_dataframes[:50]:
X_test_stock, y_test_stock = _load_data_test(stock[1])
predicted_st = predict_standard(X_test_stock,y_test_stock, model)
buyTotal = 0
sellTotal = 0
correct = 0
sellCorrect = 0
buyCorrect = 0(省略)
#Count the number of positive and the number of negative calls you got righttotalCorrect = 0total = 0buyTotal = 0sellTotal = 0correct = 0sellCorrect = 0buyCorrect = 0for i in range(len(predicted_st)):
realAnswer = y_test_stock[i][0][0] if predicted_st[i][1] > predicted_st[i][0]:
predicted = 0 #Buy
(省略)
(104, 235, 0.4425531914893617) (104, 104, 1.0) (0, 131, 0.0) 0.442553191489 0.557446808511
from sklearn.metrics import f1_score##Calculate F1 scoreactual = []
result = []for y in y_test_merged: if y[0] == 0:
actual.append(0) else:
actual.append(1)for y in predicted_st: if y[1] > y[0]:
result.append(0) else:
result.append(1)
score = f1_score(actual,result,average=‘weighted‘,pos_label=1)
print(score)
0.498192044998
#Same percentage calculations but with a thresholdTHRESH = 0.1totalCorrect = 0total = 0noDecision = 0buyTotal = 0sellTotal = 0correct = 0sellCorrect = 0buyCorrect = 0for i in range(len(predicted_st)):
realAnswer = y_test_merged[i][0] if predicted_st[i][1] - THRESH > .5:
predicted = 0 #Buy
elif predicted_st[i][0] - THRESH > .5:
predicted = 1 #Sell
else:
predicted = 2 #Pass, do not count towards percentages because you make no decision if .6>x>.4(省略)
(347, 750, 0.46266666666666667) (190, 351, 0.5413105413105413) (157, 399, 0.39348370927318294) If you just predicted all Up 0.468 If you just predicted all Down 0.532
thresholds = []
totalAcc = []
positiveAcc = []
negativeAcc = []##Graph this graph of the threshold vs accuracyfor i in range(10):
thresh = i/100.0
totalCorrect = 0
total = 0
noDecision = 0
buyTotal = 0
sellTotal = 0
correct = 0
sellCorrect = 0
buyCorrect = 0
for i in range(len(predicted_st)):
realAnswer = y_test_merged[i][0] if predicted_st[i][1] - thresh > .5:
predicted = 0 #Buy
elif predicted_st[i][0] - thresh > .5:
predicted = 1 #Sell
(省略)
plt.plot(totalAcc)
plt.show()
plt.plot(positiveAcc)
plt.show()
plt.plot(negativeAcc)
plt.show()
通过测试表明,每日价格预测,外汇有更好的表现,比传统股票。因为他有更少的噪音。
使用tensorflow进行股票价格预测的简单深度学习模型(代码片段)
使用TensorFlow进行股票价格预测的简单深度学习模型(翻译)原文链接:https://medium.com/mlreview/a-simple-deep-learning-model-for-stock-price-prediction-using-tensorflow-30505541d877深度学习在金融市场的应用越来越广泛,这篇来 查看详情
在 R 中使用 SVM 进行股票预测 + 新闻情绪?
】在R中使用SVM进行股票预测+新闻情绪?【英文标题】:Stockprediction+newssentimentwithSVMinR?【发布时间】:2019-11-1603:18:18【问题描述】:我想在R中与SVM一起预测股票价格和新闻情绪得分,以查看新闻是否对股票价格及其预测产生影... 查看详情
####好好#####利用各种信息作为因子的股票价格预测模型研究过程
完整架构概述在这篇文章中,我将创建一个预测股票价格变动的完整过程。我们将使用生成对抗网络(GAN)与LSTM(一种循环神经网络)作为生成器,使用卷积神经网络CNN作为鉴别器。我们使用LSTM的原因很... 查看详情
深度学习与图神经网络核心技术实践应用高级研修班-day2股票预测(stock_prediction)(代码片段)
股票预测1.股票预测背景2.股票数据来源3.股票数据预处理4.同步预测5.同步预测效果6.异步预测7.完整展示1.股票预测背景股票价格是典型的时间序列数据(简称时序数据),会受到经济环境、政府政策、人为操作多种... 查看详情
####好好#####利用各种信息作为因子的股票价格预测模型研究过程
完整架构概述在这篇文章中,我将创建一个预测股票价格变动的完整过程。我们将使用生成对抗网络(GAN)与LSTM(一种循环神经网络)作为生成器,使用卷积神经网络CNN作为鉴别器。我们使用LSTM的原因很... 查看详情
openjudge8464序列dp股票买卖
股票买卖总时间限制: 1000ms 内存限制: 65536kB【描述】最近越来越多的人都投身股市,阿福也有点心动了。谨记着“股市有风险,入市需谨慎”,阿福决定先来研究一下简化版的股票买卖问题。假设阿福已经准确预测出... 查看详情
深度学习lstm预测股票价格(代码片段)
入行深度学习1年多了,该还的还得还,没接触过LSTM的预测项目,这就来活了。文章目录前言1.开工1.1引入必须的库1.2数据初探1.3划分数据集1.4数据归一化1.5数据分组1.6搭建模型1.7训练1.8测试集总结前言LSTM是一个处理... 查看详情
随机漫步与有效市场假说
...比市场中其他人更早地发现了相关信息并因此买入或卖出股票的必然结果。有效市场假说(efficientmarkethypothesis,EMH)概述如果定价是理性的,那么只有新信息能引起价格的变更。因此随机漫步是反映当前信息价格的自... 查看详情
使用lstm-rnn建立股票预测模型(代码片段)
...忙,把其中的一部分内容写下来做以记录。 此次股票价格预测模型仅根据股票的历史数据来建立,不考虑消息面对个股的影响。曾有日本学者使用深度学习的方法来对当天的新闻内容进行分析,以判断其对股价正面性/... 查看详情
为股票报价创建实时警报
】为股票报价创建实时警报【英文标题】:Creatingrealtimealertsforstockquotes【发布时间】:2020-08-1514:19:15【问题描述】:我想创建一个实时股票警报应用程序,用户可以在其中订阅具有目标价格的特定股票工具。一旦达到目标价格,... 查看详情
自创深度学习算法,预测股票上涨
...ff0c;投资都是人性的重复,基于此思想,通过学习股票历史上涨的信息,就可以预测以后类似的股票。首先,导入股票连续五年的历史交易信息,包括价格、交易量等,再归一化,使用统计学压缩。通... 查看详情
用ml算法预测股票收益率
我正在研究固定时间段(比如n天)的股票收益预测模型。我希望提前收集一些想法。我的问题是:1)最好将其转化为分类问题,比如创建一个返回值大于x%的虚拟变量?然后我可以尝试ML算法的整个库。2)如果我不把它变成分... 查看详情
在 R 中下载雅虎股票价格
】在R中下载雅虎股票价格【英文标题】:DownloadingYahoostockpricesinR【发布时间】:2011-03-3106:45:02【问题描述】:这是R中的一个新手问题。我正在使用R下载雅虎财经月度股票价格数据,其中代码名称是从文本文件中读取的。我正在... 查看详情
量化交易在金融领域应用场景
...去搜索早期的神经网络、SVM的相关论文,会发现不少是做股票预测的。原因很简单,因为似乎我们可以天然地把股票投资的问题看成一个分类问题或者回归问题。回归的角度,我们可以根据之前的历史数据,预测下一个时间点的... 查看详情
贝叶斯分类对股票走势预测
上一文介绍了K均值对股票K线分类,该文对它做一个应用,通过贝叶斯分类器进行股票走势的预测。好多炒股的大牛都说看K线判断次日的涨跌,按照这个思路,我们将前六根k线做为输入,涨幅超过2个点为1作为输出训练贝叶斯分... 查看详情
Keras LSTM 预测的时间序列被挤压和移位
...在假期中获得一些使用Keras的经验,我想我会从教科书的股票数据时间序列预测示例开始。所以我要做的是给定过去48小时的平均价格变化(自上一小时以来的百分比),预测未来一小时的平均价格变化是多少。但是,在针对测... 查看详情
如何在 Xcode swift 的 UITableview 单元格中显示当前股票价格?
】如何在Xcodeswift的UITableview单元格中显示当前股票价格?【英文标题】:HowcanIdisplaythecurrentstockpriceinaUITableviewcellinXcodeswift?【发布时间】:2021-03-2619:41:03【问题描述】:我创建了一个UITableview并添加了一些显示几只股票(Apple、Tes... 查看详情
买卖股票的最佳时机ii
...funcmain() /* 给定一个数组,它的第 i个元素是一支给定股票第i天的价格。 设计一个算法来计算你所能获取的最大利润。你可以尽可能地完成更多的交易(多次买卖一支股票)。 注意:你不能同时参与多笔交易(你必须在再次... 查看详情