TensorFlow學習(十三):構造LSTM超長簡明教程
版權宣告:本文為博主原創文章,未經博主允許不得轉載。 https://blog.csdn.net/xierhacker/article/details/78772560
參考:
Module: tf.contrib.cudnn_rnn
Module: tf.contrib.rnn
更新:
2017.12.25
增加了tf.nn.embedding_lookup 來進行embedding的內容
2018.1.14
增加tf.sequence_mask和tf.boolean_mask 來對於序列的padding進行去除的內容.
2018.3.13
增加了手動呼叫call 函式實現的LSTM的網路.
LSTM的理論就不多講了,對於理論不是很熟悉的童鞋轉到:
深度學習筆記七:迴圈神經網路RNN(基本理論)
深度學習筆記八:長短時記憶網路LSTM(基本理論)
來複習一下基本的理論.這裡要知道的是,在深度學習的實踐裡面,必須先把理論給弄懂了才方便寫程式碼的,LSTM更加是,所以務必把基礎打好.不然在程式碼中很多地方為什麼那麼寫都不知道.
理論搞定之後,很重要的一點就是實踐上面怎麼使用LSTM了,估計很多人在使用tensorflow寫LSTM的時候走了彎路.花了很多時間才弄清楚一點.不是因為LSTM有多難(在時序和多層次上考慮,其實也還是有點抽象的),而是不知道常見的結構可以怎麼定義怎麼寫出來.對於初學者是很不友好的.
所以,本文先直接給出API文件裡面常用的幾個類和函式,然後寫一些玩具案例,雖然案例是玩具,但是全部消化的話,不說精通,入門絕對是夠了.
一.重要函式和類
這節主要就是說一下tensorflow裡面在LSTM中比較常用的API了,畢竟是磚頭,弄清楚肯定是有益處的.
這裡先列一下
tensorflow.contrib.rnn.BasicLSTMCell
tensorflow.contrib.rnn.MultiRNNCell
tf.nn.dynamic_rnn()
tf.nn.bidirectional_dynamic_rnn()
tf.sequence_mask()
tf.boolean_mask()
既然說到這裡,那這裡還說一個與詞向量有關的常見函式,後面一併講解.
tf.nn.embedding_lookup()
這裡只說最基本的夠用的. 當然還有幾個這裡沒有列出來的,可以在最開始列出來的文件參考,等到以後升到高階,也許會用得到.
Ⅰ.tensorflow.contrib.rnn.BasicLSTMCell
文件:tensorflow.contrib.rnn.BasicLSTMCell
BasicLSTMCell是比較基本的建立LSTM cell的一個類,首先來看一下使用的時候怎麼建立一個物件吧,
建構函式為:
__init__(num_units,forget_bias=1.0,state_is_tuple=True,activation=None,reuse=None)
引數:
num_units: int型別,LSTM cell裡面使用多少個units.其實就可以理解為節點數量.
forget_bias: float型別, 遺忘門加上一個bias. 為了減少在訓練早期的遺忘尺度.
state_is_tuple: 要是為True 的話, 接受和返回的states都是一個tuples,其中成員是和返回的狀態都是一個2元元組,成員分別為 c_state and m_state. 這裡只看這個引數是True情況,因為將來將只有這種方式來使用state.
activation: 內部的啟用函式,預設是tanh
舉個例子,比如你想定義一個內部節點數為128的一個Cell,就可以用下面的語句,
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
cell=rnn.BasicLSTMCell(num_units=128, forget_bias=1.0, state_is_tuple=True)
你會發現這個建構函式裡面居然沒有基本的輸入資訊! 但是不用擔心,關於輸入的一些細節底層都做好了,只要在後面的環節裡面給進去輸入就行了.後面會繼續講到.
這裡還要了解一點,相對於別人把這128叫做隱藏層的節點,其實我更傾向於理解為在Cell中的128個節點,每個節點接受同樣的輸入向量,然後得到一個值,128個節點合起來,輸出的話就是一個128維的向量.
知道怎麼建立之後,這裡提一下這個類比較重要的兩個屬性(當然,這個類不止這兩個屬性).分別是:output_size和state_size.
看名字就能夠猜到,output_size和state_size 分別表示的LSTM的輸出狀態和state資訊的.這裡就舉幾個例子來看一下這兩個屬性到底在各種情況下怎麼表示.
例1:
import tensorflow as tf
import numpy as np
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=128)
print("output_size:",lstm_cell.output_size)
print("state_size:",lstm_cell.state_size)
print(lstm_cell.state_size.h)
print(lstm_cell.state_size.c)
結果:
其實這裡num_units就已經決定了輸出的尺度.128個units決定了output_size就是128維的,這個很簡單.這裡的重點是state的格式,這裡發現他是一個LSTMStateTuple的型別,別管那麼多,直接當做一個tuple看待就行.之所以是一個tuple,是因為state包含了h和c兩方面的內容(這裡需要知道一些LSTM的原理),更加詳細的後面會講到.
然後這個類還有一個很重要的函式,如下
zero_state(batch_size,dtype)
作用:
這個函式主要是用來進行填零的初始化.注意,這裡是初始化一個state,不是初始化整個LSTM.
引數:
batch_size: 批大小
dtype: state使用的型別.
返回一個填零的狀態state
要是state_size is an int or TensorShape, then the return value is a N-D tensor of shape [batch_size x state_size] filled with zeros.
要是state_size 是一個tuple,那麼返回的值是同樣結構的tuple,其中每個元素都是一個2-D的tensor,他們的形狀為[batch_size x s] 其中的s是state_size中各自的s .
__call__(inputs,state,scope=None)
作用:在給定的狀態(state)和輸入上執行RNN cell,這個方法是在一個”時間點”上面執行一次RNN的方法,是比較偏底層的一個函式,對於理解RNN的執行過程非常有幫助,後面將會講到的tf.nn.dynamic_rnn() 等介面就是更加高層的介面,直接把所有的執行過程都得到了.
引數:
inputs: 2-D tensor,形狀為[batch_size x input_size].在實際使用的時候,你會先把資料整理成為[batch_size,time_steps_size,input_size] 的形狀,所以假如當前時刻是i,那麼使用的時候,直接使用[:,i,:] 作為資料傳入就行了.
state: 要是self.state_size 是一個整形 ,那麼這個引數應該當是一個形狀為 [batch_size x self.state_size] 的tensor,否則,要是self.state_size 是一個整數的元組,那麼這個應當是一個形狀為[batch_size x s] 的元組,其中s在self.state_size 中.
scope: 這個建立的子圖的變數域(VariableScope),預設是類名.
返回值:
A pair containing:
Output: 一個形狀為[batch_size x self.output_size] 的2-D tensor
New state:新的state,和之前的state結構一樣.
然後還有一些其他的方法和屬性,這裡不講了,在真的有需求的時候可以參照官方文件.
Ⅱ.tensorflow.contrib.rnn.MultiRNNCell
前面的類可以定義一個一層的LSTM,那麼怎麼定義多層的LSTM類呢? 這個類主要的作用是把前面的單層LSTM結合為多層的LSTM.
首先來看他的建構函式是怎樣的.
__init__( cells,state_is_tuple=True)
引數:
cells:一個列表,裡面是你想疊起來的RNNCells,
state_is_tuple:要是是True 的話, 接受和返回的state都是n-tuple,其中n = len(cells).
然後還有一些其他的函式和屬性都和前面的BasicLSTMCell差不多.但是這裡還是要說一下在這裡,他的兩個屬性output_size和state_size 會變成怎麼樣的形式.下面舉一個例子:
import tensorflow as tf
import numpy as np
lstm_cell_1 = tf.contrib.rnn.BasicLSTMCell(num_units=128)
lstm_cell_2 = tf.contrib.rnn.BasicLSTMCell(num_units=256)
lstm_cell_3 = tf.contrib.rnn.BasicLSTMCell(num_units=512)
#多層lstm_cell
lstm_cell=tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell_1,lstm_cell_2,lstm_cell_3])
print("output_size:",lstm_cell.output_size)
print("state_size:",lstm_cell.state_size)
結果:
這裡首先建立了3層LSTM,然後使用MultiRNNCell 的建構函式把他們堆疊在一起,所以結果中的屬性output_size為512,就是最後那層的units數量了.這個比較簡單,重要的是state_size的樣式,可以看到是一個tuple裡面,然後又有3個LSTMStateTuple物件,其實這裡也可以看出來了,就是每層的LSTMStateTuple屬性放到了一個大的tuple裡面.這裡還是非常重要的.之後各種需要state 的地方可能涉及到state的轉換.要是這裡不清楚,到時候就不好轉換了.
Ⅲ.tf.nn.dynamic_rnn()
這個函式的作用就是通過指定的RNN Cell來展開計算神經網路.
他的建構函式如下:
dynamic_rnn(cell,inputs,sequence_length=None,initial_state=None,dtype=None,parallel_iterations=None,swap_memory=False,time_major=False,scope=None)
對於dynamic_rnn來說每個batch的序列長度都是一樣的(不足的話自己要去padding),這個函式會根據 sequence_length 中止計算.同時dynamic_rnn是動態生成graph的
引數:
cell: RNNCell的物件.
inputs: RNN的輸入,當time_major == False (default) 的時候,必須是形狀為 [batch_size, max_time, ...] 的tensor, 要是 time_major == True 的話, 必須是形狀為 [max_time, batch_size, ...] 的tensor. 前面兩個維度應該在所有的輸入裡面都應該匹配.
sequence_length: 可選,一個int32/int64型別的vector,他的尺寸是[batch_size]. 對於最後結果的正確性,這個還是非常有用的.因為給他具體每一個序列的長度,能夠精確的得到結果,排除了之前為了把所有的序列弄成一樣的長度padding造成的不準確.
initial_state: 可選,RNN的初始狀態. 要是cell.state_size 是一個整形,那麼這個引數必須是一個形狀為 [batch_size, cell.state_size] 的tensor. 要是cell.state_size 是一個tuple, 那麼這個引數必須是一個tuple,其中元素為形狀為[batch_size, s] 的tensor,s為cell.state_size 中的各個相應size.
dtype: 可選,表示輸入的資料型別和期望輸出的資料型別.當初始狀態沒有被提供或者RNN的狀態由多種形式構成的時候需要顯示指定.
parallel_iterations: 預設是32,表示的是並行執行的迭代數量(Default: 32). 有一些沒有任何時間依賴的操作能夠平行計算,實際上就是空間換時間和時間換空間的折中,當value遠大於1的時候,會使用的更多的記憶體但是能夠減少時間,當這個value值很小的時候,會使用小一點的記憶體,但是會花更多的時間.
swap_memory: Transparently swap the tensors produced in forward inference but needed for back prop from GPU to CPU. This allows training RNNs which would typically not fit on a single GPU, with very minimal (or no) performance penalty.
time_major: 規定了輸入和輸出tensor的資料組織格式,如果 true, tensor的形狀需要是[max_time, batch_size, depth]. 若是false, 那麼tensor的形狀為[batch_size, max_time, depth]. 要是使用time_major = True 的話,會更加高效率一點,因為避免了在RNN計算的開始和結束的時候對於矩陣的轉置 ,然而,大多數的tensorflow資料格式都是採用的以batch為主的格式,所以這裡也預設採用以batch為主的格式.
scope: 子圖的scope名稱,預設是”rnn”
返回:
重點內容返回(outputs, state)形式的結果對,其中
outputs: 表示RNN的輸出tensor,要是time_major == False (default),那麼這個tensor的形狀為[batch_size, max_time, cell.output_size],要是time_major == True, 這個Tensor的形狀為[max_time, batch_size, cell.output_size]. 這裡有個地方需要注意的就是,要是cell.output_size 是一個tuple的話,那麼outputs 也會是一個和 cell.output_size 相同構造的tuple,其中包含和cell.output_size 中對應形狀相同的tensors.
state: 最終state,要是cell.state_size 是一個int型別,那麼這個state是一個形狀為[batch_size, cell.state_size].的tensor,要是為一個tuple,那麼這個state也會是一個tuple並且有相應的形狀. If cells are LSTMCells state will be a tuple containing a LSTMStateTuple for each cell.
例1:單層lstm
import tensorflow as tf
import numpy as np
inputs = tf.placeholder(np.float32, shape=(32,40,5)) # 32 是 batch_size
lstm_cell_1 = tf.contrib.rnn.BasicLSTMCell(num_units=128)
#lstm_cell_2 = tf.contrib.rnn.BasicLSTMCell(num_units=256)
#lstm_cell_3 = tf.contrib.rnn.BasicLSTMCell(num_units=512)
#多層lstm_cell
#lstm_cell=tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell_1,lstm_cell_2,lstm_cell_3])
print("output_size:",lstm_cell_1.output_size)
print("state_size:",lstm_cell_1.state_size)
#print(lstm_cell.state_size.h)
#print(lstm_cell.state_size.c)
output,state=tf.nn.dynamic_rnn(
cell=lstm_cell_1,
inputs=inputs,
dtype=tf.float32
)
print("output.shape:",output.shape)
print("len of state tuple",len(state))
print("state.h.shape:",state.h.shape)
print("state.c.shape:",state.c.shape)
結果:
例二:多層LSTM
import tensorflow as tf
import numpy as np
inputs = tf.placeholder(np.float32, shape=(32,40,5)) # 32 是 batch_size
lstm_cell_1 = tf.contrib.rnn.BasicLSTMCell(num_units=128)
lstm_cell_2 = tf.contrib.rnn.BasicLSTMCell(num_units=256)
lstm_cell_3 = tf.contrib.rnn.BasicLSTMCell(num_units=512)
#多層lstm_cell
lstm_cell=tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell_1,lstm_cell_2,lstm_cell_3])
print("output_size:",lstm_cell.output_size)
print("state_size:",lstm_cell.state_size)
#print(lstm_cell.state_size.h)
#print(lstm_cell.state_size.c)
output,state=tf.nn.dynamic_rnn(
cell=lstm_cell,
inputs=inputs,
dtype=tf.float32
)
print("output.shape:",output.shape)
print("len of state tuple",len(state))
結果:
多層的state就是一個tuple,而tuple的每一個元素都是每一層的state.
Ⅳ tf.nn.bidirectional_dynamic_rnn
bidirectional_dynamic_rnn(cell_fw,cell_bw,inputs,sequence_length=None,initial_state_fw=None,initial_state_bw=None,dtype=None,parallel_iterations=None,swap_memory=False,time_major=False,scope=None)
引數:
cell_fw:RNNCell的一個例項,用於正向。
cell_bw:RNNCell的一個例項,用於反向。
inputs:RNN輸入。如果time_major == False(預設),則它必須是形狀為 [batch_size, max_time, ...]的tensor,或者這些元素的巢狀元組。如果time_major == True,則它必須是形狀為[max_time, batch_size, ...]的tensor ,或者是這些元素的巢狀元組。
sequence_length:(可選)一個int32 / int64向量,大小[batch_size],包含批處理中每個序列的實際長度。如果未提供,則所有批次條目均假定為完整序列; 並且時間反轉從時間0到max_time每個序列被應用。
initial_state_fw:(可選)前向RNN的初始狀態。這必須是適當型別和形狀的張量[batch_size, cell_fw.state_size]。如果cell_fw.state_size是一個元組,這應該是一個具有形狀的張量的元組[batch_size, s] for s in cell_fw.state_size。
initial_state_bw:(可選)與之相同initial_state_fw,但使用相應的屬性cell_bw。
dtype:(可選)初始狀態和預期輸出的資料型別。如果未提供initial_states或者RNN狀態具有異構dtype,則為必需。
parallel_iterations:(預設:32)。並行執行的迭代次數。那些沒有任何時間依賴性並且可以並行執行的操作將會是。此引數用於空間換算時間。值>> 1使用更多的記憶體,但花費更少的時間,而更小的值使用更少的記憶體,但計算需要更長的時間。
swap_memory:透明地交換前向推理中產生的張量,但是從GPU到後端支援所需的張量。這允許訓練通常不適合單個GPU的RNN,而且效能損失非常小(或不)。
time_major:inputs和outputs張量的形狀格式。如果為True的話,這些都Tensors的形狀為[max_time, batch_size, depth]。如果為False的話,這些Tensors的形狀是[batch_size, max_time, depth]。
scope:建立子圖的VariableScope; 預設為“bidirectional_rnn”
返回:
元組(outputs,output_states) 其中
outputs:包含正向和反向rnn輸出的元組(output_fw,output_bw)。
如果time_major == False(預設值),則output_fw將是一個形狀為[batch_size, max_time, cell_fw.output_size] 的tensor,output_bw將是一個形狀為[batch_size, max_time, cell_bw.output_size]的tensor.
如果time_major == True,則output_fw將為一個形狀為[max_time, batch_size, cell_fw.output_size] 的tensor, output_bw將是一個形狀為[max_time, batch_size, cell_bw.output_size] 的tensor.
output_state,也是一個tuple,內容是(output_state_fw, output_state_bw) 也就是說,前向的state和後向的state放到了一個元組裡面.
這裡舉一個例子:
import tensorflow as tf
import numpy as np
inputs = tf.placeholder(np.float32, shape=(32,40,5)) # 32 是 batch_size
lstm_cell_fw = tf.contrib.rnn.BasicLSTMCell(num_units=128)
lstm_cell_bw = tf.contrib.rnn.BasicLSTMCell(num_units=128)
#多層lstm_cell
#lstm_cell=tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell_1,lstm_cell_2,lstm_cell_3])
print("output_fw_size:",lstm_cell_fw.output_size)
print("state_fw_size:",lstm_cell_fw.state_size)
print("output_bw_size:",lstm_cell_bw.output_size)
print("state_bw_size:",lstm_cell_bw.state_size)
#print(lstm_cell.state_size.h)
#print(lstm_cell.state_size.c)
output,state=tf.nn.bidirectional_dynamic_rnn(
cell_fw=lstm_cell_fw,
cell_bw=lstm_cell_bw,
inputs=inputs,
dtype=tf.float32
)
output_fw=output[0]
output_bw=output[1]
state_fw=state[0]
state_bw=state[1]
print("output_fw.shape:",output_fw.shape)
print("output_bw.shape:",output_bw.shape)
print("len of state tuple",len(state_fw))
print("state_fw:",state_fw)
print("state_bw:",state_bw)
#print("state.h.shape:",state.h.shape)
#print("state.c.shape:",state.c.shape)
#state_concat=tf.concat(values=[state_fw,state_fw],axis=1)
#print(state_concat)
state_h_concat=tf.concat(values=[state_fw.h,state_bw.h],axis=1)
print("state_fw_h_concat.shape",state_h_concat.shape)
state_c_concat=tf.concat(values=[state_fw.c,state_bw.c],axis=1)
print("state_fw_h_concat.shape",state_c_concat.shape)
state_concat=tf.contrib.rnn.LSTMStateTuple(c=state_c_concat,h=state_h_concat)
print(state_concat)
結果:
output_fw_size: 128
state_fw_size: LSTMStateTuple(c=128, h=128)
output_bw_size: 128
state_bw_size: LSTMStateTuple(c=128, h=128)
output_fw.shape: (32, 40, 128)
output_bw.shape: (32, 40, 128)
len of state tuple 2
state_fw: LSTMStateTuple(c=<tf.Tensor 'bidirectional_rnn/fw/fw/while/Exit_2:0' shape=(32, 128) dtype=float32>, h=<tf.Tensor 'bidirectional_rnn/fw/fw/while/Exit_3:0' shape=(32, 128) dtype=float32>)
state_bw: LSTMStateTuple(c=<tf.Tensor 'bidirectional_rnn/bw/bw/while/Exit_2:0' shape=(32, 128) dtype=float32>, h=<tf.Tensor 'bidirectional_rnn/bw/bw/while/Exit_3:0' shape=(32, 128) dtype=float32>)
state_fw_h_concat.shape (32, 256)
state_fw_h_concat.shape (32, 256)
LSTMStateTuple(c=<tf.Tensor 'concat_1:0' shape=(32, 256) dtype=float32>, h=<tf.Tensor 'concat:0' shape=(32, 256) dtype=float32>)
在這個例子裡面,還用到了一個拼接state的例子,可以作為自己初始化state或者拼接state的例子.
Ⅴ.tf.nn.embedding_lookup()
embedding_lookup(params,ids,partition_strategy=’mod’,name=None,max_norm=None)
這個函式主要是在任務內進行embeddings的時候使用的一個函式,通過這個函式來把一個字或者詞對映到對應維度的詞向量上面去. 要是設定為可訓練的Variable的話,可以在進行任務的時候同時對於詞向量進行訓練.
params: 表示完整的嵌入張量,或者除了第一維度之外具有相同形狀的P個張量的列表,表示經分割的嵌入張量。
ids: 一個型別為int32或int64的Tensor,包含要在params中查詢的id
partition_strategy: 指定分割槽策略的字串,如果len(params)> 1,則相關。當前支援“div”和“mod”。 預設為“mod”
name: 操作名稱(可選)
max_norm: 如果不是None,嵌入值將被l2歸一化為max_norm的值
Ⅵ. tf.sequence_mask()
sequence_mask(lengths,maxlen=None,dtype=tf.bool,name=None)
作用:返回一個mask tensor表示每個序列的前N個位置.
If lengths has shape [d_1, d_2, …, d_n] the resulting tensor mask has dtype dtype and shape [d_1, d_2, …, d_n, maxlen], with
mask[i_1, i_2, …, i_n, j] = (j < lengths[i_1, i_2, …, i_n])
引數:
lengths: 整形的tensor, 他的所有的值都要小於或等於maxlen.
maxlen: scalar integer tensor, size of last dimension of returned tensor. Default is the maximum value in lengths.
dtype: output type of the resulting tensor.
name: op名稱
返回:
A mask tensor of shape lengths.shape + (maxlen,), cast to specified dtype.
例子:
tf.sequence_mask([1, 3, 2], 5) # [[True, False, False, False, False],
# [True, True, True, False, False],
# [True, True, False, False, False]]
tf.sequence_mask([[1, 3],[2,0]]) # [[[True, False, False],
# [True, True, True]],
# [[True, True, False],
# [False, False, False]]]
Ⅶ.tf.boolean_mask()
boolean_mask(tensor,mask,name=’boolean_mask’)
把boolean型別的mask值應用到tensor上面,可以和numpy裡面的tensor[mask] 類比.
引數:
tensor: N-D tensor.
mask: K-D boolean tensor, K <= N同時K必須是已知的
name: 可選,操作名
返回:
一個(N-K+1)維tensor.相應的值對應mask tensor中的True.
上面這些API是對於要使用的東西就基本的瞭解.接下來就開始講例子了.
二.例項
如開頭所說,接下來講的幾個例子都是一些玩具示例,但是對於新手是絕對友好的,這些簡單例子涵蓋了進行LSTM程式設計需要的一些基本思想和手段,通過消化這些簡單例子可以快速上手,構建出後面適合自己的更加複雜的網路結構.
接下來從最基本的例子一個一個來講,每個例子都可以直接作為指令碼直接跑起來.
Ⅰ.預測sin函式
程式碼:
import numpy as np
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt
TIME_STEPS=10
BATCH_SIZE=128
HIDDEN_UNITS=1
LEARNING_RATE=0.001
EPOCH=150
TRAIN_EXAMPLES=11000
TEST_EXAMPLES=1100
#------------------------------------Generate Data-----------------------------------------------#
#generate data
def generate(seq):
X=[]
y=[]
for i in range(len(seq)-TIME_STEPS):
X.append([seq[i:i+TIME_STEPS]])
y.append([seq[i+TIME_STEPS]])
return np.array(X,dtype=np.float32),np.array(y,dtype=np.float32)
#s=[i for i in range(30)]
#X,y=generate(s)
#print(X)
#print(y)
seq_train=np.sin(np.linspace(start=0,stop=100,num=TRAIN_EXAMPLES,dtype=np.float32))
seq_test=np.sin(np.linspace(start=100,stop=110,num=TEST_EXAMPLES,dtype=np.float32))
#plt.plot(np.linspace(start=0,stop=100,num=10000,dtype=np.float32),seq_train)
#plt.plot(np.linspace(start=100,stop=110,num=1000,dtype=np.float32),seq_test)
#plt.show()
X_train,y_train=generate(seq_train)
#print(X_train.shape,y_train.shape)
X_test,y_test=generate(seq_test)
#reshape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,TIME_STEPS,1))
X_test=np.reshape(X_test,newshape=(-1,TIME_STEPS,1))
#draw y_test
plt.plot(range(1000),y_test[:1000,0],"r*")
#print(X_train.shape)
#print(X_test.shape)
#-----------------------------------------------------------------------------------------------------#
#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():
#------------------------------------construct LSTM------------------------------------------#
#place hoder
X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,1),name="input_placeholder")
y_p=tf.placeholder(dtype=tf.float32,shape=(None,1),name="pred_placeholder")
#lstm instance
lstm_cell=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
#initialize to zero
init_state=lstm_cell.zero_state(batch_size=BATCH_SIZE,dtype=tf.float32)
#dynamic rnn
outputs,states=tf.nn.dynamic_rnn(cell=lstm_cell,inputs=X_p,initial_state=init_state,dtype=tf.float32)
#print(outputs.shape)
h=outputs[:,-1,:]
#print(h.shape)
#--------------------------------------------------------------------------------------------#
#---------------------------------define loss and optimizer----------------------------------#
mse=tf.losses.mean_squared_error(labels=y_p,predictions=h)
#print(loss.shape)
optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=mse)
init=tf.global_variables_initializer()
#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
sess.run(init)
for epoch in range(1,EPOCH+1):
results = np.zeros(shape=(TEST_EXAMPLES, 1))
train_losses=[]
test_losses=[]
print("epoch:",epoch)
for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
_,train_loss=sess.run(
fetches=(optimizer,mse),
feed_dict={
X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
train_losses.append(train_loss)
print("average training loss:", sum(train_losses) / len(train_losses))
for j in range(TEST_EXAMPLES//BATCH_SIZE):
result,test_loss=sess.run(
fetches=(h,mse),
feed_dict={
X_p:X_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
results[j*BATCH_SIZE:(j+1)*BATCH_SIZE]=result
test_losses.append(test_loss)
print("average test loss:", sum(test_losses) / len(test_losses))
plt.plot(range(1000),results[:1000,0])
plt.show()
結果:
圖中紅色粗線是真實值,可以看到,在迭代150個epoch之後,我們的結果越來越接近真實值了.
Ⅱ.預測sin函式多層版
程式碼:
import numpy as np
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
TIME_STEPS=10
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=1
LEARNING_RATE=0.001
EPOCH=50
TRAIN_EXAMPLES=11000
TEST_EXAMPLES=1100
#------------------------------------Generate Data-----------------------------------------------#
#generate data
def generate(seq):
X=[]
y=[]
for i in range(len(seq)-TIME_STEPS):
X.append([seq[i:i+TIME_STEPS]])
y.append([seq[i+TIME_STEPS]])
return np.array(X,dtype=np.float32),np.array(y,dtype=np.float32)
#s=[i for i in range(30)]
#X,y=generate(s)
#print(X)
#print(y)
seq_train=np.sin(np.linspace(start=0,stop=100,num=TRAIN_EXAMPLES,dtype=np.float32))
seq_test=np.sin(np.linspace(start=100,stop=110,num=TEST_EXAMPLES,dtype=np.float32))
#plt.plot(np.linspace(start=0,stop=100,num=10000,dtype=np.float32),seq_train)
#plt.plot(np.linspace(start=100,stop=110,num=1000,dtype=np.float32),seq_test)
#plt.show()
X_train,y_train=generate(seq_train)
#print(X_train.shape,y_train.shape)
X_test,y_test=generate(seq_test)
#reshape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,TIME_STEPS,1))
X_test=np.reshape(X_test,newshape=(-1,TIME_STEPS,1))
#draw y_test
plt.plot(range(1000),y_test[:1000,0],"r*")
#print(X_train.shape)
#print(X_test.shape)
#-----------------------------------------------------------------------------------------------------#
#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():
#------------------------------------construct LSTM------------------------------------------#
#place hoder
X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,1),name="input_placeholder")
y_p=tf.placeholder(dtype=tf.float32,shape=(None,1),name="pred_placeholder")
#lstm instance
lstm_cell1=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)
lstm_cell=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
multi_lstm=rnn.MultiRNNCell(cells=[lstm_cell1,lstm_cell])
#initialize to zero
init_state=multi_lstm.zero_state(batch_size=BATCH_SIZE,dtype=tf.float32)
#dynamic rnn
outputs,states=tf.nn.dynamic_rnn(cell=multi_lstm,inputs=X_p,initial_state=init_state,dtype=tf.float32)
#print(outputs.shape)
h=outputs[:,-1,:]
#print(h.shape)
#--------------------------------------------------------------------------------------------#
#---------------------------------define loss and optimizer----------------------------------#
mse=tf.losses.mean_squared_error(labels=y_p,predictions=h)
#print(loss.shape)
optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=mse)
init=tf.global_variables_initializer()
#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
sess.run(init)
for epoch in range(1,EPOCH+1):
results = np.zeros(shape=(TEST_EXAMPLES, 1))
train_losses=[]
test_losses=[]
print("epoch:",epoch)
for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
_,train_loss=sess.run(
fetches=(optimizer,mse),
feed_dict={
X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
train_losses.append(train_loss)
print("average training loss:", sum(train_losses) / len(train_losses))
for j in range(TEST_EXAMPLES//BATCH_SIZE):
result,test_loss=sess.run(
fetches=(h,mse),
feed_dict={
X_p:X_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
results[j*BATCH_SIZE:(j+1)*BATCH_SIZE]=result
test_losses.append(test_loss)
print("average test loss:", sum(test_losses) / len(test_losses))
plt.plot(range(1000),results[:1000,0])
plt.show()
結果:
在這裡,我們發現僅僅是50個epoch之後,得到的效果就要明顯好於前面第一個的結果.
預測sin函式手寫版
import numpy as np
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt
TIME_STEPS=10
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=1
LEARNING_RATE=0.001
EPOCH=50
TRAIN_EXAMPLES=11000
TEST_EXAMPLES=1100
#------------------------------------Generate Data-----------------------------------------------#
#generate data
def generate(seq):
X=[]
y=[]
for i in range(len(seq)-TIME_STEPS):
X.append([seq[i:i+TIME_STEPS]])
y.append([seq[i+TIME_STEPS]])
return np.array(X,dtype=np.float32),np.array(y,dtype=np.float32)
#s=[i for i in range(30)]
#X,y=generate(s)
#print(X)
#print(y)
seq_train=np.sin(np.linspace(start=0,stop=100,num=TRAIN_EXAMPLES,dtype=np.float32))
seq_test=np.sin(np.linspace(start=100,stop=110,num=TEST_EXAMPLES,dtype=np.float32))
#plt.plot(np.linspace(start=0,stop=100,num=10000,dtype=np.float32),seq_train)
#plt.plot(np.linspace(start=100,stop=110,num=1000,dtype=np.float32),seq_test)
#plt.show()
X_train,y_train=generate(seq_train)
#print(X_train.shape,y_train.shape)
X_test,y_test=generate(seq_test)
#reshape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,TIME_STEPS,1))
X_test=np.reshape(X_test,newshape=(-1,TIME_STEPS,1))
#draw y_test
plt.plot(range(1000),y_test[:1000,0],"r*")
#print(X_train.shape)
#print(X_test.shape)
#-----------------------------------------------------------------------------------------------------#
#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():
#------------------------------------construct LSTM------------------------------------------#
#place hoder
X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,1),name="input_placeholder")
y_p=tf.placeholder(dtype=tf.float32,shape=(None,1),name="pred_placeholder")
#lstm instance
lstm_cell1=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)
lstm_cell=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
multi_lstm=rnn.MultiRNNCell(cells=[lstm_cell1,lstm_cell])
#自己初始化state
#第一層state
lstm_layer1_c=tf.zeros(shape=(BATCH_SIZE,HIDDEN_UNITS1))
lstm_layer1_h=tf.zeros(shape=(BATCH_SIZE,HIDDEN_UNITS1))
layer1_state=rnn.LSTMStateTuple(c=lstm_layer1_c,h=lstm_layer1_h)
#第二層state
lstm_layer2_c = tf.zeros(shape=(BATCH_SIZE, HIDDEN_UNITS))
lstm_layer2_h = tf.zeros(shape=(BATCH_SIZE, HIDDEN_UNITS))
layer2_state = rnn.LSTMStateTuple(c=lstm_layer2_c, h=lstm_layer2_h)
init_state=(layer1_state,layer2_state)
print(init_state)
#自己展開RNN計算
outputs = list() #用來接收儲存每步的結果
state = init_state
with tf.variable_scope('RNN'):
for timestep in range(TIME_STEPS):
if timestep > 0:
tf.get_variable_scope().reuse_variables()
# 這裡的state儲存了每一層 LSTM 的狀態
(cell_output, state) = multi_lstm(X_p[:, timestep, :], state)
outputs.append(cell_output)
h = outputs[-1]
#---------------------------------define loss and optimizer----------------------------------#
mse=tf.losses.mean_squared_error(labels=y_p,predictions=h)
#print(loss.shape)
optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=mse)
init=tf.global_variables_initializer()
#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
sess.run(init)
for epoch in range(1,EPOCH+1):
results = np.zeros(shape=(TEST_EXAMPLES, 1))
train_losses=[]
test_losses=[]
print("epoch:",epoch)
for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
_,train_loss=sess.run(
fetches=(optimizer,mse),
feed_dict={
X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
train_losses.append(train_loss)
print("average training loss:", sum(train_losses) / len(train_losses))
for j in range(TEST_EXAMPLES//BATCH_SIZE):
result,test_loss=sess.run(
fetches=(h,mse),
feed_dict={
X_p:X_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
results[j*BATCH_SIZE:(j+1)*BATCH_SIZE]=result
test_losses.append(test_loss)
print("average test loss:", sum(test_losses) / len(test_losses))
plt.plot(range(1000),results[:1000,0])
plt.show()
這個例子和上面的Ⅱ是一模一樣的,唯一的區別就是使用了自定義的初始狀態,從這個例子可以看一下怎麼自定義一個狀態. 然後就是之前自動展開的,
這裡變成了手動展開計算,這裡的計算過程
#自己展開RNN計算
outputs = list() #用來接收儲存每步的結果
state = init_state
with tf.variable_scope('RNN'):
for timestep in range(TIME_STEPS):
if timestep > 0:
tf.get_variable_scope().reuse_variables()
# 這裡的state儲存了每一層 LSTM 的狀態
(cell_output, state) = multi_lstm(X_p[:, timestep, :], state)
outputs.append(cell_output)
h = outputs[-1]
很有啟發意義,有時候需要自己掌控每一步的結果的時候,可以使用這個來展開計算.
Ⅲ.MNIST影象分類
LSTM也可以做影象分類,在這裡,思想還是非常簡單的,MNIST的影象可以表示為28x28 的形式,
程式碼:
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt
TIME_STEPS=28
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=10
LEARNING_RATE=0.001
EPOCH=50
TRAIN_EXAMPLES=42000
TEST_EXAMPLES=28000
#------------------------------------Generate Data-----------------------------------------------#
#generate data
train_frame = pd.read_csv("../Mnist/train.csv")
test_frame = pd.read_csv("../Mnist/test.csv")
# pop the labels and one-hot coding
train_labels_frame = train_frame.pop("label")
# get values
# one-hot on labels
X_train = train_frame.astype(np.float32).values
y_train=pd.get_dummies(data=train_labels_frame).values
X_test = test_frame.astype(np.float32).values
#trans the shape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,28,28))
X_test=np.reshape(X_test,newshape=(-1,28,28))
#print(X_train.shape)
#print(y_dummy.shape)
#print(X_test.shape)
#-----------------------------------------------------------------------------------------------------#
#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():
#------------------------------------construct LSTM------------------------------------------#
#place hoder
X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,28),name="input_placeholder")
y_p=tf.placeholder(dtype=tf.float32,shape=(None,10),name="pred_placeholder")
#lstm instance
lstm_cell1=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)
lstm_cell=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
multi_lstm=rnn.MultiRNNCell(cells=[lstm_cell1,lstm_cell])
#initialize to zero
init_state=multi_lstm.zero_state(batch_size=BATCH_SIZE,dtype=tf.float32)
#dynamic rnn
outputs,states=tf.nn.dynamic_rnn(cell=multi_lstm,inputs=X_p,initial_state=init_state,dtype=tf.float32)
#print(outputs.shape)
h=outputs[:,-1,:]
#print(h.shape)
#--------------------------------------------------------------------------------------------#
#---------------------------------define loss and optimizer----------------------------------#
cross_loss=tf.losses.softmax_cross_entropy(onehot_labels=y_p,logits=h)
#print(loss.shape)
correct_prediction = tf.equal(tf.argmax(h, 1), tf.argmax(y_p, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=cross_loss)
init=tf.global_variables_initializer()
#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
sess.run(init)
for epoch in range(1,EPOCH+1):
#results = np.zeros(shape=(TEST_EXAMPLES, 10))
train_losses=[]
accus=[]
#test_losses=[]
print("epoch:",epoch)
for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
_,train_loss,accu=sess.run(
fetches=(optimizer,cross_loss,accuracy),
feed_dict={
X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
train_losses.append(train_loss)
accus.append(accu)
print("average training loss:", sum(train_losses) / len(train_losses))
print("accuracy:",sum(accus)/len(accus))
結果:
Ⅳ.雙向LSTM做影象分類
程式碼:
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt
TIME_STEPS=28
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=10
LEARNING_RATE=0.001
EPOCH=50
TRAIN_EXAMPLES=42000
TEST_EXAMPLES=28000
#------------------------------------Generate Data-----------------------------------------------#
#generate data
train_frame = pd.read_csv("../Mnist/train.csv")
test_frame = pd.read_csv("../Mnist/test.csv")
# pop the labels and one-hot coding
train_labels_frame = train_frame.pop("label")
# get values
# one-hot on labels
X_train = train_frame.astype(np.float32).values
y_train=pd.get_dummies(data=train_labels_frame).values
X_test = test_frame.astype(np.float32).values
#trans the shape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,28,28))
X_test=np.reshape(X_test,newshape=(-1,28,28))
#print(X_train.shape)
#print(y_dummy.shape)
#print(X_test.shape)
#-----------------------------------------------------------------------------------------------------#
#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():
#------------------------------------construct LSTM------------------------------------------#
#place hoder
X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,28),name="input_placeholder")
y_p=tf.placeholder(dtype=tf.float32,shape=(None,10),name="pred_placeholder")
#lstm instance
lstm_forward=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
lstm_backward=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
outputs,states=tf.nn.bidirectional_dynamic_rnn(
cell_fw=lstm_forward,
cell_bw=lstm_backward,
inputs=X_p,
dtype=tf.float32
)
outputs_fw=outputs[0]
outputs_bw = outputs[1]
h=outputs_fw[:,-1,:]+outputs_bw[:,-1,:]
# print(h.shape)
#---------------------------------------;-----------------------------------------------------#
#---------------------------------define loss and optimizer----------------------------------#
cross_loss=tf.losses.softmax_cross_entropy(onehot_labels=y_p,logits=h)
#print(loss.shape)
correct_prediction = tf.equal(tf.argmax(h, 1), tf.argmax(y_p, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=cross_loss)
init=tf.global_variables_initializer()
#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
sess.run(init)
for epoch in range(1,EPOCH+1):
#results = np.zeros(shape=(TEST_EXAMPLES, 10))
train_losses=[]
accus=[]
#test_losses=[]
print("epoch:",epoch)
for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
_,train_loss,accu=sess.run(
fetches=(optimizer,cross_loss,accuracy),
feed_dict={
X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
train_losses.append(train_loss)
accus.append(accu)
print("average training loss:", sum(train_losses) / len(train_losses))
print("accuracy:",sum(accus)/len(accus))
這個例子的結果為:
會發現在後面不管怎麼學都學不到東西了.這是因為上面我們只使用了一層雙向網路.接下來僅僅需要小小的改動,把上面這個網路改為深層的雙向LSTM.
Ⅴ.深層雙向LSTM做影象分類
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt
TIME_STEPS=28
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=10
LEARNING_RATE=0.001
EPOCH=50
TRAIN_EXAMPLES=42000
TEST_EXAMPLES=28000
#------------------------------------Generate Data-----------------------------------------------#
#generate data
train_frame = pd.read_csv("../Mnist/train.csv")
test_frame = pd.read_csv("../Mnist/test.csv")
# pop the labels and one-hot coding
train_labels_frame = train_frame.pop("label")
# get values
# one-hot on labels
X_train = train_frame.astype(np.float32).values
y_train=pd.get_dummies(data=train_labels_frame).values
X_test = test_frame.astype(np.float32).values
#trans the shape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,28,28))
X_test=np.reshape(X_test,newshape=(-1,28,28))
#print(X_train.shape)
#print(y_dummy.shape)
#print(X_test.shape)
#-----------------------------------------------------------------------------------------------------#
#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():
#------------------------------------construct LSTM------------------------------------------#
#place hoder
X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,28),name="input_placeholder")
y_p=tf.placeholder(dtype=tf.float32,shape=(None,10),name="pred_placeholder")
#lstm instance
lstm_forward_1=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)
lstm_forward_2=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
lstm_forward=rnn.MultiRNNCell(cells=[lstm_forward_1,lstm_forward_2])
lstm_backward_1 = rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)
lstm_backward_2 = rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
lstm_backward=rnn.MultiRNNCell(cells=[lstm_backward_1,lstm_backward_2])
outputs,states=tf.nn.bidirectional_dynamic_rnn(
cell_fw=lstm_forward,
cell_bw=lstm_backward,
inputs=X_p,
dtype=tf.float32
)
outputs_fw=outputs[0]
outputs_bw = outputs[1]
h=outputs_fw[:,-1,:]+outputs_bw[:,-1,:]
# print(h.shape)
#---------------------------------------;-----------------------------------------------------#
#---------------------------------define loss and optimizer----------------------------------#
cross_loss=tf.losses.softmax_cross_entropy(onehot_labels=y_p,logits=h)
#print(loss.shape)
correct_prediction = tf.equal(tf.argmax(h, 1), tf.argmax(y_p, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=cross_loss)
init=tf.global_variables_initializer()
#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
sess.run(init)
for epoch in range(1,EPOCH+1):
#results = np.zeros(shape=(TEST_EXAMPLES, 10))
train_losses=[]
accus=[]
#test_losses=[]
print("epoch:",epoch)
for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
_,train_loss,accu=sess.run(
fetches=(optimizer,cross_loss,accuracy),
feed_dict={
X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
train_losses.append(train_loss)
accus.append(accu)
print("average training loss:", sum(train_losses) / len(train_losses))
print("accuracy:",sum(accus)/len(accus))
相比起上面單層的bilstm,這裡才35輪就已經到了95%了,說明在資訊抽象的能力上面,多層的架構要好於單層的架構.
---------------------
作者:謝小小XH
來源:CSDN
原文:https://blog.csdn.net/xierhacker/article/details/78772560
版權宣告:本文為博主原創文章,轉載請附上博文連結!
相關文章
- 開發Bidirectional LSTM模型的簡單教程 | 博士帶你學LSTM模型
- 深度學習-LSTM深度學習
- HTTP簡明學習HTTP
- Vim簡明學習
- redux簡明學習Redux
- React簡明學習React
- flutter 學習筆記【flutter 構造方法 TextField 屬性說明】Flutter筆記構造方法
- [譯] TensorFlow 教程 #05 - 整合學習
- fetch簡明學習
- Tensorflow 深度學習簡介(自用)深度學習
- [譯] TensorFlow 教程 #08 – 遷移學習遷移學習
- [譯] TensorFlow 教程 #08 - 遷移學習遷移學習
- react-router簡明學習React
- c++11 簡明學習C++
- 簡明 docker 教程Docker
- java學習之深入構造器Java
- 深度學習課程--assign3--LSTM結構的理解深度學習
- Tensorflow 學習
- TensorFlow系列專題(三):深度學習簡介深度學習
- 簡明 Python 教程Python
- 深度學習(三)之LSTM寫詩深度學習
- 【轉】NumPy 教程(超長)
- 構建深度學習和TensorFlow智慧應用深度學習
- GitBook簡明安裝教程Git
- 簡明 MongoDB 入門教程MongoDB
- 簡明Python 教程 --模組Python
- 超有趣!LSTM之父團隊最新力作:將強化學習“顛倒”過來強化學習
- [教程]一份簡單易懂的 TensorFlow 教程
- angular學習筆記(十三)Angular筆記
- matlab練習程式(構造簡單多邊形)Matlab
- 給Java學習者的超全教程整理Java
- 深度學習中tensorflow框架的學習深度學習框架
- tensorflow語法學習
- TensorFlow學習資源
- TensorFlow 學習筆記筆記
- webpack從0到1超詳細超基礎學習教程Web
- 最簡明的Shiro教程
- 《簡明 PHP 教程》00 開篇PHP