TensorFlow學習(十三):構造LSTM超長簡明教程

Candy_GL發表於2018-10-18

版權宣告:本文為博主原創文章,未經博主允許不得轉載。    https://blog.csdn.net/xierhacker/article/details/78772560
參考:

Module: tf.contrib.cudnn_rnn 
Module: tf.contrib.rnn
更新:

2017.12.25 
增加了tf.nn.embedding_lookup 來進行embedding的內容
2018.1.14 
增加tf.sequence_mask和tf.boolean_mask 來對於序列的padding進行去除的內容.
2018.3.13 
增加了手動呼叫call 函式實現的LSTM的網路.
LSTM的理論就不多講了,對於理論不是很熟悉的童鞋轉到: 
深度學習筆記七:迴圈神經網路RNN(基本理論) 
深度學習筆記八:長短時記憶網路LSTM(基本理論) 
來複習一下基本的理論.這裡要知道的是,在深度學習的實踐裡面,必須先把理論給弄懂了才方便寫程式碼的,LSTM更加是,所以務必把基礎打好.不然在程式碼中很多地方為什麼那麼寫都不知道. 
理論搞定之後,很重要的一點就是實踐上面怎麼使用LSTM了,估計很多人在使用tensorflow寫LSTM的時候走了彎路.花了很多時間才弄清楚一點.不是因為LSTM有多難(在時序和多層次上考慮,其實也還是有點抽象的),而是不知道常見的結構可以怎麼定義怎麼寫出來.對於初學者是很不友好的. 
所以,本文先直接給出API文件裡面常用的幾個類和函式,然後寫一些玩具案例,雖然案例是玩具,但是全部消化的話,不說精通,入門絕對是夠了.

一.重要函式和類

這節主要就是說一下tensorflow裡面在LSTM中比較常用的API了,畢竟是磚頭,弄清楚肯定是有益處的.
這裡先列一下

tensorflow.contrib.rnn.BasicLSTMCell
tensorflow.contrib.rnn.MultiRNNCell
tf.nn.dynamic_rnn()
tf.nn.bidirectional_dynamic_rnn()

tf.sequence_mask()

tf.boolean_mask()
既然說到這裡,那這裡還說一個與詞向量有關的常見函式,後面一併講解.

tf.nn.embedding_lookup()
這裡只說最基本的夠用的. 當然還有幾個這裡沒有列出來的,可以在最開始列出來的文件參考,等到以後升到高階,也許會用得到.

Ⅰ.tensorflow.contrib.rnn.BasicLSTMCell

文件:tensorflow.contrib.rnn.BasicLSTMCell

BasicLSTMCell是比較基本的建立LSTM cell的一個類,首先來看一下使用的時候怎麼建立一個物件吧, 
建構函式為: 
__init__(num_units,forget_bias=1.0,state_is_tuple=True,activation=None,reuse=None)

引數:

num_units: int型別,LSTM cell裡面使用多少個units.其實就可以理解為節點數量. 
forget_bias: float型別, 遺忘門加上一個bias. 為了減少在訓練早期的遺忘尺度. 
state_is_tuple: 要是為True 的話, 接受和返回的states都是一個tuples,其中成員是和返回的狀態都是一個2元元組,成員分別為 c_state and m_state. 這裡只看這個引數是True情況,因為將來將只有這種方式來使用state. 
activation: 內部的啟用函式,預設是tanh
舉個例子,比如你想定義一個內部節點數為128的一個Cell,就可以用下面的語句,

import tensorflow as tf
import tensorflow.contrib.rnn as rnn
cell=rnn.BasicLSTMCell(num_units=128, forget_bias=1.0, state_is_tuple=True)


你會發現這個建構函式裡面居然沒有基本的輸入資訊! 但是不用擔心,關於輸入的一些細節底層都做好了,只要在後面的環節裡面給進去輸入就行了.後面會繼續講到. 
這裡還要了解一點,相對於別人把這128叫做隱藏層的節點,其實我更傾向於理解為在Cell中的128個節點,每個節點接受同樣的輸入向量,然後得到一個值,128個節點合起來,輸出的話就是一個128維的向量.

知道怎麼建立之後,這裡提一下這個類比較重要的兩個屬性(當然,這個類不止這兩個屬性).分別是:output_size和state_size. 
看名字就能夠猜到,output_size和state_size 分別表示的LSTM的輸出狀態和state資訊的.這裡就舉幾個例子來看一下這兩個屬性到底在各種情況下怎麼表示.

例1:

import tensorflow as tf
import numpy as np
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=128)
print("output_size:",lstm_cell.output_size)
print("state_size:",lstm_cell.state_size)
print(lstm_cell.state_size.h)
print(lstm_cell.state_size.c)
結果: 


其實這裡num_units就已經決定了輸出的尺度.128個units決定了output_size就是128維的,這個很簡單.這裡的重點是state的格式,這裡發現他是一個LSTMStateTuple的型別,別管那麼多,直接當做一個tuple看待就行.之所以是一個tuple,是因為state包含了h和c兩方面的內容(這裡需要知道一些LSTM的原理),更加詳細的後面會講到.

然後這個類還有一個很重要的函式,如下 
zero_state(batch_size,dtype)

作用: 
這個函式主要是用來進行填零的初始化.注意,這裡是初始化一個state,不是初始化整個LSTM. 
引數: 
batch_size: 批大小 
dtype: state使用的型別.
返回一個填零的狀態state

要是state_size is an int or TensorShape, then the return value is a N-D tensor of shape [batch_size x state_size] filled with zeros. 
要是state_size 是一個tuple,那麼返回的值是同樣結構的tuple,其中每個元素都是一個2-D的tensor,他們的形狀為[batch_size x s] 其中的s是state_size中各自的s .
__call__(inputs,state,scope=None) 
作用:在給定的狀態(state)和輸入上執行RNN cell,這個方法是在一個”時間點”上面執行一次RNN的方法,是比較偏底層的一個函式,對於理解RNN的執行過程非常有幫助,後面將會講到的tf.nn.dynamic_rnn() 等介面就是更加高層的介面,直接把所有的執行過程都得到了. 
引數:

inputs: 2-D tensor,形狀為[batch_size x input_size].在實際使用的時候,你會先把資料整理成為[batch_size,time_steps_size,input_size] 的形狀,所以假如當前時刻是i,那麼使用的時候,直接使用[:,i,:] 作為資料傳入就行了. 
state: 要是self.state_size 是一個整形 ,那麼這個引數應該當是一個形狀為 [batch_size x self.state_size] 的tensor,否則,要是self.state_size 是一個整數的元組,那麼這個應當是一個形狀為[batch_size x s] 的元組,其中s在self.state_size 中. 
scope: 這個建立的子圖的變數域(VariableScope),預設是類名.
返回值: 
A pair containing: 
Output: 一個形狀為[batch_size x self.output_size] 的2-D tensor 
New state:新的state,和之前的state結構一樣.

然後還有一些其他的方法和屬性,這裡不講了,在真的有需求的時候可以參照官方文件.

Ⅱ.tensorflow.contrib.rnn.MultiRNNCell

前面的類可以定義一個一層的LSTM,那麼怎麼定義多層的LSTM類呢? 這個類主要的作用是把前面的單層LSTM結合為多層的LSTM. 
首先來看他的建構函式是怎樣的. 
__init__( cells,state_is_tuple=True)

引數: 
cells:一個列表,裡面是你想疊起來的RNNCells, 
state_is_tuple:要是是True 的話, 接受和返回的state都是n-tuple,其中n = len(cells).

然後還有一些其他的函式和屬性都和前面的BasicLSTMCell差不多.但是這裡還是要說一下在這裡,他的兩個屬性output_size和state_size 會變成怎麼樣的形式.下面舉一個例子:

import tensorflow as tf
import numpy as np
lstm_cell_1 = tf.contrib.rnn.BasicLSTMCell(num_units=128)
lstm_cell_2 = tf.contrib.rnn.BasicLSTMCell(num_units=256)
lstm_cell_3 = tf.contrib.rnn.BasicLSTMCell(num_units=512)
#多層lstm_cell
lstm_cell=tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell_1,lstm_cell_2,lstm_cell_3])

print("output_size:",lstm_cell.output_size)
print("state_size:",lstm_cell.state_size)
結果: 


這裡首先建立了3層LSTM,然後使用MultiRNNCell 的建構函式把他們堆疊在一起,所以結果中的屬性output_size為512,就是最後那層的units數量了.這個比較簡單,重要的是state_size的樣式,可以看到是一個tuple裡面,然後又有3個LSTMStateTuple物件,其實這裡也可以看出來了,就是每層的LSTMStateTuple屬性放到了一個大的tuple裡面.這裡還是非常重要的.之後各種需要state 的地方可能涉及到state的轉換.要是這裡不清楚,到時候就不好轉換了.

Ⅲ.tf.nn.dynamic_rnn()

這個函式的作用就是通過指定的RNN Cell來展開計算神經網路. 
他的建構函式如下: 
dynamic_rnn(cell,inputs,sequence_length=None,initial_state=None,dtype=None,parallel_iterations=None,swap_memory=False,time_major=False,scope=None)

對於dynamic_rnn來說每個batch的序列長度都是一樣的(不足的話自己要去padding),這個函式會根據 sequence_length 中止計算.同時dynamic_rnn是動態生成graph的

引數:

cell: RNNCell的物件. 
inputs: RNN的輸入,當time_major == False (default) 的時候,必須是形狀為 [batch_size, max_time, ...] 的tensor, 要是 time_major == True 的話, 必須是形狀為 [max_time, batch_size, ...] 的tensor. 前面兩個維度應該在所有的輸入裡面都應該匹配. 
sequence_length: 可選,一個int32/int64型別的vector,他的尺寸是[batch_size]. 對於最後結果的正確性,這個還是非常有用的.因為給他具體每一個序列的長度,能夠精確的得到結果,排除了之前為了把所有的序列弄成一樣的長度padding造成的不準確. 
initial_state: 可選,RNN的初始狀態. 要是cell.state_size 是一個整形,那麼這個引數必須是一個形狀為 [batch_size, cell.state_size] 的tensor. 要是cell.state_size 是一個tuple, 那麼這個引數必須是一個tuple,其中元素為形狀為[batch_size, s] 的tensor,s為cell.state_size 中的各個相應size. 
dtype: 可選,表示輸入的資料型別和期望輸出的資料型別.當初始狀態沒有被提供或者RNN的狀態由多種形式構成的時候需要顯示指定. 
parallel_iterations: 預設是32,表示的是並行執行的迭代數量(Default: 32). 有一些沒有任何時間依賴的操作能夠平行計算,實際上就是空間換時間和時間換空間的折中,當value遠大於1的時候,會使用的更多的記憶體但是能夠減少時間,當這個value值很小的時候,會使用小一點的記憶體,但是會花更多的時間. 
swap_memory: Transparently swap the tensors produced in forward inference but needed for back prop from GPU to CPU. This allows training RNNs which would typically not fit on a single GPU, with very minimal (or no) performance penalty. 
time_major: 規定了輸入和輸出tensor的資料組織格式,如果 true, tensor的形狀需要是[max_time, batch_size, depth]. 若是false, 那麼tensor的形狀為[batch_size, max_time, depth]. 要是使用time_major = True 的話,會更加高效率一點,因為避免了在RNN計算的開始和結束的時候對於矩陣的轉置 ,然而,大多數的tensorflow資料格式都是採用的以batch為主的格式,所以這裡也預設採用以batch為主的格式. 
scope: 子圖的scope名稱,預設是”rnn”
返回:

重點內容返回(outputs, state)形式的結果對,其中 
outputs: 表示RNN的輸出tensor,要是time_major == False (default),那麼這個tensor的形狀為[batch_size, max_time, cell.output_size],要是time_major == True, 這個Tensor的形狀為[max_time, batch_size, cell.output_size]. 這裡有個地方需要注意的就是,要是cell.output_size 是一個tuple的話,那麼outputs 也會是一個和 cell.output_size 相同構造的tuple,其中包含和cell.output_size 中對應形狀相同的tensors. 
state: 最終state,要是cell.state_size 是一個int型別,那麼這個state是一個形狀為[batch_size, cell.state_size].的tensor,要是為一個tuple,那麼這個state也會是一個tuple並且有相應的形狀. If cells are LSTMCells state will be a tuple containing a LSTMStateTuple for each cell.
例1:單層lstm

import tensorflow as tf
import numpy as np

inputs = tf.placeholder(np.float32, shape=(32,40,5)) # 32 是 batch_size
lstm_cell_1 = tf.contrib.rnn.BasicLSTMCell(num_units=128)
#lstm_cell_2 = tf.contrib.rnn.BasicLSTMCell(num_units=256)
#lstm_cell_3 = tf.contrib.rnn.BasicLSTMCell(num_units=512)
#多層lstm_cell
#lstm_cell=tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell_1,lstm_cell_2,lstm_cell_3])

print("output_size:",lstm_cell_1.output_size)
print("state_size:",lstm_cell_1.state_size)
#print(lstm_cell.state_size.h)
#print(lstm_cell.state_size.c)
output,state=tf.nn.dynamic_rnn(
    cell=lstm_cell_1,
    inputs=inputs,
    dtype=tf.float32
)

print("output.shape:",output.shape)
print("len of state tuple",len(state))
print("state.h.shape:",state.h.shape)
print("state.c.shape:",state.c.shape)
結果: 


例二:多層LSTM

import tensorflow as tf
import numpy as np

inputs = tf.placeholder(np.float32, shape=(32,40,5)) # 32 是 batch_size
lstm_cell_1 = tf.contrib.rnn.BasicLSTMCell(num_units=128)
lstm_cell_2 = tf.contrib.rnn.BasicLSTMCell(num_units=256)
lstm_cell_3 = tf.contrib.rnn.BasicLSTMCell(num_units=512)
#多層lstm_cell
lstm_cell=tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell_1,lstm_cell_2,lstm_cell_3])

print("output_size:",lstm_cell.output_size)
print("state_size:",lstm_cell.state_size)
#print(lstm_cell.state_size.h)
#print(lstm_cell.state_size.c)
output,state=tf.nn.dynamic_rnn(
    cell=lstm_cell,
    inputs=inputs,
    dtype=tf.float32
)

print("output.shape:",output.shape)
print("len of state tuple",len(state))

結果: 

多層的state就是一個tuple,而tuple的每一個元素都是每一層的state.

Ⅳ tf.nn.bidirectional_dynamic_rnn

bidirectional_dynamic_rnn(cell_fw,cell_bw,inputs,sequence_length=None,initial_state_fw=None,initial_state_bw=None,dtype=None,parallel_iterations=None,swap_memory=False,time_major=False,scope=None)

引數:

cell_fw:RNNCell的一個例項,用於正向。 
cell_bw:RNNCell的一個例項,用於反向。 
inputs:RNN輸入。如果time_major == False(預設),則它必須是形狀為 [batch_size, max_time, ...]的tensor,或者這些元素的巢狀元組。如果time_major == True,則它必須是形狀為[max_time, batch_size, ...]的tensor ,或者是這些元素的巢狀元組。 
sequence_length:(可選)一個int32 / int64向量,大小[batch_size],包含批處理中每個序列的實際長度。如果未提供,則所有批次條目均假定為完整序列; 並且時間反轉從時間0到max_time每個序列被應用。 
initial_state_fw:(可選)前向RNN的初始狀態。這必須是適當型別和形狀的張量[batch_size, cell_fw.state_size]。如果cell_fw.state_size是一個元組,這應該是一個具有形狀的張量的元組[batch_size, s] for s in cell_fw.state_size。 
initial_state_bw:(可選)與之相同initial_state_fw,但使用相應的屬性cell_bw。 
dtype:(可選)初始狀態和預期輸出的資料型別。如果未提供initial_states或者RNN狀態具有異構dtype,則為必需。 
parallel_iterations:(預設:32)。並行執行的迭代次數。那些沒有任何時間依賴性並且可以並行執行的操作將會是。此引數用於空間換算時間。值>> 1使用更多的記憶體,但花費更少的時間,而更小的值使用更少的記憶體,但計算需要更長的時間。 
swap_memory:透明地交換前向推理中產生的張量,但是從GPU到後端支援所需的張量。這允許訓練通常不適合單個GPU的RNN,而且效能損失非常小(或不)。 
time_major:inputs和outputs張量的形狀格式。如果為True的話,這些都Tensors的形狀為[max_time, batch_size, depth]。如果為False的話,這些Tensors的形狀是[batch_size, max_time, depth]。 
scope:建立子圖的VariableScope; 預設為“bidirectional_rnn”
返回:

元組(outputs,output_states) 其中 
outputs:包含正向和反向rnn輸出的元組(output_fw,output_bw)。 
如果time_major == False(預設值),則output_fw將是一個形狀為[batch_size, max_time, cell_fw.output_size] 的tensor,output_bw將是一個形狀為[batch_size, max_time, cell_bw.output_size]的tensor. 
如果time_major == True,則output_fw將為一個形狀為[max_time, batch_size, cell_fw.output_size] 的tensor, output_bw將是一個形狀為[max_time, batch_size, cell_bw.output_size] 的tensor. 
output_state,也是一個tuple,內容是(output_state_fw, output_state_bw) 也就是說,前向的state和後向的state放到了一個元組裡面.
這裡舉一個例子:

import tensorflow as tf
import numpy as np

inputs = tf.placeholder(np.float32, shape=(32,40,5)) # 32 是 batch_size
lstm_cell_fw = tf.contrib.rnn.BasicLSTMCell(num_units=128)
lstm_cell_bw = tf.contrib.rnn.BasicLSTMCell(num_units=128)

#多層lstm_cell
#lstm_cell=tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell_1,lstm_cell_2,lstm_cell_3])

print("output_fw_size:",lstm_cell_fw.output_size)
print("state_fw_size:",lstm_cell_fw.state_size)
print("output_bw_size:",lstm_cell_bw.output_size)
print("state_bw_size:",lstm_cell_bw.state_size)

#print(lstm_cell.state_size.h)
#print(lstm_cell.state_size.c)
output,state=tf.nn.bidirectional_dynamic_rnn(
    cell_fw=lstm_cell_fw,
    cell_bw=lstm_cell_bw,
    inputs=inputs,
    dtype=tf.float32
)
output_fw=output[0]
output_bw=output[1]
state_fw=state[0]
state_bw=state[1]

print("output_fw.shape:",output_fw.shape)
print("output_bw.shape:",output_bw.shape)
print("len of state tuple",len(state_fw))
print("state_fw:",state_fw)
print("state_bw:",state_bw)
#print("state.h.shape:",state.h.shape)
#print("state.c.shape:",state.c.shape)

#state_concat=tf.concat(values=[state_fw,state_fw],axis=1)
#print(state_concat)
state_h_concat=tf.concat(values=[state_fw.h,state_bw.h],axis=1)
print("state_fw_h_concat.shape",state_h_concat.shape)

state_c_concat=tf.concat(values=[state_fw.c,state_bw.c],axis=1)
print("state_fw_h_concat.shape",state_c_concat.shape)

state_concat=tf.contrib.rnn.LSTMStateTuple(c=state_c_concat,h=state_h_concat)
print(state_concat)
結果:

output_fw_size: 128
state_fw_size: LSTMStateTuple(c=128, h=128)
output_bw_size: 128
state_bw_size: LSTMStateTuple(c=128, h=128)
output_fw.shape: (32, 40, 128)
output_bw.shape: (32, 40, 128)
len of state tuple 2
state_fw: LSTMStateTuple(c=<tf.Tensor 'bidirectional_rnn/fw/fw/while/Exit_2:0' shape=(32, 128) dtype=float32>, h=<tf.Tensor 'bidirectional_rnn/fw/fw/while/Exit_3:0' shape=(32, 128) dtype=float32>)
state_bw: LSTMStateTuple(c=<tf.Tensor 'bidirectional_rnn/bw/bw/while/Exit_2:0' shape=(32, 128) dtype=float32>, h=<tf.Tensor 'bidirectional_rnn/bw/bw/while/Exit_3:0' shape=(32, 128) dtype=float32>)
state_fw_h_concat.shape (32, 256)
state_fw_h_concat.shape (32, 256)
LSTMStateTuple(c=<tf.Tensor 'concat_1:0' shape=(32, 256) dtype=float32>, h=<tf.Tensor 'concat:0' shape=(32, 256) dtype=float32>)
在這個例子裡面,還用到了一個拼接state的例子,可以作為自己初始化state或者拼接state的例子.

Ⅴ.tf.nn.embedding_lookup()

embedding_lookup(params,ids,partition_strategy=’mod’,name=None,max_norm=None) 
這個函式主要是在任務內進行embeddings的時候使用的一個函式,通過這個函式來把一個字或者詞對映到對應維度的詞向量上面去. 要是設定為可訓練的Variable的話,可以在進行任務的時候同時對於詞向量進行訓練.

params: 表示完整的嵌入張量,或者除了第一維度之外具有相同形狀的P個張量的列表,表示經分割的嵌入張量。 
ids: 一個型別為int32或int64的Tensor,包含要在params中查詢的id 
partition_strategy: 指定分割槽策略的字串,如果len(params)> 1,則相關。當前支援“div”和“mod”。 預設為“mod” 
name: 操作名稱(可選) 
max_norm: 如果不是None,嵌入值將被l2歸一化為max_norm的值
Ⅵ. tf.sequence_mask()

sequence_mask(lengths,maxlen=None,dtype=tf.bool,name=None) 
作用:返回一個mask tensor表示每個序列的前N個位置. 
If lengths has shape [d_1, d_2, …, d_n] the resulting tensor mask has dtype dtype and shape [d_1, d_2, …, d_n, maxlen], with 
mask[i_1, i_2, …, i_n, j] = (j < lengths[i_1, i_2, …, i_n])

引數: 
lengths: 整形的tensor, 他的所有的值都要小於或等於maxlen. 
maxlen: scalar integer tensor, size of last dimension of returned tensor. Default is the maximum value in lengths. 
dtype: output type of the resulting tensor. 
name: op名稱 
返回: 
A mask tensor of shape lengths.shape + (maxlen,), cast to specified dtype.
例子:

tf.sequence_mask([1, 3, 2], 5)  # [[True, False, False, False, False],
                                #  [True, True, True, False, False],
                                #  [True, True, False, False, False]]

tf.sequence_mask([[1, 3],[2,0]])  # [[[True, False, False],
                                  #   [True, True, True]],
                                  #  [[True, True, False],
                                  #   [False, False, False]]]
Ⅶ.tf.boolean_mask()

boolean_mask(tensor,mask,name=’boolean_mask’) 
把boolean型別的mask值應用到tensor上面,可以和numpy裡面的tensor[mask] 類比.

引數: 
tensor: N-D tensor. 
mask: K-D boolean tensor, K <= N同時K必須是已知的 
name: 可選,操作名 
返回: 
一個(N-K+1)維tensor.相應的值對應mask tensor中的True.
上面這些API是對於要使用的東西就基本的瞭解.接下來就開始講例子了.

二.例項

如開頭所說,接下來講的幾個例子都是一些玩具示例,但是對於新手是絕對友好的,這些簡單例子涵蓋了進行LSTM程式設計需要的一些基本思想和手段,通過消化這些簡單例子可以快速上手,構建出後面適合自己的更加複雜的網路結構. 
接下來從最基本的例子一個一個來講,每個例子都可以直接作為指令碼直接跑起來.

Ⅰ.預測sin函式

程式碼:

import numpy as np
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt


TIME_STEPS=10
BATCH_SIZE=128
HIDDEN_UNITS=1
LEARNING_RATE=0.001
EPOCH=150

TRAIN_EXAMPLES=11000
TEST_EXAMPLES=1100

#------------------------------------Generate Data-----------------------------------------------#
#generate data
def generate(seq):
    X=[]
    y=[]
    for i in range(len(seq)-TIME_STEPS):
        X.append([seq[i:i+TIME_STEPS]])
        y.append([seq[i+TIME_STEPS]])
    return np.array(X,dtype=np.float32),np.array(y,dtype=np.float32)

#s=[i for i in range(30)]
#X,y=generate(s)
#print(X)
#print(y)

seq_train=np.sin(np.linspace(start=0,stop=100,num=TRAIN_EXAMPLES,dtype=np.float32))
seq_test=np.sin(np.linspace(start=100,stop=110,num=TEST_EXAMPLES,dtype=np.float32))

#plt.plot(np.linspace(start=0,stop=100,num=10000,dtype=np.float32),seq_train)

#plt.plot(np.linspace(start=100,stop=110,num=1000,dtype=np.float32),seq_test)
#plt.show()

X_train,y_train=generate(seq_train)
#print(X_train.shape,y_train.shape)
X_test,y_test=generate(seq_test)

#reshape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,TIME_STEPS,1))
X_test=np.reshape(X_test,newshape=(-1,TIME_STEPS,1))

#draw y_test
plt.plot(range(1000),y_test[:1000,0],"r*")
#print(X_train.shape)
#print(X_test.shape)


#-----------------------------------------------------------------------------------------------------#


#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():

    #------------------------------------construct LSTM------------------------------------------#
    #place hoder
    X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,1),name="input_placeholder")
    y_p=tf.placeholder(dtype=tf.float32,shape=(None,1),name="pred_placeholder")

    #lstm instance
    lstm_cell=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)

    #initialize to zero
    init_state=lstm_cell.zero_state(batch_size=BATCH_SIZE,dtype=tf.float32)

    #dynamic rnn
    outputs,states=tf.nn.dynamic_rnn(cell=lstm_cell,inputs=X_p,initial_state=init_state,dtype=tf.float32)
    #print(outputs.shape)
    h=outputs[:,-1,:]
    #print(h.shape)
    #--------------------------------------------------------------------------------------------#

    #---------------------------------define loss and optimizer----------------------------------#
    mse=tf.losses.mean_squared_error(labels=y_p,predictions=h)
    #print(loss.shape)
    optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=mse)


    init=tf.global_variables_initializer()


#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
    sess.run(init)
    for epoch in range(1,EPOCH+1):
        results = np.zeros(shape=(TEST_EXAMPLES, 1))
        train_losses=[]
        test_losses=[]
        print("epoch:",epoch)
        for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
            _,train_loss=sess.run(
                    fetches=(optimizer,mse),
                    feed_dict={
                            X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
                            y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
                        }
            )
            train_losses.append(train_loss)
        print("average training loss:", sum(train_losses) / len(train_losses))


        for j in range(TEST_EXAMPLES//BATCH_SIZE):
            result,test_loss=sess.run(
                    fetches=(h,mse),
                    feed_dict={
                            X_p:X_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
                            y_p:y_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
                        }
            )
            results[j*BATCH_SIZE:(j+1)*BATCH_SIZE]=result
            test_losses.append(test_loss)
        print("average test loss:", sum(test_losses) / len(test_losses))
        plt.plot(range(1000),results[:1000,0])
    plt.show()
結果: 


圖中紅色粗線是真實值,可以看到,在迭代150個epoch之後,我們的結果越來越接近真實值了.

Ⅱ.預測sin函式多層版

程式碼:

import numpy as np
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt


TIME_STEPS=10
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=1
LEARNING_RATE=0.001
EPOCH=50

TRAIN_EXAMPLES=11000
TEST_EXAMPLES=1100

#------------------------------------Generate Data-----------------------------------------------#
#generate data
def generate(seq):
    X=[]
    y=[]
    for i in range(len(seq)-TIME_STEPS):
        X.append([seq[i:i+TIME_STEPS]])
        y.append([seq[i+TIME_STEPS]])
    return np.array(X,dtype=np.float32),np.array(y,dtype=np.float32)

#s=[i for i in range(30)]
#X,y=generate(s)
#print(X)
#print(y)

seq_train=np.sin(np.linspace(start=0,stop=100,num=TRAIN_EXAMPLES,dtype=np.float32))
seq_test=np.sin(np.linspace(start=100,stop=110,num=TEST_EXAMPLES,dtype=np.float32))

#plt.plot(np.linspace(start=0,stop=100,num=10000,dtype=np.float32),seq_train)

#plt.plot(np.linspace(start=100,stop=110,num=1000,dtype=np.float32),seq_test)
#plt.show()

X_train,y_train=generate(seq_train)
#print(X_train.shape,y_train.shape)
X_test,y_test=generate(seq_test)

#reshape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,TIME_STEPS,1))
X_test=np.reshape(X_test,newshape=(-1,TIME_STEPS,1))

#draw y_test
plt.plot(range(1000),y_test[:1000,0],"r*")
#print(X_train.shape)
#print(X_test.shape)


#-----------------------------------------------------------------------------------------------------#


#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():

    #------------------------------------construct LSTM------------------------------------------#
    #place hoder
    X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,1),name="input_placeholder")
    y_p=tf.placeholder(dtype=tf.float32,shape=(None,1),name="pred_placeholder")

    #lstm instance
    lstm_cell1=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)
    lstm_cell=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)

    multi_lstm=rnn.MultiRNNCell(cells=[lstm_cell1,lstm_cell])

    #initialize to zero
    init_state=multi_lstm.zero_state(batch_size=BATCH_SIZE,dtype=tf.float32)

    #dynamic rnn
    outputs,states=tf.nn.dynamic_rnn(cell=multi_lstm,inputs=X_p,initial_state=init_state,dtype=tf.float32)
    #print(outputs.shape)
    h=outputs[:,-1,:]
    #print(h.shape)
    #--------------------------------------------------------------------------------------------#

    #---------------------------------define loss and optimizer----------------------------------#
    mse=tf.losses.mean_squared_error(labels=y_p,predictions=h)
    #print(loss.shape)
    optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=mse)


    init=tf.global_variables_initializer()


#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
    sess.run(init)
    for epoch in range(1,EPOCH+1):
        results = np.zeros(shape=(TEST_EXAMPLES, 1))
        train_losses=[]
        test_losses=[]
        print("epoch:",epoch)
        for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
            _,train_loss=sess.run(
                    fetches=(optimizer,mse),
                    feed_dict={
                            X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
                            y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
                        }
            )
            train_losses.append(train_loss)
        print("average training loss:", sum(train_losses) / len(train_losses))


        for j in range(TEST_EXAMPLES//BATCH_SIZE):
            result,test_loss=sess.run(
                    fetches=(h,mse),
                    feed_dict={
                            X_p:X_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
                            y_p:y_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
                        }
            )
            results[j*BATCH_SIZE:(j+1)*BATCH_SIZE]=result
            test_losses.append(test_loss)
        print("average test loss:", sum(test_losses) / len(test_losses))
        plt.plot(range(1000),results[:1000,0])
    plt.show()
結果: 


在這裡,我們發現僅僅是50個epoch之後,得到的效果就要明顯好於前面第一個的結果.

預測sin函式手寫版

import numpy as np
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt

TIME_STEPS=10
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=1
LEARNING_RATE=0.001
EPOCH=50

TRAIN_EXAMPLES=11000
TEST_EXAMPLES=1100

#------------------------------------Generate Data-----------------------------------------------#
#generate data
def generate(seq):
    X=[]
    y=[]
    for i in range(len(seq)-TIME_STEPS):
        X.append([seq[i:i+TIME_STEPS]])
        y.append([seq[i+TIME_STEPS]])
    return np.array(X,dtype=np.float32),np.array(y,dtype=np.float32)

#s=[i for i in range(30)]
#X,y=generate(s)
#print(X)
#print(y)

seq_train=np.sin(np.linspace(start=0,stop=100,num=TRAIN_EXAMPLES,dtype=np.float32))
seq_test=np.sin(np.linspace(start=100,stop=110,num=TEST_EXAMPLES,dtype=np.float32))

#plt.plot(np.linspace(start=0,stop=100,num=10000,dtype=np.float32),seq_train)

#plt.plot(np.linspace(start=100,stop=110,num=1000,dtype=np.float32),seq_test)
#plt.show()

X_train,y_train=generate(seq_train)
#print(X_train.shape,y_train.shape)
X_test,y_test=generate(seq_test)

#reshape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,TIME_STEPS,1))
X_test=np.reshape(X_test,newshape=(-1,TIME_STEPS,1))

#draw y_test
plt.plot(range(1000),y_test[:1000,0],"r*")
#print(X_train.shape)
#print(X_test.shape)


#-----------------------------------------------------------------------------------------------------#


#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():

    #------------------------------------construct LSTM------------------------------------------#
    #place hoder
    X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,1),name="input_placeholder")
    y_p=tf.placeholder(dtype=tf.float32,shape=(None,1),name="pred_placeholder")

    #lstm instance
    lstm_cell1=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)
    lstm_cell=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)

    multi_lstm=rnn.MultiRNNCell(cells=[lstm_cell1,lstm_cell])


    #自己初始化state
    #第一層state
    lstm_layer1_c=tf.zeros(shape=(BATCH_SIZE,HIDDEN_UNITS1))
    lstm_layer1_h=tf.zeros(shape=(BATCH_SIZE,HIDDEN_UNITS1))
    layer1_state=rnn.LSTMStateTuple(c=lstm_layer1_c,h=lstm_layer1_h)

    #第二層state
    lstm_layer2_c = tf.zeros(shape=(BATCH_SIZE, HIDDEN_UNITS))
    lstm_layer2_h = tf.zeros(shape=(BATCH_SIZE, HIDDEN_UNITS))
    layer2_state = rnn.LSTMStateTuple(c=lstm_layer2_c, h=lstm_layer2_h)

    init_state=(layer1_state,layer2_state)
    print(init_state)


    #自己展開RNN計算
    outputs = list()                                    #用來接收儲存每步的結果
    state = init_state
    with tf.variable_scope('RNN'):
        for timestep in range(TIME_STEPS):
            if timestep > 0:
                tf.get_variable_scope().reuse_variables()
            # 這裡的state儲存了每一層 LSTM 的狀態
            (cell_output, state) = multi_lstm(X_p[:, timestep, :], state)
            outputs.append(cell_output)
    h = outputs[-1]


    #---------------------------------define loss and optimizer----------------------------------#
    mse=tf.losses.mean_squared_error(labels=y_p,predictions=h)
    #print(loss.shape)
    optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=mse)

    init=tf.global_variables_initializer()


#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
    sess.run(init)
    for epoch in range(1,EPOCH+1):
        results = np.zeros(shape=(TEST_EXAMPLES, 1))
        train_losses=[]
        test_losses=[]
        print("epoch:",epoch)
        for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
            _,train_loss=sess.run(
                    fetches=(optimizer,mse),
                    feed_dict={
                            X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
                            y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
                        }
            )
            train_losses.append(train_loss)
        print("average training loss:", sum(train_losses) / len(train_losses))


        for j in range(TEST_EXAMPLES//BATCH_SIZE):
            result,test_loss=sess.run(
                    fetches=(h,mse),
                    feed_dict={
                            X_p:X_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
                            y_p:y_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
                        }
            )
            results[j*BATCH_SIZE:(j+1)*BATCH_SIZE]=result
            test_losses.append(test_loss)
        print("average test loss:", sum(test_losses) / len(test_losses))
        plt.plot(range(1000),results[:1000,0])
    plt.show()
這個例子和上面的Ⅱ是一模一樣的,唯一的區別就是使用了自定義的初始狀態,從這個例子可以看一下怎麼自定義一個狀態. 然後就是之前自動展開的, 
這裡變成了手動展開計算,這裡的計算過程

#自己展開RNN計算
    outputs = list()                                    #用來接收儲存每步的結果
    state = init_state
    with tf.variable_scope('RNN'):
        for timestep in range(TIME_STEPS):
            if timestep > 0:
                tf.get_variable_scope().reuse_variables()
            # 這裡的state儲存了每一層 LSTM 的狀態
            (cell_output, state) = multi_lstm(X_p[:, timestep, :], state)
            outputs.append(cell_output)
    h = outputs[-1]

很有啟發意義,有時候需要自己掌控每一步的結果的時候,可以使用這個來展開計算.

Ⅲ.MNIST影象分類

LSTM也可以做影象分類,在這裡,思想還是非常簡單的,MNIST的影象可以表示為28x28 的形式,

程式碼:

import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt


TIME_STEPS=28
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=10
LEARNING_RATE=0.001
EPOCH=50

TRAIN_EXAMPLES=42000
TEST_EXAMPLES=28000

#------------------------------------Generate Data-----------------------------------------------#
#generate data
train_frame = pd.read_csv("../Mnist/train.csv")
test_frame = pd.read_csv("../Mnist/test.csv")

# pop the labels and one-hot coding
train_labels_frame = train_frame.pop("label")

# get values
# one-hot on labels
X_train = train_frame.astype(np.float32).values
y_train=pd.get_dummies(data=train_labels_frame).values
X_test = test_frame.astype(np.float32).values

#trans the shape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,28,28))
X_test=np.reshape(X_test,newshape=(-1,28,28))
#print(X_train.shape)
#print(y_dummy.shape)
#print(X_test.shape)

#-----------------------------------------------------------------------------------------------------#


#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():

    #------------------------------------construct LSTM------------------------------------------#
    #place hoder
    X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,28),name="input_placeholder")
    y_p=tf.placeholder(dtype=tf.float32,shape=(None,10),name="pred_placeholder")

    #lstm instance
    lstm_cell1=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)
    lstm_cell=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)

    multi_lstm=rnn.MultiRNNCell(cells=[lstm_cell1,lstm_cell])

    #initialize to zero
    init_state=multi_lstm.zero_state(batch_size=BATCH_SIZE,dtype=tf.float32)

    #dynamic rnn
    outputs,states=tf.nn.dynamic_rnn(cell=multi_lstm,inputs=X_p,initial_state=init_state,dtype=tf.float32)
    #print(outputs.shape)
    h=outputs[:,-1,:]
    #print(h.shape)
    #--------------------------------------------------------------------------------------------#

    #---------------------------------define loss and optimizer----------------------------------#
    cross_loss=tf.losses.softmax_cross_entropy(onehot_labels=y_p,logits=h)
    #print(loss.shape)

    correct_prediction = tf.equal(tf.argmax(h, 1), tf.argmax(y_p, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

    optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=cross_loss)

    init=tf.global_variables_initializer()


#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
    sess.run(init)
    for epoch in range(1,EPOCH+1):
        #results = np.zeros(shape=(TEST_EXAMPLES, 10))
        train_losses=[]
        accus=[]
        #test_losses=[]
        print("epoch:",epoch)
        for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
            _,train_loss,accu=sess.run(
                    fetches=(optimizer,cross_loss,accuracy),
                    feed_dict={
                            X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
                            y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
                        }
            )
            train_losses.append(train_loss)
            accus.append(accu)
        print("average training loss:", sum(train_losses) / len(train_losses))
        print("accuracy:",sum(accus)/len(accus))
結果: 


Ⅳ.雙向LSTM做影象分類

程式碼:

import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt


TIME_STEPS=28
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=10
LEARNING_RATE=0.001
EPOCH=50

TRAIN_EXAMPLES=42000
TEST_EXAMPLES=28000

#------------------------------------Generate Data-----------------------------------------------#
#generate data
train_frame = pd.read_csv("../Mnist/train.csv")
test_frame = pd.read_csv("../Mnist/test.csv")

# pop the labels and one-hot coding
train_labels_frame = train_frame.pop("label")

# get values
# one-hot on labels
X_train = train_frame.astype(np.float32).values
y_train=pd.get_dummies(data=train_labels_frame).values
X_test = test_frame.astype(np.float32).values

#trans the shape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,28,28))
X_test=np.reshape(X_test,newshape=(-1,28,28))
#print(X_train.shape)
#print(y_dummy.shape)
#print(X_test.shape)

#-----------------------------------------------------------------------------------------------------#


#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():

    #------------------------------------construct LSTM------------------------------------------#
    #place hoder
    X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,28),name="input_placeholder")
    y_p=tf.placeholder(dtype=tf.float32,shape=(None,10),name="pred_placeholder")

    #lstm instance
    lstm_forward=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
    lstm_backward=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)

    outputs,states=tf.nn.bidirectional_dynamic_rnn(
        cell_fw=lstm_forward,
        cell_bw=lstm_backward,
        inputs=X_p,
        dtype=tf.float32
    )

    outputs_fw=outputs[0]
    outputs_bw = outputs[1]
    h=outputs_fw[:,-1,:]+outputs_bw[:,-1,:]
   # print(h.shape)
    #---------------------------------------;-----------------------------------------------------#

    #---------------------------------define loss and optimizer----------------------------------#
    cross_loss=tf.losses.softmax_cross_entropy(onehot_labels=y_p,logits=h)
    #print(loss.shape)

    correct_prediction = tf.equal(tf.argmax(h, 1), tf.argmax(y_p, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

    optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=cross_loss)

    init=tf.global_variables_initializer()


#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
    sess.run(init)
    for epoch in range(1,EPOCH+1):
        #results = np.zeros(shape=(TEST_EXAMPLES, 10))
        train_losses=[]
        accus=[]
        #test_losses=[]
        print("epoch:",epoch)
        for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
            _,train_loss,accu=sess.run(
                    fetches=(optimizer,cross_loss,accuracy),
                    feed_dict={
                            X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
                            y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
                        }
            )
            train_losses.append(train_loss)
            accus.append(accu)
        print("average training loss:", sum(train_losses) / len(train_losses))
        print("accuracy:",sum(accus)/len(accus))
這個例子的結果為: 


會發現在後面不管怎麼學都學不到東西了.這是因為上面我們只使用了一層雙向網路.接下來僅僅需要小小的改動,把上面這個網路改為深層的雙向LSTM.

Ⅴ.深層雙向LSTM做影象分類

import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt


TIME_STEPS=28
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=10
LEARNING_RATE=0.001
EPOCH=50

TRAIN_EXAMPLES=42000
TEST_EXAMPLES=28000

#------------------------------------Generate Data-----------------------------------------------#
#generate data
train_frame = pd.read_csv("../Mnist/train.csv")
test_frame = pd.read_csv("../Mnist/test.csv")

# pop the labels and one-hot coding
train_labels_frame = train_frame.pop("label")

# get values
# one-hot on labels
X_train = train_frame.astype(np.float32).values
y_train=pd.get_dummies(data=train_labels_frame).values
X_test = test_frame.astype(np.float32).values

#trans the shape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,28,28))
X_test=np.reshape(X_test,newshape=(-1,28,28))
#print(X_train.shape)
#print(y_dummy.shape)
#print(X_test.shape)

#-----------------------------------------------------------------------------------------------------#


#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():

    #------------------------------------construct LSTM------------------------------------------#
    #place hoder
    X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,28),name="input_placeholder")
    y_p=tf.placeholder(dtype=tf.float32,shape=(None,10),name="pred_placeholder")

    #lstm instance
    lstm_forward_1=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)
    lstm_forward_2=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
    lstm_forward=rnn.MultiRNNCell(cells=[lstm_forward_1,lstm_forward_2])

    lstm_backward_1 = rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)
    lstm_backward_2 = rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
    lstm_backward=rnn.MultiRNNCell(cells=[lstm_backward_1,lstm_backward_2])

    outputs,states=tf.nn.bidirectional_dynamic_rnn(
        cell_fw=lstm_forward,
        cell_bw=lstm_backward,
        inputs=X_p,
        dtype=tf.float32
    )

    outputs_fw=outputs[0]
    outputs_bw = outputs[1]
    h=outputs_fw[:,-1,:]+outputs_bw[:,-1,:]
   # print(h.shape)
    #---------------------------------------;-----------------------------------------------------#

    #---------------------------------define loss and optimizer----------------------------------#
    cross_loss=tf.losses.softmax_cross_entropy(onehot_labels=y_p,logits=h)
    #print(loss.shape)

    correct_prediction = tf.equal(tf.argmax(h, 1), tf.argmax(y_p, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

    optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=cross_loss)

    init=tf.global_variables_initializer()


#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
    sess.run(init)
    for epoch in range(1,EPOCH+1):
        #results = np.zeros(shape=(TEST_EXAMPLES, 10))
        train_losses=[]
        accus=[]
        #test_losses=[]
        print("epoch:",epoch)
        for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
            _,train_loss,accu=sess.run(
                    fetches=(optimizer,cross_loss,accuracy),
                    feed_dict={
                            X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
                            y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
                        }
            )
            train_losses.append(train_loss)
            accus.append(accu)
        print("average training loss:", sum(train_losses) / len(train_losses))
        print("accuracy:",sum(accus)/len(accus))


相比起上面單層的bilstm,這裡才35輪就已經到了95%了,說明在資訊抽象的能力上面,多層的架構要好於單層的架構.
--------------------- 
作者:謝小小XH 
來源:CSDN 
原文:https://blog.csdn.net/xierhacker/article/details/78772560 
版權宣告:本文為博主原創文章,轉載請附上博文連結!

相關文章