

LSTM(Long Short Term Memory networks)


因為RNN能吸收前一個神經元的大部分資訊,而對於遠一點的神經元的資訊卻利用的少。這就導致了預測的不準確,比如語言文字的預測,‘我生活在中國,喜歡去旅遊,而且我喜歡說。。。 ’,如果要預測喜歡說的下一個詞語,那麼‘中國’這個詞就很重要,但這個詞離預測的太遠了,導致傳遞資訊的誤差大。這個問題稱為 長期依賴問題。LSTM主要的特點是它可以將先前的網路資訊傳遞至當前神經元,能夠很好地解決這個問題。



這裡把前一個的隱藏層 h t − 1 h_{t-1} ht1和輸入值 x t x_t xt, 加上bias,再通過sigmoid函式得到 f t f_t ft。公式是
f t = σ ( W f ⋅ [ h t − 1 , x t ] + b f ) f_t = \sigma(W_f ·[h_{t-1}, x_t] + b_f ) ft=σ(Wf[ht1,xt]+bf)
這一層的引數 W f , b f W_f,b_f Wf,bf 全部用f做下標 ,以免跟其他層混淆.


在這裡插入圖片描述這一層將會第一層相似,得到相似的 i t i_t it,兩個公式是
i t = σ ( W i ⋅ [ h t − 1 , x t ] + b i ) i_t = \sigma(W_i ·[h_{t-1}, x_t] + b_i) it=σ(Wi[ht1,xt]+bi)
這裡的引數 W i , b i W_i, b_i Wi,bi將會用 i 做下標
C t ^ = t a n h ( W C ⋅ [ h t − 1 , x t ] + b C ) \hat{C_t} = tanh(W_C·[h_{t-1}, x_t] + b_C) Ct^=tanh(WC[ht1,xt]+bC)
這裡的引數 W C , b C W_C,b_C WC,bC將會用 C做下標


這裡是把第一步, 第二步的結果和上一層的cell 做 相乘和相加 的處理。
C t = f t ∗ C t − 1 + i t ∗ C t ^ C_t = f_t * C_{t-1} + i_t * \hat{C_t} Ct=ftCt1+itCt^
這裡就可以更新 C t − 1 C_{t-1} Ct1, 得到 C t C_t Ct, 用於下一層的的計算。
第四步就會更新 h t − 1 h_{t-1} ht1, 得到 h t h_{t} ht, 用於下一層的計算。


o t = σ ( W o ⋅ [ h t − 1 , x t ] + b o ) o_t = \sigma(W_o·[h_{t-1}, x_t] + b_o) ot=σ(Wo[ht1,xt]+bo)
這裡的引數 W o , b o W_o,b_o Wo,bo用的是 0 做下標.
然後結合第三步的 C t C_t Ct計算
h t = o t ∗ t a n h ( C t ) h_t = o_t * tanh(C_t) ht=ottanh(Ct)
終於到最後一步,可以更新 h t h_t ht.

實戰python code

要知道我們最後的目的是更新 C t , h t C_t, h_t Ct,ht,所以其他的引數計算也是為了這個目的。

f t = σ ( W f ⋅ [ h t − 1 , x t ] + b f ) f_t = \sigma(W_f ·[h_{t-1}, x_t] + b_f ) ft=σ(Wf[ht1,xt]+bf)
i t = σ ( W i ⋅ [ h t − 1 , x t ] + b i ) i_t = \sigma(W_i ·[h_{t-1}, x_t] + b_i) it=σ(Wi[ht1,xt]+bi)
C t ^ = t a n h ( W C ⋅ [ h t − 1 , x t ] + b C ) \hat{C_t} = tanh(W_C·[h_{t-1}, x_t] + b_C) Ct^=tanh(WC[ht1,xt]+bC)
C t = f t ∗ C t − 1 + i t ∗ C t ^ C_t = f_t * C_{t-1} + i_t * \hat{C_t} Ct=ftCt1+itCt^
o t = σ ( W o ⋅ [ h t − 1 , x t ] + b o ) o_t = \sigma(W_o·[h_{t-1}, x_t] + b_o) ot=σ(Wo[ht1,xt]+bo)
h t = o t ∗ t a n h ( C t ) h_t = o_t * tanh(C_t) ht=ottanh(Ct)


首先先定義LSTM結構出現的兩個啟用函式 --sigmoid+tanh

def sigmoid(x):
  out = 1/(1+tf.exp(-x))
  return out 
def tanh(x):
  out = (tf.exp(x)-tf.exp(-x))/(tf.exp(x)+tf.exp(-x))
  return out


def LSTM_step(cell_inputs, cell_states, kernel, recurrent_kernel, bias):
    Run one time step of the cell. That is, given the current inputs(x) and the cell states(C_{t-1}) from the last time step, 
    calculate the current state(h_t) and cell output(C_t).
    Hint: In LSTM there exist both matrix multiplication and element-wise multiplication. Try not to mix them.
    -開始我混淆了 matrix multiplication和element-wise 全程只用了matrix multiplication,導致輸出的C_t是一個scale,但其實理應是(1,16)
    :param cell_inputs: The input at the current time step. The last dimension of it should be 1.
    :param cell_states:  The state value of the cell from the last time step, containing previous hidden state h_{t-1} and cell state C_{t-1}.
    :param kernel: The kernel matrix for the multiplication with cell_inputs
    :param recurrent_kernel: The kernel matrix for the multiplication with hidden state h_tml
    :param bias: Common bias value
    :return: current hidden state, and a list of hidden state and cell state
    h_tml = cell_states[0]  #previosu hidden gate h_{t-1}
    c_tml = cell_states[1]  #previous cell gate C_{t-1}
 	#$f_t =(W_f ·[h_{t-1}, x_t] + b_f )$
	#$i_t =(W_i ·[h_{t-1}, x_t] + b_i)$
 	#$\hat{C_t} =(W_C·[h_{t-1}, x_t] + b_C)$
	#$o_t =(W_o·[h_{t-1}, x_t] + b_o)$
	#這四個公式的結合 稱為z 
    z = tf.matmul(cell_inputs, kernel)
    z += tf.matmul(h_tml,recurrent_kernel)
    z += bias
    z0, z1, z2, z3 = tf.split(z,4,axis=1)
    ft = sigmoid(z0)   #在我們的資料裡,ft shape為(1,64)
    it = sigmoid(z1)   #shape 為 (1,64)
    hat_ct = tanh(z2)   #同理shape
    ot = sigmoid(z3)    #同理shape

    #update計算 cell gate - ct 
    ct = ft * c_tml + it * hat_ct   #這裡計算的ct是用點乘 shape為是1,64
	#update計算 hidden gate - ht 
    ht =tanh(ct) * ot               #這裡計算的ht是點乘,element wise ht shape為 也是1,64
    return ht, [ht,ct] 

最後隨機定義資料來check LSTM step

import numpy as np
cell_inputs = np.ones((1,1))
cell_states = [0.2*np.ones((1,64)), np.zeros((1,64))]
kernel = 0.1*np.ones((1,256))
recurrent_kernel = 0.1*np.ones((64,256))
bias = np.zeros(256)

h , [h,c] = LSTM_step(cell_inputs, cell_states, kernel, recurrent_kernel, bias)
print('Simple verification:')
print('Is h correct?', np.isclose(h.numpy()[0][0],0.48484358))
print('Is c correct?', np.isclose(c.numpy()[0][0],0.70387213))

Simple verification:
Is h correct? True
Is c correct? True
