深度學習課程--assign3--LSTM結構的理解

LSTM（Long Short Term Memory networks）

特殊的RNN的一種

因為RNN能吸收前一個神經元的大部分資訊，而對於遠一點的神經元的資訊卻利用的少。這就導致了預測的不準確，比如語言文字的預測，‘我生活在中國，喜歡去旅遊，而且我喜歡說。。。 ’，如果要預測喜歡說的下一個詞語，那麼‘中國’這個詞就很重要，但這個詞離預測的太遠了，導致傳遞資訊的誤差大。這個問題稱為長期依賴問題。LSTM主要的特點是它可以將先前的網路資訊傳遞至當前神經元，能夠很好地解決這個問題。

這是LSTM的結構圖，相比RNN是
在這裡插入圖片描述

第一步：

這裡把前一個的隱藏層 $h_{t-1}$ 和輸入值 $x_t$ , 加上bias，再通過sigmoid函式得到 $f_t$ 。公式是
$f_t = \sigma(W_f ·[h_{t-1}, x_t] + b_f )$
這一層的引數 $W_f,b_f$ 全部用f做下標，以免跟其他層混淆.

第二步：

在這裡插入圖片描述這一層將會第一層相似，得到相似的 $i_t$ ,兩個公式是
$i_t = \sigma(W_i ·[h_{t-1}, x_t] + b_i)$
這裡的引數 $W_i, b_i$ 將會用 i 做下標
$\hat{C_t} = tanh(W_C·[h_{t-1}, x_t] + b_C)$
這裡的引數 $W_C,b_C$ 將會用 C做下標

第三步：

在這裡插入圖片描述
這裡是把第一步, 第二步的結果和上一層的cell 做相乘和相加的處理。
$C_t = f_t * C_{t-1} + i_t * \hat{C_t}$
這裡就可以更新 $C_{t-1}$ , 得到 $C_t$ , 用於下一層的的計算。
第四步就會更新 $h_{t-1}$ , 得到 $h_{t}$ , 用於下一層的計算。

第四步：

在這裡插入圖片描述
這一步將會計算，
$o_t = \sigma(W_o·[h_{t-1}, x_t] + b_o)$
這裡的引數 $W_o,b_o$ 用的是 0 做下標.
然後結合第三步的 $C_t$ 計算
公式為
$h_t = o_t * tanh(C_t)$
終於到最後一步，可以更新 $h_t$ .

實戰python code

我們將會使用numpy來實現LSTM的結構，包括feedward和backward來更新權值。
現在我們上面的所有公式整合在一起，方便設定相應的引數
要知道我們最後的目的是更新 $C_t, h_t$ ,所以其他的引數計算也是為了這個目的。
在這裡插入圖片描述

$f_t = \sigma(W_f ·[h_{t-1}, x_t] + b_f )$
$i_t = \sigma(W_i ·[h_{t-1}, x_t] + b_i)$
$\hat{C_t} = tanh(W_C·[h_{t-1}, x_t] + b_C)$
$C_t = f_t * C_{t-1} + i_t * \hat{C_t}$
$o_t = \sigma(W_o·[h_{t-1}, x_t] + b_o)$
$h_t = o_t * tanh(C_t)$

課程實戰-Python-簡單手寫LSTM結構

首先先定義LSTM結構出現的兩個啟用函式 --sigmoid+tanh

def sigmoid(x):
  out = 1/(1+tf.exp(-x))
  return out 
def tanh(x):
  out = (tf.exp(x)-tf.exp(-x))/(tf.exp(x)+tf.exp(-x))
  return out

然後，根據LSTM結定義

def LSTM_step(cell_inputs, cell_states, kernel, recurrent_kernel, bias):
    """
    Run one time step of the cell. That is, given the current inputs(x) and the cell states(C_{t-1}) from the last time step, 
    calculate the current state(h_t) and cell output(C_t).
    
    Hint: In LSTM there exist both matrix multiplication and element-wise multiplication. Try not to mix them.
    -開始我混淆了 matrix multiplication和element-wise 全程只用了matrix multiplication，導致輸出的C_t是一個scale，但其實理應是(1,16)
    
        
        
    :param cell_inputs: The input at the current time step. The last dimension of it should be 1.
    :param cell_states:  The state value of the cell from the last time step, containing previous hidden state h_{t-1} and cell state C_{t-1}.
    :param kernel: The kernel matrix for the multiplication with cell_inputs
    :param recurrent_kernel: The kernel matrix for the multiplication with hidden state h_tml
    :param bias: Common bias value
    
    
    :return: current hidden state, and a list of hidden state and cell state
    """
    h_tml = cell_states[0]  #previosu hidden gate h_{t-1}
    c_tml = cell_states[1]  #previous cell gate C_{t-1}
 
 	#這裡是公式  
 	#$f_t =(W_f ·[h_{t-1}, x_t] + b_f )$
	#$i_t =(W_i ·[h_{t-1}, x_t] + b_i)$
 	#$\hat{C_t} =(W_C·[h_{t-1}, x_t] + b_C)$
	#$o_t =(W_o·[h_{t-1}, x_t] + b_o)$
	#這四個公式的結合 稱為z 
    z = tf.matmul(cell_inputs, kernel)
    z += tf.matmul(h_tml,recurrent_kernel)
    z += bias
    #把z分開為四分，通過啟用函式分別稱為ft,it,hat_ct,ot
    z0, z1, z2, z3 = tf.split(z,4,axis=1)
    
    ft = sigmoid(z0)   #在我們的資料裡，ft shape為(1,64)
    it = sigmoid(z1)   #shape 為 （1，64）
    hat_ct = tanh(z2)   #同理shape
    ot = sigmoid(z3)    #同理shape

    #update計算 cell gate - ct 
    ct = ft * c_tml + it * hat_ct   #這裡計算的ct是用點乘 shape為是1，64
	#update計算 hidden gate - ht 
    ht =tanh(ct) * ot               #這裡計算的ht是點乘，element wise ht shape為 也是1，64
    
    return ht, [ht,ct]

最後隨機定義資料來check LSTM step

import numpy as np
cell_inputs = np.ones((1,1))
cell_states = [0.2*np.ones((1,64)), np.zeros((1,64))]
kernel = 0.1*np.ones((1,256))
recurrent_kernel = 0.1*np.ones((64,256))
bias = np.zeros(256)

h , [h,c] = LSTM_step(cell_inputs, cell_states, kernel, recurrent_kernel, bias)
print('Simple verification:')
print('Is h correct?', np.isclose(h.numpy()[0][0],0.48484358))
print('Is c correct?', np.isclose(c.numpy()[0][0],0.70387213))

Simple verification:
Is h correct? True
Is c correct? True