pytorch lstm原始碼解讀

最近閱讀了pytorch中lstm的原始碼，發現其中有很多值得學習的地方。
首先檢視pytorch當中相應的定義

        \begin{array}{ll} \\
            i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{t-1} + b_{hi}) \\
            f_t = \sigma(W_{if} x_t + b_{if} + W_{hf} h_{t-1} + b_{hf}) \\
            g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{t-1} + b_{hg}) \\
            o_t = \sigma(W_{io} x_t + b_{io} + W_{ho} h_{t-1} + b_{ho}) \\
            c_t = f_t \odot c_{t-1} + i_t \odot g_t \\
            h_t = o_t \odot \tanh(c_t) \\
        \end{array}

lstm原理圖
對應公式：
圈1： $f_t = \sigma(W_{if} x_t + b_{if} + W_{hf} h_{t-1} + b_{hf})$
圈2： $i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{t-1} + b_{hi})$
圈3： $g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{t-1} + b_{hg})$
圈4： $o_t = \sigma(W_{io} x_t + b_{io} + W_{ho} h_{t-1} + b_{ho})$
圈5： $c_t = f_t \odot c_{t-1} + i_t \odot g_t$
圈6： $h_t = o_t \odot \tanh(c_t)$
呼叫lstm的相應程式碼如下：

import torch
import torch.nn as nn
bilstm = nn.LSTM(input_size=10, hidden_size=20, num_layers=2, bidirectional=True)
input = torch.randn(5, 3, 10)
h0 = torch.randn(4, 3, 20)
c0 = torch.randn(4, 3, 20)
#with  open('D://test//input1.txt','w')  as  f:
#    f.write(str(input))
#with  open('D://test//h0.txt','w')  as  f:
#    f.write(str(h0))
#with  open('D://test//c0.txt','w')  as  f:
#    f.write(str(c0))
output, (hn, cn) = bilstm(input, (h0, c0))
print('output shape: ', output.shape)
print('hn shape: ', hn.shape)
print('cn shape: ', cn.shape)

這裡的input = (seq_len, batch, input_size)，h_0 (num_layers * num_directions, batch, hidden_size)，c_0 (num_layers * num_directions, batch, hidden_size)
觀察初始化部分的原始碼
可以看出這裡當為lstm層的時候，gate_size = 4*hidden_size

這裡當bidirectional = True時num_directions = 2,當bidirectional = False時num_directions = 1。
接下來的初始化部分 self._flat_weigts_names中的數值，因為這裡總共定義了兩層，所以’weight_ih_l0’ = [80,10],‘weight_hh_l0’ = [80,20],‘bias_ih_l0’ = [80],‘bias_hh_l0’ = [80],‘weight_ih_l0_reverse’ = [80,10],‘weight_hh_l0_reverse’ = [80,20],‘bias_ih_l0_reverse’ = [80],‘bias_hh_l0_reverse’ = [80]
‘weight_ih_l1’ = [80,40],‘weight_hh_l1’ = [80,20],‘bias_ih_l1’ = [80],‘bias_hh_l1’ = [80]
‘weight_ih_l1_reverse’ = [80,40],‘weight_hh_l1_reverse’ = [80,20],‘bias_ih_l1_reverse’ = [80],‘bias_hh_l1_reverse’ = [80]
關於這些陣列的意義回讀一下之前的註釋內容
這裡面的weight_ih_l[k] = [80,10]，其中的80是由4hidden_size = 420得到的，這4個引數分別為W_ii,W_if,W_ig,W_io,而weight_ih_l[k]是由這四個引數拼接得來的[80,10],同理可得到對應的weight_ih_l[k],weight_hh_l[k],bias_ih_l[k],bias_hh_l[k]的相應的含義。
其中，input = [5,3,10],h0 = [4,3,20],c0 = [4,3,20]
對應的lstm結構圖如下所示
h0中的[4,3,20]中的h0[0],h0[1],h0[2],h0[3]分別對應著h[0],h[1],h[2],h[3],每一個的shape都等於[3,20]
同理c0的原理一致。
對於公式進行分析
對於第一層的內容：
公式1： $f_t = \sigma(W_{if}[20,10] x_t + b_{if}[20] + W_{hf}[20,20] h_{t-1} + b_{hf}[20])$
公式2： $i_t = \sigma(W_{ii}[20,10] x_t + b_{ii}[20] + W_{hi}[20,20] h_{t-1} + b_{hi}[20])$
公式3： $g_t = \tanh(W_{ig}[20,10] x_t + b_{ig}[20] + W_{hg}[20,20] h_{t-1} + b_{hg}[20])$
公式4： $o_t = \sigma(W_{io}[20,10] x_t + b_{io}[20] + W_{ho}[20,20] h_{t-1} + b_{ho}[20])$
公式5： $c_t = f_t[20,20] \odot c_{t-1}[20,20] + i_t[20,20] \odot g_t[20,20]$
公式6： $h_t = o_t[20,20] \odot \tanh(c_t)[20,20]$
對於第二層的內容：
公式1： $f_t = \sigma(W_{if}[20,40] x_t + b_{if}[20] + W_{hf}[20,20] h_{t-1} + b_{hf}[20])$
公式2： $i_t = \sigma(W_{ii}[20,40] x_t + b_{ii}[20] + W_{hi}[20,20] h_{t-1} + b_{hi}[20])$
公式3： $g_t = \tanh(W_{ig}[20,40] x_t + b_{ig}[20] + W_{hg}[20,20] h_{t-1} + b_{hg}[20])$
公式4： $o_t = \sigma(W_{io}[20,40] x_t + b_{io}[20] + W_{ho}[20,20] h_{t-1} + b_{ho}[20])$
公式5： $c_t = f_t[20,20] \odot c_{t-1}[20,20] + i_t[20,20] \odot g_t[20,20]$
公式6： $h_t = o_t[20,20] \odot \tanh(c_t)[20,20]$

pytorch lstm原始碼解讀

相關文章