（二）非線性迴圈神經網路（RNN）

chen_h發表於2019-02-16

原文網址 : https://flycode.co/archives/80583

神經網路RNN

作者：chen_h
微訊號 & QQ：862251340
微信公眾號：coderpai
我的部落格：請點選這裡

這篇教程是翻譯Peter Roelants寫的迴圈神經網路教程，作者已經授權翻譯，這是原文。

該教程將介紹如何實現一個迴圈神經網路（RNN），一共包含兩部分。你可以在以下連結找到完整內容。

非線性迴圈神經網路應用於二進位制相加

本教程主要包含兩部分：

利用張量儲存資料
利用彈性反向傳播和動量方法進行優化

在第一部分中，我們已經學習了一個簡單的線性迴圈神經網路。在這一部分中，我們將利用非線性函式來設計一個非線性的迴圈神經網路，並且實現一個二進位制相加的功能。

我們先匯入教程需要的軟體包：

import itertools
import numpy as np 
import matplotlib
import matplotlib.pyplot as plt複製程式碼

定義資料集

在這個教程中，我們訓練的資料集是2000個資料，在程式中用 create_dataset 函式產生。每個訓練樣本都是有兩部分 (Xi1, Xi2)組成，每一部分是一個7位的二進位制表示，分別由6位的二進位制和最右邊一位的0組成（最右邊的0是為了防止二進位制相加溢位）。預測目標 ti 也是一個7位的二進位制表示，即 ti = Xi1 + Xi2。我們之所以從左到右編碼二進位制，是因為我們的RNN網路是從左到右進行計算的。

輸入資料和預測結果都是被儲存在三維張量裡，比如下圖表示了我們的訓練資料 (X_train, T_train)的維度表示。第一維度表示一共有多少組資料（我們第一維度的值是 2000），第二維度表示的是每個時間步長上面的取值，一共7個時間步長，第三維度表示RNN輸入單元神經元的個數（該教程設定的是2個神經元）。下圖就是輸入張量 X_train 的視覺化展示：

下面程式碼定義了輸入資料集：

# Create dataset
nb_train = 2000  # Number of training samples
# Addition of 2 n-bit numbers can result in a n+1 bit number
sequence_len = 7  # Length of the binary sequence

def create_dataset(nb_samples, sequence_len):
    """Create a dataset for binary addition and return as input, targets."""
    max_int = 2**(sequence_len-1) # Maximum integer that can be added
    format_str = '{:0' + str(sequence_len) + 'b}' # Transform integer in binary format
    nb_inputs = 2  # Add 2 binary numbers
    nb_outputs = 1  # Result is 1 binary number
    X = np.zeros((nb_samples, sequence_len, nb_inputs))  # Input samples
    T = np.zeros((nb_samples, sequence_len, nb_outputs))  # Target samples
    # Fill up the input and target matrix
    for i in xrange(nb_samples):
        # Generate random numbers to add
        nb1 = np.random.randint(0, max_int)
        nb2 = np.random.randint(0, max_int)
        # Fill current input and target row.
        # Note that binary numbers are added from right to left, but our RNN reads 
        #  from left to right, so reverse the sequence.
        X[i,:,0] = list(reversed([int(b) for b in format_str.format(nb1)]))
        X[i,:,1] = list(reversed([int(b) for b in format_str.format(nb2)]))
        T[i,:,0] = list(reversed([int(b) for b in format_str.format(nb1+nb2)]))
    return X, T

# Create training samples
X_train, T_train = create_dataset(nb_train, sequence_len)
print('X_train shape: {0}'.format(X_train.shape))
print('T_train shape: {0}'.format(T_train.shape))複製程式碼

X_train shape: (2000, 7, 2)
T_train shape: (2000, 7, 1)

二進位制相加

如果需要理解迴圈神經網路從輸入資料流到輸出資料流的一整個形式，那麼二進位制相加將是一個很好的例子。迴圈神經網路需要學習兩件事：第一，怎麼去將上一次的運算狀態傳遞到下一次的運算中去；第二，根據輸入資料和上一步的輸入狀態值（也就是記憶），去判斷什麼時候應該輸出0，什麼時候應該輸出1。

下面程式碼將二進位制相加結果做了一個視覺化：

# Show an example input and target
def printSample(x1, x2, t, y=None):
    """Print a sample in a more visual way."""
    x1 = ''.join([str(int(d)) for d in x1])
    x2 = ''.join([str(int(d)) for d in x2])
    t = ''.join([str(int(d[0])) for d in t])
    if not y is None:
        y = ''.join([str(int(d[0])) for d in y])
    print('x1:   {:s}   {:2d}'.format(x1, int(''.join(reversed(x1)), 2)))
    print('x2: + {:s}   {:2d} '.format(x2, int(''.join(reversed(x2)), 2)))
    print('      -------   --')
    print('t:  = {:s}   {:2d}'.format(t, int(''.join(reversed(t)), 2)))
    if not y is None:
        print('y:  = {:s}'.format(t))
    
# Print the first sample
printSample(X_train[0,:,0], X_train[0,:,1], T_train[0,:,:])複製程式碼

x1: 1010010 37
x2: + 1101010 43
--------------------
t: = 0000101 80

迴圈神經網路架構

在這個教程中，對於每個時間點，我們設計的迴圈神經網路有2個輸入神經元，之後將它們轉換成狀態值，最後輸出一個單獨的預測概率值。當前的時間點是1（而不是0）。由輸入資料轉換成的狀態值，它的作用是記住一部分資訊，以便於網路與預測下一步應該輸出什麼。

網上有很多的方法，可以將我們設計的RNN進行視覺化展示，我們還是利用在第一部分中的展示方法，將我們的RNN架構進行戰術，如下圖：

或者，我們還能將完整的輸入資料，完整的狀態值和完整的預測結果進行視覺化，輸入資料張量可以被並行對映到狀態值張量，狀態值張量也可以被並行對映到每一個時間點的預測值張量。如下圖：

在程式碼中，每一個對映過程被抽象成了一個類。在每一個類中，都有一個 forward 函式用來計算BP演算法中的前向傳播，backward 函式來計算BP演算法中的反向傳播。

RNN的計算過程

線性轉換

在神經網路中，將輸入資料對映到下一層的常用方法是矩陣相乘並且加上偏差項，最後利用一個非線性函式進行啟用操作。在這篇教程中，二維的輸入資料 (Xik1, Xik2) ，通過一個 2*3的連結矩陣和長度是3的偏差向量對映到狀態層，即下一層。在狀態反饋之前，三維的狀態值，將會通過一個 3*1 的連結矩陣和長度是1的偏差向量對映到輸出層，即得到輸出概率。

因為我們想在一個計算步驟中，將訓練樣本中的每個樣本在每個時間點都進行狀態對映，所以我們將使用 Numpy 中的 tensordot 函式，去實現這個相乘的操作。這個函式需要輸入2個張量和指定需要累加的軸。比如，輸入資料為 shape(X) = (2000, 7, 2)，狀態值為 shape(S) = (2000, 7, 3)，連結矩陣為 shape(W) = (2, 3)，那麼我們可以得到公式 S = tensordot(X, W, axes = ((-1), (0)) 。這個公式會把 X 的最後一維 (-1) 和 W 的第零維度進行相乘累加，最後得到一個維度是 (2000, 7, 3) 的張量。

這個線性轉換可以用在輸入資料 X 到狀態層 S 的對映，也可以用在狀態層 S 到輸出層 Y 的對映。在程式碼的 TensorLinear 類中，實現了這個線性轉換，還實現了它的梯度。根據Xavier Glorot的建議，權重初始值是一個均勻分佈，資料範圍是：

Logitstic分類

Logistic分類函式將被在輸出層使用，用來得到輸出的概率值，在 LogisticClassifier 函式中實現了它的損失函式和梯度。

# Define the linear tensor transformation layer
class TensorLinear(object):
    """The linear tensor layer applies a linear tensor dot product and a bias to its input."""
    def __init__(self, n_in, n_out, tensor_order, W=None, b=None):
        """Initialse the weight W and bias b parameters."""
        a = np.sqrt(6.0 / (n_in + n_out))
        self.W = (np.random.uniform(-a, a, (n_in, n_out)) if W is None else W)
        self.b = (np.zeros((n_out)) if b is None else b)  # Bias paramters
        self.bpAxes = tuple(range(tensor_order-1))  # Axes summed over in backprop

    def forward(self, X):
        """Perform forward step transformation with the help of a tensor product."""
        # Same as: Y[i,j,:] = np.dot(X[i,j,:], self.W) + self.b (for i,j in X.shape[0:1])
        # Same as: Y = np.einsum('ijk,kl->ijl', X, self.W) + self.b
        return np.tensordot(X, self.W, axes=((-1),(0))) + self.b

    def backward(self, X, gY):
        """Return the gradient of the parmeters and the inputs of this layer."""
        # Same as: gW = np.einsum('ijk,ijl->kl', X, gY)
        # Same as: gW += np.dot(X[:,j,:].T, gY[:,j,:]) (for i,j in X.shape[0:1])
        gW = np.tensordot(X, gY, axes=(self.bpAxes, self.bpAxes))
        gB = np.sum(gY, axis=self.bpAxes)
        # Same as: gX = np.einsum('ijk,kl->ijl', gY, self.W.T)
        # Same as: gX[i,j,:] = np.dot(gY[i,j,:], self.W.T) (for i,j in gY.shape[0:1])
        gX = np.tensordot(gY, self.W.T, axes=((-1),(0)))  
        return gX, gW, gB複製程式碼

# Define the logistic classifier layer
class LogisticClassifier(object):
    """The logistic layer applies the logistic function to its inputs."""
   
    def forward(self, X):
        """Perform the forward step transformation."""
        return 1 / (1 + np.exp(-X))
    
    def backward(self, Y, T):
        """Return the gradient with respect to the cost function at the inputs of this layer."""
        # Normalise of the number of samples and sequence length.
        return (Y - T) / (Y.shape[0] * Y.shape[1])
    
    def cost(self, Y, T):
        """Compute the cost at the output."""
        # Normalise of the number of samples and sequence length.
        # Add a small number (1e-99) because Y can become 0 if the network learns
        #  to perfectly predict the output. log(0) is undefined.
        return - np.sum(np.multiply(T, np.log(Y+1e-99)) + np.multiply((1-T), np.log(1-Y+1e-99))) / (Y.shape[0] * Y.shape[1])複製程式碼

展開迴圈神經網路的中間狀態

在上一部分教程中，我們知道隨著時間步長，我們需要把迴圈狀態進行展開處理。在程式碼中，RecurrentStateUnfold 類實現了這個展開的BPTT演算法。這個類包含了前一狀態層到當前狀態層的權重，偏差單元，當然也實現了權重初始化和優化函式。

在 RecurrentStateUnfold 類中，forward 函式實現了隨著時間步長，狀態函式的迭代更新。backward 函式實現了每個輸出狀態值的梯度。在每個時間點 k 上，輸出層 Y 的梯度還需要加上上一狀態的梯度之和。權重項和偏差項的梯度需要將所有時間點上面的權重項和偏差項的梯度都進行累加，因為在每一個時間點它們的值都是共享的。在時間點 k = 0，最後狀態的梯度需要去優化初始的狀態 S0，因為初始狀體的梯度是 ∂ξ/∂S0 。

RecurrentStateUnfold 類需要使用 RecurrentStateUpdate 類。這個類中的 forward 方法實現了將 k-1 的狀態和 input 輸入進行聯合計算得到 k 時刻的狀態值。backward 方法實現了BPTT演算法。在 RecurrentStateUpdate 類中實現的非線性啟用函式是hyperbolic tangent (tanh)函式，這個函式的取值範圍是從 -1 到 +1。這個函式在 TanH 類中實現了。

# Define tanh layer
class TanH(object):
    """TanH applies the tanh function to its inputs."""
    
    def forward(self, X):
        """Perform the forward step transformation."""
        return np.tanh(X) 
    
    def backward(self, Y, output_grad):
        """Return the gradient at the inputs of this layer."""
        gTanh = 1.0 - np.power(Y,2)
        return np.multiply(gTanh, output_grad)複製程式碼

# Define internal state update layer
class RecurrentStateUpdate(object):
    """Update a given state."""
    def __init__(self, nbStates, W, b):
        """Initialse the linear transformation and tanh transfer function."""
        self.linear = TensorLinear(nbStates, nbStates, 2, W, b)
        self.tanh = TanH()

    def forward(self, Xk, Sk):
        """Return state k+1 from input and state k."""
        return self.tanh.forward(Xk + self.linear.forward(Sk))
    
    def backward(self, Sk0, Sk1, output_grad):
        """Return the gradient of the parmeters and the inputs of this layer."""
        gZ = self.tanh.backward(Sk1, output_grad)
        gSk0, gW, gB = self.linear.backward(Sk0, gZ)
        return gZ, gSk0, gW, gB複製程式碼

# Define layer that unfolds the states over time
class RecurrentStateUnfold(object):
    """Unfold the recurrent states."""
    def __init__(self, nbStates, nbTimesteps):
        " Initialse the shared parameters, the inital state and state update function."
        a = np.sqrt(6.0 / (nbStates * 2))
        self.W = np.random.uniform(-a, a, (nbStates, nbStates))
        self.b = np.zeros((self.W.shape[0]))  # Shared bias
        self.S0 = np.zeros(nbStates)  # Initial state
        self.nbTimesteps = nbTimesteps  # Timesteps to unfold
        self.stateUpdate = RecurrentStateUpdate(nbStates, self.W, self.b)  # State update function
        
    def forward(self, X):
        """Iteratively apply forward step to all states."""
        S = np.zeros((X.shape[0], X.shape[1]+1, self.W.shape[0]))  # State tensor
        S[:,0,:] = self.S0  # Set initial state
        for k in range(self.nbTimesteps):
            # Update the states iteratively
            S[:,k+1,:] = self.stateUpdate.forward(X[:,k,:], S[:,k,:])
        return S
    
    def backward(self, X, S, gY):
        """Return the gradient of the parmeters and the inputs of this layer."""
        gSk = np.zeros_like(gY[:,self.nbTimesteps-1,:])  # Initialise gradient of state outputs
        gZ = np.zeros_like(X)  # Initialse gradient tensor for state inputs
        gWSum = np.zeros_like(self.W)  # Initialise weight gradients
        gBSum = np.zeros_like(self.b)  # Initialse bias gradients
        # Propagate the gradients iteratively
        for k in range(self.nbTimesteps-1, -1, -1):
            # Gradient at state output is gradient from previous state plus gradient from output
            gSk += gY[:,k,:]
            # Propgate the gradient back through one state
            gZ[:,k,:], gSk, gW, gB = self.stateUpdate.backward(S[:,k,:], S[:,k+1,:], gSk)
            gWSum += gW  # Update total weight gradient
            gBSum += gB  # Update total bias gradient
        gS0 = np.sum(gSk, axis=0)  # Get gradient of initial state over all samples
        return gZ, gWSum, gBSum, gS0複製程式碼

整個網路

在 RnnBinaryAdder 類中，實現了整個二進位制相加的網路過程。它在建立的時候，同時初始化了所有的網路引數。forward 方法實現了整個網路的前向傳播過程，backward 方法實現了整個網路的梯度更新和反向傳播過程。getParamGrads 方法計算了每一個引數的梯度，並且作為一個列表進行返回。get_params_iter 方法是將引數做一個索引排序，使得引數的梯度按照一定的順序返回。

# Define the full network
class RnnBinaryAdder(object):
    """RNN to perform binary addition of 2 numbers."""
    def __init__(self, nb_of_inputs, nb_of_outputs, nb_of_states, sequence_len):
        """Initialse the network layers."""
        self.tensorInput = TensorLinear(nb_of_inputs, nb_of_states, 3)  # Input layer
        self.rnnUnfold = RecurrentStateUnfold(nb_of_states, sequence_len)  # Recurrent layer
        self.tensorOutput = TensorLinear(nb_of_states, nb_of_outputs, 3)  # Linear output transform
        self.classifier = LogisticClassifier()  # Classification output
        
    def forward(self, X):
        """Perform the forward propagation of input X through all layers."""
        recIn = self.tensorInput.forward(X)  # Linear input transformation
        # Forward propagate through time and return states
        S = self.rnnUnfold.forward(recIn)
        Z = self.tensorOutput.forward(S[:,1:sequence_len+1,:])  # Linear output transformation
        Y = self.classifier.forward(Z)  # Get classification probabilities
        # Return: input to recurrent layer, states, input to classifier, output
        return recIn, S, Z, Y
    
    def backward(self, X, Y, recIn, S, T):
        """Perform the backward propagation through all layers.
        Input: input samples, network output, intput to recurrent layer, states, targets."""
        gZ = self.classifier.backward(Y, T)  # Get output gradient
        gRecOut, gWout, gBout = self.tensorOutput.backward(S[:,1:sequence_len+1,:], gZ)
        # Propagate gradient backwards through time
        gRnnIn, gWrec, gBrec, gS0 = self.rnnUnfold.backward(recIn, S, gRecOut)
        gX, gWin, gBin = self.tensorInput.backward(X, gRnnIn)
        # Return the parameter gradients of: linear output weights, linear output bias,
        #  recursive weights, recursive bias, linear input weights, linear input bias, initial state.
        return gWout, gBout, gWrec, gBrec, gWin, gBin, gS0
    
    def getOutput(self, X):
        """Get the output probabilities of input X."""
        recIn, S, Z, Y = self.forward(X)
        return Y  # Only return the output.
    
    def getBinaryOutput(self, X):
        """Get the binary output of input X."""
        return np.around(self.getOutput(X))
    
    def getParamGrads(self, X, T):
        """Return the gradients with respect to input X and target T as a list.
        The list has the same order as the get_params_iter iterator."""
        recIn, S, Z, Y = self.forward(X)
        gWout, gBout, gWrec, gBrec, gWin, gBin, gS0 = self.backward(X, Y, recIn, S, T)
        return [g for g in itertools.chain(
                np.nditer(gS0),
                np.nditer(gWin),
                np.nditer(gBin),
                np.nditer(gWrec),
                np.nditer(gBrec),
                np.nditer(gWout),
                np.nditer(gBout))]
    
    def cost(self, Y, T):
        """Return the cost of input X w.r.t. targets T."""
        return self.classifier.cost(Y, T)
    
    def get_params_iter(self):
        """Return an iterator over the parameters.
        The iterator has the same order as get_params_grad.
        The elements returned by the iterator are editable in-place."""
        return itertools.chain(
            np.nditer(self.rnnUnfold.S0, op_flags=['readwrite']),
            np.nditer(self.tensorInput.W, op_flags=['readwrite']),
            np.nditer(self.tensorInput.b, op_flags=['readwrite']),
            np.nditer(self.rnnUnfold.W, op_flags=['readwrite']),
            np.nditer(self.rnnUnfold.b, op_flags=['readwrite']),
            np.nditer(self.tensorOutput.W, op_flags=['readwrite']), 
            np.nditer(self.tensorOutput.b, op_flags=['readwrite']))複製程式碼

梯度檢查

我們需要將網路求得的梯度和進行數值計算得到的梯度進行比較，從而判斷梯度是否計算正確，我們在這篇部落格中已經詳細介紹瞭如何進行梯度檢查，如果還有不明白，可以檢視這篇部落格。

# Do gradient checking
# Define an RNN to test
RNN = RnnBinaryAdder(2, 1, 3, sequence_len)
# Get the gradients of the parameters from a subset of the data
backprop_grads = RNN.getParamGrads(X_train[0:100,:,:], T_train[0:100,:,:])

eps = 1e-7  # Set the small change to compute the numerical gradient
# Compute the numerical gradients of the parameters in all layers.
for p_idx, param in enumerate(RNN.get_params_iter()):
    grad_backprop = backprop_grads[p_idx]
    # + eps
    param += eps
    plus_cost = RNN.cost(RNN.getOutput(X_train[0:100,:,:]), T_train[0:100,:,:])
    # - eps
    param -= 2 * eps
    min_cost = RNN.cost(RNN.getOutput(X_train[0:100,:,:]), T_train[0:100,:,:])
    # reset param value
    param += eps
    # calculate numerical gradient
    grad_num = (plus_cost - min_cost)/(2*eps)
    # Raise error if the numerical grade is not close to the backprop gradient
    if not np.isclose(grad_num, grad_backprop):
        raise ValueError('Numerical gradient of {:.6f} is not close to the backpropagation gradient of {:.6f}!'.format(float(grad_num), float(grad_backprop)))
print('No gradient errors found')複製程式碼

No gradient errors found

使用動量方法優化Rmsprop

在上一部分中，我們使用彈性反向傳播演算法去優化我們的網路。在這個部落格中，我們將使用動量方法來優化Rmsprop。我們將原來的 Rprop 演算法替換為 Rmsprop 演算法，是因為 Rprop 演算法在處理小批量資料上的效果並不是很好，可能會發生梯度翻轉的情況。

Rmsprop 演算法是從 Rprop 演算法中得到靈感的，它保留了對於每一個引數 θ 的平方梯度的平均移動，如下：

其中，λ 是一個平均移動引數。

這時候，梯度已經被歸一化了，如下：

之後，這個歸一化的梯度被用於引數的更新。

注意，這個梯度不是直接被使用在引數的更新上面，而是用在每個引數的速度引數（Vs）上面的更新。這個引數和這篇部落格中的動量部分中的速度引數很像，但是在使用的方法上面又有一點差異。Nesterov 的加速梯度和一般的動量方法是不同的，主要體現在更新迭代方面。常規的動量演算法在每一次迭代的開始就計算梯度，並且更新速度引數。但是 Nesterov 的加速梯度演算法是根據較少速度來計算梯度的值，然後再更新速度，最後再根據區域性梯度進行移動。這種處理方法有一個優點就是梯度在進行區域性更新時將得到更多的資訊，即使當前速度進行了一個錯誤的更新，該演算法也能使梯度進行正確的計算。Nesterov 的更新可以如下計算：

其中，∇(θ) 是一個在關於引數 θ 的區域性梯度。比如，當前是第 i 次迴圈，那麼式子可以被表示為如下圖：

注意，我們不能保證一定會收斂到全域性最小值，即 cost = 0。因為如果你在引數更新的開始取的位置不是很好，那麼最後的優化可能會取到區域性最小值。而且訓練過程對引數 lmbd，learning_rate，mementum_term，eps 都很敏感。你可以嘗試一下以下程式碼，看看執行多久可以達到收斂。

# Set hyper-parameters
lmbd = 0.5  # Rmsprop lambda
learning_rate = 0.05  # Learning rate
momentum_term = 0.80  # Momentum term
eps = 1e-6  # Numerical stability term to prevent division by zero
mb_size = 100  # Size of the minibatches (number of samples)

# Create the network
nb_of_states = 3  # Number of states in the recurrent layer
RNN = RnnBinaryAdder(2, 1, nb_of_states, sequence_len)
# Set the initial parameters
nbParameters =  sum(1 for _ in RNN.get_params_iter())  # Number of parameters in the network
maSquare = [0.0 for _ in range(nbParameters)]  # Rmsprop moving average
Vs = [0.0 for _ in range(nbParameters)]  # Velocity

# Create a list of minibatch costs to be plotted
ls_of_costs = [RNN.cost(RNN.getOutput(X_train[0:100,:,:]), T_train[0:100,:,:])]
# Iterate over some iterations
for i in range(5):
    # Iterate over all the minibatches
    for mb in range(nb_train/mb_size):
        X_mb = X_train[mb:mb+mb_size,:,:]  # Input minibatch
        T_mb = T_train[mb:mb+mb_size,:,:]  # Target minibatch
        V_tmp = [v * momentum_term for v in Vs]
        # Update each parameters according to previous gradient
        for pIdx, P in enumerate(RNN.get_params_iter()):
            P += V_tmp[pIdx]
        # Get gradients after following old velocity
        backprop_grads = RNN.getParamGrads(X_mb, T_mb)  # Get the parameter gradients    
        # Update each parameter seperately
        for pIdx, P in enumerate(RNN.get_params_iter()):
            # Update the Rmsprop moving averages
            maSquare[pIdx] = lmbd * maSquare[pIdx] + (1-lmbd) * backprop_grads[pIdx]**2
            # Calculate the Rmsprop normalised gradient
            pGradNorm = learning_rate * backprop_grads[pIdx] / np.sqrt(maSquare[pIdx] + eps)
            # Update the momentum velocity
            Vs[pIdx] = V_tmp[pIdx] - pGradNorm     
            P -= pGradNorm   # Update the parameter
        ls_of_costs.append(RNN.cost(RNN.getOutput(X_mb), T_mb))  # Add cost to list to plot複製程式碼

# Plot the cost over the iterations
plt.plot(ls_of_costs, 'b-')
plt.xlabel('minibatch iteration')
plt.ylabel('$\\xi$', fontsize=15)
plt.title('Decrease of cost over backprop iteration')
plt.grid()
plt.show()複製程式碼

測試網路

下面程式碼對我們上述設計的迴圈神經網路進行了二進位制相加的測試，具體結果如下：

# Create test samples
nb_test = 5
Xtest, Ttest = create_dataset(nb_test, sequence_len)
# Push test data through network
Y = RNN.getBinaryOutput(Xtest)
Yf = RNN.getOutput(Xtest)

# Print out all test examples
for i in range(Xtest.shape[0]):
    printSample(Xtest[i,:,0], Xtest[i,:,1], Ttest[i,:,:], Y[i,:,:])
    print ''複製程式碼

x1: 0100010 34
x2: + 1100100 19

\------- --複製程式碼

t: = 1010110 53
y: = 1010110

x1: 1010100 21
x2: + 1110100 23

\------- --複製程式碼

t: = 0011010 44
y: = 0011010

x1: 1111010 47
x2: + 0000000 0

\------- --複製程式碼

t: = 1111010 47
y: = 1111010

x1: 1000000 1
x2: + 1111110 63

\------- --複製程式碼

t: = 0000001 64
y: = 0000001

x1: 1010100 21
x2: + 1010100 21

\------- --複製程式碼

t: = 0101010 42
y: = 0101010

完整程式碼，點選這裡

CoderPai 是一個專注於演算法實戰的平臺，從基礎的演算法到人工智慧演算法都有設計。如果你對演算法實戰感興趣，請快快關注我們吧。加入AI實戰微信群，AI實戰QQ群，ACM演算法微信群，ACM演算法QQ群。詳情請關注 “CoderPai” 微訊號（coderpai）。

（一）線性迴圈神經網路（RNN）
2019-02-21
神經網路RNN
迴圈神經網路（RNN）
2020-07-14
神經網路RNN
迴圈神經網路 RNN
2020-12-21
神經網路RNN
迴圈神經網路（Recurrent Neural Network，RNN）
2018-08-22
神經網路RNN
迴圈神經網路LSTM RNN迴歸：sin曲線預測
2021-09-11
神經網路RNN
深度學習之RNN(迴圈神經網路)
2018-05-28
深度學習RNN神經網路
4.5 RNN迴圈神經網路（recurrent neural network）
2021-07-05
RNN神經網路
從網路架構方面簡析迴圈神經網路RNN
2019-05-17
架構神經網路RNN
[譯] RNN 迴圈神經網路系列 2：文字分類
2019-03-01
RNN神經網路文字分類
RNN-迴圈神經網路和LSTM_01基礎
2018-05-27
RNN神經網路
迴圈神經網路
2020-03-14
神經網路
【神經網路篇】--RNN遞迴神經網路初始與詳解
2018-05-13
神經網路RNN遞迴
[譯] RNN 迴圈神經網路系列 3：編碼、解碼器
2019-03-03
RNN神經網路
用於自然語言處理的迴圈神經網路RNN
2024-11-25
自然語言處理神經網路RNN
關於 RNN 迴圈神經網路的反向傳播求導
2021-01-11
RNN神經網路反向傳播求導
torch神經網路--線性迴歸
2024-10-05
神經網路
從前饋到反饋：解析迴圈神經網路（RNN）及其tricks
2018-07-26
神經網路RNN
NLP與深度學習（二）迴圈神經網路
2021-08-28
深度學習神經網路
迴圈神經網路介紹
2018-08-12
神經網路
pytorch--迴圈神經網路
2020-12-22
PyTorch神經網路
TensorFlow系列專題（七）：一文綜述RNN迴圈神經網路
2018-11-22
RNN神經網路
吳恩達《序列模型》課程筆記（1）– 迴圈神經網路（RNN）
2018-08-02
吳恩達模型筆記神經網路RNN
第五週：迴圈神經網路
2020-08-22
神經網路
前沿高階技術之遞迴神經網路（RNN）
2022-07-20
遞迴神經網路RNN
動畫圖解迴圈神經網路
2019-09-09
動畫圖解神經網路
常見迴圈神經網路結構
2021-03-24
神經網路
RNN神經網路模型綜述
2019-05-10
RNN神經網路模型
YJango的迴圈神經網路——實現LSTM
2018-08-13
Go神經網路
深度學習迴圈神經網路詳解
2018-05-28
深度學習神經網路
TensorFlow系列專題（八）：七步帶你實現RNN迴圈神經網路小示例
2018-11-22
RNN神經網路
精講深度學習RNN三大核心點，三分鐘掌握迴圈神經網路
2019-03-09
深度學習RNN神經網路
YJango的迴圈神經網路——scan實現LSTM
2018-08-13
Go神經網路
TensorFlow.NET機器學習入門【3】採用神經網路實現非線性迴歸
2021-12-24
機器學習神經網路
一文讀懂LSTM和迴圈神經網路
2018-04-18
神經網路
從零開始用 Python 構建迴圈神經網路
2019-03-13
Python神經網路
第三章：線性神經網路
2024-08-16
神經網路
使用PyTorch從零開始構建Elman迴圈神經網路
2018-03-29
PyTorch神經網路
迴圈神經網路之embedding，padding，模型構建與訓練
2021-03-02
神經網路padding模型