【讀書1】【2017】MATLAB與深度學習——消失的梯度(1)

梅花香——苦寒來發表於2018-11-10

它的實現也是極其容易的。

Its implementation is extremely easy aswell.

sigmoid函式將節點的輸出範圍限制為單位1,沒有考慮輸入值的大小。

The sigmoid function limits the node’soutputs to the unity regardless of the input’s magnitude.

相反,ReLU函式不受這樣的限制。

In contrast, the ReLU function does notexert such limits.

如此簡單的變化導致了深度神經網路學習效能的急劇提高,難道不是很有意思嗎?

Isn’t it interesting that such a simplechange resulted in a drastic improvement of the learning performance of thedeep neural network?

反向傳播演算法中需要的另一個元素是ReLU函式的導數。

Another element that we need for the back-propagationalgorithm is the derivative of the ReLU function.

根據ReLU函式的定義,它的導數為:

By the definition of the ReLU function, itsderivative is given as:

在這裡插入圖片描述

此外,交叉熵驅動的學習規則可以改善效能,如第3章所述。

In addition, the cross entropy-drivenlearning rules may improve the performance, as addressed in Chapter 3.

此外,先進的梯度下降法是一種較好的實現最優值的數值方法,對深度神經網路的訓練也有一定的意義。

Furthermore, the advanced gradient descent,which is a numerical method that better achieves the optimum value, is alsobeneficial for the training of the deep neural network.

過度擬合(Overfitting)

深度神經網路特別容易過擬合的原因是模型變得更加複雜,因為模型中包含更多的隱藏層,以及更多的權值。

The reason that the deep neural network isespecially vulnerable to overfitting is that the model becomes more complicatedas it includes more hidden layers, and hence more weight.

如第1章所述,一個複雜的模型更容易被過度擬合。

As addressed in Chapter 1, a complicatedmodel is more vulnerable to overfitting.

可能會陷入兩難的境地——為了更高的效能而加深神經網路的層數,但使得神經網路面臨機器學習的挑戰。

Here is the dilemma—deepening the layersfor higher performance drives the neural network to face the challenge ofMachine Learning.

最具代表性的解決方案是dropout,它只訓練隨機選擇的一些節點,而不是整個網路。

The most representative solution is thedropout, which trains only some of the randomly selected nodes rather than theentire network.

這是非常有效的,而它的實現並不十分複雜。

It is very effective, while itsimplementation is not very complex.

圖5-4解釋了dropout的概念。

Figure 5-4 explains the concept of thedropout.

在這裡插入圖片描述

圖5-4 dropout是隨機選擇一些節點並將它們的輸出設定為零(即這些被選擇的節點不參與網路的訓練運算)Dropout is where some nodes are randomly selected and their outputsare set to zero to deactivate the nodes

隨機選擇一些節點並將它們的輸出設定為零。

Some nodes are randomly selected at acertain percentage and their outputs are set to be zero to deactivate thenodes.

dropout有效地防止過度擬合,因為它不斷地改變訓練過程中的節點和權重。

The dropout effectively preventsoverfitting as it continuously alters the nodes and weights in the trainingprocess.

對於隱藏層和輸入層,dropout的合適比例分別約為50%和25%。

The adequate percentages of the dropout areapproximately 50% and 25% for hidden and input layers, respectively.

用於防止過擬合的另一種流行方法是將正則化項新增到代價函式中,其中正則化項提供權重的幅度大小。

Another prevailing method used to preventoverfitting is adding regularization terms, which provide the magnitude of theweights, to the cost function.

——本文譯自Phil Kim所著的《Matlab Deep Learning》

更多精彩文章請關注微訊號:在這裡插入圖片描述

相關文章