02改善深層神經網路-Gradient+Checking-第一週程式設計作業3
梯度檢查
用於測試反向傳播時梯度下降所取到的值是否合理,使用2圖的公式計算difference差值是否足夠小。
What you should remember from this notebook:
- Gradient checking verifies closeness between the gradients from backpropagation and the numerical approximation of the gradient (computed using forward propagation).(梯度下降驗證反向傳播時梯度值是否數值逼近正向傳播時的梯度)
- Gradient checking is slow, so we don't run it in every iteration of training. You would usually run it only to make sure your code is correct, then turn it off and use backprop for the actual learning process.(梯度檢查很慢,因此我們不要在每一輪訓練中都使用它。你應該僅在程式碼正確的時候執行它來確認,然後關閉它,在學習過程中使用反向傳播)
#coding=utf-8
# Packages
import numpy as np
from testCases import *
from gc_utils import sigmoid, relu, dictionary_to_vector, vector_to_dictionary, gradients_to_vector
# GRADED FUNCTION: forward_propagation
def forward_propagation(x, theta):
"""
Implement the linear forward propagation (compute J) presented in Figure 1 (J(theta) = theta * x)
Arguments:
x -- a real-valued input
theta -- our parameter, a real number as well
Returns:
J -- the value of function J, computed using the formula J(theta) = theta * x
"""
### START CODE HERE ### (approx. 1 line)
J = theta * x
### END CODE HERE ###
return J
# GRADED FUNCTION: backward_propagation
def backward_propagation(x, theta):
"""
Computes the derivative of J with respect to theta (see Figure 1).
Arguments:
x -- a real-valued input
theta -- our parameter, a real number as well
Returns:
dtheta -- the gradient of the cost with respect to theta
"""
### START CODE HERE ### (approx. 1 line)
dtheta = x
### END CODE HERE ###
return dtheta
# GRADED FUNCTION: gradient_check
def gradient_check(x, theta, epsilon = 1e-7):
"""
Implement the backward propagation presented in Figure 1.
Arguments:
x -- a real-valued input
theta -- our parameter, a real number as well
epsilon -- tiny shift to the input to compute approximated gradient with formula(1)
Returns:
difference -- difference (2) between the approximated gradient and the backward propagation gradient
"""
# Compute gradapprox using left side of formula (1). epsilon is small enough, you don't need to worry about the limit.
### START CODE HERE ### (approx. 5 lines)
thetaplus = theta + epsilon # Step 1
thetaminus = theta - epsilon # Step 2
J_plus = forward_propagation(x, thetaplus) # Step 3
J_minus = forward_propagation(x, thetaminus) # Step 4
gradapprox = (J_plus - J_minus) / (2 * epsilon )# Step 5
### END CODE HERE ###
# Check if gradapprox is close enough to the output of backward_propagation()
### START CODE HERE ### (approx. 1 line)
grad = backward_propagation(x, theta)
### END CODE HERE ###
### START CODE HERE ### (approx. 1 line)
numerator = np.linalg.norm(grad - gradapprox) # Step 1'
denominator = (np.linalg.norm(grad) + np.linalg.norm(gradapprox)) # Step 2'
difference = numerator / denominator # Step 3'
### END CODE HERE ###
if difference < 1e-7:
print ("The gradient is correct!")
else:
print ("The gradient is wrong!")
return difference
def forward_propagation_n(X, Y, parameters):
"""
Implements the forward propagation (and computes the cost) presented in Figure 3.
Arguments:
X -- training set for m examples
Y -- labels for m examples
parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3":
W1 -- weight matrix of shape (5, 4)
b1 -- bias vector of shape (5, 1)
W2 -- weight matrix of shape (3, 5)
b2 -- bias vector of shape (3, 1)
W3 -- weight matrix of shape (1, 3)
b3 -- bias vector of shape (1, 1)
Returns:
cost -- the cost function (logistic cost for one example)
"""
# retrieve parameters
m = X.shape[1]
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
W3 = parameters["W3"]
b3 = parameters["b3"]
# LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID
Z1 = np.dot(W1, X) + b1
A1 = relu(Z1)
Z2 = np.dot(W2, A1) + b2
A2 = relu(Z2)
Z3 = np.dot(W3, A2) + b3
A3 = sigmoid(Z3)
# Cost
logprobs = np.multiply(-np.log(A3),Y) + np.multiply(-np.log(1 - A3), 1 - Y)
cost = 1./m * np.sum(logprobs)
cache = (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3)
return cost, cache
def backward_propagation_n(X, Y, cache):
"""
Implement the backward propagation presented in figure 2.
Arguments:
X -- input datapoint, of shape (input size, 1)
Y -- true "label"
cache -- cache output from forward_propagation_n()
Returns:
gradients -- A dictionary with the gradients of the cost with respect to each parameter, activation and pre-activation variables.
"""
m = X.shape[1]
(Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3) = cache
dZ3 = A3 - Y
dW3 = 1./m * np.dot(dZ3, A2.T)
db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True)
dA2 = np.dot(W3.T, dZ3)
dZ2 = np.multiply(dA2, np.int64(A2 > 0))
dW2 = 1./m * np.dot(dZ2, A1.T) * 2
db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True)
dA1 = np.dot(W2.T, dZ2)
dZ1 = np.multiply(dA1, np.int64(A1 > 0))
dW1 = 1./m * np.dot(dZ1, X.T)
db1 = 4./m * np.sum(dZ1, axis=1, keepdims = True)
gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3,
"dA2": dA2, "dZ2": dZ2, "dW2": dW2, "db2": db2,
"dA1": dA1, "dZ1": dZ1, "dW1": dW1, "db1": db1}
return gradients
# GRADED FUNCTION: gradient_check_n
def gradient_check_n(parameters, gradients, X, Y, epsilon = 1e-7):
"""
Checks if backward_propagation_n computes correctly the gradient of the cost output by forward_propagation_n
Arguments:
parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3":
grad -- output of backward_propagation_n, contains gradients of the cost with respect to the parameters.
x -- input datapoint, of shape (input size, 1)
y -- true "label"
epsilon -- tiny shift to the input to compute approximated gradient with formula(1)
Returns:
difference -- difference (2) between the approximated gradient and the backward propagation gradient
"""
# Set-up variables
parameters_values, _ = dictionary_to_vector(parameters)
grad = gradients_to_vector(gradients)
num_parameters = parameters_values.shape[0]
J_plus = np.zeros((num_parameters, 1))
J_minus = np.zeros((num_parameters, 1))
gradapprox = np.zeros((num_parameters, 1))
# Compute gradapprox
for i in range(num_parameters):
# Compute J_plus[i]. Inputs: "parameters_values, epsilon". Output = "J_plus[i]".
# "_" is used because the function you have to outputs two parameters but we only care about the first one
### START CODE HERE ### (approx. 3 lines)
thetaplus = np.copy(parameters_values) # Step 1
thetaplus[i][0] = thetaplus[i][0] + epsilon # Step 2
J_plus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaplus))# Step 3
### END CODE HERE ###
# Compute J_minus[i]. Inputs: "parameters_values, epsilon". Output = "J_minus[i]".
### START CODE HERE ### (approx. 3 lines)
thetaminus = np.copy(parameters_values) # Step 1
thetaminus[i][0] = thetaminus[i][0] - epsilon # Step 2
J_minus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaminus))# Step 3
### END CODE HERE ###
# Compute gradapprox[i]
### START CODE HERE ### (approx. 1 line)
gradapprox[i] = (J_plus[i] - J_minus[i]) / (2 * epsilon )
### END CODE HERE ###
# Compare gradapprox to backward propagation gradients by computing difference.
### START CODE HERE ### (approx. 1 line)
numerator = np.linalg.norm(grad - gradapprox) # Step 1'
denominator = (np.linalg.norm(grad) + np.linalg.norm(gradapprox)) # Step 2'
difference = numerator / denominator # Step 3'
### END CODE HERE ###
if difference > 1e-7:
print ("\033[93m" + "There is a mistake in the backward propagation! difference = " + str(difference) + "\033[0m")
else:
print ("\033[92m" + "Your backward propagation works perfectly fine! difference = " + str(difference) + "\033[0m")
return difference
if __name__=='__main__':
x, theta = 2, 4
J = forward_propagation(x, theta)
print ("J = " + str(J))
x, theta = 2, 4
dtheta = backward_propagation(x, theta)
print ("dtheta = " + str(dtheta))
x, theta = 2, 4
difference = gradient_check(x, theta)
print("difference = " + str(difference))
X, Y, parameters = gradient_check_n_test_case()
cost, cache = forward_propagation_n(X, Y, parameters)
gradients = backward_propagation_n(X, Y, cache)
difference = gradient_check_n(parameters, gradients, X, Y)
相關文章
- 02改善深層神經網路-Initialization-第一週程式設計作業1神經網路程式設計
- 02改善深層神經網路-Regularization-第一週程式設計作業2神經網路程式設計
- 02改善深層神經網路-Optimization+methods-第二週程式設計作業1神經網路程式設計
- 深層神經網路2神經網路
- 第二課改善深層神經網路assignment1-Initialization神經網路
- Coursera Deep Learning 2 改善深層神經網路:超引數除錯、正則化以及優化 第一週習題神經網路除錯優化
- 【深度學習】1.4深層神經網路深度學習神經網路
- 深度學習教程 | 深層神經網路深度學習神經網路
- 吳恩達神經網路-第一週吳恩達神經網路
- Python 第一週程式設計作業Python程式設計
- 01神經網路和深度學習-Planar data classification with one hidden layer v3-第三週程式設計作業神經網路深度學習程式設計
- 01神經網路和深度學習-Python-Basics-With-Numpy-第二週程式設計作業1神經網路深度學習Python程式設計
- 吳恩達《神經網路與深度學習》課程筆記(5)– 深層神經網路吳恩達神經網路深度學習筆記
- Coursera Deep Learning 2 改善深層神經網路:超引數除錯、正則化以及優化 第二週習題神經網路除錯優化
- (三)神經網路入門之隱藏層設計神經網路
- 01神經網路和深度學習-Logistic-Regression-with-a-Neural-Network-mindset-第二週程式設計作業2神經網路深度學習程式設計
- 三、淺層神經網路神經網路
- Java中神經網路Triton GPU程式設計Java神經網路GPU程式設計
- MXNET:多層神經網路神經網路
- 神經網路入門篇之深層神經網路:詳解前向傳播和反向傳播(Forward and backward propagation)神經網路反向傳播Forward
- 案例剖析:利用LSTM深層神經網路進行時間序列預測神經網路
- 何為神經網路卷積層?神經網路卷積
- 神經網路中間層輸出神經網路
- 設計卷積神經網路CNN為什麼不是程式設計?卷積神經網路CNN程式設計
- 卷積神經網路-3卷積神經網路
- 嵌入式系統程式設計基礎第一二週作業程式設計
- 第五週:迴圈神經網路神經網路
- 【deeplearning.ai-神經網路和深度學習】第一週答案AI神經網路深度學習
- 0603-常用的神經網路層神經網路
- 3、基於Python建立任意層數的深度神經網路Python神經網路
- 構建兩層以上BP神經網路(python程式碼)神經網路Python
- 卷積神經網路:卷積層和池化層卷積神經網路
- 神經網路篇——從程式碼出發理解BP神經網路神經網路
- Matlab程式設計之——卷積神經網路CNN程式碼解析Matlab程式設計卷積神經網路CNN
- 神經網路程式設計入門【轉】神經網路程式設計
- (五)神經網路入門之構建多層網路神經網路
- 自適應模糊神經網路的設計神經網路
- 計算機潛意識- 單層神經網路(感知器)計算機神經網路