程式碼來源:https://github.com/eriklindernoren/ML-From-Scratch
卷積神經網路中卷積層Conv2D(帶stride、padding)的具體實現:https://www.cnblogs.com/xiximayou/p/12706576.html
啟用函式的實現(sigmoid、softmax、tanh、relu、leakyrelu、elu、selu、softplus):https://www.cnblogs.com/xiximayou/p/12713081.html
損失函式定義(均方誤差、交叉熵損失):https://www.cnblogs.com/xiximayou/p/12713198.html
優化器的實現(SGD、Nesterov、Adagrad、Adadelta、RMSprop、Adam):https://www.cnblogs.com/xiximayou/p/12713594.html
本節將根據程式碼繼續學習卷積層的反向傳播過程。
這裡就只貼出Conv2D前向傳播和反向傳播的程式碼了:
def forward_pass(self, X, training=True): batch_size, channels, height, width = X.shape self.layer_input = X # Turn image shape into column shape # (enables dot product between input and weights) self.X_col = image_to_column(X, self.filter_shape, stride=self.stride, output_shape=self.padding) # Turn weights into column shape self.W_col = self.W.reshape((self.n_filters, -1)) # Calculate output output = self.W_col.dot(self.X_col) + self.w0 # Reshape into (n_filters, out_height, out_width, batch_size) output = output.reshape(self.output_shape() + (batch_size, )) # Redistribute axises so that batch size comes first return output.transpose(3,0,1,2) def backward_pass(self, accum_grad): # Reshape accumulated gradient into column shape accum_grad = accum_grad.transpose(1, 2, 3, 0).reshape(self.n_filters, -1) if self.trainable: # Take dot product between column shaped accum. gradient and column shape # layer input to determine the gradient at the layer with respect to layer weights grad_w = accum_grad.dot(self.X_col.T).reshape(self.W.shape) # The gradient with respect to bias terms is the sum similarly to in Dense layer grad_w0 = np.sum(accum_grad, axis=1, keepdims=True) # Update the layers weights self.W = self.W_opt.update(self.W, grad_w) self.w0 = self.w0_opt.update(self.w0, grad_w0) # Recalculate the gradient which will be propogated back to prev. layer accum_grad = self.W_col.T.dot(accum_grad) # Reshape from column shape to image shape accum_grad = column_to_image(accum_grad, self.layer_input.shape, self.filter_shape, stride=self.stride, output_shape=self.padding) return accum_grad
而在定義卷積神經網路中是在neural_network.py中
def train_on_batch(self, X, y): """ Single gradient update over one batch of samples """ y_pred = self._forward_pass(X) loss = np.mean(self.loss_function.loss(y, y_pred)) acc = self.loss_function.acc(y, y_pred) # Calculate the gradient of the loss function wrt y_pred loss_grad = self.loss_function.gradient(y, y_pred) # Backpropagate. Update weights self._backward_pass(loss_grad=loss_grad) return loss, acc
還需要看一下self._forward_pas和self._backward_pass:
def _forward_pass(self, X, training=True): """ Calculate the output of the NN """ layer_output = X for layer in self.layers: layer_output = layer.forward_pass(layer_output, training) return layer_output def _backward_pass(self, loss_grad): """ Propagate the gradient 'backwards' and update the weights in each layer """ for layer in reversed(self.layers): loss_grad = layer.backward_pass(loss_grad)
我們可以看到,在前向傳播中會計算出self.layers中每一層的輸出,把包括卷積、池化、啟用和歸一化等。然後在反向傳播中從後往前更新每一層的梯度。這裡我們以一個卷積層+全連線層+損失函式為例。網路前向傳播完之後,最先獲得的梯度是損失函式的梯度。然後將損失函式的梯度傳入到全連線層,然後獲得全連線層計算的梯度,傳入到卷積層中,此時呼叫卷積層的backward_pass()方法。在卷積層中的backward_pass()方法中,如果設定了self.trainable,那麼會計算出對權重W以及偏置項w0的梯度,然後使用優化器optmizer,也就是W_opt和w0_opt進行引數的更新,然後再計算對前一層的梯度。最後有一個colun_to_image()方法。
def column_to_image(cols, images_shape, filter_shape, stride, output_shape='same'): batch_size, channels, height, width = images_shape pad_h, pad_w = determine_padding(filter_shape, output_shape) height_padded = height + np.sum(pad_h) width_padded = width + np.sum(pad_w) images_padded = np.empty((batch_size, channels, height_padded, width_padded)) # Calculate the indices where the dot products are applied between weights # and the image k, i, j = get_im2col_indices(images_shape, filter_shape, (pad_h, pad_w), stride) cols = cols.reshape(channels * np.prod(filter_shape), -1, batch_size) cols = cols.transpose(2, 0, 1) # Add column content to the images at the indices np.add.at(images_padded, (slice(None), k, i, j), cols) # Return image without padding return images_padded[:, :, pad_h[0]:height+pad_h[0], pad_w[0]:width+pad_w[0]]
該方法是將之間為了方便計算卷積進行的形狀改變image_to_column()重新恢復成images_padded的格式。
像這種計算期間的各種的形狀的變換就挺讓人頭疼的,還會碰到numpy中各式各樣的函式,需要去查閱相關的資料。只要弄懂其中大致過程就可以了,加深相關知識的理解。