深度學習筆記：CNN經典論文研讀之AlexNet及其Tensorflow實現

dicksonjyl560101發表於2018-07-28

原文網址 : http://blog.itpub.net/29829936/viewspace-2168560/

在 Yann Lecun 提出 Le-Net5 之後的十幾年內，由於神經網路本身較差的可解釋性以及受限於計算能力的影響，神經網路發展緩慢且在較長一段時間內處於低谷。2012年，深度學習三巨頭之一、具有神經網路之父之稱的 Geoffrey Hinton 的學生 Alex Krizhevsky 率先提出了 AlexNet，並在當年度的 ILSVRC（ImageNet大規模視覺挑戰賽）以顯著的優勢獲得當屆冠軍，top-5 的錯誤率降至了 16.4%，相比於第二名 26.2% 的錯誤率有了極大的提升。這一成績引起了學界和業界的極大關注，計算機視覺也開始逐漸進入深度學習主導的時代。

AlexNet 繼承了 LeCun 的 Le-Net5 思想，將卷積神經網路的發展到很寬很深的網路當中，相較於 Le-Net5 的六萬個引數，AlexNet 包含了 6 億三千萬條連線，6000 萬個引數和 65 萬個神經元，其網路結構包括 5 層卷積，其中第一、第二和第五層卷積後面連線了最大池化層，然後是 3 個全連線層。AlexNet 的創新點在於：

首次成功使用 relu 作為啟用函式，使其在較深的網路上效果超過傳統的 sigmoid 啟用函式，極大的緩解了梯度消失問題。
首次在實踐中發揮了 dropout 的作用，為全連線層新增 dropout 防止過擬合。
相較於之前 Le-Net5 中採用的平均池化，AlexNet 首次採用了重疊的最大池化，避免了平均池化的模糊化效果。
提出了 LRN 層，對區域性神經元的活動建立了競爭機制。
使用多 GPU 進行平行計算。
採用了一定的資料增強手段，一定程度上也緩解了過擬合。

AlexNet 網路結構

以上是 AlexNet 的基本介紹和創新點，下面我們看一下 AlexNet 的網路架構。

AlexNet 不算池化層總共有 8 層，前 5 層為卷積層，其中第一、第二和第五層卷積都包含了一個最大池化層，後三層為全連線層。所以 AlexNet 的簡略結構如下：
輸入>卷積>池化>卷積>池化>卷積>卷積>卷積>池化>全連線>全連線>全連線>輸出

各層的結構和引數如下：
C1層是個卷積層，其輸入輸出結構如下：
輸入： 227 x 227 x 3 濾波器大小： 11 x 11 x 3 濾波器個數：96
輸出： 55 x 55 x 96

P1層是C1後面的池化層，其輸入輸出結構如下：
輸入： 55 x 55 x 96 濾波器大小： 3 x 3 濾波器個數：96
輸出： 27 x 27 x 96

C2層是個卷積層，其輸入輸出結構如下：
輸入： 27 x 27 x 96 濾波器大小： 5 x 5 x 96 濾波器個數：256
輸出： 27 x 27 x 256

P2層是C2後面的池化層，其輸入輸出結構如下：
輸入： 27 x 27 x 256 濾波器大小： 3 x 3 濾波器個數：256
輸出： 13 x 13 x 256

C3層是個卷積層，其輸入輸出結構如下：
輸入： 13 x 13 x 256 濾波器大小： 3 x 3 x 256 濾波器個數：384
輸出： 13 x 13 x 384

C4層是個卷積層，其輸入輸出結構如下：
輸入： 13 x 13 x 384 濾波器大小： 3 x 3 x 384 濾波器個數：384
輸出： 13 x 13 x 384

C5層是個卷積層，其輸入輸出結構如下：
輸入： 13 x 13 x 384 濾波器大小： 3 x 3 x 384 濾波器個數：256
輸出： 13 x 13 x 256

P5層是C5後面的池化層，其輸入輸出結構如下：
輸入： 13 x 13 x 256 濾波器大小： 3 x 3 濾波器個數：256
輸出： 6 x 6 x 256

F6層是個全連線層，其輸入輸出結構如下：
輸入：6 x 6 x 256
輸出：4096

F7層是個全連線層，其輸入輸出結構如下：
輸入：4096
輸出：4096

F8層也是個全連線層，即輸出層，其輸入輸出結構如下：
輸入：4096
輸出：1000

在論文中，輸入影像大小為 224 x 224 x 3，實際為 227 x 227 x 3。各層輸出採用 relu 進行啟用。前五層卷積雖然計算量極大，但引數量並不如後三層的全連線層多，但前五層卷積層的作用卻要比全連線層重要許多。

AlexNet 在驗證集和測試集上的分類錯誤率表現：

AlexNet 的 tensorflow 實現

我們繼續秉持前面關於利用 tensorflow 構建卷積神經網路的基本步驟和方法：定義建立輸入輸出的佔位符變數模組、初始化各層引數模組、建立前向傳播模組、定義模型優化迭代模型，以及在最後設定輸入資料。

定義卷積過程

def conv(x, filter_height, filter_width, num_filters, stride_y, stride_x, name,
         padding='SAME', groups=1):    
    # Get number of input channels
    input_channels = int(x.get_shape()[-1])    
    # Create lambda function for the convolution
    convolve = lambda i, k: tf.nn.conv2d(i, k,
                                         strides=[1, stride_y, stride_x, 1],
                                         padding=padding)    
    with tf.variable_scope(name) as scope:        
        # Create tf variables for the weights and biases of the conv layer
        weights = tf.get_variable('weights', shape=[filter_height,
                                                    filter_width,
                                                    input_channels/groups,
                                                    num_filters])
        biases = tf.get_variable('biases', shape=[num_filters]) 
           
    if groups == 1:
        conv = convolve(x, weights)    
        # In the cases of multiple groups, split inputs & weights and
    else:        
        # Split input and weights and convolve them separately
        input_groups = tf.split(axis=3, num_or_size_splits=groups, value=x)
        weight_groups = tf.split(axis=3, num_or_size_splits=groups,
                                 value=weights)
        output_groups = [convolve(i, k) for i, k in zip(input_groups, weight_groups)]        
        # Concat the convolved output together again
        conv = tf.concat(axis=3, values=output_groups)    
    # Add biases
    bias = tf.reshape(tf.nn.bias_add(conv, biases), tf.shape(conv))   
    # Apply relu function
    relu_result = tf.nn.relu(bias, name=scope.name)    
        
    return relu_result

定義全連線層

def fc(x, num_in, num_out, name, relu=True):  
    with tf.variable_scope(name) as scope:      
      # Create tf variables for the weights and biases
      weights = tf.get_variable('weights', shape=[num_in, num_out],
                                trainable=True)
      biases = tf.get_variable('biases', [num_out], trainable=True)      
      # Matrix multiply weights and inputs and add bias
      act = tf.nn.xw_plus_b(x, weights, biases, name=scope.name)  
      if relu:     
          relu = tf.nn.relu(act)      
          return relu  
      else:      
          return act

定義最大池化過程

def max_pool(x, filter_height, filter_width, stride_y, stride_x, name,
           padding='SAME'):  
    return tf.nn.max_pool(x, ksize=[1, filter_height, filter_width, 1],
                        strides=[1, stride_y, stride_x, 1],
                        padding=padding, name=name)

定義 LRN

def lrn(x, radius, alpha, beta, name, bias=1.0):  
    return tf.nn.local_response_normalization(x, depth_radius=radius,
                                            alpha=alpha, beta=beta,
                                            bias=bias, name=name)

定義 dropout 操作

def dropout(x, keep_prob):
return tf.nn.dropout(x,keep_prob)

以上關於搭建 AlexNet 的各個元件我們都已準備好，下面我們利用這些組建建立一個 AlexNet 類來實現 AlexNet。

class AlexNet(object):
    def __init__(self, x, keep_prob, num_classes, skip_layer,
                 weights_path='DEFAULT'):
        # Parse input arguments into class variables
        self.X = x
        self.NUM_CLASSES = num_classes
        self.KEEP_PROB = keep_prob
        self.SKIP_LAYER = skip_layer
        if weights_path == 'DEFAULT':
            self.WEIGHTS_PATH = 'bvlc_alexnet.npy'
        else:
            self.WEIGHTS_PATH = weights_path
        # Call the create function to build the computational graph of AlexNet
        self.create()
    def create(self):
        # 1st Layer: Conv (w ReLu) -> Lrn -> Pool
        conv1 = conv(self.X, 11, 11, 96, 4, 4, padding='VALID', name='conv1')
        norm1 = lrn(conv1, 2, 1e-04, 0.75, name='norm1')
        pool1 = max_pool(norm1, 3, 3, 2, 2, padding='VALID', name='pool1')
        # 2nd Layer: Conv (w ReLu)  -> Lrn -> Pool with 2 groups
        conv2 = conv(pool1, 5, 5, 256, 1, 1, groups=2, name='conv2')
        norm2 = lrn(conv2, 2, 1e-04, 0.75, name='norm2')
        pool2 = max_pool(norm2, 3, 3, 2, 2, padding='VALID', name='pool2')
        # 3rd Layer: Conv (w ReLu)
        conv3 = conv(pool2, 3, 3, 384, 1, 1, name='conv3')
        # 4th Layer: Conv (w ReLu) splitted into two groups
        conv4 = conv(conv3, 3, 3, 384, 1, 1, groups=2, name='conv4')
        # 5th Layer: Conv (w ReLu) -> Pool splitted into two groups
        conv5 = conv(conv4, 3, 3, 256, 1, 1, groups=2, name='conv5')
        pool5 = max_pool(conv5, 3, 3, 2, 2, padding='VALID', name='pool5')
        # 6th Layer: Flatten -> FC (w ReLu) -> Dropout
        flattened = tf.reshape(pool5, [-1, 6*6*256])
        fc6 = fc(flattened, 6*6*256, 4096, name='fc6')
        dropout6 = dropout(fc6, self.KEEP_PROB)
        # 7th Layer: FC (w ReLu) -> Dropout
        fc7 = fc(dropout6, 4096, 4096, name='fc7')
        dropout7 = dropout(fc7, self.KEEP_PROB)
        # 8th Layer: FC and return unscaled activations
        self.fc8 = fc(dropout7, 4096, self.NUM_CLASSES, relu=False, name='fc8')
    def load_initial_weights(self, session):
        # Load the weights into memory
        weights_dict = np.load(self.WEIGHTS_PATH, encoding='bytes').item()
        # Loop over all layer names stored in the weights dict
        for op_name in weights_dict:
            # Check if layer should be trained from scratch
            if op_name not in self.SKIP_LAYER:
                with tf.variable_scope(op_name, reuse=True):
                    # Assign weights/biases to their corresponding tf variable
                    for data in weights_dict[op_name]:
                        # Biases
                        if len(data.shape) == 1:
                            var = tf.get_variable('biases', trainable=False)
                            session.run(var.assign(data))
                        # Weights
                        else:
                            var = tf.get_variable('weights', trainable=False)
                            session.run(var.assign(data))

在上述程式碼中，我們利用了之前定義的各個元件封裝了前向計算過程，從http://www.cs.toronto.edu/~guerzhoy/tf_alexnet/上匯入了預訓練好的模型權重。這樣一來，我們就將 AlexNet 基本搭建好了。

【本文轉載自微信公眾號：資料科學家養成記，ID：louwill12，作者： louwill，往期文章可自行查閱微信公眾號】

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/31077337/viewspace-2158712/，如需轉載，請註明出處，否則將追究法律責任。

深度學習經典卷積神經網路之AlexNet
2020-04-06
深度學習卷積神經網路
小白的經典CNN復現（三）：AlexNet
2021-02-07
CNN
【深度學習論文篇 01-1 】AlexNet論文翻譯
2022-04-05
深度學習
目標檢測（5）：手撕 CNN 經典網路之 AlexNet（理論篇）
2022-02-10
CNN
深度學習與圖神經網路學習分享：CNN 經典網路之-ResNet
2022-10-12
深度學習神經網路CNN
AlexNet論文解讀
2024-06-12
PointCloud及其經典論文介紹
2019-07-27
Cloud
小白經典CNN論文復現系列（一）：LeNet1989
2020-12-29
CNN
大彙總 | 一文學會八篇經典CNN論文
2020-07-24
CNN
深度學習筆記8：利用Tensorflow搭建神經網路
2021-09-09
深度學習筆記神經網路
經典的CNN模型架構-LeNet、AlexNet、VGG、GoogleLeNet、ResNet
2020-05-06
CNN模型架構Go
深度學習論文翻譯解析（十二）：Fast R-CNN
2020-08-04
深度學習ASTCNN
網路表述學習經典論文——DeepWalk
2019-04-19
吳恩達深度學習筆記（deeplearning.ai）之卷積神經網路（CNN）（上）
2020-10-12
吳恩達深度學習筆記AI卷積神經網路CNN
行業專家分享：深度學習筆記之Tensorflow入門！
2018-07-23
行業深度學習筆記
【生成對抗網路學習其三】BiGAN論文閱讀筆記及其原理理解
2022-06-12
筆記
深度學習論文閱讀路線圖
2018-08-06
深度學習
《深度學習之TensorFlow》pdf
2019-12-17
深度學習
深度學習之Tensorflow框架
2019-02-20
深度學習框架
TensorFlow 學習筆記
2024-10-11
筆記
tensorflow 學習筆記使用CNN做英文文字分類任務
2020-10-24
筆記CNN文字分類
想輕鬆復現深度強化學習論文？看這篇經驗之談
2018-04-10
強化學習
【深度學習論文篇 02-1 】YOLOv1論文精讀
2022-04-14
深度學習YOLOv1
TensorFlow上實現卷積神經網路CNN
2020-04-06
卷積神經網路CNN
推薦閱讀《Tensorflow：實戰Google深度學習框架》
2019-12-17
Go深度學習框架
TensorFlow學習筆記（二）
2019-04-11
筆記
tensorflow學習筆記3
2018-11-26
筆記
tensorflow學習筆記——DenseNet
2020-12-12
筆記SENet
Raft論文讀書筆記
2018-07-11
Raft筆記
MapReduce 論文閱讀筆記
2020-06-24
筆記
9/12讀論文筆記
2024-09-12
筆記
AutoEmbedding論文閱讀筆記
2023-03-29
筆記
Python深度學習入門之mnist-inception（Tensorflow2.0實現）
2020-11-15
Python深度學習
深度學習面試100題（第1-5題）：經典常考點CNN
2018-07-06
深度學習面試CNN
AlexNet論文總結
2021-08-23
ICML 2017大熱論文：Wasserstein GAN | 經典論文復現
2018-10-31
深度學習相關論文
2020-04-06
深度學習
COLING 2018 最佳論文解讀：序列標註經典模型復現
2018-07-02
模型

深度學習筆記：CNN經典論文研讀之AlexNet及其Tensorflow實現

相關文章