Caffe:Layer的相關學習

查志強發表於2015-08-07

【原文:https://yufeigan.github.io/2014/12/09/Caffe%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B03-Layer%E7%9A%84%E7%9B%B8%E5%85%B3%E5%AD%A6%E4%B9%A0/

Layer

Layer是所有層的基類,在Layer的基礎上衍生出來的有5種Layers:

  • data_layer
  • neuron_layer
  • loss_layer
  • common_layer
  • vision_layer

它們都有對應的[.hpp .cpp]檔案宣告和實現了各個類的介面。下面一個一個地講這5個Layer。

data_layer

先看data_layer.hpp中標頭檔案呼叫情況:

1
2
3
4
5
6
7
8
9
10
11
12
#include "boost/scoped_ptr.hpp"
#include "hdf5.h"
#include "leveldb/db.h"
#include "lmdb.h"
//前4個都是資料格式有關的檔案
#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/data_transformer.hpp"
#include "caffe/filler.hpp"
#include "caffe/internal_thread.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"

不難看出data_layer主要包含與資料有關的檔案。在官方文件中指出data是caffe資料的入口是網路的最低層,並且支援多種格式,在這之中又有5種LayerType:

  • DATA
  • MEMORY_DATA
  • HDF5_DATA
  • HDF5_OUTPUT
  • IMAGE_DATA

其實還有兩種WINDOW_DATADUMMY_DATA用於測試和預留的介面,這裡暫時不管。

DATA

1
2
3
4
5
6
template <typename Dtype>
class BaseDataLayer : public Layer<Dtype>
template <typename Dtype>
class BasePrefetchingDataLayer : public BaseDataLayer<Dtype>, public InternalThread
template <typename Dtype>
class DataLayer : public BasePrefetchingDataLayer<Dtype>

用於LevelDB或LMDB資料格式的輸入的型別,輸入引數有sourcebatch_size, (rand_skip), (backend)。後兩個是可選。

MEMORY_DATA

1
2
template <typename Dtype>
class MemoryDataLayer : public BaseDataLayer<Dtype>

這種型別可以直接從記憶體讀取資料使用時需要呼叫MemoryDataLayer::Reset,輸入引數有batch_size,channelsheightwidth

HDF5_DATA

1
2
template <typename Dtype>
class HDF5DataLayer : public Layer<Dtype>

HDF5資料格式輸入的型別,輸入引數有sourcebatch_size

HDF5_OUTPUT

1
2
template <typename Dtype>
class HDF5OutputLayer : public Layer<Dtype>

HDF5資料格式輸出的型別,輸入引數有file_name

IMAGE_DATA

1
2
template <typename Dtype>
class ImageDataLayer : public BasePrefetchingDataLayer<Dtype>

影象格式資料輸入的型別,輸入引數有sourcebatch_size, (rand_skip), (shuffle), (new_height), (new_width)。

neuron_layer

先看neuron_layer.hpp中標頭檔案呼叫情況

1
2
3
4
#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"

同樣是資料的操作層,neuron_layer實現裡大量啟用函式,主要是元素級別的操作,具有相同的bottom,topsize。
Caffe中實現了大量啟用函式GPU和CPU的都有很多。它們的父類都是NeuronLayer

1
2
template <typename Dtype>
class NeuronLayer : public Layer<Dtype>

這部分目前沒什麼需要深究的地方值得注意的是一般的引數設定格式如下(以ReLU為例):

1
2
3
4
5
6
layers {
name: "relu1"
type: RELU
bottom: "conv1"
top: "conv1"
}

loss_layer

Loss層計算網路誤差,loss_layer.hpp標頭檔案呼叫情況:

1
2
3
4
5
#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/layer.hpp"
#include "caffe/neuron_layers.hpp"
#include "caffe/proto/caffe.pb.h"

可以看見呼叫了neuron_layers.hpp,估計是需要呼叫裡面的函式計算Loss,一般來說Loss放在最後一層。caffe實現了大量loss function,它們的父類都是LossLayer

1
2
template <typename Dtype>
class LossLayer : public Layer<Dtype>

common_layer

先看common_layer.hpp標頭檔案呼叫:

1
2
3
4
5
6
7
#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/data_layers.hpp"
#include "caffe/layer.hpp"
#include "caffe/loss_layers.hpp"
#include "caffe/neuron_layers.hpp"
#include "caffe/proto/caffe.pb.h"

用到了前面提到的data_layers.hpploss_layers.hppneuron_layers.hpp說明這一層肯定開始有複雜的操作了。
這一層主要進行的是vision_layer的連線
宣告瞭9個型別的common_layer,部分有GPU實現:

  • InnerProductLayer
  • SplitLayer
  • FlattenLayer
  • ConcatLayer
  • SilenceLayer
  • (Elementwise Operations) 這裡面是我們常說的啟用函式層Activation Layers。
    • EltwiseLayer
    • SoftmaxLayer
    • ArgMaxLayer
    • MVNLayer

InnerProductLayer

常常用來作為全連線層,設定格式為:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
layers {
name: "fc8"
type: INNER_PRODUCT
 blobs_lr: 1 # learning rate multiplier for the filters
 blobs_lr: 2 # learning rate multiplier for the biases
 weight_decay: 1 # weight decay mu
weight_decay: 0 # weight decay multiplier for the biases
inner_product_param {
num_output: 1000
 weight_filler {
type: "gaussian"
 std: 0.01
 }
 bias_filler {
type: "constant"
 value: 0
 }
}
bottom: "fc7"
 top: "fc8
}

SplitLayer

用於一輸入對多輸出的場合(對blob)

FlattenLayer

將n * c * h * w變成向量的格式n * ( c * h * w ) * 1 * 1

ConcatLayer

用於多輸入一輸出的場合。

1
2
3
4
5
6
7
8
9
10
layers {
name: "concat"
bottom: "in1"
bottom: "in2"
top: "out"
type: CONCAT
concat_param {
concat_dim: 1
}
}

SilenceLayer

用於一輸入對多輸出的場合(對layer)

(Elementwise Operations)

EltwiseLayerSoftmaxLayerArgMaxLayerMVNLayer

vision_layer

標頭檔案包含前面所有檔案,也就是說包含了最複雜的操作。

1
2
3
4
5
6
7
8
#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/common_layers.hpp"
#include "caffe/data_layers.hpp"
#include "caffe/layer.hpp"
#include "caffe/loss_layers.hpp"
#include "caffe/neuron_layers.hpp"
#include "caffe/proto/caffe.pb.h"

它主要是實現Convolution和Pooling操作。主要有以下幾個類。

1
2
3
4
5
6
7
8
template <typename Dtype>
class ConvolutionLayer : public Layer<Dtype>
template <typename Dtype>
class Im2colLayer : public Layer<Dtype>
template <typename Dtype>
class LRNLayer : public Layer<Dtype>
template <typename Dtype>
class PoolingLayer : public Layer<Dtype>

ConvolutionLayer

最常用的卷積操作,設定格式如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
layers {
name: "conv1"
type: CONVOLUTION
 bottom: "data"
 top: "conv1"
 blobs_lr: 1 # learning rate multiplier for the filters
 blobs_lr: 2 # learning rate multiplier for the biases
 weight_decay: 1 # weight decay multiplier for the filters
 weight_decay: 0 # weight decay multiplier for the biases
convolution_param {
 num_output: 96 # learn 96 filters
 kernel_size: 11 # each filter is 11x11
 stride: 4 # step 4 pixels between each filter application
weight_filler {
 type: "gaussian" # initialize the filters from a Gaussian
std: 0.01 # distribution with stdev 0.01 (default mean: 0)
}
 bias_filler {
 type: "constant" # initialize the biases to zero (0)
 value: 0
}
}
}

Im2colLayer

與MATLAB裡面的im2col類似,即image-to-column transformation,轉換後方便卷積計算

LRNLayer

全稱local response normalization layer,在Hinton論文中有詳細介紹ImageNet Classification with Deep Convolutional Neural Networks 

PoolingLayer

即Pooling操作,格式:

1
2
3
4
5
6
7
8
9
10
11
layers {
name: "pool1"
type: POOLING
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3 # pool over a 3x3 region
stride: 2 # step two pixels (in the bottom blob) between pooling regions
}
}

相關文章