【面向程式碼】學習 Deep Learning(三)Convolution Neural Network(CNN)

Dark_Scope發表於2013-07-26

==========================================================================================

最近一直在看Deep Learning,各類部落格、論文看得不少

但是說實話,這樣做有些疏於實現,一來呢自己的電腦也不是很好,二來呢我目前也沒能力自己去寫一個toolbox

只是跟著Andrew Ng的UFLDL tutorial 寫了些已有框架的程式碼(這部分的程式碼見github)

後來發現了一個matlab的Deep Learning的toolbox,發現其程式碼很簡單,感覺比較適合用來學習演算法

再一個就是matlab的實現可以省略掉很多資料結構的程式碼,使演算法思路非常清晰

所以我想在解讀這個toolbox的程式碼的同時來鞏固自己學到的,同時也為下一步的實踐打好基礎

(本文只是從程式碼的角度解讀演算法,具體的演算法理論步驟還是需要去看paper的

我會在文中給出一些相關的paper的名字,本文旨在梳理一下演算法過程,不會深究演算法原理和公式)

==========================================================================================

使用的程式碼:DeepLearnToolbox  ,下載地址:點選開啟,感謝該toolbox的作者

==========================================================================================

今天是CNN的內容啦,CNN講起來有些糾結,你可以事先看看convolutionpooling(subsampling),還有這篇:tornadomeet的博文

下面是那張經典的圖:

======================================================================================================

開啟\tests\test_example_CNN.m一觀


cnn.layers = {
    struct('type', 'i') %input layer
    struct('type', 'c', 'outputmaps', 6, 'kernelsize', 5) %convolution layer
    struct('type', 's', 'scale', 2) %sub sampling layer
    struct('type', 'c', 'outputmaps', 12, 'kernelsize', 5) %convolution layer
    struct('type', 's', 'scale', 2) %subsampling layer
};
cnn = cnnsetup(cnn, train_x, train_y);        //here!!!
opts.alpha = 1;
opts.batchsize = 50;
opts.numepochs = 1;
cnn = cnntrain(cnn, train_x, train_y, opts);  //here!!!

似乎這次要複雜了一些啊,首先是layer,有三種,i是input,c是convolution,s是subsampling

'c'的outputmaps是convolution之後有多少張圖,比如上(最上那張經典的))第一層convolution之後就有六個特徵圖

'c'的kernelsize 其實就是用來convolution的patch是多大

's'的scale就是pooling的size為scale*scale的區域

接下來似乎就是常規思路了,cnnsetup()和cnntrain()啦,我們來看程式碼

\CNN\cnnsetup.m

主要是一些引數的作用的解釋,詳細的參看程式碼裡的註釋啊
function net = cnnsetup(net, x, y)
    inputmaps = 1;
    mapsize = size(squeeze(x(:, :, 1)));
    //尤其注意這幾個迴圈的引數的設定
    //numel(net.layers)  表示有多少層
    for l = 1 : numel(net.layers)   //  layer
        if strcmp(net.layers{l}.type, 's')
            mapsize = mapsize / net.layers{l}.scale;
            //subsampling層的mapsize,最開始mapsize是每張圖的大小28*28(這是第一次卷積後的結果,卷積前是32*32)
            //這裡除以scale,就是pooling之後圖的大小,這裡為14*14
            assert(all(floor(mapsize)==mapsize), ['Layer ' num2str(l) ' size must be integer. Actual: ' num2str(mapsize)]);
            for j = 1 : inputmaps //inputmap就是上一層有多少張特徵圖,通過初始化為1然後依層更新得到
                net.layers{l}.b{j} = 0;
            end
        end
        if strcmp(net.layers{l}.type, 'c')
            mapsize = mapsize - net.layers{l}.kernelsize + 1;
            //這裡的mapsize可以參見UFLDL裡面的那張圖下面的解釋
            fan_out = net.layers{l}.outputmaps * net.layers{l}.kernelsize ^ 2;
            //隱藏層的大小,是一個(後層特徵圖數量)*(用來卷積的patch圖的大小)
            for j = 1 : net.layers{l}.outputmaps  //  output map
                fan_in = inputmaps * net.layers{l}.kernelsize ^ 2;
                //對於每一個後層特徵圖,有多少個引數鏈到前層
                for i = 1 : inputmaps  //  input map
                    net.layers{l}.k{i}{j} = (rand(net.layers{l}.kernelsize) - 0.5) * 2 * sqrt(6 / (fan_in + fan_out));
                end
                net.layers{l}.b{j} = 0;
            end
            inputmaps = net.layers{l}.outputmaps;
        end
    end
    // 'onum' is the number of labels, that's why it is calculated using size(y, 1). If you have 20 labels so the output of the network will be 20 neurons.
    // 'fvnum' is the number of output neurons at the last layer, the layer just before the output layer.
    // 'ffb' is the biases of the output neurons.
    // 'ffW' is the weights between the last layer and the output neurons. Note that the last layer is fully connected to the output layer, that's why the size of the weights is (onum * fvnum)
    fvnum = prod(mapsize) * inputmaps;
    onum = size(y, 1);
    //這裡是最後一層神經網路的設定
    net.ffb = zeros(onum, 1);
    net.ffW = (rand(onum, fvnum) - 0.5) * 2 * sqrt(6 / (onum + fvnum));
end

\CNN\cnntrain.m

 cnntrain就和nntrain是一個節奏了:

            net = cnnff(net, batch_x);
            net = cnnbp(net, batch_y);
            net = cnnapplygrads(net, opts);

cnntrain是用back propagation來計算gradient的,我們一次來看這三個函式:

cnnff.m

這部分計算還比較簡單,可以說是有跡可循,大家最好看看tornadomeet的博文的步驟,說得比較清楚

function net = cnnff(net, x)
    n = numel(net.layers);
    net.layers{1}.a{1} = x;
    inputmaps = 1;

    for l = 2 : n   //  for each layer
        if strcmp(net.layers{l}.type, 'c')
            //  !!below can probably be handled by insane matrix operations
            for j = 1 : net.layers{l}.outputmaps   //  for each output map
                //  create temp output map
                z = zeros(size(net.layers{l - 1}.a{1}) - [net.layers{l}.kernelsize - 1 net.layers{l}.kernelsize - 1 0]);
                for i = 1 : inputmaps   //  for each input map
                    //  convolve with corresponding kernel and add to temp output map
                    //  做卷積,參考UFLDL,這裡是對每一個input的特徵圖做一次卷積,再加起來
                    z = z + convn(net.layers{l - 1}.a{i}, net.layers{l}.k{i}{j}, 'valid');
                end
                //  add bias, pass through nonlinearity
                //  加入bias
                net.layers{l}.a{j} = sigm(z + net.layers{l}.b{j});
            end
            //  set number of input maps to this layers number of outputmaps
            inputmaps = net.layers{l}.outputmaps;
        elseif strcmp(net.layers{l}.type, 's')
            //  downsample
            for j = 1 : inputmaps
                //這裡有點繞繞的,它是新建了一個patch來做卷積,但我們要的是pooling,所以它跳著把結果讀出來,步長為scale
                //這裡做的是mean-pooling
                z = convn(net.layers{l - 1}.a{j}, ones(net.layers{l}.scale) / (net.layers{l}.scale ^ 2), 'valid');   //  !! replace with variable
                net.layers{l}.a{j} = z(1 : net.layers{l}.scale : end, 1 : net.layers{l}.scale : end, :);
            end
        end
    end
    //  收納到一個vector裡面,方便後面用~~
    //  concatenate all end layer feature maps into vector
    net.fv = [];
    for j = 1 : numel(net.layers{n}.a)
        sa = size(net.layers{n}.a{j});
        net.fv = [net.fv; reshape(net.layers{n}.a{j}, sa(1) * sa(2), sa(3))];
    end
    //  最後一層的perceptrons,資料識別的結果
    net.o = sigm(net.ffW * net.fv + repmat(net.ffb, 1, size(net.fv, 2)));

end


cnnbp.m

這個就哭了,程式碼有些糾結,不得已又找資料看啊,《Notes on Convolutional Neural Networks》要好一些

只是這個toolbox的程式碼和《Notes on Convolutional Neural Networks》裡有些不一樣的是這個toolbox在subsampling(也就是pooling層)沒有加sigmoid啟用函式,只是單純地pooling了一下,所以這地方還需仔細辨別,這個toolbox裡的subsampling是不用計算gradient的,而在Notes裡是計算了的

還有這個toolbox沒有Combinations of Feature Maps,也就是tornadomeet的博文裡這張表格:

具體就去看看上面這篇論文吧


然後就看程式碼吧:

function net = cnnbp(net, y)
    n = numel(net.layers);
    //  error
    net.e = net.o - y;
    //  loss function
    net.L = 1/2* sum(net.e(:) .^ 2) / size(net.e, 2);
    //從最後一層的error倒推回來deltas
    //和神經網路的bp有些類似
    ////  backprop deltas
    net.od = net.e .* (net.o .* (1 - net.o));   //  output delta
    net.fvd = (net.ffW' * net.od);              //  feature vector delta
    if strcmp(net.layers{n}.type, 'c')         //  only conv layers has sigm function
        net.fvd = net.fvd .* (net.fv .* (1 - net.fv));
    end
    //和神經網路類似,參看神經網路的bp

    //  reshape feature vector deltas into output map style
    sa = size(net.layers{n}.a{1});
    fvnum = sa(1) * sa(2);
    for j = 1 : numel(net.layers{n}.a)
        net.layers{n}.d{j} = reshape(net.fvd(((j - 1) * fvnum + 1) : j * fvnum, :), sa(1), sa(2), sa(3));
    end
    //這是算delta的步驟
    //這部分的計算參看Notes on Convolutional Neural Networks,其中的變化有些複雜
    //和這篇文章裡稍微有些不一樣的是這個toolbox在subsampling(也就是pooling層)沒有加sigmoid啟用函式
    //所以這地方還需仔細辨別
    //這這個toolbox裡的subsampling是不用計算gradient的,而在上面那篇note裡是計算了的
    for l = (n - 1) : -1 : 1
        if strcmp(net.layers{l}.type, 'c')
            for j = 1 : numel(net.layers{l}.a)
                net.layers{l}.d{j} = net.layers{l}.a{j} .* (1 - net.layers{l}.a{j}) .* (expand(net.layers{l + 1}.d{j}, [net.layers{l + 1}.scale net.layers{l + 1}.scale 1]) / net.layers{l + 1}.scale ^ 2);
            end
        elseif strcmp(net.layers{l}.type, 's')
            for i = 1 : numel(net.layers{l}.a)
                z = zeros(size(net.layers{l}.a{1}));
                for j = 1 : numel(net.layers{l + 1}.a)
                     z = z + convn(net.layers{l + 1}.d{j}, rot180(net.layers{l + 1}.k{i}{j}), 'full');
                end
                net.layers{l}.d{i} = z;
            end
        end
    end
    //參見paper,注意這裡只計算了'c'層的gradient,因為只有這層有引數
    ////  calc gradients
    for l = 2 : n
        if strcmp(net.layers{l}.type, 'c')
            for j = 1 : numel(net.layers{l}.a)
                for i = 1 : numel(net.layers{l - 1}.a)
                    net.layers{l}.dk{i}{j} = convn(flipall(net.layers{l - 1}.a{i}), net.layers{l}.d{j}, 'valid') / size(net.layers{l}.d{j}, 3);
                end
                net.layers{l}.db{j} = sum(net.layers{l}.d{j}(:)) / size(net.layers{l}.d{j}, 3);
            end
        end
    end
    //最後一層perceptron的gradient的計算
    net.dffW = net.od * (net.fv)' / size(net.od, 2);
    net.dffb = mean(net.od, 2);

    function X = rot180(X)
        X = flipdim(flipdim(X, 1), 2);
    end
end

cnnapplygrads.m

  這部分就輕鬆了,已經有grads了,依次進行梯度更新就好了

function net = cnnapplygrads(net, opts)
    for l = 2 : numel(net.layers)
        if strcmp(net.layers{l}.type, 'c')
            for j = 1 : numel(net.layers{l}.a)
                for ii = 1 : numel(net.layers{l - 1}.a)
                    net.layers{l}.k{ii}{j} = net.layers{l}.k{ii}{j} - opts.alpha * net.layers{l}.dk{ii}{j};
                end
                net.layers{l}.b{j} = net.layers{l}.b{j} - opts.alpha * net.layers{l}.db{j};
            end
        end
    end

    net.ffW = net.ffW - opts.alpha * net.dffW;
    net.ffb = net.ffb - opts.alpha * net.dffb;
end

cnntest.m

          好吧,我們得知道最後結果怎麼來啊
          
function [er, bad] = cnntest(net, x, y)
    //  feedforward
    net = cnnff(net, x);
    [~, h] = max(net.o);
    [~, a] = max(y);
    bad = find(h ~= a);

    er = numel(bad) / size(y, 2);
end
         就是這樣~~cnnff一次後net.o就是結果

總結

               just code !

               這是一個89年的模型啊~~~,最近還和RBM結合起來了,做了一個Imagenet的最好成績(是這個吧?):

               Alex Krizhevsky.ImageNet Classification with Deep  Convolutional Neural Networks. Video and Slides, 2012
               http://www.cs.utoronto.ca/~rsalakhu/papers/dbm.pdf

              【參考】:

                       【Deep learning:三十八(Stacked CNN簡單介紹)

                       【UFLDL

                       【Notes on Convolutional Neural Networks

                       【Convolutional Neural Networks (LeNet)】  這是deeplearning 的theano庫的

相關文章