TensorFlow實現Batch Normalization

marsjhao發表於2020-04-06

原文網址 : https://blog.csdn.net/marsjhao/article/details/72876460

一、BN（Batch Normalization）演算法

1. 對資料進行歸一化處理的重要性

神經網路學習過程的本質就是學習資料分佈，在訓練資料與測試資料分佈不同情況下，模型的泛化能力就大大降低；另一方面，若訓練過程中每批batch的資料分佈也各不相同，那麼網路每批迭代學習過程也會出現較大波動，使之更難趨於收斂，降低訓練收斂速度。對於深層網路，網路前幾層的微小變化都會被網路累積放大，則訓練資料的分佈變化問題會被放大，更加影響訓練速度。

2. BN演算法的強大之處

1）為了加速梯度下降演算法的訓練，我們可以採取指數衰減學習率等方法在初期快速學習，後期緩慢進入全域性最優區域。使用BN演算法後，就可以直接選擇比較大的學習率，且設定很大的學習率衰減速度，大大提高訓練速度。即使選擇了較小的學習率，也會比以前不使用BN情況下的收斂速度快。總結就是BN演算法具有快速收斂的特性。

2）BN具有提高網路泛化能力的特性。採用BN演算法後，就可以移除針對過擬合問題而設定的dropout和L2正則化項，或者採用更小的L2正則化引數。

3）BN本身是一個歸一化網路層，則區域性響應歸一化層（Local Response Normalization，LRN層）則可不需要了（Alexnet網路中使用到）。

3. BN演算法概述

BN演算法提出了變換重構，引入了可學習引數γ、β，這就是演算法的關鍵之處：

引入這兩個引數後，我們的網路便可以學習恢復出原是網路所要學習的特徵分佈，BN層的錢箱傳到過程如下：

其中m為batchsize。BatchNormalization中所有的操作都是平滑可導，這使得back propagation可以有效執行並學到相應的引數γ，β。需要注意的一點是Batch Normalization在training和testing時行為有所差別。Training時μβ和σβ由當前batch計算得出；在Testing時μβ和σβ應使用Training時儲存的均值或類似的經過處理的值，而不是由當前batch計算。

二、TensorFlow相關函式

1.tf.nn.moments(x, axes, shift=None, name=None, keep_dims=False)

x是輸入張量，axes是在哪個維度上求解，即想要 normalize的維度, [0] 代表 batch 維度，如果是影像資料，可以傳入 [0, 1, 2]，相當於求[batch, height, width] 的均值/方差，注意不要加入channel 維度。該函式返回兩個張量，均值mean和方差variance。

2.tf.identity(input, name=None)

返回與輸入張量input形狀和內容一致的張量。

3.tf.nn.batch_normalization(x, mean, variance, offset, scale, variance_epsilon,name=None)

計算公式為scale(x - mean)/ variance + offset。

這些引數中，tf.nn.moments可得到均值mean和方差variance，offset和scale是可訓練的，offset一般初始化為0，scale初始化為1，offset和scale的shape與mean相同，variance_epsilon引數設為一個很小的值如0.001。

三、TensorFlow程式碼實現

1. 完整程式碼

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

ACTIVITION = tf.nn.relu
N_LAYERS = 7 # 總共7層隱藏層
N_HIDDEN_UNITS = 30 # 每層包含30個神經元

def fix_seed(seed=1): # 設定隨機數種子
    np.random.seed(seed)
    tf.set_random_seed(seed)

def plot_his(inputs, inputs_norm): # 繪製直方圖函式
    for j, all_inputs in enumerate([inputs, inputs_norm]):
        for i, input in enumerate(all_inputs):
            plt.subplot(2, len(all_inputs), j*len(all_inputs)+(i+1))
            plt.cla()
            if i == 0:
                the_range = (-7, 10)
            else:
                the_range = (-1, 1)
            plt.hist(input.ravel(), bins=15, range=the_range, color='#FF5733')
            plt.yticks(())
            if j == 1:
                plt.xticks(the_range)
            else:
                plt.xticks(())
            ax = plt.gca()
            ax.spines['right'].set_color('none')
            ax.spines['top'].set_color('none')
        plt.title("%s normalizing" % ("Without" if j == 0 else "With"))
    plt.draw()
    plt.pause(0.01)

def built_net(xs, ys, norm): # 搭建網路函式
    # 新增層
    def add_layer(inputs, in_size, out_size, activation_function=None, norm=False):
        Weights = tf.Variable(tf.random_normal([in_size, out_size],
                                               mean=0.0, stddev=1.0))
        biases = tf.Variable(tf.zeros([1, out_size]) + 0.1)
        Wx_plus_b = tf.matmul(inputs, Weights) + biases

        if norm: # 判斷是否是Batch Normalization層
            # 計算均值和方差，axes引數0表示batch維度
            fc_mean, fc_var = tf.nn.moments(Wx_plus_b, axes=[0])
            scale = tf.Variable(tf.ones([out_size]))
            shift = tf.Variable(tf.zeros([out_size]))
            epsilon = 0.001

            # 定義滑動平均模型物件
            ema = tf.train.ExponentialMovingAverage(decay=0.5)

            def mean_var_with_update():
                ema_apply_op = ema.apply([fc_mean, fc_var])
                with tf.control_dependencies([ema_apply_op]):
                    return tf.identity(fc_mean), tf.identity(fc_var)

            mean, var = mean_var_with_update()

            Wx_plus_b = tf.nn.batch_normalization(Wx_plus_b, mean, var,
                                                  shift, scale, epsilon)

        if activation_function is None:
            outputs = Wx_plus_b
        else:
            outputs = activation_function(Wx_plus_b)
        return outputs

    fix_seed(1)

    if norm: # 為第一層進行BN
        fc_mean, fc_var = tf.nn.moments(xs, axes=[0])
        scale = tf.Variable(tf.ones([1]))
        shift = tf.Variable(tf.zeros([1]))
        epsilon = 0.001

        ema = tf.train.ExponentialMovingAverage(decay=0.5)

        def mean_var_with_update():
            ema_apply_op = ema.apply([fc_mean, fc_var])
            with tf.control_dependencies([ema_apply_op]):
                return tf.identity(fc_mean), tf.identity(fc_var)

        mean, var = mean_var_with_update()
        xs = tf.nn.batch_normalization(xs, mean, var, shift, scale, epsilon)

    layers_inputs = [xs] # 記錄每一層的輸入

    for l_n in range(N_LAYERS): # 依次新增7層
        layer_input = layers_inputs[l_n]
        in_size = layers_inputs[l_n].get_shape()[1].value

        output = add_layer(layer_input, in_size, N_HIDDEN_UNITS, ACTIVITION, norm)
        layers_inputs.append(output)

    prediction = add_layer(layers_inputs[-1], 30, 1, activation_function=None)
    cost = tf.reduce_mean(tf.reduce_sum(tf.square(ys - prediction),
                                        reduction_indices=[1]))

    train_op = tf.train.GradientDescentOptimizer(0.001).minimize(cost)
    return [train_op, cost, layers_inputs]

fix_seed(1)
x_data = np.linspace(-7, 10, 2500)[:, np.newaxis]
np.random.shuffle(x_data)
noise =np.random.normal(0, 8, x_data.shape)
y_data = np.square(x_data) - 5 + noise

plt.scatter(x_data, y_data)
plt.show()

xs = tf.placeholder(tf.float32, [None, 1])
ys = tf.placeholder(tf.float32, [None, 1])

train_op, cost, layers_inputs = built_net(xs, ys, norm=False)
train_op_norm, cost_norm, layers_inputs_norm = built_net(xs, ys, norm=True)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    cost_his = []
    cost_his_norm = []
    record_step = 5

    plt.ion()
    plt.figure(figsize=(7, 3))
    for i in range(250):
        if i % 50 == 0:
            all_inputs, all_inputs_norm = sess.run([layers_inputs, layers_inputs_norm],
                                                   feed_dict={xs: x_data, ys: y_data})
            plot_his(all_inputs, all_inputs_norm)

        sess.run([train_op, train_op_norm],
                 feed_dict={xs: x_data[i*10:i*10+10], ys: y_data[i*10:i*10+10]})

        if i % record_step == 0:
            cost_his.append(sess.run(cost, feed_dict={xs: x_data, ys: y_data}))
            cost_his_norm.append(sess.run(cost_norm,
                                          feed_dict={xs: x_data, ys: y_data}))

    plt.ioff()
    plt.figure()
    plt.plot(np.arange(len(cost_his))*record_step,
             np.array(cost_his), label='Without BN')     # no norm
    plt.plot(np.arange(len(cost_his))*record_step,
             np.array(cost_his_norm), label='With BN')   # norm
    plt.legend()
    plt.show()

2. 實驗結果

輸入資料分佈：

批標準化BN效果對比：

解毒batch normalization
2020-04-06
BATORM
深度學習中 Batch Normalization
2020-11-17
深度學習BATORM
batch normalization學習理解筆記
2019-06-09
BATORM筆記
【深度學習筆記】Batch Normalization (BN)
2019-01-07
深度學習筆記BATORM
BN（Batch Normalization）層的詳細介紹
2019-04-08
BATORM
Batch Normalization: 如何更快地訓練深度神經網路
2019-04-18
BATORM神經網路
DKT模型及其TensorFlow實現（Deep knowledge tracing with Tensorflow）
2021-12-25
模型
Layer Normalization
2024-07-28
ORM
DSSM模型和tensorflow實現
2018-08-28
SSM模型
TensorFlow實現seq2seq
2019-02-28
TensorFlow實現線性迴歸
2019-06-05
實戰程式碼（二）:Springboot Batch實現定時資料遷移
2020-11-15
Spring BootBAT
【TensorFlow篇】--Tensorflow框架實現SoftMax模型識別手寫數字集
2018-03-28
框架模型
【Tensorflow_DL_Note6】Tensorflow實現卷積神經網路(1)
2018-04-18
卷積神經網路
【Tensorflow_DL_Note7】Tensorflow實現卷積神經網路(2)
2018-04-18
卷積神經網路
TensorFlow上實現AutoEncoder自編碼器
2020-04-06
【Tensorflow_DL_Note12】TensorFlow中LeNet-5模型的實現程式碼
2018-05-06
模型
利用tensorflow.js實現JS中的AI
2019-01-07
JSAI
利用 TensorFlow 實現卷積自編碼器
2018-12-12
卷積
【TensorFlow篇】--Tensorflow框架初始，實現機器學習中多元線性迴歸
2018-03-27
框架機器學習
【TensorFlow】tf.nn.max_pool實現池化操作
2018-10-17
當微信小程式遇上TensorFlow：Server端實現
2019-03-02
微信小程式Server
TensorFlow 呼叫預訓練好的模型—— Python 實現
2018-10-10
模型Python
當微信小程式遇上TensorFlow：小程式實現
2018-10-08
微信小程式
Tensorflow實現RNN（LSTM）手寫數字識別
2018-05-27
RNN
30秒輕鬆實現TensorFlow物體檢測
2018-03-14
TensorFlow上實現卷積神經網路CNN
2020-04-06
卷積神經網路CNN
基於Python和TensorFlow實現BERT模型應用
2024-06-26
Python模型
使用nodejs實現OData的batch操作在Marketing Cloud裡讀取contact資訊
2019-05-25
NodeJSBATCloud
深度學習中的Normalization模型
2018-08-29
深度學習ORM模型
[PyTorch 學習筆記] 6.2 Normalization
2020-09-10
PyTorch筆記ORM
陪你解讀Spring Batch（一）Spring Batch介紹
2019-02-18
SpringBAT
使用Tensorflow實現口算檢查器(1)：模型選擇
2019-03-04
模型
Tensorflow實現神經網路的前向傳播
2020-06-22
神經網路
Tensorflow實現的深度NLP模型集錦（附資源）
2019-04-29
模型
YOLOv3 的 TensorFlow 實現，GitHub 完整原始碼解析
2019-01-31
YOLOGithub原始碼
基於Tensorflow + Opencv 實現CNN自定義影像分類
2021-09-22
OpenCVCNN
Batch Scripting Tutorial
2024-07-16
BAT