[PyTorch 學習筆記] 6.2 Normalization

張賢同學發表於2020-09-10

原文網址 : https://www.cnblogs.com/zhangxiann/p/13648991.html

本章程式碼：

https://github.com/zhangxiann/PyTorch_Practice/blob/master/lesson6/bn_and_initialize.py

https://github.com/zhangxiann/PyTorch_Practice/blob/master/lesson6/bn_in_123_dim.py

https://github.com/zhangxiann/PyTorch_Practice/blob/master/lesson6/normallization_layers.py

這篇文章主要介紹了 Batch Normalization 的概念，以及 PyTorch 中的 1d/2d/3d Batch Normalization 實現。

Batch Normalization

稱為批標準化。批是指一批資料，通常為 mini-batch；標準化是處理後的資料服從 $N(0,1)$ 的正態分佈。

批標準化的優點有如下：

可以使用更大的學習率，加速模型收斂
可以不用精心設計權值初始化
可以不用 dropout 或者較小的 dropout
可以不用 L2 或者較小的 weight decay
可以不用 LRN (local response normalization)

假設輸入的 mini-batch 資料是 $\mathcal{B}=\left{x_{1 \dots m}\right}$，Batch Normalization 的可學習引數是 $\gamma, \beta$，步驟如下：

求 mini-batch 的均值：$\mu_{\mathcal{B}} \leftarrow \frac{1}{m} \sum_{i=1}^{m} x_{i}$
求 mini-batch 的方差：$\sigma_{\mathcal{B}}^{2} \leftarrow \frac{1}{m} \sum_{i=1}\left(x_{i}-\mu_{\mathcal{B}}\right)^{2}$
標準化：$\widehat{x}{i} \leftarrow \frac{x{i}-\mu_{\mathcal{B}}}{\sqrt{\sigma_{B}^{2}+\epsilon}}$，其中 $\epsilon$ 是放置分母為 0 的一個數
affine transform(縮放和平移)：$y_{i} \leftarrow \gamma \widehat{x}{i}+\beta \equiv \mathrm{B} \mathrm{N}{\gamma, \beta}\left(x_{i}\right)$，這個操作可以增強模型的 capacity，也就是讓模型自己判斷是否要對資料進行標準化，進行多大程度的標準化。如果 $\gamma= \sqrt{\sigma_{B}^{2}}$，$\beta=\mu_{\mathcal{B}}$，那麼就實現了恆等對映。

Batch Normalization 的提出主要是為了解決 Internal Covariate Shift (ICS)。在訓練過程中，資料需要經過多層的網路，如果資料在前向傳播的過程中，尺度發生了變化，可能會導致梯度爆炸或者梯度消失，從而導致模型難以收斂。

Batch Normalization 層一般在啟用函式前一層。

下面的程式碼列印一個網路的每個網路層的輸出，在沒有進行初始化時，資料尺度越來越小。

import torch
import numpy as np
import torch.nn as nn
from common_tools import set_seed

set_seed(1)  # 設定隨機種子


class MLP(nn.Module):
    def __init__(self, neural_num, layers=100):
        super(MLP, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(neural_num, neural_num, bias=False) for i in range(layers)])
        self.bns = nn.ModuleList([nn.BatchNorm1d(neural_num) for i in range(layers)])
        self.neural_num = neural_num

    def forward(self, x):

        for (i, linear), bn in zip(enumerate(self.linears), self.bns):
            x = linear(x)
            # x = bn(x)
            x = torch.relu(x)

            if torch.isnan(x.std()):
                print("output is nan in {} layers".format(i))
                break

            print("layers:{}, std:{}".format(i, x.std().item()))

        return x

    def initialize(self):
        for m in self.modules():
            if isinstance(m, nn.Linear):

                # method 1
                # nn.init.normal_(m.weight.data, std=1)    # normal: mean=0, std=1

                # method 2 kaiming
                nn.init.kaiming_normal_(m.weight.data)


neural_nums = 256
layer_nums = 100
batch_size = 16

net = MLP(neural_nums, layer_nums)
# net.initialize()

inputs = torch.randn((batch_size, neural_nums))  # normal: mean=0, std=1

output = net(inputs)
print(output)

當使用nn.init.kaiming_normal_()初始化後，資料的標準差尺度穩定在 [0.6, 0.9]。

當我們不對網路層進行權值初始化，而是在每個啟用函式層之前使用 bn 層，檢視資料的標準差尺度穩定在 [0.58, 0.59]。因此 Batch Normalization 可以不用精心設計權值初始化。

下面以人民幣二分類實驗中的 LeNet 為例，新增 bn 層，對比不帶 bn 層的網路和帶 bn 層的網路的訓練過程。

不帶 bn 層的網路，並且使用 kaiming 初始化權值，訓練過程如下：

可以看到訓練過程中，訓練集的 loss 在中間激增到 1.4，不夠穩定。

帶有 bn 層的 LeNet 定義如下：

class LeNet_bn(nn.Module):
    def __init__(self, classes):
        super(LeNet_bn, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.bn1 = nn.BatchNorm2d(num_features=6)

        self.conv2 = nn.Conv2d(6, 16, 5)
        self.bn2 = nn.BatchNorm2d(num_features=16)

        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.bn3 = nn.BatchNorm1d(num_features=120)

        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, classes)

    def forward(self, x):
        out = self.conv1(x)
        out = self.bn1(out)
        out = F.relu(out)

        out = F.max_pool2d(out, 2)

        out = self.conv2(out)
        out = self.bn2(out)
        out = F.relu(out)

        out = F.max_pool2d(out, 2)

        out = out.view(out.size(0), -1)

        out = self.fc1(out)
        out = self.bn3(out)
        out = F.relu(out)

        out = F.relu(self.fc2(out))
        out = self.fc3(out)
        return out

帶 bn 層的網路，並且不使用 kaiming 初始化權值，訓練過程如下：

雖然訓練過程中，訓練集的 loss 也有激增，但只是增加到 0.4，非常穩定。

Batch Normalization in PyTorch

在 PyTorch 中，有 3 個 Batch Normalization 類

nn.BatchNorm1d()，輸入資料的形狀是 $B \times C \times 1D_feature$
nn.BatchNorm2d()，輸入資料的形狀是 $B \times C \times 2D_feature$
nn.BatchNorm3d()，輸入資料的形狀是 $B \times C \times 3D_feature$

以nn.BatchNorm1d()為例，如下：

torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

引數：

num_features：一個樣本的特徵數量，這個引數最重要
eps：在進行標準化操作時的分佈修正項
momentum：指數加權平均估計當前的均值和方差
affine：是否需要 affine transform，預設為 True
track_running_stats：True 為訓練狀態，此時均值和方差會根據每個 mini-batch 改變。False 為測試狀態，此時均值和方差會固定

主要屬性：

runninng_mean：均值
running_var：方差
weight：affine transform 中的 $\gamma$
bias：affine transform 中的 $\beta$

在訓練時，均值和方差採用指數加權平均計算，也就是不僅考慮當前 mini-batch 的值均值和方差還考慮前面的 mini-batch 的均值和方差。

在訓練時，均值方差固定為當前統計值。

所有的 bn 層都是根據特徵維度計算上面 4 個屬性，詳情看下面例子。

nn.BatchNorm1d()

輸入資料的形狀是 $B \times C \times 1D_feature$。在下面的例子中，資料的維度是：(3, 5, 1)，表示一個 mini-batch 有 3 個樣本，每個樣本有 5 個特徵，每個特徵的維度是 1。那麼就會計算 5 個均值和方差，分別對應每個特徵維度。momentum 設定為 0.3，第一次的均值和方差預設為 0 和 1。輸入兩次 mini-batch 的資料。

資料如下圖：

程式碼如下所示：

    batch_size = 3
    num_features = 5
    momentum = 0.3

    features_shape = (1)

    feature_map = torch.ones(features_shape)                                                    # 1D
    feature_maps = torch.stack([feature_map*(i+1) for i in range(num_features)], dim=0)         # 2D
    feature_maps_bs = torch.stack([feature_maps for i in range(batch_size)], dim=0)             # 3D
    print("input data:\n{} shape is {}".format(feature_maps_bs, feature_maps_bs.shape))

    bn = nn.BatchNorm1d(num_features=num_features, momentum=momentum)

    running_mean, running_var = 0, 1
    mean_t, var_t = 2, 0
    for i in range(2):
        outputs = bn(feature_maps_bs)

        print("\niteration:{}, running mean: {} ".format(i, bn.running_mean))
        print("iteration:{}, running var:{} ".format(i, bn.running_var))



        running_mean = (1 - momentum) * running_mean + momentum * mean_t
        running_var = (1 - momentum) * running_var + momentum * var_t

        print("iteration:{}, 第二個特徵的running mean: {} ".format(i, running_mean))
        print("iteration:{}, 第二個特徵的running var:{}".format(i, running_var))

輸出為：

input data:
tensor([[[1.],
         [2.],
         [3.],
         [4.],
         [5.]],
        [[1.],
         [2.],
         [3.],
         [4.],
         [5.]],
        [[1.],
         [2.],
         [3.],
         [4.],
         [5.]]]) shape is torch.Size([3, 5, 1])
iteration:0, running mean: tensor([0.3000, 0.6000, 0.9000, 1.2000, 1.5000])
iteration:0, running var:tensor([0.7000, 0.7000, 0.7000, 0.7000, 0.7000])
iteration:0, 第二個特徵的running mean: 0.6
iteration:0, 第二個特徵的running var:0.7
iteration:1, running mean: tensor([0.5100, 1.0200, 1.5300, 2.0400, 2.5500])
iteration:1, running var:tensor([0.4900, 0.4900, 0.4900, 0.4900, 0.4900])
iteration:1, 第二個特徵的running mean: 1.02
iteration:1, 第二個特徵的running var:0.48999999999999994

雖然兩個 mini-batch 的資料是一樣的，但是 bn 層的均值和方差卻不一樣。以第二個特徵的均值計算為例，值都是 2。

第一次 bn 層的均值計算：$running_mean=(1-momentum) \times pre_running_mean + momentum \times mean_t =(1-0.3) \times 0 + 0.3 \times 2 =0.6$
第二次 bn 層的均值計算：$running_mean=(1-momentum) \times pre_running_mean + momentum \times mean_t =(1-0.3) \times 0.6 + 0.3 \times 2 =1.02$

網路還沒進行前向傳播之前，斷點檢視 bn 層的屬性如下：

## nn.BatchNorm2d()

輸入資料的形狀是 $B \times C \times 2D_feature$。在下面的例子中，資料的維度是：(3, 3, 2, 2)，表示一個 mini-batch 有 3 個樣本，每個樣本有 3 個特徵，每個特徵的維度是 $1 \times 2$。那麼就會計算 3 個均值和方差，分別對應每個特徵維度。momentum 設定為 0.3，第一次的均值和方差預設為 0 和 1。輸入兩次 mini-batch 的資料。

資料如下圖：

程式碼如下：

    batch_size = 3
    num_features = 3
    momentum = 0.3

    features_shape = (2, 2)

    feature_map = torch.ones(features_shape)                                                    # 2D
    feature_maps = torch.stack([feature_map*(i+1) for i in range(num_features)], dim=0)         # 3D
    feature_maps_bs = torch.stack([feature_maps for i in range(batch_size)], dim=0)             # 4D

    # print("input data:\n{} shape is {}".format(feature_maps_bs, feature_maps_bs.shape))

    bn = nn.BatchNorm2d(num_features=num_features, momentum=momentum)

    running_mean, running_var = 0, 1

    for i in range(2):
        outputs = bn(feature_maps_bs)

        print("\niter:{}, running_mean: {}".format(i, bn.running_mean))
        print("iter:{}, running_var: {}".format(i, bn.running_var))

        print("iter:{}, weight: {}".format(i, bn.weight.data.numpy()))
        print("iter:{}, bias: {}".format(i, bn.bias.data.numpy()))

輸出如下：

iter:0, running_mean: tensor([0.3000, 0.6000, 0.9000])
iter:0, running_var: tensor([0.7000, 0.7000, 0.7000])
iter:0, weight: [1. 1. 1.]
iter:0, bias: [0. 0. 0.]
iter:1, running_mean: tensor([0.5100, 1.0200, 1.5300])
iter:1, running_var: tensor([0.4900, 0.4900, 0.4900])
iter:1, weight: [1. 1. 1.]
iter:1, bias: [0. 0. 0.]

nn.BatchNorm3d()

輸入資料的形狀是 $B \times C \times 3D_feature$。在下面的例子中，資料的維度是：(3, 2, 2, 2, 3)，表示一個 mini-batch 有 3 個樣本，每個樣本有 2 個特徵，每個特徵的維度是 $2 \times 2 \times 3$。那麼就會計算 2 個均值和方差，分別對應每個特徵維度。momentum 設定為 0.3，第一次的均值和方差預設為 0 和 1。輸入兩次 mini-batch 的資料。

資料如下圖：

程式碼如下：

    batch_size = 3
    num_features = 3
    momentum = 0.3

    features_shape = (2, 2, 3)

    feature = torch.ones(features_shape)                                                # 3D
    feature_map = torch.stack([feature * (i + 1) for i in range(num_features)], dim=0)  # 4D
    feature_maps = torch.stack([feature_map for i in range(batch_size)], dim=0)         # 5D

    # print("input data:\n{} shape is {}".format(feature_maps, feature_maps.shape))

    bn = nn.BatchNorm3d(num_features=num_features, momentum=momentum)

    running_mean, running_var = 0, 1

    for i in range(2):
        outputs = bn(feature_maps)

        print("\niter:{}, running_mean.shape: {}".format(i, bn.running_mean.shape))
        print("iter:{}, running_var.shape: {}".format(i, bn.running_var.shape))

        print("iter:{}, weight.shape: {}".format(i, bn.weight.shape))
        print("iter:{}, bias.shape: {}".format(i, bn.bias.shape))

輸出如下：

iter:0, running_mean.shape: torch.Size([3])
iter:0, running_var.shape: torch.Size([3])
iter:0, weight.shape: torch.Size([3])
iter:0, bias.shape: torch.Size([3])
iter:1, running_mean.shape: torch.Size([3])
iter:1, running_var.shape: torch.Size([3])
iter:1, weight.shape: torch.Size([3])
iter:1, bias.shape: torch.Size([3])

Layer Normalization

提出的原因：Batch Normalization 不適用於變長的網路，如 RNN

思路：每個網路層計算均值和方差

注意事項：

不再有 running_mean 和 running_var
$\gamma$ 和 $\beta$ 為逐樣本的

``` torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True) ```

引數：

normalized_shape：該層特徵的形狀，可以取 $C \times H \times W$、$H \times W$、$W$
eps：標準化時的分母修正項
elementwise_affine：是否需要逐個樣本 affine transform

下面程式碼中，輸入資料的形狀是 $B \times C \times feature$，(8, 2, 3, 4)，表示一個 mini-batch 有 8 個樣本，每個樣本有 2 個特徵，每個特徵的維度是 $3 \times 4$。那麼就會計算 8 個均值和方差，分別對應每個樣本。

    batch_size = 8
    num_features = 2

    features_shape = (3, 4)

    feature_map = torch.ones(features_shape)  # 2D
    feature_maps = torch.stack([feature_map * (i + 1) for i in range(num_features)], dim=0)  # 3D
    feature_maps_bs = torch.stack([feature_maps for i in range(batch_size)], dim=0)  # 4D

    # feature_maps_bs shape is [8, 6, 3, 4],  B * C * H * W
    # ln = nn.LayerNorm(feature_maps_bs.size()[1:], elementwise_affine=True)
    # ln = nn.LayerNorm(feature_maps_bs.size()[1:], elementwise_affine=False)
    # ln = nn.LayerNorm([6, 3, 4])
    ln = nn.LayerNorm([2, 3, 4])

    output = ln(feature_maps_bs)

    print("Layer Normalization")
    print(ln.weight.shape)
    print(feature_maps_bs[0, ...])
    print(output[0, ...])

Layer Normalization
torch.Size([2, 3, 4])
tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],
        [[2., 2., 2., 2.],
         [2., 2., 2., 2.],
         [2., 2., 2., 2.]]])
tensor([[[-1.0000, -1.0000, -1.0000, -1.0000],
         [-1.0000, -1.0000, -1.0000, -1.0000],
         [-1.0000, -1.0000, -1.0000, -1.0000]],
        [[ 1.0000,  1.0000,  1.0000,  1.0000],
         [ 1.0000,  1.0000,  1.0000,  1.0000],
         [ 1.0000,  1.0000,  1.0000,  1.0000]]], grad_fn=<SelectBackward>)

Layer Normalization 可以設定 normalized_shape 為 (3, 4) 或者 (4)。

Instance Normalization

提出的原因：Batch Normalization 不適用於影像生成。因為在一個 mini-batch 中的影像有不同的風格，不能把這個 batch 裡的資料都看作是同一類取標準化。

思路：逐個 instance 的 channel 計算均值和方差。也就是每個 feature map 計算一個均值和方差。

包括 InstanceNorm1d、InstanceNorm2d、InstanceNorm3d。

以InstanceNorm1d為例，定義如下：

torch.nn.InstanceNorm1d(num_features, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)

引數：

num_features：一個樣本的特徵數，這個引數最重要
eps：分母修正項
momentum：指數加權平均估計當前的的均值和方差
affine：是否需要 affine transform
track_running_stats：True 為訓練狀態，此時均值和方差會根據每個 mini-batch 改變。False 為測試狀態，此時均值和方差會固定

下面程式碼中，輸入資料的形狀是 $B \times C \times 2D_feature$，(3, 3, 2, 2)，表示一個 mini-batch 有 3 個樣本，每個樣本有 3 個特徵，每個特徵的維度是 $2 \times 2 $。那麼就會計算 $3 \times 3 $ 個均值和方差，分別對應每個樣本的每個特徵。如下圖所示：

下面是程式碼：

    batch_size = 3
    num_features = 3
    momentum = 0.3

    features_shape = (2, 2)

    feature_map = torch.ones(features_shape)    # 2D
    feature_maps = torch.stack([feature_map * (i + 1) for i in range(num_features)], dim=0)  # 3D
    feature_maps_bs = torch.stack([feature_maps for i in range(batch_size)], dim=0)  # 4D

    print("Instance Normalization")
    print("input data:\n{} shape is {}".format(feature_maps_bs, feature_maps_bs.shape))

    instance_n = nn.InstanceNorm2d(num_features=num_features, momentum=momentum)

    for i in range(1):
        outputs = instance_n(feature_maps_bs)

        print(outputs)

輸出如下：

Instance Normalization
input data:
tensor([[[[1., 1.],
          [1., 1.]],
         [[2., 2.],
          [2., 2.]],
         [[3., 3.],
          [3., 3.]]],
        [[[1., 1.],
          [1., 1.]],
         [[2., 2.],
          [2., 2.]],
         [[3., 3.],
          [3., 3.]]],
        [[[1., 1.],
          [1., 1.]],
         [[2., 2.],
          [2., 2.]],
         [[3., 3.],
          [3., 3.]]]]) shape is torch.Size([3, 3, 2, 2])
tensor([[[[0., 0.],
          [0., 0.]],
         [[0., 0.],
          [0., 0.]],
         [[0., 0.],
          [0., 0.]]],
        [[[0., 0.],
          [0., 0.]],
         [[0., 0.],
          [0., 0.]],
         [[0., 0.],
          [0., 0.]]],
        [[[0., 0.],
          [0., 0.]],
         [[0., 0.],
          [0., 0.]],
         [[0., 0.],
          [0., 0.]]]])

Group Normalization

提出的原因：在小 batch 的樣本中，Batch Normalization 估計的值不準。一般用在很大的模型中，這時 batch size 就很小。

思路：資料不夠，通道來湊。每個樣本的特徵分為幾組，每組特徵分別計算均值和方差。可以看作是 Layer Normalization 的基礎上新增了特徵分組。

注意事項：

不再有 running_mean 和 running_var
$\gamma$ 和 $\beta$ 為逐通道的

定義如下：

torch.nn.GroupNorm(num_groups, num_channels, eps=1e-05, affine=True)

引數：

num_groups：特徵的分組數量
num_channels：特徵數，通道數。注意 num_channels 要可以整除 num_groups
eps：分母修正項
affine：是否需要 affine transform

下面程式碼中，輸入資料的形狀是 $B \times C \times 2D_feature$，(2, 4, 3, 3)，表示一個 mini-batch 有 2 個樣本，每個樣本有 4 個特徵，每個特徵的維度是 $3 \times 3 $。num_groups 設定為 2，那麼就會計算 $2 \times (4 \div 2) $ 個均值和方差，分別對應每個樣本的每個特徵。

   batch_size = 2
    num_features = 4
    num_groups = 2
    features_shape = (2, 2)

    feature_map = torch.ones(features_shape)    # 2D
    feature_maps = torch.stack([feature_map * (i + 1) for i in range(num_features)], dim=0)  # 3D
    feature_maps_bs = torch.stack([feature_maps * (i + 1) for i in range(batch_size)], dim=0)  # 4D

    gn = nn.GroupNorm(num_groups, num_features)
    outputs = gn(feature_maps_bs)

    print("Group Normalization")
    print(gn.weight.shape)
    print(outputs[0])

輸出如下：

Group Normalization
torch.Size([4])
tensor([[[-1.0000, -1.0000],
         [-1.0000, -1.0000]],
        [[ 1.0000,  1.0000],
         [ 1.0000,  1.0000]],
        [[-1.0000, -1.0000],
         [-1.0000, -1.0000]],
        [[ 1.0000,  1.0000],
         [ 1.0000,  1.0000]]], grad_fn=<SelectBackward>)

參考資料

深度之眼 PyTorch 框架班

如果你覺得這篇文章對你有幫助，不妨點個贊，讓我有更多動力寫出好文章。

batch normalization學習理解筆記
2019-06-09
BATORM筆記
【深度學習筆記】Batch Normalization (BN)
2019-01-07
深度學習筆記BATORM
pytorch學習筆記
2020-10-24
PyTorch筆記
PyTorch 學習筆記
2024-10-11
PyTorch筆記
深度學習框架Pytorch學習筆記
2023-02-27
深度學習框架PyTorch筆記
Pytorch學習筆記之tensorboard
2022-12-19
PyTorch筆記ORB
Pytorch學習筆記|莫凡Python
2020-11-02
PyTorch筆記Python
[PyTorch 學習筆記] 3.2 卷積層
2020-08-30
PyTorch筆記卷積
[PyTorch 學習筆記] 5.1 TensorBoard 介紹
2020-09-05
PyTorch筆記ORB
樹莓派學習筆記（三）PyTorch
2020-11-24
樹莓派筆記PyTorch
pytorch使用交叉熵訓練模型學習筆記
2024-06-17
PyTorch熵模型筆記
深度學習中 Batch Normalization
2020-11-17
深度學習BATORM
PyTorch深度學習入門筆記（一）PyTorch環境配置及安裝
2022-02-12
PyTorch深度學習筆記
[PyTorch 學習筆記] 7.1 模型儲存與載入
2020-09-15
PyTorch筆記模型
ExtJS 6.2開發筆記
2019-06-05
JS筆記
深度學習中的Normalization模型
2018-08-29
深度學習ORM模型
[PyTorch 學習筆記] 5.2 Hook 函式與 CAM 演算法
2020-09-07
PyTorch筆記Hook函式演算法
[PyTorch 學習筆記] 3.1 模型建立步驟與 nn.Module
2020-08-28
PyTorch筆記模型
訓練一個影像分類器demo in PyTorch【學習筆記】
2022-06-30
PyTorch筆記
【小白學PyTorch】21 Keras的API詳解（下）池化、Normalization層
2020-10-15
PyTorchKerasAPIORM
pytorch 方法筆記
2020-11-13
PyTorch筆記
Pytorch筆記（一）
2020-10-01
PyTorch筆記
numpy的學習筆記\pandas學習筆記
2018-03-18
筆記
學習筆記
2024-04-14
筆記
[PyTorch 學習筆記] 2.2 圖片預處理 transforms 模組機制
2020-08-27
PyTorch筆記ORM
Anaconda Pytorch 深度學習入門記錄
2024-10-27
PyTorch深度學習
【學習筆記】數學
2024-07-15
筆記
《JAVA學習指南》學習筆記
2019-03-24
Java筆記
機器學習學習筆記
2021-06-01
機器學習筆記
學習筆記-粉筆980
2024-10-14
筆記
學習筆記（3.29）
2019-03-31
筆記
學習筆記（4.1）
2019-04-01
筆記
學習筆記（3.25）
2019-03-25
筆記
學習筆記（3.26）
2019-03-26
筆記
JavaWeb 學習筆記
2018-10-28
JavaWeb筆記
golang 學習筆記
2019-03-26
Golang筆記
Nginx 學習筆記
2018-11-02
Nginx筆記
spring學習筆記
2019-01-05
Spring筆記

[PyTorch 學習筆記] 6.2 Normalization

Batch Normalization

Batch Normalization in PyTorch

nn.BatchNorm1d()

nn.BatchNorm3d()

Layer Normalization

Instance Normalization

Group Normalization

相關文章