【PyTorch】常用的神經網路層彙總（持續補充更新）

最菜程式設計師Sxx發表於2022-04-30

原文網址 : https://www.cnblogs.com/shaoxx333/p/16199309.html

1. Convolution Layers

1.1 nn.Conv2d

（1）原型

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, 
                padding_mode='zeros', device=None, dtype=None)

　　在由多個輸入平面組成的輸入訊號上應用2D卷積，簡言之就是在多通道輸入影像上進行卷積操作。

（2）引數

in_channels (int) — 輸入影像的通道數

out_channels (int) — 輸出影像（張量表示）的通道數

kernel_size (int or tuple) — 卷積核大小。n*n型的寫成 kernel_size = 5 即可，n*m型的則需要寫成 kernel_size = (n, m)

stride (int or tuple, 可選擇) — 卷積步長，即卷積核在影像上每次平移的間隔。預設：1

padding (int, tuple or str, 可選擇) — 邊緣填充，影像上下左右四邊填充為 0 的行數和列數。預設：0

padding_mode (string, 可選擇) — padding的模式：'zeros'，’reflect'，'replicata' 或 'circular'。預設：'zeros'

dilation (int or tuple, 可選擇) — 核心元素的間隔，該引數決定了是否採用空洞卷積。預設：1（不採用）

groups (int, 可選擇) — 輸入通道到輸出通道之間塊狀連線的數量。預設：1

bias (bool, 可選擇) — 是否增加一個可學習的偏置項到輸出。預設：True

（3）屬性

~Linear.weight (torch.Tensor) — 形狀為 (out_channels, in_channels / group, kernel_size[0], kernel_size[1]) 的模型的可學習的偏置項

初始化為：

~Linear.bias — 形狀為 (out_channels) 的模型的可學習的偏置項

如果bias為True，初始化為：

（4）用法示例

import torch
import torch.nn as nn

m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
print(m)
# (N, C, H, W)
inputImage = torch.randn(20, 16, 50, 100)
output = m(inputImage)
print(output.shape)

結果：

2. Pooling Layers

2.1 nn.MaxPool2d

（1）原型

torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

在由多個輸入平面組成的輸入訊號上應用 2D 最大池化。

注：當 ceil_mode=True 時，如果滑動視窗從左側填充或輸入中開始，則允許它們越界。在右側填充區域開始的滑動視窗將被忽略。

（2）引數

kernel_size — 表示做最大池化的視窗大小，可以是單個值，也可以是tuple元組
stride — 卷積步長，即卷積核在影像上每次平移的間隔。預設：kernel_size

padding — 影像上下左右四邊填充為 0 的行數和列數。預設：0
dilation — 核心元素的間隔，該引數決定了是否採用空洞卷積。預設：1（不採用）
return_indices (bool) — 是否返回輸出的最大索引。預設：False
ceil_mode (bool) — 使用向上取整（ceil）或向下取整（floor）的方式計算得到輸出形狀。預設：False（floor，向下取整）

（3）用法示例

# pool of square window of size=3, stride=2
m = nn.MaxPool2d(3, stride=2)
# pool of non-square window
m = nn.MaxPool2d((3, 2), stride=(2, 1))
input = torch.randn(20, 16, 50, 32)
output = m(input)
print(f'input shape: {input.shape}', f'output shape: {output.shape}', sep='\n')

結果：

2.2 nn.AvgPool2d

（1）原型

torch.nn.AvgPool2d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None)

在由多個輸入平面組成的輸入訊號上應用 2D 平均池化。

注：當 ceil_mode=True 時，如果滑動視窗從左側填充或輸入中開始，則允許它們越界。在右側填充區域開始的滑動視窗將被忽略。

（2）引數

kernel_size — 表示做最大池化的視窗大小，可以是單個值，也可以是tuple元組
stride — 卷積步長，即卷積核在影像上每次平移的間隔。預設：kernel_size
padding — 影像上下左右四邊填充為 0 的行數和列數。預設：0
ceil_mode — 使用向上取整（ceil）或向下取整（floor）的方式計算得到輸出形狀。預設：False（floor，向下取整）
count_include_pad — 是否在平均計算中包含零填充。預設：True
divisor_override — 如果指定，它將用作除數，否則將使用池化區域的大小

（3）用法示例

# pool of square window of size=3, stride=2
m = nn.AvgPool2d(3, stride=2)
# pool of non-square window
m = nn.AvgPool2d((3, 2), stride=(2, 1))
input = torch.randn(20, 16, 50, 32)
output = m(input)
print(f'\ninput shape: {input.shape}', f'output shape: {output.shape}', sep='\n')

2.3 AdaptiveMaxPool2d

（1）原型

torch.nn.AdaptiveMaxPool2d(output_size, return_indices=False)

在由多個輸入平面組成的輸入訊號上應用 2D 自適應最大池化。

對於任何輸入大小，輸出大小為 H_out * W_out，輸出特徵的數量等於輸入平面的數量。

（2）引數

output_size — 目標輸出為形如 H_out * W_out 的影像。可能是一個陣列 (H_out, W_out) 或者方形影像 H_out * H_out的單項 H_out
return_indices — 是否返回輸出的最大索引。預設：False

（3）用法

# target output size of 5x7
m = nn.AdaptiveMaxPool2d((5, 7))
input = torch.randn(1, 64, 8, 9)
output = m(input)
print(f'input shape: {input.shape}', f'output shape: {output.shape}', sep='\n')
# target output size of 7x7 (square)
m = nn.AdaptiveMaxPool2d(7)
input = torch.randn(1, 64, 10, 9)
output = m(input)
print(f'\ninput shape: {input.shape}', f'output shape: {output.shape}', sep='\n')
# target output size of 10x7
m = nn.AdaptiveMaxPool2d((None, 7))
input = torch.randn(1, 64, 10, 9)
output = m(input)
print(f'\ninput shape: {input.shape}', f'output shape: {output.shape}', sep='\n')

結果：

2.4 AdaptiveAvgPool2d

（1）原型

torch.nn.AdaptiveAvgPool2d(output_size)

在由多個輸入平面組成的輸入訊號上應用 2D 自適應平均池化。

對於任何輸入大小，輸出大小為 H x W。輸出特徵的數量等於輸入平面的數量。

（2）引數

output_size – 目標輸出為形如 H * W 的影像。可能是一個陣列 (H, W) 或者方形影像 H * H的單項 H

（3）用法

# target output size of 5x7
m = nn.AdaptiveAvgPool2d((5, 7))
input = torch.randn(1, 64, 8, 9)
output = m(input)
print(f'input shape: {input.shape}', f'output shape: {output.shape}', sep='\n')
# target output size of 7x7 (square)
m = nn.AdaptiveAvgPool2d(7)
input = torch.randn(1, 64, 10, 9)
output = m(input)
print(f'\ninput shape: {input.shape}', f'output shape: {output.shape}', sep='\n')
# target output size of 10x7
m = nn.AdaptiveAvgPool2d((None, 7))
input = torch.randn(1, 64, 10, 9)
output = m(input)
print(f'\ninput shape: {input.shape}', f'output shape: {output.shape}', sep='\n')

3. Non-linear Activations (weighted sum, nonlinearity)

3.1 nn.Sigmoid

（1）原型

torch.nn.Sigmoid()

（2）Sigmoid函式表示式及影像

逐元素執行：

（3）用法示例

m = nn.Sigmoid()
input = torch.randn(2)
print(input)
output = m(input)
print(output)

結果：

3.2 nn.Tanh

（1）原型

torch.nn.Tanh()

（2）Tanh函式表示式及影像

逐元素執行：

（3）用法示例

m = nn.Tanh()
input = torch.randn(2)
print(input)
output = m(input)
print(output)

結果：

3.3 nn.ReLU

（1）原型

torch.nn.ReLU(inplace=False)

（2）ReLU函式表示式及影像

逐元素執行：

（3）引數

inplace — 選擇是否就地執行操作，即是否對Input本身執行該操作。若為True，則對Input執行ReLU的同時也會改變（或重新整理）Input的值，使得Input=Output；若為False，則不會改變Input的值。預設：False

（4）用法示例

m = nn.ReLU()
input = torch.randn(2)
output = m(input)

# An implementation of CReLU - https://arxiv.org/abs/1603.05201
m = nn.ReLU()
input = torch.randn(2).unsqueeze(0)
output = torch.cat((m(input),m(-input)))

結果：

3.4 nn.LeakyReLU

（1）定義

torch.nn.LeakyReLU(negative_slope=0.01, inplace=False)

（2）LeakyReLU函式表示式和影像

逐元素執行：

（3）引數

negative_slope — Input的負數部分的斜率。預設：1e-2
inplace — 選擇是否就地執行操作，即是否對Input本身執行該操作。若為True，則對Input執行ReLU的同時也會改變（或重新整理）Input的值，使得Input=Output；若為False，則不會改變Input的值。預設：False

（4）用法示例

m = nn.LeakyReLU(0.1)
input = torch.randn(2)
print(input)
output = m(input)
print(output)

結果：

3.5 nn.ReLU6

（1）原型

torch.nn.ReLU6(inplace=False)

（2）ReLU6函式表示式及影像

逐元素執行：

（3）引數

inplace — 選擇是否就地執行操作，即是否對Input本身執行該操作。若為True，則對Input執行ReLU的同時也會改變（或重新整理）Input的值，使得Input=Output；若為False，則不會改變Input的值。預設：False

（4）用法示例

m = nn.ReLU6()
input = torch.randn(2)
output = m(input)
print(f'input shape: {input}', f'output shape: {output}', sep='\n')

結果：

3.6 nn.GeLU

（1）原型

torch.nn.GELU

（2）GELU函式表達式及影像

逐元素執行：

是高斯分佈的累積分佈函式（概率密度函式的積分），表示如下：

（3）用法示例

# GeLU
m = nn.GELU()
input = torch.randn(2)
output = m(input)
print('input: ', input, 'output: ', output, sep='\n')

結果：

3.7 nn.SeLU

（1）原型

torch.nn.SELU(inplace=False)

注：當使用 kaiming_normal 或 kaiming_normal_ 進行初始化時，應使用 nonlinearity='linear' 而不是 nonlinearity='selu'，以獲得自歸一化神經網路。

更多細節詳見 Self-Normalizing Neural Networks 一文。

（2）SELU函式表示式及影像

逐元素執行：

式中：α=1.6732632423543772848170429916717，scale=1.0507009873554804934193349852946

（3）引數

inplace（bool, 可選擇）— 選擇是否就地執行操作，即是否對Input本身執行該操作。若為True，則對Input執行ReLU的同時也會改變（或重新整理）Input的值，使得Input=Output；若為False，則不會改變Input的值。預設：False

（4）用法示例

# SeLU
m = nn.SELU()
input = torch.randn(2)
output = m(input)
print('input: ', input, 'output: ', output, sep='\n')

結果：

4. Non-linear Activations (other)

4.1 nn.Softmax

（1）原型

torch.nn.Softmax(dim=None)

將 Softmax 函式應用於 n 維的輸入張量，改變他們的大小，使得 n 維輸出張量的元素位於 [0, 1] 範圍內，並且總和為0。

當輸入張量是稀疏張量時，未指定的值將被視為 -inf。

需要注意的是，該模組不直接與 NLLLoss 一起使用，它期望在 Softmax 和自身之間計算 Log。改用 LogSoftmax（它更快並且具有更好的數值屬性）

（2）Softmax函式表示式

（3）引數

dim (int) — 計算 Softmax 的維度（因此沿 dim 的每個切片總和為 1）。

（4）用法示例

# Softmax
m = nn.Softmax(dim=1)
input = torch.randn(2, 3)
output = m(input)
print('input: ', input, 'output: ', output, sep='\n')

結果：

5 . Normalization Layers

5.1 nn.BatchNorm2d

（1）原型

torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, 
                     track_running_stats=True, device=None, dtype=None)

對4D輸入應用批量歸一化（具有附加通道尺寸的小批量的2D輸入）。詳述可參考論文 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .

（2）引數

num_features — 指特徵數。一般情況下輸入的資料格式為（batch_size，num_features，height，width）其中的C為特徵數，也稱channel數
eps — 為了分數值穩定而新增到分母的值。預設：1e-5
momentum — 用於執行過程中均值和方差的估計引數。可以將累積移動平均線（即簡單平均線）設定為 None 。預設：0.1
affine — 此模組是否具有可學習的仿射引數。預設：True
track_running_stats — 一個布林值，當設定為True時，此模組跟蹤執行平均值和方差；設定為False時，此模組不跟蹤此類統計資訊，並將統計資訊緩衝區running_mean和running_var初始化為None。當這些緩衝區為None時，此模組將始終使用批處理統計資訊。在訓練和評估模式下都可以。預設：True

（3）用法示例

# With Learnable Parameters
m = nn.BatchNorm2d(100)
# Without Learnable Parameters
m = nn.BatchNorm2d(100, affine=False)
input = torch.randn(20, 100, 35, 45)
output = m(input)

6. Linear Layers

6.1 nn.Linear

（1）原型

torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)

對輸入資料進行線性變換：

（2）引數

in_features — 每個輸入樣本的尺寸
out_features — 每個輸出樣本的尺寸
bias — 該層是否會學習一個額外的偏置項。預設：True
device — 代表將分配到裝置的物件（'cpu' 或 'cuda'，cuda需設定裝置編號，比如 'cuda:0'、'cuda:1' 等）

dtype — 代表資料型別

（3）屬性

~Linear.weight (torch.Tensor) — 形狀為（out_features, in_features）的模型的可學習的偏置項

初始化為：

~Linear.bias — 形狀為out_features的模型的可學習的偏置項

如果bias為True，初始化為：

（4）用法示例

# Linear
m = nn.Linear(20, 30)
input = torch.randn(128, 20)
output = m(input)
print(f'output size: {output.size()}', f'weight size: {m.weight.size()}', f'bias size: {m.bias.size()}', sep='\n')

結果：

7. Dropout Layers

7.1 nn.Dropout

（1）原型

torch.nn.Dropout(p=0.5, inplace=False)

在訓練期間，使用來自伯努利分佈的樣本以概率 p 將輸入張量的一些元素隨機歸零。每個通道將在每次前向呼叫時獨立歸零。

Dropout是一種用於正則化和防止神經元間互適應的有效技術，詳細介紹可參考 Improving neural networks by preventing co-adaptation of feature detectors 一文。此外，輸出在訓練期間按 1/(1-p) 倍縮放。這意味著在評估期間，模組只計算一個恆等函式。

輸入可以是任意形狀的張量，輸出形狀與輸入保持一致。

（2）引數

p — 元素歸零的概率。預設：0.5
inplace — 選擇是否就地執行操作，即是否對Input本身執行該操作。若為True，則對Input執行ReLU的同時也會改變（或重新整理）Input的值，使得Input=Output；若為False，則不會改變Input的值。預設：False

（3）用法示例

# Dropout
m = nn.Dropout(p=0.2)
input = torch.randn(4, 6)
output = m(input)
print('input:', input, 'output:', output, sep='\n')

結果：

問：除了以 p=0.2 的概率隨機歸零的一些元素以外，其他元素的值為什麼也變化了？

答：所有的元素都會在訓練期間按 1/(1-p) 倍縮放。

7.2 nn.Dropout2d

（1）原型

torch.nn.Dropout2d(p=0.5, inplace=False)

不同於前一節的Dropout，Dropout2d是將整個通道隨機歸零（一個通道通常代表一個2D特徵圖，比如：在批量輸入中，第 i 個樣本的第 j 個通道是一個2D的張量 input[i, j]），使用來自伯努利分佈的樣本，每個通道將在每次前向呼叫中使用來自伯努利分佈的樣本以概率 p 獨立清零。

如論文 Efficient Object Localization Using Convolutional Networks 中所述，如果特徵圖中的相鄰畫素是強相關的（通常在早期卷積層中就是這種情況），那麼 dropout 不會規範啟用，否則只會導致有效的學習率降低。在這種情況下，nn.Dropout2d() 將有助於促進特徵圖之間的獨立性，應改為使用。

輸入可以是 (N, C, H, W) 或 (C, H, W) ，輸出和輸入形狀一致，也是(N, C, H, W) 或 (C, H, W) 。

（2）引數

p — 元素歸零的概率。預設：0.5
inplace — 選擇是否就地執行操作，即是否對Input本身執行該操作。若為True，則對Input執行ReLU的同時也會改變（或重新整理）Input的值，使得Input=Output；若為False，則不會改變Input的值。預設：False

（3）用法示例

# Dropout
m = nn.Dropout(p=0.2)
input = torch.randn(4, 6)
output = m(input)
print('input:', input, 'output:', output, sep='\n')

結果：

8. Loss Functions

8.1 nn.L1Loss

（1）原型

torch.nn.L1Loss(size_average=None, reduce=None, reduction='mean')

計算輸入 x 和目標 y 之間的平均絕對誤差（MAE）。x可以是任意維度的張量；y形狀和x保持一致。

（3）引數

size_average (bool, 可選引數) — 已棄用（參考reduction引數）。預設情況下，損失是批次中每個損失元素的平均值。請注意，對於某些損失，每個樣本有多個元素。如果欄位 size_average 設定為 False，則將每個 minibatch 的損失相加。當 reduce 為 False 時忽略。預設：True
reduce (bool, 可選引數) — 已棄用（參考reduction引數）。預設情況下，損失會根據 size_average 對每個小批量的觀測值進行平均或求和。當 reduce 為 False 時，返回每個批次元素的損失並忽略 size_average。預設：True
reduction (string, 可選引數) — 指定要用於輸出的reduction取值：none，mean，sum。none：不使用reduction；mean：輸出的總和將除以輸出中的元素數；sum：輸出將被求和。注意：size_average 和 reduce 正在被棄用，同時，指定這兩個引數中的任何一個都將覆蓋 reduction。預設：mean

（4）用法

# L1Loss
loss = nn.L1Loss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()
print(input, target, output, sep='\n')

結果：

8.2 nn.MSELoss（L2Loss）

（1）原型

torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')

計算輸入 x 和目標 y 中的每個元素之間的均方誤差（平方 L2 範數）

（2）公式

（3）引數

size_average (bool, 可選引數) — 已棄用（參考reduction引數）。預設情況下，損失是批次中每個損失元素的平均值。請注意，對於某些損失，每個樣本有多個元素。如果欄位 size_average 設定為 False，則將每個 minibatch 的損失相加。當 reduce 為 False 時忽略。預設：True
reduce (bool, 可選引數) — 已棄用（參考reduction引數）。預設情況下，損失會根據 size_average 對每個小批量的觀測值進行平均或求和。當 reduce 為 False 時，返回每個批次元素的損失並忽略 size_average。預設：True
reduction (string, 可選引數) — 指定要用於輸出的reduction取值：none，mean，sum。none：不使用 reduction；mean：輸出的總和將除以輸出中的元素數；sum：輸出將被求和。注意：size_average 和 reduce 正在被棄用，同時，指定這兩個引數中的任何一個都將覆蓋 reduction。預設：mean

（4）用法

# MSELoss(L2Loss)
loss = nn.MSELoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()
print(input, target, output, sep='\n')

結果：

8.3 nn.CrossEntropyLoss

（1）原型

torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=- 100, reduce=None, reduction='mean', label_smoothing=0.0)

計算輸入 x 和目標 y 之間的交叉熵損失

（2）公式

（3）引數

weight (Tensor, 可選引數) – 手動重新調整每個類別的權重。如果給定，則必須是大小為 C 的張量
size_average (bool, 可選引數) – 已棄用（參考reduction引數）。預設情況下，損失是批次中每個損失元素的平均值。請注意，對於某些損失，每個樣本有多個元素。如果欄位 size_average 設定為 False，則將每個 minibatch 的損失相加。當 reduce 為 False 時忽略。預設：True
ignore_index (int, 可選引數) – 指定一個被忽略且不影響輸入梯度的目標值。當 size_average 為 True 時，損失在非忽略目標上進行平均。請注意，ignore_index 僅適用於目標包含類索引時。
reduce (bool, 可選引數) – 已棄用（參考reduction引數）。預設情況下，損失會根據 size_average 對每個小批量的觀測值進行平均或求和。當 reduce 為 False 時，返回每個批次元素的損失並忽略 size_average。預設：True
reduction (string, 可選引數) – 指定要用於輸出的reduction取值：none，mean，sum。none：不使用reduction；mean：取輸出的加權平均值；sum：輸出將被求和。注意：size_average 和 reduce 正在被棄用，同時，指定這兩個引數中的任何一個都將覆蓋 reduction。預設：mean
label_smoothing (float, 可選引數) – [0.0, 1.0]之間的浮點數。指定計算損失時的平滑量，其中0.0表示不平滑。target變成了原本ground truth和 Rethinking the Inception Architecture for Computer Vision 一文中所述的均勻分佈的組合。預設值：0.0

（4）用法

# Example of target with class indices
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()
print(input, target, output, sep='\n')

print('')

# Example of target with class probabilities
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5).softmax(dim=1)
output = loss(input, target)
output.backward()
print(input, target, output, sep='\n')

結果：

參考資料

1、PyTorch官方文件

2、【Pytorch系列】nn.BatchNorm2d用法詳解

0603-常用的神經網路層
2021-04-25
神經網路
python 系列文章彙總(持續更新…）
2018-11-26
Python
前端面試經典題目彙總（持續更新中）
2019-02-16
前端面試
lambda（持續補充）
2024-03-06
技術資源下載（持續補充更新）
2019-01-07
前端學習資源彙總（持續更新）
2018-12-09
前端
Kotlin學習資料彙總(持續更新...)
2018-03-12
Kotlin
演算法之排序(Java版)-持續更新補充
2018-08-03
演算法排序Java
前端程式設計輔助網站彙總（將持續更新）
2018-04-27
前端程式設計網站
Tensor：Pytorch神經網路界的Numpy
2021-08-09
PyTorch神經網路
2019 Vue 面試題彙總（持續更新中...）
2019-05-01
Vue面試題
(持續更新) 常用 Laravel 開發的 Composer 擴充套件包
2019-07-24
Laravel套件
Android 常用開源庫總結（持續更新）
2019-03-07
Android
pytorch--迴圈神經網路
2020-12-22
PyTorch神經網路
資料分析中常用小函式彙總【持續更新，個人筆記。。。】
2019-02-25
函式筆記
演算法之搜尋(Java版)-持續更新補充
2018-08-06
演算法Java
MXNET：多層神經網路
2018-08-22
神經網路
三、淺層神經網路
2024-09-16
神經網路
LeetCode Animation 題目圖解彙總(持續更新中...)
2018-12-06
LeetCode圖解
讓你瞬間提高工作效率的常用js函式彙總(持續更新)
2019-07-02
JS函式
Pytorch常用程式碼段彙總
2024-10-10
PyTorch
Pytorch_第九篇_神經網路中常用的啟用函式
2020-08-06
PyTorch神經網路函式
PHP的常用函式持續更新
2021-02-09
PHP函式
[JLU]校園網上網攻略彙總與補充
2024-06-09
Unity打包安卓專案問題彙總（持續更新）
2022-01-15
Unity安卓
彙編筆記(持續更新中)
2024-11-23
筆記
神經網路中常用的函式
2024-08-22
神經網路函式
1.1. 電阻篇----硬體設計指南（持續補充更新）
2024-06-15
Git 常用命令總結，將會持續更新
2021-05-06
Git
Python Redis常用操作（持續更新）
2020-05-27
PythonRedis
go 常用包整理（持續更新）
2020-01-03
Go
一些常用的命令（持續更新）
2020-03-26
pytorch分散式訓練注意事項/踩坑總結 - 持續更新
2024-06-18
PyTorch分散式
國產資料庫考試資料彙總（持續更新）
2022-02-17
資料庫
Make Your First GAN With PyTorch 之第一個 PyTorch 神經網路
2020-10-09
PyTorch神經網路
1.3 功率電感選型----硬體設計指南（持續補充更新）
2024-08-06
有用的網站（持續更新）
2020-11-12
網站
神經網路中間層輸出
2024-11-02
神經網路

【PyTorch】常用的神經網路層彙總（持續補充更新）

1. Convolution Layers

1.1 nn.Conv2d

2. Pooling Layers

2.1 nn.MaxPool2d

2.2 nn.AvgPool2d

2.3 AdaptiveMaxPool2d

2.4 AdaptiveAvgPool2d

3. Non-linear Activations (weighted sum, nonlinearity)

3.1 nn.Sigmoid

3.2 nn.Tanh

3.3 nn.ReLU

3.4 nn.LeakyReLU

3.5 nn.ReLU6

3.6 nn.GeLU

3.7 nn.SeLU

4. Non-linear Activations (other)

4.1 nn.Softmax

5 . Normalization Layers

5.1 nn.BatchNorm2d

6. Linear Layers

6.1 nn.Linear

7. Dropout Layers

7.1 nn.Dropout

7.2 nn.Dropout2d

8. Loss Functions

8.1 nn.L1Loss

8.2 nn.MSELoss（L2Loss）

8.3 nn.CrossEntropyLoss

參考資料

相關文章