殘差神經網路
一些前置知識
神經網路的層數
神經網路的層數或者神經網路的深度指的是隱藏層數+1(輸出層)。如下圖所示,它是一個三層的網路。[通常情況下計算神經網路的層數時不包括池化層]
low/mid/high-level features
在神經網路中low/mid/high-level features指的是在不同層級上提取的特徵,這些特徵在抽象層次上逐漸增加。
-
低階特徵(low level features)。指的是從輸入影像的最初幾層卷積網路中提取的特徵,它們通常捕捉影像中的基本資訊,如邊緣、紋理、顏色等。它們對於描述影像的區域性結構非常重要。
-
中級特徵(mid level features)。中級特徵是在網路的中間層提取的特徵,它們能夠捕捉影像中更加複雜的模式和區域性結構,如角落、曲線、重複的結構、眼睛、嘴巴、輪子等。
-
高階特徵(high level features)。高階特徵是從網路的後幾層提取的特徵,它們是最抽象的特徵能夠捕捉影像的全域性資訊和語義內容,能夠描述影像中的完整物體和場景,如物體的整體形狀、物體的具體類別(貓、狗、車、人等)。
End-to-end
端到端訓練意味著整個模型從輸入到輸出是一個整體,沒有手工設計的特徵提取或中間步驟。
10-crop tesing
指的是:在測試時,對每個圖片隨機取樣出10個不同的圖片出來,然後分別做預測,然後把測試結果做平均。
fully convolutional form(FCN)
全卷積去掉了一般卷積網路最後的全連線層與平均池化層,新增了1*1的卷積和轉置卷積。
-
1*1卷積目的:不改變影像的空間結構資訊,主要作用是進行降維即減少通道數。
-
轉置(Transposed)卷積目的:轉置卷積也叫做反摺積或上採用卷積用於增加特徵圖的空間解析度即高度和寬度,不改變通道數。如果CNN是把影像縮小,那麼轉置卷積的目的則是擴大影像(還原影像)。
error plateaus
誤差穩定區:即錯誤率比較平的時候,言外之意就是錯誤率不再下降的時候。
動量梯度下降法
momentum
是一種加速梯度下降最佳化演算法的方法,有助於提高訓練速度並避免陷入區域性最小值。
傳統梯度下降引數更新公式如下:
\(\theta_{t + 1} = \theta_t - \eta \cdot \nabla J(\theta_t)\)
加入momentum後,引數更新公式如下:
\(v_t+1 = \gamma v_t + \eta \nabla J(\theta_t)\)
\(\theta_{t + 1} = \theta - v_{t + 1}\)
unreferenced functions
無參考的函式,這裡指的是傳統神經網路嘗試直接學習從輸入到輸出的對映函式,而不借助任何額外的參考資訊或結構。而殘差網路學習輸入和輸出之間的區別(\(H(\mathrm{x}) - \mathrm{x}\))
"plain" networks
指的是,在實驗對比中沒有使用跳躍連線的網路。
degradation problem
指的是還沒有發生過擬合時,隨著網路層數的增加,錯誤率也會增加,如下圖所示:
實際上,本不應該發生這種問題,因為理論上存在一個構造解可以讓深層網路達到和淺層網路一樣的效果。
- 構造解:假設存在兩個網路,一個是淺層的,另外一個是在淺層的基礎上繼續增加層數構成深層網路。那麼對於這個深層網路理論上存在一個構造解:能夠使得新增的層輸出就等於輸入不做改變,而其他層的引數從淺層網路中複製過來。
構造解表明:深層網路至少可以達到與淺層網路相同的訓練誤差,因為深層可以透過構造解來退化為淺層網路。
但實際的實驗結果表明:現在的最佳化演算法很難找到與構造解一樣好的解,或者比構造解更好的解。這意味著,儘管理論上深層網路不應該有比淺層網路更高的訓練誤差,但在實際最佳化過程中,實際的最佳化方法無法有效的找到這些好的解。
ResNet結構
這裡為了節省空間,把圖片旋轉了。圖片顯示的是一個34層的殘差網路,其中裡面用虛線連線的表示維度不一致(即F(X)不能直接+X)
實現程式碼
import torch.nn as nn
import torch
class BasicBlock(nn.Module):
expansion = 1
def __init__(self, in_channel, out_channel, stride=1, downsample=None, **kwargs):
super(BasicBlock, self).__init__()
self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
kernel_size=3, stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channel)
self.relu = nn.ReLU()
self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
kernel_size=3, stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channel)
self.downsample = downsample
def forward(self, x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out += identity
out = self.relu(out)
return out
class Bottleneck(nn.Module):
"""
注意:原論文中,在虛線殘差結構的主分支上,第一個1x1卷積層的步距是2,第二個3x3卷積層步距是1。
但在pytorch官方實現過程中是第一個1x1卷積層的步距是1,第二個3x3卷積層步距是2,
這麼做的好處是能夠在top1上提升大概0.5%的準確率。
可參考Resnet v1.5 https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch
"""
expansion = 4
def __init__(self, in_channel, out_channel, stride=1, downsample=None,
groups=1, width_per_group=64):
super(Bottleneck, self).__init__()
width = int(out_channel * (width_per_group / 64.)) * groups
self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=width,
kernel_size=1, stride=1, bias=False) # squeeze channels
self.bn1 = nn.BatchNorm2d(width)
# -----------------------------------------
self.conv2 = nn.Conv2d(in_channels=width, out_channels=width, groups=groups,
kernel_size=3, stride=stride, bias=False, padding=1)
self.bn2 = nn.BatchNorm2d(width)
# -----------------------------------------
self.conv3 = nn.Conv2d(in_channels=width, out_channels=out_channel*self.expansion,
kernel_size=1, stride=1, bias=False) # unsqueeze channels
self.bn3 = nn.BatchNorm2d(out_channel*self.expansion)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
def forward(self, x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
out += identity
out = self.relu(out)
return out
class ResNet(nn.Module):
def __init__(self,
block,
blocks_num,
num_classes=1000,
include_top=True,
groups=1,
width_per_group=64):
super(ResNet, self).__init__()
self.include_top = include_top
self.in_channel = 64
self.groups = groups
self.width_per_group = width_per_group
self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2,
padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(self.in_channel)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, blocks_num[0])
self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
if self.include_top:
self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # output size = (1, 1)
self.fc = nn.Linear(512 * block.expansion, num_classes)
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
def _make_layer(self, block, channel, block_num, stride=1):
downsample = None
if stride != 1 or self.in_channel != channel * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(channel * block.expansion))
layers = []
layers.append(block(self.in_channel,
channel,
downsample=downsample,
stride=stride,
groups=self.groups,
width_per_group=self.width_per_group))
self.in_channel = channel * block.expansion
for _ in range(1, block_num):
layers.append(block(self.in_channel,
channel,
groups=self.groups,
width_per_group=self.width_per_group))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
if self.include_top:
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
def resnet34(num_classes=1000, include_top=True):
# https://download.pytorch.org/models/resnet34-333f7ec4.pth
return ResNet(BasicBlock, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)
def resnet50(num_classes=1000, include_top=True):
# https://download.pytorch.org/models/resnet50-19c8e357.pth
return ResNet(Bottleneck, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)
def resnet101(num_classes=1000, include_top=True):
# https://download.pytorch.org/models/resnet101-5d3b4d8f.pth
return ResNet(Bottleneck, [3, 4, 23, 3], num_classes=num_classes, include_top=include_top)
def resnext50_32x4d(num_classes=1000, include_top=True):
# https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth
groups = 32
width_per_group = 4
return ResNet(Bottleneck, [3, 4, 6, 3],
num_classes=num_classes,
include_top=include_top,
groups=groups,
width_per_group=width_per_group)
def resnext101_32x8d(num_classes=1000, include_top=True):
# https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth
groups = 32
width_per_group = 8
return ResNet(Bottleneck, [3, 4, 23, 3],
num_classes=num_classes,
include_top=include_top,
groups=groups,
width_per_group=width_per_group)