AlexNet論文解讀

南风丶丶發表於2024-06-12

前言

  作為深度學習的開山之作AlexNet,確實給後來的研究者們很大的啟發,使用神經網路來做具體的任務,如分類任務、迴歸(預測)任務等,儘管AlexNet在今天看來已經有很多神經網路超越了它,但是它依然是重要的。AlexNet的作者Alex Krizhevsky首次在兩塊GTX 580 GPU上做神經網路,並且在2012年ImageNet競賽中取得了冠軍,這是一件非常有意義的事情,為後來深度學習的興起奠定了重要基礎,包括現在的顯示卡公司NVIDIA的市值超越蘋果,都有深度學習的一份功勞。

  下面講解一下AlexNet的網路結構和論文復現。實驗為使用AlexNet網路做貓狗分類任務;實驗經過了模型搭建,訓練,測試以及結果分析。

1.網路結構

  AlexNet的網路一共有8層,前5層是卷積層,剩下3層是全連線層,具體如下所示:

AlexNet論文解讀

  第一層:卷積層1,輸入為 224 × 224 × 3 的影像,卷積核的數量為96,論文中兩片GPU分別計算48個核; 卷積核的大小為 11 × 11 × 3;stride = 4, stride表示的是步長, pad = 0, 表示不擴充邊緣;卷積後的圖形大小為:wide = (224 + 2 * padding - kernel_size) / stride + 1 = 54,height = (224 + 2 * padding - kernel_size) / stride + 1 = 54,dimention = 96,然後進行 (Local Response Normalized), 後面跟著池化pool_size = (3, 3), stride = 2, pad = 0 最終獲得第一層卷積的feature map;

  第二層:卷積層2, 輸入為上一層卷積的feature map, 卷積的個數為256個,論文中的兩個GPU分別有128個卷積核。卷積核的大小為:5 × 5 × 48;pad = 2, stride = 1; 然後做 LRN,最後 max_pooling, pool_size = (3, 3), stride = 2;

  第三層:卷積3, 輸入為第二層的輸出,卷積核個數為384,kernel_size = (3 × 3 × 128),padding = 1,第三層沒有做LRN和Pool;

  第四層:卷積4, 輸入為第三層的輸出,卷積核個數為384,kernel_size = (3 × 3 × 192),padding = 1,和第三層一樣,沒有LRN和Pool;

  第五層:卷積5, 輸入為第四層的輸出,卷積核個數為256,kernel_size = (3 × 3 × 192),padding = 1。然後直接進行max_pooling, pool_size = (3, 3), stride = 2;

  第6,7,8層是全連線層,每一層的神經元的個數為4096,最終輸出softmax為1000,因為上面介紹過,ImageNet這個比賽的分類個數為1000。全連線層中使用了Relu和Dropout。

2.資料集

  資料集為貓狗的圖片,其中貓的圖片12500張,狗的圖片12500張;訓練資料集貓12300張,狗12300張,驗證集貓100張,狗100張,測試集貓100張,狗100張;資料集連結:https://pan.baidu.com/s/11UHodPIHRDwHiRoae_fqtQ 提取碼:d0fa;下圖為訓練集示意圖:

AlexNet論文解讀

3.資料集分類

  將資料集中的貓和狗分別放在train_0和train_1中:

import os
import re
import shutil

origin_path = '/workspace/src/how-to-read-paper/dataset/train'
target_path_0 = '/workspace/src/how-to-read-paper/dataset/train_0/0'
target_path_1 = '/workspace/src/how-to-read-paper/dataset/train_0/1'

os.makedirs(target_path_0, exist_ok=True)
os.makedirs(target_path_1, exist_ok=True)

file_list = os.listdir(origin_path)

for i in range(len(file_list)):
    old_path = os.path.join(origin_path, file_list[i])
    result = re.findall(r'\w+', file_list[i])[0]
    if result == 'cat':
        shutil.move(old_path, target_path_0)
    else:
        shutil.move(old_path, target_path_1)

4.模型搭建

  進行模型搭建和資料匯入:

import torch
import os
from torch import nn
from torch.nn import functional as F
from torch.autograd import Variable
import matplotlib.pyplot as plt
from torchvision.datasets import ImageFolder
import torch.optim as optim
import torch.utils.data
from PIL import Image
import torchvision.transforms as transforms

# 超引數設定
DEVICE = torch.device('cuda'if torch.cuda.is_available() else 'cpu')
EPOCH = 100
BATCH_SIZE = 256

# 卷積層和全連線層、前向傳播
class AlexNet(nn.Module):
    def __init__(self, num_classes=2):
        super(AlexNet, self).__init__()
        # 卷積層
        self.features = nn.Sequential(
            nn.Conv2d(3, 48, kernel_size=11),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(48, 128, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(128, 192, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 192, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        # 全連線層
        self.classifier = nn.Sequential(
            nn.Linear(6*6*128, 2048),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(2048, 2048),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(2048, num_classes),
        )
    # 前向傳播
    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, start_dim=1)
        x = self.classifier(x)
    
        return x
    
# 訓練集、測試集、驗證集的匯入
# 歸一化處理
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

# 訓練集
path_1 = '/workspace/src/how-to-read-paper/dataset/train_0'
trans_1 = transforms.Compose([
    transforms.Resize((65, 65)),
    transforms.ToTensor(),
    normalize,
])

# 資料集
train_set = ImageFolder(root=path_1, transform=trans_1)
# 資料載入器
train_loader = torch.utils.data.DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)

# 測試集
path_2 = '/workspace/src/how-to-read-paper/dataset/test'
trans_2 = transforms.Compose([
    transforms.Resize((65, 65)),
    transforms.ToTensor(),
    normalize,
])
test_data = ImageFolder(root=path_2, transform=trans_2)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)

# 驗證集
path_3 = '/workspace/src/how-to-read-paper/dataset/valid'
trans_3 = transforms.Compose([
    transforms.Resize((65, 65)),
    transforms.ToTensor(),
    normalize,
])
valid_data = ImageFolder(root=path_3, transform=trans_3)
valid_loader = torch.utils.data.DataLoader(valid_data, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)

5.訓練

  進行模型訓練:

# 定義模型
model = AlexNet().to(DEVICE)
# 最佳化器的選擇
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=0.0005)

def train_model(model, device, train_loader, optimizer, epoch):
    train_loss = 0
    model.train()
    for batch_index, (data, label) in enumerate(train_loader):
        data, label = data.to(device), label.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.cross_entropy(output, label)
        loss.backward()
        optimizer.step()
        if batch_index % 300 == 0:
            train_loss = loss.item()
            print('Train Epoch:{}\ttrain loss:{:.6f}'.format(epoch, loss.item()))

    return train_loss

def test_model(model, device, test_loader):
    model.eval()
    correct = 0.0
    test_loss = 0.0

    # 不需要梯度的記錄
    with torch.no_grad():
        for data, label in test_loader:
            data, label = data.to(device), label.to(device)
            output = model(data)
            test_loss += F.cross_entropy(output, label).item()
            pred = output.argmax(dim=1)
            correct += pred.eq(label.view_as(pred)).sum().item()
        test_loss /= len(test_loader.dataset)
        print('Test_average_loss:{:.4f}, Accuracy:{:3f}\n'.format(test_loss, 100*correct/len(test_loader.dataset)))
        acc = 100*correct / len(test_loader.dataset)

    return test_loss, acc

# 開始訓練¶
list = []
Train_Loss_list = []
Valid_Loss_list = []
Valid_Accuracy_list = []

for epoch in range(1, EPOCH+1):
    # 訓練集訓練
    train_loss = train_model(model, DEVICE, train_loader, optimizer, epoch)
    Train_Loss_list.append(train_loss)
    torch.save(model, r'/workspace/src/how-to-read-paper/model/model%s.pth' % epoch)

    # 驗證集進行驗證
    test_loss, acc = test_model(model, DEVICE, valid_loader)
    Valid_Loss_list.append(test_loss)
    Valid_Accuracy_list.append(acc)
    list.append(test_loss)

6.測試

  進行模型測試:

# 驗證集的test_loss

min_num = min(list)
min_index = list.index(min_num)

print('model%s' % (min_index+1))
print('驗證集最高準確率:')
print('{}'.format(Valid_Accuracy_list[min_index]))

# 取最好的進入測試集進行測試
model = torch.load('/workspace/src/how-to-read-paper/model/model%s.pth' % (min_index+1))
model.eval()

accuracy = test_model(model, DEVICE, test_loader)
print('測試集準確率')
print('{}%'.format(accuracy))

7.實驗結果分析

  下圖為epoch為50和100的loss和acc的折線圖,其中使用最優的模型epoch=50時測試集的loss=0.00132, acc=89.0%;其中使用最優的模型epoch=100時測試集的loss=0.00203, acc=91.5%;從實驗結果可以看出epoch=20時模型train已經很好了,那麼想要train一個更好的模型有方法嗎?答案肯定是有的,比如說做一下資料增強、使用正則化項、噪聲注入等,這些大家都可以嘗試一下。

  注:本實驗程式碼地址

AlexNet論文解讀

AlexNet論文解讀

相關文章