前言
作為深度學習的開山之作AlexNet,確實給後來的研究者們很大的啟發,使用神經網路來做具體的任務,如分類任務、迴歸(預測)任務等,儘管AlexNet在今天看來已經有很多神經網路超越了它,但是它依然是重要的。AlexNet的作者Alex Krizhevsky首次在兩塊GTX 580 GPU上做神經網路,並且在2012年ImageNet競賽中取得了冠軍,這是一件非常有意義的事情,為後來深度學習的興起奠定了重要基礎,包括現在的顯示卡公司NVIDIA的市值超越蘋果,都有深度學習的一份功勞。
下面講解一下AlexNet的網路結構和論文復現。實驗為使用AlexNet網路做貓狗分類任務;實驗經過了模型搭建,訓練,測試以及結果分析。
1.網路結構
AlexNet的網路一共有8層,前5層是卷積層,剩下3層是全連線層,具體如下所示:
第一層:卷積層1,輸入為 224 × 224 × 3 的影像,卷積核的數量為96,論文中兩片GPU分別計算48個核; 卷積核的大小為 11 × 11 × 3;stride = 4, stride表示的是步長, pad = 0, 表示不擴充邊緣;卷積後的圖形大小為:wide = (224 + 2 * padding - kernel_size) / stride + 1 = 54,height = (224 + 2 * padding - kernel_size) / stride + 1 = 54,dimention = 96,然後進行 (Local Response Normalized), 後面跟著池化pool_size = (3, 3), stride = 2, pad = 0 最終獲得第一層卷積的feature map;
第二層:卷積層2, 輸入為上一層卷積的feature map, 卷積的個數為256個,論文中的兩個GPU分別有128個卷積核。卷積核的大小為:5 × 5 × 48;pad = 2, stride = 1; 然後做 LRN,最後 max_pooling, pool_size = (3, 3), stride = 2;
第三層:卷積3, 輸入為第二層的輸出,卷積核個數為384,kernel_size = (3 × 3 × 128),padding = 1,第三層沒有做LRN和Pool;
第四層:卷積4, 輸入為第三層的輸出,卷積核個數為384,kernel_size = (3 × 3 × 192),padding = 1,和第三層一樣,沒有LRN和Pool;
第五層:卷積5, 輸入為第四層的輸出,卷積核個數為256,kernel_size = (3 × 3 × 192),padding = 1。然後直接進行max_pooling, pool_size = (3, 3), stride = 2;
第6,7,8層是全連線層,每一層的神經元的個數為4096,最終輸出softmax為1000,因為上面介紹過,ImageNet這個比賽的分類個數為1000。全連線層中使用了Relu和Dropout。
2.資料集
資料集為貓狗的圖片,其中貓的圖片12500張,狗的圖片12500張;訓練資料集貓12300張,狗12300張,驗證集貓100張,狗100張,測試集貓100張,狗100張;資料集連結:https://pan.baidu.com/s/11UHodPIHRDwHiRoae_fqtQ 提取碼:d0fa;下圖為訓練集示意圖:
3.資料集分類
將資料集中的貓和狗分別放在train_0和train_1中:
import os
import re
import shutil
origin_path = '/workspace/src/how-to-read-paper/dataset/train'
target_path_0 = '/workspace/src/how-to-read-paper/dataset/train_0/0'
target_path_1 = '/workspace/src/how-to-read-paper/dataset/train_0/1'
os.makedirs(target_path_0, exist_ok=True)
os.makedirs(target_path_1, exist_ok=True)
file_list = os.listdir(origin_path)
for i in range(len(file_list)):
old_path = os.path.join(origin_path, file_list[i])
result = re.findall(r'\w+', file_list[i])[0]
if result == 'cat':
shutil.move(old_path, target_path_0)
else:
shutil.move(old_path, target_path_1)
4.模型搭建
進行模型搭建和資料匯入:
import torch
import os
from torch import nn
from torch.nn import functional as F
from torch.autograd import Variable
import matplotlib.pyplot as plt
from torchvision.datasets import ImageFolder
import torch.optim as optim
import torch.utils.data
from PIL import Image
import torchvision.transforms as transforms
# 超引數設定
DEVICE = torch.device('cuda'if torch.cuda.is_available() else 'cpu')
EPOCH = 100
BATCH_SIZE = 256
# 卷積層和全連線層、前向傳播
class AlexNet(nn.Module):
def __init__(self, num_classes=2):
super(AlexNet, self).__init__()
# 卷積層
self.features = nn.Sequential(
nn.Conv2d(3, 48, kernel_size=11),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(48, 128, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(128, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 128, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
# 全連線層
self.classifier = nn.Sequential(
nn.Linear(6*6*128, 2048),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(2048, 2048),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(2048, num_classes),
)
# 前向傳播
def forward(self, x):
x = self.features(x)
x = torch.flatten(x, start_dim=1)
x = self.classifier(x)
return x
# 訓練集、測試集、驗證集的匯入
# 歸一化處理
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
# 訓練集
path_1 = '/workspace/src/how-to-read-paper/dataset/train_0'
trans_1 = transforms.Compose([
transforms.Resize((65, 65)),
transforms.ToTensor(),
normalize,
])
# 資料集
train_set = ImageFolder(root=path_1, transform=trans_1)
# 資料載入器
train_loader = torch.utils.data.DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
# 測試集
path_2 = '/workspace/src/how-to-read-paper/dataset/test'
trans_2 = transforms.Compose([
transforms.Resize((65, 65)),
transforms.ToTensor(),
normalize,
])
test_data = ImageFolder(root=path_2, transform=trans_2)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
# 驗證集
path_3 = '/workspace/src/how-to-read-paper/dataset/valid'
trans_3 = transforms.Compose([
transforms.Resize((65, 65)),
transforms.ToTensor(),
normalize,
])
valid_data = ImageFolder(root=path_3, transform=trans_3)
valid_loader = torch.utils.data.DataLoader(valid_data, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
5.訓練
進行模型訓練:
# 定義模型
model = AlexNet().to(DEVICE)
# 最佳化器的選擇
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=0.0005)
def train_model(model, device, train_loader, optimizer, epoch):
train_loss = 0
model.train()
for batch_index, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.cross_entropy(output, label)
loss.backward()
optimizer.step()
if batch_index % 300 == 0:
train_loss = loss.item()
print('Train Epoch:{}\ttrain loss:{:.6f}'.format(epoch, loss.item()))
return train_loss
def test_model(model, device, test_loader):
model.eval()
correct = 0.0
test_loss = 0.0
# 不需要梯度的記錄
with torch.no_grad():
for data, label in test_loader:
data, label = data.to(device), label.to(device)
output = model(data)
test_loss += F.cross_entropy(output, label).item()
pred = output.argmax(dim=1)
correct += pred.eq(label.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('Test_average_loss:{:.4f}, Accuracy:{:3f}\n'.format(test_loss, 100*correct/len(test_loader.dataset)))
acc = 100*correct / len(test_loader.dataset)
return test_loss, acc
# 開始訓練¶
list = []
Train_Loss_list = []
Valid_Loss_list = []
Valid_Accuracy_list = []
for epoch in range(1, EPOCH+1):
# 訓練集訓練
train_loss = train_model(model, DEVICE, train_loader, optimizer, epoch)
Train_Loss_list.append(train_loss)
torch.save(model, r'/workspace/src/how-to-read-paper/model/model%s.pth' % epoch)
# 驗證集進行驗證
test_loss, acc = test_model(model, DEVICE, valid_loader)
Valid_Loss_list.append(test_loss)
Valid_Accuracy_list.append(acc)
list.append(test_loss)
6.測試
進行模型測試:
# 驗證集的test_loss
min_num = min(list)
min_index = list.index(min_num)
print('model%s' % (min_index+1))
print('驗證集最高準確率:')
print('{}'.format(Valid_Accuracy_list[min_index]))
# 取最好的進入測試集進行測試
model = torch.load('/workspace/src/how-to-read-paper/model/model%s.pth' % (min_index+1))
model.eval()
accuracy = test_model(model, DEVICE, test_loader)
print('測試集準確率')
print('{}%'.format(accuracy))
7.實驗結果分析
下圖為epoch為50和100的loss和acc的折線圖,其中使用最優的模型epoch=50時測試集的loss=0.00132, acc=89.0%;其中使用最優的模型epoch=100時測試集的loss=0.00203, acc=91.5%;從實驗結果可以看出epoch=20時模型train已經很好了,那麼想要train一個更好的模型有方法嗎?答案肯定是有的,比如說做一下資料增強、使用正則化項、噪聲注入等,這些大家都可以嘗試一下。
注:本實驗程式碼地址