深度學習（二）之貓狗分類

段小輝發表於2022-03-20

原文網址 : https://www.cnblogs.com/xiaohuiduan/p/16032352.html

任務目標

構建深度學習模型，對貓狗資料集進行分類（資料集來自kaggle），要求測試集的準確率不能低於\(75 \%\)。在本文中，使用了3個不同的模型進行分類，其測試集結果分別是：

自定義卷積神經網路：\(87.26\%\)。

使用resnet34做特徵提取：\(93.6\%\)。

使用resnet34和VGG16做特徵提取：\(94.88\%\)。

python：3.9.7

torch：1.11.0（使用resnet34和VGG16做特徵提取使用的pytorch 版本是1.9.1）

程式碼Github：https://github.com/xiaohuiduan/deeplearning-study/tree/main/貓狗識別

資料集

資料集來自kaggle的貓狗資料集：Dogs vs. Cats | Kaggle。在資料集中，一共有2個壓縮包，其中一個是訓練集，另一個是測試集。但是針對於測試集，kaggle並沒有相對應label標籤。因此，在本次實驗中，對kaggle訓練集的資料進行劃分，按照\(8:2\)的比例劃分為訓練集和驗證集，最終使用驗證集對模型效能進行測試。

在資料集中，以檔名對圖片的型別進行劃分，我們只需要提取檔名的前3個字元判斷其為“dog”或者“cat”便可以對每張圖片打上相對應的標籤。

參考程式碼如下：

root_dir = "./train"
import os 
from PIL import Image
imgs_name = os.listdir(root_dir)

imgs_path = []
labels_data = []

for name in imgs_name:
    if name[:3] == "dog":
        label = 0
    if name[:3] == "cat":
        label = 1
    img_path = os.path.join(root_dir,name)
    imgs_path.append(img_path)
    labels_data.append(label)

資料集部分圖片如下：

資料增強

為了提高模型的能力，可以使用pytorch自帶的Transforms對圖片進行處理變換。在訓練時，可以對圖片進行一定的剪裁，旋轉，但是在驗證的時候，並不需要進行這些操作。

# 對訓練圖片進行處理變換
my_transforms = transforms.Compose([
    transforms.Resize(75),
    transforms.RandomResizedCrop(64), #隨機裁剪一個area然後再resize
    transforms.RandomHorizontalFlip(), #隨機水平翻轉
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# 對驗證集的圖片進行處理變換
valid_transforms = transforms.Compose([
    transforms.Resize((64,64)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

在以下3個模型中，模型接收的輸入為(3,64,64)規格的圖片。同時在資料增強階段對圖片進行標準化。標準化所使用的std和mean為ImageNet的值。

模型一：自定義網路

模型一是隨便設計的卷積神經網路，Netron生成的模型圖如下所示，網路一共由3個卷積層和2個全連線層構成。

模型對應的簡化圖，如下所示：

程式碼參考如下：

import torch.nn.functional as F
class MyNet(nn.Module):
    def __init__(self):
        super(MyNet,self).__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(3,32,kernel_size=3),
            nn.ReLU(),
            nn.BatchNorm2d(32),
            nn.MaxPool2d(2,2),
            nn.Dropout(0.25)
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(32,64,kernel_size=3),
            nn.ReLU(),
            nn.BatchNorm2d(64),
            nn.MaxPool2d(2,2),
            nn.Dropout(0.25)
        )

        self.conv3 = nn.Sequential(
            nn.Conv2d(64,128,kernel_size=3),
            nn.ReLU(),
            nn.BatchNorm2d(128),
            nn.MaxPool2d(2,2),
            nn.Dropout(0.25)
        )
        
        self.fc = nn.Sequential(
            nn.Linear(128*6*6,256),
            nn.Dropout(0.2),
            nn.Linear(256,2),
        )
    def forward(self,x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = x.view(x.size(0),-1)
        x = self.fc(x)
        return F.log_softmax(x,dim=1)

模型二：使用resnet34做特徵提取

模型二的網路結構的簡化圖如下所示，resnet34使用的是torchvision中自帶的模型，去除最後一層的全連線層，將前面的卷積層用於特徵提取。然後將特徵提取的結果進行Flatten，輸入到全連線層，最終輸出預測結果。

參考程式碼：


# 使用Resnet特徵
resnet = models.resnet34(pretrained=True)
modules = list(resnet.children())[:-2]      # delete the last fc layer.
res_feature = nn.Sequential(*modules).eval() # 訓練時，不改變resnet引數

# 定義網路
class MyNet(nn.Module):
    def __init__(self,resnet_feature):
        super(MyNet,self).__init__()
        self.resnet_feature=resnet_feature
        self.fc = nn.Sequential(
            nn.Linear(512*2*2,256),
            nn.Dropout(0.25),
            nn.Linear(256,2)
        )
    def forward(self,x):
        x = self.resnet_feature(x)
        x = x.view(x.size(0),-1)
        x = self.fc(x)
        return F.log_softmax(x,dim=1)

模型三：resnet34&vgg16做特徵提取

模型三相比較於模型二，使用了兩個網路進行特徵提取，然後將輸出的特徵在channel維進行concat，再將concat後的結果輸入到全連線層，最終得到預測結果。