學習筆記19：影像定位

有何m不可發表於2024-06-04

原文網址 : https://www.cnblogs.com/gongzb/p/18230188

筆記

轉自：https://www.cnblogs.com/miraclepbc/p/14385623.html

影像定位的直觀理解

不僅需要我們知道圖片中的物件是什麼，還要在物件的附近畫一個邊框，確定該物件所處的位置。

也就是最終輸出的是一個四元組，表示邊框的位置
學習筆記19：影像定位

影像定位網路架構

可以將影像定位任務看作是一個迴歸問題！
學習筆記19：影像定位

資料集介紹

採用Oxford-IIIT資料集

The Oxford-IIIT Pet Dataset是一個寵物影像資料集，包含37種寵物，每種寵物200張左右寵物圖片，該資料集同時包含寵物分類、頭部輪廓標註和語義分割資訊。

標頭檔案

import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from torchvision import datasets, transforms, models
from torch.utils import data
import os
import shutil
from lxml import etree
from matplotlib.patches import Rectangle
import glob
from PIL import Image
%matplotlib inline

這裡介紹幾個之前沒用過的包：

lxml的etree是一個解析HTML文字的工具
Rectangle可以在圖中畫出矩形

資料預處理

獲取圖片及標籤地址

images = glob.glob(r'E:\Oxford-IIIT Pets Dataset\dataset\images\*.jpg')
anno = glob.glob(r'E:\Oxford-IIIT Pets Dataset\dataset\annotations\xmls\*.xml')

這裡發現len(images)大於len(anno)，因此需要獲得有對應xml檔案的影像地址

篩選影像地址

這裡的思路是：先搞出有xml檔案的檔名列表xml_name，然後遍歷images，找到檔名在xml_name中的地址

xml_name = [x.split('\\')[-1].split('.')[0] for x in anno]
imgs = [x for x in images if x.split('\\')[-1].split('.')[0] in xml_name]

獲取每張影像的邊框值

邊框值記錄在這裡：
學習筆記19：影像定位

因此，我們就可以透過解析xml檔案，按照路徑找到對應的四個值+長和寬

def to_labels(path):
    xml = open(r'{}'.format(path)).read() # 開啟xml檔案，注意地址轉義的寫法
    selection = etree.HTML(xml) # 用etree解析xml檔案
    width = int(selection.xpath('//size/width/text()')[0]) # 獲取資料的方式也值得學習
    height = int(selection.xpath('//size/height/text()')[0])
    xmin = int(selection.xpath('//bndbox/xmin/text()')[0])
    xmax = int(selection.xpath('//bndbox/xmax/text()')[0])
    ymin = int(selection.xpath('//bndbox/ymin/text()')[0])
    ymax = int(selection.xpath('//bndbox/ymax/text()')[0])
    return [xmin / width, ymin / height, xmax / width, ymax / height] # 因為要進行過會兒要進行裁剪，因此我希望獲得的是一個比例

labels = [to_labels(path) for path in anno]

劃分訓練集和測試集

資料集定義

class OxfordDataset(data.Dataset):
    def __init__(self, img_paths, labels, transform):
        self.imgs = img_paths
        self.labels = labels
        self.transforms = transform
    def __getitem__(self, index):
        img = self.imgs[index]
        l1, l2, l3, l4 = self.labels[index]
        pil_img = Image.open(img)
        pil_img = pil_img.convert('RGB')
        data = self.transforms(pil_img)
        return data, l1, l2, l3, l4
    def __len__(self):
        return len(self.imgs)

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor()
])

資料集切分

index = np.random.permutation(len(imgs))
all_imgs_path = np.array(imgs)[index]
all_labels = np.array(labels)[index].astype(np.float32)
s = int(len(all_imgs_path) * 0.8)

train_ds = OxfordDataset(all_imgs_path[:s], all_labels[:s], transform)
test_ds = OxfordDataset(all_imgs_path[s:], all_labels[s:], transform)
train_dl = data.DataLoader(train_ds, batch_size = 8, shuffle = True)
test_dl = data.DataLoader(test_ds, batch_size = 8)

將一個批次的資料繪圖

img_batch, out1_b, out2_b, out3_b, out4_b = next(iter(train_dl))

plt.figure(figsize = (12, 8))
for i, (img, l1, l2, l3, l4) in enumerate(zip(img_batch[:3], out1_b[:3], out2_b[:3], out3_b[:3], out4_b[:3])):
    img = img.permute(1, 2, 0).numpy() # 將channel放在最後一維
    plt.subplot(1, 3, i + 1)
    plt.imshow(img)
    xmin, ymin, xmax, ymax = l1 * 224, l2 * 224, l3 * 224, l4 * 224 # 裁剪後的位置，即之前得到的比例乘以影像的長度/寬度
    rect = Rectangle((xmin, ymin), xmax - xmin, ymax - ymin, fill = False, color = 'red') # fill指的是矩形內部需不需要填充
    ax = plt.gca()
    ax.axes.add_patch(rect) # 將元素新增到影像中

學習筆記19：影像定位

定義模型

根據文章一開始給出的網路架構，可以看出組成部分為：卷積基+全連線層

獲取卷積基

resnet = models.resnet101(pretrained = True)
conv_base = nn.Sequential(*list(resnet.children())[: -1]) # list(resnet.children())獲取網路的各層資訊，*表示將列表中的元素解耦

模型定義

模型的組成有1個卷積基+4個全連線層組成，每個全連線層輸出一個值

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv_base = nn.Sequential(*list(resnet.children())[: -1])
        self.fc1 = nn.Linear(in_size, 1)
        self.fc2 = nn.Linear(in_size, 1)
        self.fc3 = nn.Linear(in_size, 1)
        self.fc4 = nn.Linear(in_size, 1)
    def forward(self, x):
        x = self.conv_base(x)
        x = x.view(x.size(0), -1) # 注意，進入全連線層之前要進行扁平化
        x1 = self.fc1(x)
        x2 = self.fc2(x)
        x3 = self.fc3(x)
        x4 = self.fc4(x)
        return x1, x2, x3, x4

訓練模型

model = Net()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)

loss_func = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr = 0.0001)
epochs = 10
exp_lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size = 7, gamma = 0.1)

def fit(epoch, model, trainloader, testloader):
    running_loss = 0
    
    model.train()
    for x, y1, y2, y3, y4 in trainloader:
        x, y1, y2, y3, y4 = x.to(device), y1.to(device), y2.to(device), y3.to(device), y4.to(device)
        y_pred1, y_pred2, y_pred3, y_pred4 = model(x)
        loss1 = loss_func(y_pred1, y1)
        loss2 = loss_func(y_pred2, y2)
        loss3 = loss_func(y_pred3, y3)
        loss4 = loss_func(y_pred4, y4)
        loss = loss1 + loss2 + loss3 + loss4
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        with torch.no_grad():
            running_loss += loss.item()

    exp_lr_scheduler.step()
    
    epoch_loss = running_loss / len(trainloader.dataset)

    test_running_loss = 0
    
    model.eval()
    with torch.no_grad():
        for x, y1, y2, y3, y4 in testloader:
            x, y1, y2, y3, y4 = x.to(device), y1.to(device), y2.to(device), y3.to(device), y4.to(device)
            y_pred1, y_pred2, y_pred3, y_pred4 = model(x)
            loss1 = loss_func(y_pred1, y1)
            loss2 = loss_func(y_pred2, y2)
            loss3 = loss_func(y_pred3, y3)
            loss4 = loss_func(y_pred4, y4)
            loss = loss1 + loss2 + loss3 + loss4
            test_running_loss += loss.item()
    epoch_test_loss = test_running_loss / len(testloader.dataset)
    
    print('epoch: ', epoch, 
          'loss: ', round(epoch_loss, 3),
          'test_loss: ', round(epoch_test_loss, 3))
    
    return epoch_loss, epoch_test_loss

train_loss = []
test_loss = []
for epoch in range(epochs):
    epoch_loss, epoch_test_loss = fit(epoch, model, train_dl, test_dl)
    train_loss.append(epoch_loss)
    test_loss.append(epoch_test_loss)

注意，迴歸問題不用計算準確率

結果

學習筆記19：影像定位

SRE 18、19章學習筆記
2018-08-30
筆記
2020-10-19學習筆記
2020-10-19
筆記
CSS學習筆記——傳統定位篇
2020-11-28
CSS筆記
git rebase（變基）—— Git 學習筆記 19
2018-10-28
Git筆記
OpenCV影像處理學習筆記-Day1
2020-09-28
OpenCV筆記
基於深度學習的醫學影像配準學習筆記2
2020-10-06
深度學習筆記
Ansible 學習筆記 - 定位主機和組的模式
2023-01-15
筆記模式
C語言例項解析精粹學習筆記——19
2018-09-01
C語言筆記
angular 19 學習記錄
2024-12-05
Angular
Redis基礎知識（學習筆記19--Redis Sentinel）
2024-07-18
Redis筆記
SAP Fiori Elements 學習筆記 - 2021年4月19日
2021-04-21
筆記
opencv學習筆記（二）-- 載入、修改和儲存影像
2021-01-05
OpenCV筆記
numpy的學習筆記\pandas學習筆記
2018-03-18
筆記
訓練一個影像分類器demo in PyTorch【學習筆記】
2022-06-30
PyTorch筆記
學習筆記
2024-04-14
筆記
LTE-5G學習筆記19--QCI優化提升方法
2019-02-25
筆記優化
百度影像分割7日打卡訓練營學習筆記
2020-10-24
筆記
影像噪聲學習記錄（1）
2021-09-09
【學習筆記】數學
2024-07-15
筆記
《JAVA學習指南》學習筆記
2019-03-24
Java筆記
機器學習學習筆記
2021-06-01
機器學習筆記
學習筆記-粉筆980
2024-10-14
筆記
學習筆記（3.29）
2019-03-31
筆記
學習筆記（4.1）
2019-04-01
筆記
學習筆記（3.25）
2019-03-25
筆記
學習筆記（3.26）
2019-03-26
筆記
JavaWeb 學習筆記
2018-10-28
JavaWeb筆記
golang 學習筆記
2019-03-26
Golang筆記
Nginx 學習筆記
2018-11-02
Nginx筆記
spring學習筆記
2019-01-05
Spring筆記
gPRC學習筆記
2018-11-25
筆記
GDB學習筆記
2018-12-08
筆記
學習筆記（4.2）
2019-04-03
筆記
學習筆記（4.3）
2019-04-04
筆記
學習筆記（4.4）
2019-04-04
筆記
Servlet學習筆記
2019-01-19
Servlet筆記
學習筆記（3.27）
2019-03-28
筆記
jest 學習筆記
2019-03-12
筆記

學習筆記19：影像定位

影像定位的直觀理解

影像定位網路架構

資料集介紹

標頭檔案

資料預處理

獲取圖片及標籤地址

篩選影像地址

獲取每張影像的邊框值

劃分訓練集和測試集

資料集定義

資料集切分

將一個批次的資料繪圖

定義模型

獲取卷積基

模型定義

訓練模型

結果

相關文章