CutMix&Mixup詳解與程式碼實戰

華為雲開發者聯盟發表於2023-04-27
摘要:本文將透過實踐案例帶大家掌握CutMix&Mixup。

本文分享自華為雲社群《CutMix&Mixup詳解與程式碼實戰》,作者:李長安。

引言

最近在回顧之前學到的知識,看到了資料增強部分,對於CutMix以及Mixup這兩種資料增強方式發現理解不是很到位,所以這裡寫了一個專案再去好好看這兩種資料增強方式。最開始在目標檢測中,未對資料的標籤部分進行思考,對於影像的處理,大家是可以很好理解的,因為非常直觀,但是透過閱讀相關論文,檢視一些相關的資料發現一些新的有趣的東西。接下來為大家講解一下這兩種資料增強方式。下圖從左至右分別為原圖、mixup、cutout、cutmix。

CutMix&Mixup詳解與程式碼實戰

Mixup離線實現

Mixup相信大家有了很多瞭解,並且大家也能發現網路上有很多大神的解答,所以我這裡就不在進行詳細講解了。

  • Mixup核心思想:兩張圖片採用比例混合,label也需要按照比例混合
CutMix&Mixup詳解與程式碼實戰
  • 論文關鍵點
  1. 考慮過三個或者三個以上的標籤做混合,但是效果幾乎和兩個一樣,而且增加了mixup過程的時間。
  2. 當前的mixup使用了一個單一的loader獲取minibatch,對其隨機打亂後,mixup對同一個minibatch內的資料做混合。這樣的策略和在整個資料集隨機打亂效果是一樣的,而且還減少了IO的開銷。
  3. 在同種標籤的資料中使用mixup不會造成結果的顯著增強

下面的Cell為Mixup的影像效果展示,具體實現請參考下面的線上實現。

%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as Image
import numpy as np
im1 = Image.imread("work/data/10img11.jpg")
im1 = im1/255.
im2 = Image.imread("work/data/14img01.jpg")
im2 = im2/255.
for i in range(1,10):
    lam= i*0.1
 im_mixup = (im1*lam+im2*(1-lam))
 plt.subplot(3,3,i)
 plt.imshow(im_mixup)
plt.show()

CutMix離線實現

簡單來說cutmix相當於cutout+mixup的結合,可以應用於各種任務中。

mixup相當於是全圖融合,cutout僅僅對圖片進行增強,不改變label,而cutmix則是採用了cutout的區域性融合思想,並且採用了mixup的混合label策略,看起來比較make sense。

  • cutmix和mixup的區別是: 其混合位置是採用hard 0-1掩碼,而不是soft操作,相當於新合成的兩張圖是來自兩張圖片的hard結合,而不是Mixup的線性組合。但是其label還是和mixup一樣是線性組合。
CutMix&Mixup詳解與程式碼實戰CutMix&Mixup詳解與程式碼實戰

下面的程式碼為了消除隨機性,對cut的位置進行了固定,主要是為了展示效果。程式碼更改位置如下所示,註釋的部分為大家通用的實現。

  # bbx1 = np.clip(cx - cut_w // 2, 0, W)
    # bby1 = np.clip(cy - cut_h // 2, 0, H)
    # bbx2 = np.clip(cx + cut_w // 2, 0, W)
    # bby2 = np.clip(cy + cut_h // 2, 0, H)
    bbx1 = 10
    bby1 = 600
    bbx2 = 10
    bby2 = 600
%matplotlib inline
import glob
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [10,10]
import cv2
# Path to data
data_folder = f"/home/aistudio/work/data/"
# Read filenames in the data folder
filenames = glob.glob(f"{data_folder}*.jpg")
# Read first 10 filenames
image_paths = filenames[:4]
image_batch = []
image_batch_labels = []
n_images = 4
print(image_paths)
for i in range(4):
    image = cv2.cvtColor(cv2.imread(image_paths[i]), cv2.COLOR_BGR2RGB)
 image_batch.append(image)
image_batch_labels=np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]])
def rand_bbox(size, lamb):
    W = size[0]
    H = size[1]
 cut_rat = np.sqrt(1. - lamb)
 cut_w = np.int(W * cut_rat)
 cut_h = np.int(H * cut_rat)
 # uniform
    cx = np.random.randint(W)
    cy = np.random.randint(H)
 # bbx1 = np.clip(cx - cut_w // 2, 0, W)
 # bby1 = np.clip(cy - cut_h // 2, 0, H)
 # bbx2 = np.clip(cx + cut_w // 2, 0, W)
 # bby2 = np.clip(cy + cut_h // 2, 0, H)
    bbx1 = 10
    bby1 = 600
    bbx2 = 10
    bby2 = 600
 return bbx1, bby1, bbx2, bby2
image = cv2.cvtColor(cv2.imread(image_paths[0]), cv2.COLOR_BGR2RGB)
# Crop a random bounding box
lamb = 0.3
size = image.shape
print('size',size)
def generate_cutmix_image(image_batch, image_batch_labels, beta):
    c=[1,0,3,2]
 # generate mixed sample
    lam = np.random.beta(beta, beta)
 rand_index = np.random.permutation(len(image_batch))
 print(f'iamhere{rand_index}')
 target_a = image_batch_labels
 target_b = np.array(image_batch_labels)[c]
 print('img.shape',image_batch[0].shape)
    bbx1, bby1, bbx2, bby2 = rand_bbox(image_batch[0].shape, lam)
 print('bbx1',bbx1)
 print('bby1',bby1)
 print('bbx2',bbx2)
 print('bby2',bby2)
 image_batch_updated = image_batch.copy()
 image_batch_updated=np.array(image_batch_updated)
 image_batch=np.array(image_batch)
 image_batch_updated[:, bbx1:bby1, bbx2:bby2, :] = image_batch[[c], bbx1:bby1, bbx2:bby2, :]
 # adjust lambda to exactly match pixel ratio
    lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (image_batch.shape[1] * image_batch.shape[2]))
 print(f'lam is {lam}')
    label = target_a * lam + target_b * (1. - lam)
 return image_batch_updated, label
# Generate CutMix image
input_image = image_batch[0]
image_batch_updated, image_batch_labels_updated = generate_cutmix_image(image_batch, image_batch_labels, 1.0)
# Show original images
print("Original Images")
for i in range(2):
 for j in range(2):
 plt.subplot(2,2,2*i+j+1)
 plt.imshow(image_batch[2*i+j])
plt.show()
# Show CutMix images
print("CutMix Images")
for i in range(2):
 for j in range(2):
 plt.subplot(2,2,2*i+j+1)
 plt.imshow(image_batch_updated[2*i+j])
plt.show()
# Print labels
print('Original labels:')
print(image_batch_labels)
print('Updated labels')
print(image_batch_labels_updated)
['/home/aistudio/work/data/11img01.jpg', '/home/aistudio/work/data/10img11.jpg', '/home/aistudio/work/data/14img01.jpg', '/home/aistudio/work/data/12img11.jpg']
size (2016, 1512, 3)
iamhere[2 1 0 3]
img.shape (2016, 1512, 3)
bbx1 10
bby1 600
bbx2 10
bby2 600
lam is 1.0
Original Images
CutMix&Mixup詳解與程式碼實戰
CutMix Images
CutMix&Mixup詳解與程式碼實戰
Original labels:
[[1 0 0 0]
 [0 1 0 0]
 [0 0 1 0]
 [0 0 0 1]]
Updated labels
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

Mixup&CutMix線上實現

大家需要注意的是,通常我們在實際的使用中都是使用線上的方式進行資料增強,也就是本小節所講的方法,所以大家在實際的使用中可以使用下面的程式碼。mixup實現原理同cutmix相差不多,大家可以根據我下面的的程式碼更改一下即可。

!cd 'data/data97595' && unzip -q nongzuowu.zip
from paddle.io import Dataset
import cv2
import paddle
import random
# 匯入所需要的庫
from sklearn.utils import shuffle
import os
import pandas as pd
import numpy as np
from PIL import Image
import paddle
import paddle.nn as nn
from paddle.io import Dataset
import paddle.vision.transforms as T
import paddle.nn.functional as F
from paddle.metric import Accuracy
import warnings
warnings.filterwarnings("ignore")
# 讀取資料
train_images = pd.read_csv('data/data97595/nongzuowu/train.csv')
# 劃分訓練集和校驗集
all_size = len(train_images)
# print(all_size)
train_size = int(all_size * 0.8)
train_df = train_images[:train_size]
val_df = train_images[train_size:]
#  CutMix 的切塊功能
def rand_bbox(size, lam):
 if len(size) == 4:
        W = size[2]
        H = size[3]
 elif len(size) == 3:
        W = size[0]
        H = size[1]
 else:
 raise Exception
 cut_rat = np.sqrt(1. - lam)
 cut_w = np.int(W * cut_rat)
 cut_h = np.int(H * cut_rat)
 # uniform
    cx = np.random.randint(W)
    cy = np.random.randint(H)
    bbx1 = np.clip(cx - cut_w // 2, 0, W)
    bby1 = np.clip(cy - cut_h // 2, 0, H)
    bbx2 = np.clip(cx + cut_w // 2, 0, W)
    bby2 = np.clip(cy + cut_h // 2, 0, H)
 return bbx1, bby1, bbx2, bby2
# 定義資料預處理
data_transforms = T.Compose([
 T.Resize(size=(256, 256)),
 T.Transpose(), # HWC -> CHW
 T.Normalize(
        mean=[0, 0, 0], # 歸一化
        std=[255, 255, 255],
 to_rgb=True) 
])
class JSHDataset(Dataset):
 def __init__(self, df, transforms, train=False):
 self.df = df
 self.transfoms = transforms
 self.train = train
 def __getitem__(self, idx):
        row = self.df.iloc[idx]
 fn = row.image
 # 讀取圖片資料
        image = cv2.imread(os.path.join('data/data97595/nongzuowu/train', fn))
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        image = cv2.resize(image, (256, 256), interpolation=cv2.INTER_LINEAR)
 # 讀取 mask 資料
 # masks = cv2.imread(os.path.join(row['mask_path'], fn), cv2.IMREAD_GRAYSCALE)/255
 # masks = cv2.resize(masks, (1024, 1024), interpolation=cv2.INTER_LINEAR)
 # 讀取 label
        label = paddle.zeros([4])
        label[row.label] = 1
 # ------------------------------  CutMix  ------------------------------------------
        prob = 20 # 將 prob 設定為 0 即可關閉 CutMix
 if random.randint(0, 99) < prob and self.train:
 rand_index = random.randint(0, len(self.df) - 1)
 rand_row = self.df.iloc[rand_index]
 rand_fn = rand_row.image
 rand_image = cv2.imread(os.path.join('data/data97595/nongzuowu/train', rand_fn))
 rand_image = cv2.cvtColor(rand_image, cv2.COLOR_BGR2RGB)
 rand_image = cv2.resize(rand_image, (256, 256), interpolation=cv2.INTER_LINEAR)
 # rand_masks = cv2.imread(os.path.join(rand_row['mask_path'], rand_fn), cv2.IMREAD_GRAYSCALE)/255
 # rand_masks = cv2.resize(rand_masks, (1024, 1024), interpolation=cv2.INTER_LINEAR)
            lam = np.random.beta(1,1)
            bbx1, bby1, bbx2, bby2 = rand_bbox(image.shape, lam)
 image[bbx1:bbx2, bby1:bby2, :] = rand_image[bbx1:bbx2, bby1:bby2, :]
 # masks[bbx1:bbx2, bby1:bby2] = rand_masks[bbx1:bbx2, bby1:bby2]
            lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (image.shape[1] * image.shape[0]))
 rand_label = paddle.zeros([4])
 rand_label[rand_row.label] = 1
            label = label * lam + rand_label * (1. - lam)
 # ---------------------------------  CutMix  ---------------------------------------
 # 應用之前我們定義的各種資料增廣
 # augmented = self.transforms(image=image, mask=masks)
 # img, mask = augmented['image'], augmented['mask']
 img = image
 return self.transfoms(img), label
 def __len__(self):
 return len(self.df)
train_dataset = JSHDataset(train_df, data_transforms, train=True)
val_dataset = JSHDataset(val_df, data_transforms)
#train_loader
train_loader = paddle.io.DataLoader(train_dataset, places=paddle.CPUPlace(), batch_size=8, shuffle=True, num_workers=0)
#val_loader
val_loader = paddle.io.DataLoader(val_dataset, places=paddle.CPUPlace(), batch_size=8, shuffle=True, num_workers=0)
for batch_id, data in enumerate(train_loader()):
 x_data = data[0]
 y_data = data[1]
 print(x_data.dtype)
 print(y_data)
 break
paddle.float32
Tensor(shape=[8, 4], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
 [[0. , 0. , 1. , 0. ],
 [0.54284668, 0.45715332, 0. , 0. ],
 [0. , 1. , 0. , 0. ],
 [0. , 0. , 1. , 0. ],
 [0.32958984, 0. , 0.67041016, 0. ],
 [0. , 0. , 0. , 1. ],
 [0. , 0. , 0. , 1. ],
 [0. , 0. , 0. , 1. ]])
from paddle.vision.models import resnet18
model = resnet18(num_classes=4)
# 模型封裝
model = paddle.Model(model)
# 定義最佳化器
optim = paddle.optimizer.Adam(learning_rate=3e-4, parameters=model.parameters())
# 配置模型
model.prepare(
 optim,
 paddle.nn.CrossEntropyLoss(soft_label=True),
 Accuracy()
 )
# 模型訓練與評估
model.fit(train_loader,
 val_loader,
 log_freq=1,
        epochs=2,
        verbose=1,
 )
The loss value printed in the log is the current step, and the metric is the average value of previous steps.
Epoch 1/2
step 56/56 [==============================] - loss: 1.2033 - acc: 0.5843 - 96ms/step        
Eval begin...
step 14/14 [==============================] - loss: 1.6905 - acc: 0.5625 - 73ms/step         
Eval samples: 112
Epoch 2/2
step 56/56 [==============================] - loss: 0.5297 - acc: 0.7708 - 82ms/step        
Eval begin...
step 14/14 [==============================] - loss: 0.5764 - acc: 0.7857 - 67ms/step        
Eval samples: 112

總結

在CutMix中,用另一幅影像的一部分以及第二幅影像的ground truth標記替換該切塊。在影像生成過程中設定每個影像的比例(例如0.4/0.6)。在下面的圖片中,你可以看到CutMix的作者是如何演示這種技術比簡單的MixUp和Cutout效果更好。

ps:神經網路熱力圖生成可以參考我另一個專案。

CutMix&Mixup詳解與程式碼實戰

這兩種資料增強方式能夠很好地代表了目前資料增強的一些方法,比如cutout、mosaic等方法,掌握了這兩種方法,大家也就理解了另外的cutout以及mosaic增強方法。

 

點選關注,第一時間瞭解華為雲新鮮技術~

相關文章