摘要:本文將透過實踐案例帶大家掌握CutMix&Mixup。
本文分享自華為雲社群《CutMix&Mixup詳解與程式碼實戰》,作者:李長安。
引言
最近在回顧之前學到的知識,看到了資料增強部分,對於CutMix以及Mixup這兩種資料增強方式發現理解不是很到位,所以這裡寫了一個專案再去好好看這兩種資料增強方式。最開始在目標檢測中,未對資料的標籤部分進行思考,對於影像的處理,大家是可以很好理解的,因為非常直觀,但是透過閱讀相關論文,檢視一些相關的資料發現一些新的有趣的東西。接下來為大家講解一下這兩種資料增強方式。下圖從左至右分別為原圖、mixup、cutout、cutmix。
Mixup離線實現
Mixup相信大家有了很多瞭解,並且大家也能發現網路上有很多大神的解答,所以我這裡就不在進行詳細講解了。
- Mixup核心思想:兩張圖片採用比例混合,label也需要按照比例混合
- 論文關鍵點
- 考慮過三個或者三個以上的標籤做混合,但是效果幾乎和兩個一樣,而且增加了mixup過程的時間。
- 當前的mixup使用了一個單一的loader獲取minibatch,對其隨機打亂後,mixup對同一個minibatch內的資料做混合。這樣的策略和在整個資料集隨機打亂效果是一樣的,而且還減少了IO的開銷。
- 在同種標籤的資料中使用mixup不會造成結果的顯著增強
下面的Cell為Mixup的影像效果展示,具體實現請參考下面的線上實現。
%matplotlib inline import matplotlib.pyplot as plt import matplotlib.image as Image import numpy as np im1 = Image.imread("work/data/10img11.jpg") im1 = im1/255. im2 = Image.imread("work/data/14img01.jpg") im2 = im2/255. for i in range(1,10): lam= i*0.1 im_mixup = (im1*lam+im2*(1-lam)) plt.subplot(3,3,i) plt.imshow(im_mixup) plt.show()
CutMix離線實現
簡單來說cutmix相當於cutout+mixup的結合,可以應用於各種任務中。
mixup相當於是全圖融合,cutout僅僅對圖片進行增強,不改變label,而cutmix則是採用了cutout的區域性融合思想,並且採用了mixup的混合label策略,看起來比較make sense。
- cutmix和mixup的區別是: 其混合位置是採用hard 0-1掩碼,而不是soft操作,相當於新合成的兩張圖是來自兩張圖片的hard結合,而不是Mixup的線性組合。但是其label還是和mixup一樣是線性組合。
下面的程式碼為了消除隨機性,對cut的位置進行了固定,主要是為了展示效果。程式碼更改位置如下所示,註釋的部分為大家通用的實現。
# bbx1 = np.clip(cx - cut_w // 2, 0, W) # bby1 = np.clip(cy - cut_h // 2, 0, H) # bbx2 = np.clip(cx + cut_w // 2, 0, W) # bby2 = np.clip(cy + cut_h // 2, 0, H) bbx1 = 10 bby1 = 600 bbx2 = 10 bby2 = 600 %matplotlib inline import glob import numpy as np import matplotlib.pyplot as plt plt.rcParams['figure.figsize'] = [10,10] import cv2 # Path to data data_folder = f"/home/aistudio/work/data/" # Read filenames in the data folder filenames = glob.glob(f"{data_folder}*.jpg") # Read first 10 filenames image_paths = filenames[:4] image_batch = [] image_batch_labels = [] n_images = 4 print(image_paths) for i in range(4): image = cv2.cvtColor(cv2.imread(image_paths[i]), cv2.COLOR_BGR2RGB) image_batch.append(image) image_batch_labels=np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]]) def rand_bbox(size, lamb): W = size[0] H = size[1] cut_rat = np.sqrt(1. - lamb) cut_w = np.int(W * cut_rat) cut_h = np.int(H * cut_rat) # uniform cx = np.random.randint(W) cy = np.random.randint(H) # bbx1 = np.clip(cx - cut_w // 2, 0, W) # bby1 = np.clip(cy - cut_h // 2, 0, H) # bbx2 = np.clip(cx + cut_w // 2, 0, W) # bby2 = np.clip(cy + cut_h // 2, 0, H) bbx1 = 10 bby1 = 600 bbx2 = 10 bby2 = 600 return bbx1, bby1, bbx2, bby2 image = cv2.cvtColor(cv2.imread(image_paths[0]), cv2.COLOR_BGR2RGB) # Crop a random bounding box lamb = 0.3 size = image.shape print('size',size) def generate_cutmix_image(image_batch, image_batch_labels, beta): c=[1,0,3,2] # generate mixed sample lam = np.random.beta(beta, beta) rand_index = np.random.permutation(len(image_batch)) print(f'iamhere{rand_index}') target_a = image_batch_labels target_b = np.array(image_batch_labels)[c] print('img.shape',image_batch[0].shape) bbx1, bby1, bbx2, bby2 = rand_bbox(image_batch[0].shape, lam) print('bbx1',bbx1) print('bby1',bby1) print('bbx2',bbx2) print('bby2',bby2) image_batch_updated = image_batch.copy() image_batch_updated=np.array(image_batch_updated) image_batch=np.array(image_batch) image_batch_updated[:, bbx1:bby1, bbx2:bby2, :] = image_batch[[c], bbx1:bby1, bbx2:bby2, :] # adjust lambda to exactly match pixel ratio lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (image_batch.shape[1] * image_batch.shape[2])) print(f'lam is {lam}') label = target_a * lam + target_b * (1. - lam) return image_batch_updated, label # Generate CutMix image input_image = image_batch[0] image_batch_updated, image_batch_labels_updated = generate_cutmix_image(image_batch, image_batch_labels, 1.0) # Show original images print("Original Images") for i in range(2): for j in range(2): plt.subplot(2,2,2*i+j+1) plt.imshow(image_batch[2*i+j]) plt.show() # Show CutMix images print("CutMix Images") for i in range(2): for j in range(2): plt.subplot(2,2,2*i+j+1) plt.imshow(image_batch_updated[2*i+j]) plt.show() # Print labels print('Original labels:') print(image_batch_labels) print('Updated labels') print(image_batch_labels_updated) ['/home/aistudio/work/data/11img01.jpg', '/home/aistudio/work/data/10img11.jpg', '/home/aistudio/work/data/14img01.jpg', '/home/aistudio/work/data/12img11.jpg'] size (2016, 1512, 3) iamhere[2 1 0 3] img.shape (2016, 1512, 3) bbx1 10 bby1 600 bbx2 10 bby2 600 lam is 1.0 Original Images
CutMix Images
Original labels: [[1 0 0 0] [0 1 0 0] [0 0 1 0] [0 0 0 1]] Updated labels [[1. 0. 0. 0.] [0. 1. 0. 0.] [0. 0. 1. 0.] [0. 0. 0. 1.]]
Mixup&CutMix線上實現
大家需要注意的是,通常我們在實際的使用中都是使用線上的方式進行資料增強,也就是本小節所講的方法,所以大家在實際的使用中可以使用下面的程式碼。mixup實現原理同cutmix相差不多,大家可以根據我下面的的程式碼更改一下即可。
!cd 'data/data97595' && unzip -q nongzuowu.zip from paddle.io import Dataset import cv2 import paddle import random # 匯入所需要的庫 from sklearn.utils import shuffle import os import pandas as pd import numpy as np from PIL import Image import paddle import paddle.nn as nn from paddle.io import Dataset import paddle.vision.transforms as T import paddle.nn.functional as F from paddle.metric import Accuracy import warnings warnings.filterwarnings("ignore") # 讀取資料 train_images = pd.read_csv('data/data97595/nongzuowu/train.csv') # 劃分訓練集和校驗集 all_size = len(train_images) # print(all_size) train_size = int(all_size * 0.8) train_df = train_images[:train_size] val_df = train_images[train_size:] # CutMix 的切塊功能 def rand_bbox(size, lam): if len(size) == 4: W = size[2] H = size[3] elif len(size) == 3: W = size[0] H = size[1] else: raise Exception cut_rat = np.sqrt(1. - lam) cut_w = np.int(W * cut_rat) cut_h = np.int(H * cut_rat) # uniform cx = np.random.randint(W) cy = np.random.randint(H) bbx1 = np.clip(cx - cut_w // 2, 0, W) bby1 = np.clip(cy - cut_h // 2, 0, H) bbx2 = np.clip(cx + cut_w // 2, 0, W) bby2 = np.clip(cy + cut_h // 2, 0, H) return bbx1, bby1, bbx2, bby2 # 定義資料預處理 data_transforms = T.Compose([ T.Resize(size=(256, 256)), T.Transpose(), # HWC -> CHW T.Normalize( mean=[0, 0, 0], # 歸一化 std=[255, 255, 255], to_rgb=True) ]) class JSHDataset(Dataset): def __init__(self, df, transforms, train=False): self.df = df self.transfoms = transforms self.train = train def __getitem__(self, idx): row = self.df.iloc[idx] fn = row.image # 讀取圖片資料 image = cv2.imread(os.path.join('data/data97595/nongzuowu/train', fn)) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image = cv2.resize(image, (256, 256), interpolation=cv2.INTER_LINEAR) # 讀取 mask 資料 # masks = cv2.imread(os.path.join(row['mask_path'], fn), cv2.IMREAD_GRAYSCALE)/255 # masks = cv2.resize(masks, (1024, 1024), interpolation=cv2.INTER_LINEAR) # 讀取 label label = paddle.zeros([4]) label[row.label] = 1 # ------------------------------ CutMix ------------------------------------------ prob = 20 # 將 prob 設定為 0 即可關閉 CutMix if random.randint(0, 99) < prob and self.train: rand_index = random.randint(0, len(self.df) - 1) rand_row = self.df.iloc[rand_index] rand_fn = rand_row.image rand_image = cv2.imread(os.path.join('data/data97595/nongzuowu/train', rand_fn)) rand_image = cv2.cvtColor(rand_image, cv2.COLOR_BGR2RGB) rand_image = cv2.resize(rand_image, (256, 256), interpolation=cv2.INTER_LINEAR) # rand_masks = cv2.imread(os.path.join(rand_row['mask_path'], rand_fn), cv2.IMREAD_GRAYSCALE)/255 # rand_masks = cv2.resize(rand_masks, (1024, 1024), interpolation=cv2.INTER_LINEAR) lam = np.random.beta(1,1) bbx1, bby1, bbx2, bby2 = rand_bbox(image.shape, lam) image[bbx1:bbx2, bby1:bby2, :] = rand_image[bbx1:bbx2, bby1:bby2, :] # masks[bbx1:bbx2, bby1:bby2] = rand_masks[bbx1:bbx2, bby1:bby2] lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (image.shape[1] * image.shape[0])) rand_label = paddle.zeros([4]) rand_label[rand_row.label] = 1 label = label * lam + rand_label * (1. - lam) # --------------------------------- CutMix --------------------------------------- # 應用之前我們定義的各種資料增廣 # augmented = self.transforms(image=image, mask=masks) # img, mask = augmented['image'], augmented['mask'] img = image return self.transfoms(img), label def __len__(self): return len(self.df) train_dataset = JSHDataset(train_df, data_transforms, train=True) val_dataset = JSHDataset(val_df, data_transforms) #train_loader train_loader = paddle.io.DataLoader(train_dataset, places=paddle.CPUPlace(), batch_size=8, shuffle=True, num_workers=0) #val_loader val_loader = paddle.io.DataLoader(val_dataset, places=paddle.CPUPlace(), batch_size=8, shuffle=True, num_workers=0) for batch_id, data in enumerate(train_loader()): x_data = data[0] y_data = data[1] print(x_data.dtype) print(y_data) break paddle.float32 Tensor(shape=[8, 4], dtype=float32, place=CUDAPlace(0), stop_gradient=True, [[0. , 0. , 1. , 0. ], [0.54284668, 0.45715332, 0. , 0. ], [0. , 1. , 0. , 0. ], [0. , 0. , 1. , 0. ], [0.32958984, 0. , 0.67041016, 0. ], [0. , 0. , 0. , 1. ], [0. , 0. , 0. , 1. ], [0. , 0. , 0. , 1. ]]) from paddle.vision.models import resnet18 model = resnet18(num_classes=4) # 模型封裝 model = paddle.Model(model) # 定義最佳化器 optim = paddle.optimizer.Adam(learning_rate=3e-4, parameters=model.parameters()) # 配置模型 model.prepare( optim, paddle.nn.CrossEntropyLoss(soft_label=True), Accuracy() ) # 模型訓練與評估 model.fit(train_loader, val_loader, log_freq=1, epochs=2, verbose=1, ) The loss value printed in the log is the current step, and the metric is the average value of previous steps. Epoch 1/2 step 56/56 [==============================] - loss: 1.2033 - acc: 0.5843 - 96ms/step Eval begin... step 14/14 [==============================] - loss: 1.6905 - acc: 0.5625 - 73ms/step Eval samples: 112 Epoch 2/2 step 56/56 [==============================] - loss: 0.5297 - acc: 0.7708 - 82ms/step Eval begin... step 14/14 [==============================] - loss: 0.5764 - acc: 0.7857 - 67ms/step Eval samples: 112
總結
在CutMix中,用另一幅影像的一部分以及第二幅影像的ground truth標記替換該切塊。在影像生成過程中設定每個影像的比例(例如0.4/0.6)。在下面的圖片中,你可以看到CutMix的作者是如何演示這種技術比簡單的MixUp和Cutout效果更好。
ps:神經網路熱力圖生成可以參考我另一個專案。
這兩種資料增強方式能夠很好地代表了目前資料增強的一些方法,比如cutout、mosaic等方法,掌握了這兩種方法,大家也就理解了另外的cutout以及mosaic增強方法。