探索 YOLO v3 實現細節 – 第4篇資料和y_true

SpikeKing發表於2019-03-04

原文網址 : https://flycode.co/archives/276983

YOLO，即You Only Look Once的縮寫，是一個基於卷積神經網路（CNN）的物體檢測演算法。而YOLO v3是YOLO的第3個版本（即YOLO、YOLO 9000、YOLO v3），檢測效果，更準更強。

YOLO v3的更多細節，可以參考YOLO的官網。

YOLO是一句美國的俗語，You Only Live Once，你只能活一次，即人生苦短，及時行樂。

本文主要分享，如何實現YOLO v3的演算法細節，Keras框架。這是第4篇，資料和y_true，其中包含隨機生成圖片資料，和設定真值y_true，技巧性非常強?。當然還有第5篇，至第n篇，畢竟，這是一個完整版：）。

本文的GitHub原始碼：github.com/SpikeKing/k…

已更新：

歡迎關注，微信公眾號 深度演算法 （ID: DeepAlgorithm），瞭解更多深度技術！

1. fit_generator

在訓練中，模型呼叫fit_generator方法，按批次建立資料，輸入模型，進行訓練。其中，資料生成器wrapper是data_generator_wrapper，用於驗證資料格式，最終呼叫data_generator，輸入引數是：

annotation_lines：標註資料的行，每行資料包含圖片路徑，和框的位置資訊；
batch_size：批次數，每批生成的資料個數；
input_shape：影像輸入尺寸，如(416, 416)；
anchors：anchor box列表，9個寬高值；
num_classes：類別的數量；

在data_generator_wrapper中，驗證輸入引數是否正確，再呼叫data_generator，這也是wrapper函式的常見用法。

實現：

data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes)

def data_generator_wrapper(annotation_lines, batch_size, input_shape, anchors, num_classes):
    n = len(annotation_lines)  # 標註圖片的行數
    if n == 0 or batch_size <= 0: return None
    return data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes)
    
def data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes):
複製程式碼

2. 資料生成器

在資料生成器data_generator中，資料的總行數是n，迴圈輸出固定批次數batch_size的圖片資料image_data和標註框資料box_data。

在第0次時，將資料洗牌shuffle，呼叫get_random_data解析annotation_lines[i]，生成圖片image和標註框box，新增至各自的列表image_data和box_data中。

索引值遞增i+1，當完成n個一輪之後，重新將i置0，再次呼叫shuffle洗牌資料。

將image_data和box_data都轉換為np陣列，其中：

image_data: (16, 416, 416, 3)
box_data: (16, 20, 5) # 每個圖片最多含有20個框
複製程式碼

接著，將框的資料box_data、輸入圖片尺寸input_shape、anchor box列表anchors和類別數num_classes轉換為真值y_true，其中y_true是3個預測特徵的列表：

[(16, 13, 13, 3, 6), (16, 26, 26, 3, 6), (16, 52, 52, 3, 6)]
複製程式碼

最終輸出：圖片資料image_data、真值y_true、每個圖片的損失值np.zeros(batch_size)。不斷迴圈while True，生成的批次資料，與epoch步數相同，即steps_per_epoch。

實現如下：

def data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes):
    ```data generator for fit_generator```
    n = len(annotation_lines)
    i = 0
    while True:
        image_data = []
        box_data = []
        for b in range(batch_size):
            if i == 0:
                np.random.shuffle(annotation_lines)
            image, box = get_random_data(annotation_lines[i], input_shape, random=True)  # 獲取圖片和框
            image_data.append(image)  # 新增圖片
            box_data.append(box)  # 新增框
            i = (i + 1) % n
        image_data = np.array(image_data)
        box_data = np.array(box_data)
        y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes)  # 真值
        yield [image_data] + y_true, np.zeros(batch_size)
複製程式碼

3. 圖片和標註框

在get_random_data中，分離圖片image和標註框box，輸入：

資料annotation_line：圖片地址和框的位置類別；
圖片尺寸input_shape：如(416, 416)；
資料random：隨機開關；

方法如下：

image, box = get_random_data(annotation_lines[i], input_shape, random=True)

def get_random_data(
        annotation_line, input_shape, random=True,
        max_boxes=20, jitter=.3, hue=.1, sat=1.5,
        val=1.5, proc_img=True):
複製程式碼

第1步，解析annotation_line資料：

將annotation_line按空格分割為line列表；
使用PIL讀取圖片image；
圖片的寬和高，iw和ih；
輸入尺寸的高和寬，h和w；
圖片中的標註框box，box是5維，4個點和1個類別；

實現：

line = annotation_line.split()
image = Image.open(line[0])
iw, ih = image.size
h, w = input_shape
box = np.array([np.array(list(map(int, box.split(`,`)))) for box in line[1:]])
複製程式碼

第2步，如果是非隨機，即if not random：

將圖片等比例轉換為416×416的圖片，其餘用灰色填充，即(128, 128, 128)，同時顏色值轉換為0~1之間，即每個顏色值除以255；
將邊界框box等比例縮小，再加上填充的偏移量dx和dy，因為新的圖片部分用灰色填充，影響box的座標系，box最多有max_boxes個，即20個。

實現：

scale = min(float(w) / float(iw), float(h) / float(ih))
nw = int(iw * scale)
nh = int(ih * scale)
dx = (w - nw) // 2
dy = (h - nh) // 2
image_data = 0
if proc_img:  # 圖片
    image = image.resize((nw, nh), Image.BICUBIC)
    new_image = Image.new(`RGB`, (w, h), (128, 128, 128))
    new_image.paste(image, (dx, dy))
    image_data = np.array(new_image) / 255.

# 標註框
box_data = np.zeros((max_boxes, 5))
if len(box) > 0:
    np.random.shuffle(box)
    if len(box) > max_boxes: box = box[:max_boxes]  # 最多隻取20個
    box[:, [0, 2]] = box[:, [0, 2]] * scale + dx
    box[:, [1, 3]] = box[:, [1, 3]] * scale + dy
    box_data[:len(box)] = box

return image_data, box_data
複製程式碼

第3步，如果是隨機：

通過jitter引數，隨機計算new_ar和scale，生成新的nh和nw，將原始影像隨機轉換為nw和nh尺寸的影像，即非等比例變換影像。

實現：

new_ar = w / h * rand(1 - jitter, 1 + jitter) / rand(1 - jitter, 1 + jitter)
scale = rand(.25, 2.)
if new_ar < 1:
    nh = int(scale * h)
    nw = int(nh * new_ar)
else:
    nw = int(scale * w)
    nh = int(nw / new_ar)
image = image.resize((nw, nh), Image.BICUBIC)
複製程式碼

將變換後的影像，轉換為416×416的影像，其餘部分用灰色值填充。

實現：

dx = int(rand(0, w - nw))
dy = int(rand(0, h - nh))
new_image = Image.new(`RGB`, (w, h), (128, 128, 128))
new_image.paste(image, (dx, dy))
image = new_image
複製程式碼

根據隨機數flip，隨機左右翻轉FLIP_LEFT_RIGHT圖片。

實現：

flip = rand() < .5
if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)
複製程式碼

在HSV座標域中，改變圖片的顏色範圍，hue值相加，sat和vat相乘，先由RGB轉為HSV，再由HSV轉為RGB，新增若干錯誤判斷，避免範圍過大。

實現：

hue = rand(-hue, hue)
sat = rand(1, sat) if rand() < .5 else 1 / rand(1, sat)
val = rand(1, val) if rand() < .5 else 1 / rand(1, val)
x = rgb_to_hsv(np.array(image) / 255.)
x[..., 0] += hue
x[..., 0][x[..., 0] > 1] -= 1
x[..., 0][x[..., 0] < 0] += 1
x[..., 1] *= sat
x[..., 2] *= val
x[x > 1] = 1
x[x < 0] = 0
image_data = hsv_to_rgb(x)  # numpy array, 0 to 1
複製程式碼

將所有的圖片變換，增加至檢測框中，並且包含若干異常處理，避免變換之後的值過大或過小，去除異常的box。

實現：

box_data = np.zeros((max_boxes, 5))
if len(box) > 0:
    np.random.shuffle(box)
    box[:, [0, 2]] = box[:, [0, 2]] * nw / iw + dx
    box[:, [1, 3]] = box[:, [1, 3]] * nh / ih + dy
    if flip: box[:, [0, 2]] = w - box[:, [2, 0]]
    box[:, 0:2][box[:, 0:2] < 0] = 0
    box[:, 2][box[:, 2] > w] = w
    box[:, 3][box[:, 3] > h] = h
    box_w = box[:, 2] - box[:, 0]
    box_h = box[:, 3] - box[:, 1]
    box = box[np.logical_and(box_w > 1, box_h > 1)]  # discard invalid box
    if len(box) > max_boxes: box = box[:max_boxes]
    box_data[:len(box)] = box
複製程式碼

最終，返回影像資料image_data和邊框資料box_data。box的4個值是(xmin, ymin, xmax, ymax)，第5位不變，是標註框的類別，如0~n。

4. 真值y_true

在preprocess_true_boxes中，輸入：

true_boxes：檢測框，批次數16，最大框數20，每個框5個值，4個邊界點和1個類別序號，如(16, 20, 5)；
input_shape：圖片尺寸，如(416, 416)；
anchors：anchor box列表；
num_classes：類別的數量；

如：

def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes):
複製程式碼

檢測類別序號是否小於類別數，避免異常資料，如：

assert (true_boxes[..., 4] < num_classes).all(), `class id must be less than num_classes`
複製程式碼

每一層anchor box的數量num_layers；預設anchor box的掩碼anchor_mask，第1層678，第2層345，第3層012，倒序排列。

實現：

num_layers = len(anchors) // 3  # default setting
anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] if num_layers == 3 else [[3, 4, 5], [1, 2, 3]]
複製程式碼

計算true_boxes：

true_boxes：真值框，左上和右下2個座標值和1個類別，如[184, 299, 191, 310, 0.0]，結構是(16, 20, 5)，16是批次數，20是框的最大數，5是框的5個值；
boxes_xy：xy是box的中心點，結構是(16, 20, 2)；
boxes_wh：wh是box的寬和高，結構也是(16, 20, 2)；
input_shape：輸入尺寸416×416；

true_boxes：第0和1位設定為xy，除以416，歸一化，第2和3位設定為wh，除以416，歸一化，如[0.449, 0.730, 0.016, 0.026, 0.0]。

實現：

true_boxes = np.array(true_boxes, dtype=`float32`)
input_shape = np.array(input_shape, dtype=`int32`)
boxes_xy = (true_boxes[..., 0:2] + true_boxes[..., 2:4]) // 2
boxes_wh = true_boxes[..., 2:4] - true_boxes[..., 0:2]
true_boxes[..., 0:2] = boxes_xy / input_shape[::-1]
true_boxes[..., 2:4] = boxes_wh / input_shape[::-1]
複製程式碼

設定y_true的初始值：

m是批次16；
grid_shape是input_shape等比例降低，即[[13,13], [26,26], [52,52]]；
y_true是全0矩陣（np.zeros）列表，即[(16,13,13,3,6), (16,26,26,3,6), (16,52,52,3,6)]

實現：

m = true_boxes.shape[0]
grid_shapes = [input_shape // {0: 32, 1: 16, 2: 8}[l] for l in range(num_layers)]
y_true = [np.zeros((m, grid_shapes[l][0], grid_shapes[l][1], len(anchor_mask[l]), 5 + num_classes),
                   dtype=`float32`) for l in range(num_layers)]
複製程式碼

設定anchors的值：

將anchors增加1維expand_dims，由(9,2)轉為(1,9,2)；
anchor_maxes，是anchors值除以2；
anchor_mins，是負的anchor_maxes；
valid_mask，將boxes_wh中寬w大於0的位，設為True，即含有box，結構是(16,20)；

valid_mask：

實現：

anchors = np.expand_dims(anchors, 0)
anchor_maxes = anchors / 2.
anchor_mins = -anchor_maxes
valid_mask = boxes_wh[..., 0] > 0
複製程式碼

迴圈m處理批次中的每個影像和標註框：

只選擇存在標註框的wh，例如：wh的shape是(7,2)；
np.expand_dims(wh, -2)是wh倒數第2個新增1位，即(7,2)->(7,1,2)；
box_maxes和box_mins，與anchor_maxes和anchor_mins的操作類似。

實現：

for b in range(m):
    # Discard zero rows.
    wh = boxes_wh[b, valid_mask[b]]
    if len(wh) == 0: continue
    # Expand dim to apply broadcasting.
    wh = np.expand_dims(wh, -2)
    box_maxes = wh / 2.
    box_mins = -box_maxes
複製程式碼

計算標註框box與anchor box的iou值，計算方式很巧妙：

box_mins的shape是(7,1,2)，anchor_mins的shape是(1,9,2)，intersect_mins的shape是(7,9,2)，即兩兩組合的值；
intersect_area的shape是(7,9)；
box_area的shape是(7,1)；
anchor_area的shape是(1,9)；
iou的shape是(7,9)；

IoU資料，即anchor box與檢測框box，兩兩匹配的iou值。

實現：

intersect_mins = np.maximum(box_mins, anchor_mins)
intersect_maxes = np.minimum(box_maxes, anchor_maxes)
intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)
intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
box_area = wh[..., 0] * wh[..., 1]
anchor_area = anchors[..., 0] * anchors[..., 1]
iou = intersect_area / (box_area + anchor_area - intersect_area)
複製程式碼

接著，選擇IoU最大的anchor索引，即：

best_anchor = np.argmax(iou, axis=-1)
複製程式碼

設定y_true的值：

t是box的序號；n是最優anchor的序號；l是層號；
如果最優anchor在層l中，則設定其中的值，否則預設為0；
true_boxes是(16, 20, 5)，即批次、box數、框值；
true_boxes[b, t, 0]，其中b是批次序號、t是box序號，第0位是x，第1位是y；
grid_shapes是3個檢測圖的尺寸，將歸一化的值，與框長寬相乘，恢復為具體值；
k是在anchor box中的序號；
c是類別，true_boxes的第4位；
將xy和wh放入y_true中，將y_true的第4位框的置信度設為1，將y_true第5~n位的類別設為1；

for t, n in enumerate(best_anchor):
    for l in range(num_layers):
        if n in anchor_mask[l]:
            i = np.floor(true_boxes[b, t, 0] * grid_shapes[l][1]).astype(`int32`)
            j = np.floor(true_boxes[b, t, 1] * grid_shapes[l][0]).astype(`int32`)
            k = anchor_mask[l].index(n)
            c = true_boxes[b, t, 4].astype(`int32`)
            y_true[l][b, j, i, k, 0:4] = true_boxes[b, t, 0:4]
            y_true[l][b, j, i, k, 4] = 1
            y_true[l][b, j, i, k, 5 + c] = 1
複製程式碼

y_true的第0和1位是中心點xy，範圍是(0~1)，第2和3位是寬高wh，範圍是(0~1)，第4位是置信度1或0，第5~n位是類別為1其餘為0。

補充1. 矩陣相加

NumPy支援不同維度的矩陣相加，如(1, 2) + (2, 1) = (2, 2)，如：

import numpy as np

a = np.array([[1, 2]])
print(a.shape)  # (1, 2)
b = np.array([[1], [2]])
print(b.shape)  # (2, 1)
c = a + b
print(c.shape)  # (2, 2)
print(c)
"""
[[2 3]
 [3 4]]
"""

複製程式碼

OK, that`s all! Enjoy it!

By C. L. Wang @ 美圖視覺技術部