專案介紹

　　在這裡我們主要對相關程式碼進行解釋說明。我們將使用5個檔案來實現 Faster-RCNN：

　　utils.py：關於檢測框的繪製與計算;

　　demo.py：測試 utils.py 中的函式;

　　rpn.py：構造 RPN 網路;

　　train.py：訓練 RPN 網路;

　　test.py：測試 RPN 網路。

　　utils.py 中的函式說明

　　1、wandhG

　　wandhG 中包含著 9 個預測框的寬度和長度(這是經過 kmeans 演算法計算過的結果)。

　　import cv2

　　import numpy as np

　　import tensorflow as tf

　　wandhG = np.array([[ 74., 149.],

　　[ 34., 149.],

　　[ 86., 74.],

　　[109., 132.],

　　[172., 183.],

　　[103., 229.],

　　[149., 91.],

　　[ 51., 132.],

　　[ 57., 200.]], dtype=np.float32)

　　2、load_gt_boxes

　　def load_gt_boxes(path):

　　bbs = open(path).readlines()[1:]

　　roi = np.zeros([len(bbs), 4])

　　for iter_, bb in zip(range(len(bbs)), bbs):

　　bb = bb.replace('\n', '').split(' ')

　　bbtype = bb[0]

　　bba = np.array([float(bb[i]) for i in range(1, 5)])

　　ignore = int(bb[10])

　　ignore = ignore or (bbtype != 'person')

　　ignore = ignore or (bba[3] < 40)

　　bba[2] += bba[0]

　　bba[3] += bba[1]

　　roi[iter_, :4] = bba

　　return roi

　　load_gt_boxes() 函式返回一個 (-1, 4) 的陣列，代表著多個檢測物體的 ground truth boxes (即真實檢測框)的左上角座標和右下角座標。

　　其中，我們需要輸入一個路徑，此路徑下的 .txt 檔案中包含著真實框的 (x, y, w, h)，x 表示真實框左上角的橫座標;y 表示真實框左上角的縱座標;w 表示真實框的寬度;h 表示真實框的高度。

　　3、plot_boxes_on_image

　　def plot_boxes_on_image(show_image_with_boxes, boxes, color=[0, 0, 255], thickness=2):

　　for box in boxes:

　　cv2.rectangle(show_image_with_boxes,

　　pt1=(int(box[0]), int(box[1])),

　　pt2=(int(box[2]), int(box[3])), color=color, thickness=thickness)

　　show_image_with_boxes = cv2.cvtColor(show_image_with_boxes, cv2.COLOR_BGR2RGB)

　　return show_image_with_boxes

　　plot_boxes_on_image() 函式的輸入有兩個，分別是：需要被畫上檢測框的原始圖片以及檢測框的左上角和右下角的座標。其輸出為被畫上檢測框的圖片。

　　4、compute_iou

　　def compute_iou(boxes1, boxes2):

　　left_up = np.maximum(boxes1[..., :2], boxes2[..., :2], )

　　right_down = np.minimum(boxes1[..., 2:], boxes2[..., 2:])

　　inter_wh = np.maximum(right_down - left_up, 0.0) # 交集的寬和長

　　inter_area = inter_wh[..., 0] * inter_wh[..., 1] # 交集的面積

　　boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1]) # anchor 的面積

　　boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1]) # ground truth boxes 的面積

　　union_area = boxes1_area + boxes2_area - inter_area # 並集的面積

　　ious = inter_area / union_area

　　return ious

　　compute_iou() 函式用來計算 IOU 值，即真實檢測框與預測檢測框(當然也可以是任意兩個檢測框)的交集面積比上它們的並集面積，這個值越大，代表這個預測框與真實框的位置越接近。用下面這個圖片表示：

　　那麼，left_up=[xmin2, ymin2];right_down=[xmax1, ymax1]。之後求出來這兩個框的交集面積和並集面積，進而得到這兩個框的 IOU 值。如果說得到的 IOU 值大於設定的正閾值，那麼我們稱這個預測框為正預測框(positive anchor)，其中包含著檢測目標;如果說得到的 IOU 值小於於設定的負閾值，那麼我們稱這個預測框為負預測框(negative anchor)，其中包含著背景。

　　5、compute_regression

　　因為所有預測框的中心點位置(即特徵圖的每個塊中心)以及尺寸(9個標準尺寸)都是固定的，那麼這一定會導致獲得的檢測框很不準確。因此，我們希望創造一個對映，可以透過輸入正預測框經過對映得到一個跟真實框更接近的迴歸框。

　　假設正預測框的座標為 (Ax,Ay,Aw,Ah)(A_x, A_y, A_w, A_h)(Ax,Ay,Aw,Ah)，即正預測框左上角座標為 (Ax,Ay)(A_x, A_y)(Ax,Ay)，寬度為 AwA_wAw，高度為 AhA_hAh;真實框的座標為 (Gx,Gy,Gw,Gh)(G_x, G_y, G_w, G_h)(Gx,Gy,Gw,Gh);迴歸框的座標為 (Gx′,Gy′,Gw′,Gh′)(G_x^{'}, G_y^{'}, G_w^{'}, G_h^{'})(Gx′,Gy′,Gw′,Gh′)。

　　那麼，如何讓正預測框變成迴歸框呢?

　　先做平移：

　　再做縮放：

　　這裡的 dx(A)d_x(A)dx(A)，dy(A)d_y(A)dy(A)，dw(A)d_w(A)dw(A) 和 dh(A)d_h(A)dh(A) 是我們需要學習的變換，稱為迴歸變數。

　　正預測框與真實框之間的平移量 (tx,ty)(t_x, t_y)(tx,ty) 與尺度因子 (tw,th)(t_w, t_h)(tw,th) 如下：

　　def compute_regression(box1, box2):

　　target_reg = np.zeros(shape=[4,])

　　w1 = box1[2] - box1[0]

　　h1 = box1[3] - box1[1]

　　w2 = box2[2] - box2[0]

　　h2 = box2[3] - box2[1]

　　target_reg[0] = (box1[0] - box2[0]) / w2

　　target_reg[1] = (box1[1] - box2[1]) / h2

　　target_reg[2] = np.log(w1 / w2)

　　target_reg[3] = np.log(h1 / h2)

　　return target_reg

　　6、decode_output

　　decode_output 函式的作用是，將一張圖片上的 45*60*9 個預測框的平移量與尺度因子以及每個框的得分輸入，得到每個正預測框對應的迴歸框(其實所有表示同一個檢測目標的迴歸框都是近似重合的)。

　　def decode_output(pred_bboxes, pred_scores, score_thresh=0.5):

　　grid_x, grid_y = tf.range(60, dtype=tf.int32), tf.range(45, dtype=tf.int32)

　　grid_x, grid_y = tf.meshgrid(grid_x, grid_y)

　　grid_x, grid_y = tf.expand_dims(grid_x, -1), tf.expand_dims(grid_y, -1)

　　grid_xy = tf.stack([grid_x, grid_y], axis=-1)

　　center_xy = grid_xy * 16 + 8

　　center_xy = tf.cast(center_xy, tf.float32)

　　anchor_xymin = center_xy - 0.5 * wandhG

　　xy_min = pred_bboxes[..., 0:2] * wandhG[:, 0:2] + anchor_xymin

　　xy_max = tf.exp(pred_bboxes[..., 2:4]) * wandhG[:, 0:2] + xy_min

　　pred_bboxes = tf.concat([xy_min, xy_max], axis=-1)

　　pred_scores = pred_scores[..., 1]

　　score_mask = pred_scores > score_thresh

　　pred_bboxes = tf.reshape(pred_bboxes[score_mask], shape=[-1,4]).numpy()

　　pred_scores = tf.reshape(pred_scores[score_mask], shape=[-1,]).numpy()

　　return pred_scores, pred_bboxes

　　對於輸入：

　　pred_bboxes：它的形狀為 [1, 45, 60, 9, 4]，表示一共 45*60*9 個預測框，每個預測框都包含著兩個平移量和兩個尺度因子;

　　pred_scores：它的形狀為 [1, 45, 60, 9, 2]，表示在 45*60*9 個預測框中，[1, i, j, k, 0] 表示第 i 行第 j 列中的第 k 個預測框中包含的是背景的機率;[1, i, j, k, 1] 表示第 i 行第 j 列中的第 k 個預測框中包含的是檢測物體的機率。

　　其中，經過 meshgrid() 函式後，grid_x 的形狀為 (45, 60)，grid_y 的性狀也是 (45, 60)，它們的不同是：grid_x 由 45 行 range(60) 組成;grid_y 由 60 列 range(45) 組成。

　　經過 stack() 函式後，grid_xy 包含著所有特徵圖中小塊的左上角的座標，如 (0, 0)，(1, 0)，……，(59, 0)，(0, 1)，……，(59, 44)。

　　因為特徵圖中一個小塊能表示原始影像中一塊 16*16 的區域(也就是說，特徵圖中一個 1*1 的小塊對應著原始影像上一個 16*16 的小塊)，所以計算原始影像上每個小塊的中心 center_xy 時，只需要用 grid_xy 乘 16 加 8 即可。

　　計算預測框的左上角座標時，只需要用 center_xy 減去提前規定的預測框的寬度和長度(wandhG)的一半即可。

　　xy_min 和 xy_max 是迴歸框的左上角座標和右下角座標，它們的計算過程在 compute_regression() 函式那裡已經講過了，此處的 pred_bboxes 輸入就是 compute_regression() 函式的輸出，其中包含著每個框的平移量和尺度因子。然後將xy_min 和 xy_max 合併，得到新的 pred_bboxes，其中包含著迴歸框左上角座標和右下角座標。

　　pred_scores[…, 1] 指的是每個框中含有檢測目標的機率(稱為得分)，如果得分大於閾值，我們就認為這個框中檢測到了目標，然後我們把這個框的座標和得分提取出來，組成新的 pred_bboxes 和 pred_scores。

　　經過 decode_output 函式的輸出為：

　　pred_score：其形狀為 [-1, ]，表示每個檢測框中的內容是檢測物的機率。

　　pred_bboxes：其形狀為 [-1, 4]，表示每個檢測框的左上角和右下角的座標。

　　7、nms

　　非極大值抑制(Non-Maximum Suppression，NMS)，顧名思義就是抑制不是極大值的元素，說白了就是去除掉那些重疊率較高但得分較低的預測框。

　　nms() 函式的作用是從選出的正預測框中進一步選出最好的 n 個預測框，其中，n 指圖片中檢測物的個數。其流程為：

　　取出所有預測框中得分最高的一個，並將這個預測框跟其他的預測框進行 IOU 計算;

　　將 IOU 值大於 0.1 的預測框視為與剛取出的得分最高的預測框表示了同一個檢測物，故去掉;

　　重複以上操作，直到所有其他的預測框都被去掉為止。

　　def nms(pred_boxes, pred_score, iou_thresh):

　　"""

　　pred_boxes shape: [-1, 4]

　　pred_score shape: [-1,]

　　"""

　　selected_boxes = []

　　while len(pred_boxes) > 0:

　　max_idx = np.argmax(pred_score)

　　selected_box = pred_boxes[max_idx]

　　selected_boxes.append(selected_box)

　　pred_boxes = np.concatenate([pred_boxes[:max_idx], pred_boxes[max_idx+1:]])

　　pred_score = np.concatenate([pred_score[:max_idx], pred_score[max_idx+1:]])

　　ious = compute_iou(selected_box, pred_boxes)

　　iou_mask = ious <= 0.1

　　pred_boxes = pred_boxes[iou_mask]

　　pred_score = pred_score[iou_mask]

　　selected_boxes = np.array(selected_boxes)

　　return selected_boxes

　　demo.py

　　其實，到這兒，我們可以先用提供的人工標註的真實框座標(左上角的座標+寬+高)給圖片中的檢測目標畫框來檢驗一下 utils.py 中的函式：

　　1、將 utils.py 中的函式匯入

　　import cv2

　　import numpy as np

　　import tensorflow as tf

　　from PIL import Image

　　from utils import compute_iou, plot_boxes_on_image, wandhG, load_gt_boxes, compute_regression, decode_output

　　2、設定閾值與相關引數

　　pos_thresh = 0.5

　　neg_thresh = 0.1

　　iou_thresh = 0.5

　　grid_width = 16 # 網格的長寬都是16，因為從原始圖片到 feature map 經歷了16倍的縮放

　　grid_height = 16

　　image_height = 720

　　image_width = 960

　　3、讀取圖片與真實框座標

　　image_path = "./synthetic_dataset/synthetic_dataset/image/2.jpg"

　　label_path = "./synthetic_dataset/synthetic_dataset/imageAno/2.txt"

　　gt_boxes = load_gt_boxes(label_path) # 把 ground truth boxes 的座標讀取出來

　　raw_image = cv2.imread(image_path) # 將圖片讀取出來 (高，寬，通道數)

　　我們可以嘗試將真實框畫在圖片上：

　　image_with_gt_boxes = np.copy(raw_image) # 複製原始圖片

　　plot_boxes_on_image(image_with_gt_boxes, gt_boxes) # 將 ground truth boxes 畫在圖片上

　　Image.fromarray(image_with_gt_boxes).show() # 展示畫了 ground truth boxes 的圖片

　　得到：

　　然後，我們需要再此複製原始圖片用來求解每個預測框的得分和迴歸變數(平移量與尺度因子)。

　　4、每個預測框的得分和訓練變數

　　## 因為得到的 feature map 的長寬都是原始圖片的 1/16，所以這裡 45=720/16，60=960/16。

　　target_scores = np.zeros(shape=[45, 60, 9, 2]) # 0: background, 1: foreground, ,

　　target_bboxes = np.zeros(shape=[45, 60, 9, 4]) # t_x, t_y, t_w, t_h

　　target_masks = np.zeros(shape=[45, 60, 9]) # negative_samples: -1, positive_samples: 1

　　################################### ENCODE INPUT #################################

　　## 將 feature map 分成 45*60 個小塊

　　for i in range(45):

　　for j in range(60):

　　for k in range(9):

　　center_x = j * grid_width + grid_width * 0.5 # 計算此小塊的中心點橫座標

　　center_y = i * grid_height + grid_height * 0.5 # 計算此小塊的中心點縱座標

　　xmin = center_x - wandhG[k][0] * 0.5 # wandhG 是預測框的寬度和長度，xmin 是預測框在圖上的左上角的橫座標

　　ymin = center_y - wandhG[k][1] * 0.5 # ymin 是預測框在圖上的左上角的縱座標

　　xmax = center_x + wandhG[k][0] * 0.5 # xmax 是預測框在圖上的右下角的縱座標

　　ymax = center_y + wandhG[k][1] * 0.5 # ymax 是預測框在圖上的右下角的縱座標

　　# ignore cross-boundary anchors

　　if (xmin > -5) & (ymin > -5) & (xmax < (image_width+5)) & (ymax < (image_height+5)):

　　anchor_boxes = np.array([xmin, ymin, xmax, ymax])

　　anchor_boxes = np.expand_dims(anchor_boxes, axis=0)

　　# compute iou between this anchor and all ground-truth boxes in image.

　　ious = compute_iou(anchor_boxes, gt_boxes)

　　positive_masks = ious > pos_thresh

　　negative_masks = ious < neg_thresh

　　if np.any(positive_masks):

　　plot_boxes_on_image(encoded_image, anchor_boxes, thickness=1)

　　print("=> Encoding positive sample: %d, %d, %d" %(i, j, k))

　　cv2.circle(encoded_image, center=(int(0.5*(xmin+xmax)), int(0.5*(ymin+ymax))),

　　radius=1, color=[255,0,0], thickness=4) # 正預測框的中心點用紅圓表示

　　target_scores[i, j, k, 1] = 1. # 表示檢測到物體

　　target_masks[i, j, k] = 1 # labeled as a positive sample

　　# find out which ground-truth box matches this anchor

　　max_iou_idx = np.argmax(ious)

　　selected_gt_boxes = gt_boxes[max_iou_idx]

　　target_bboxes[i, j, k] = compute_regression(selected_gt_boxes, anchor_boxes[0])

　　if np.all(negative_masks):

　　target_scores[i, j, k, 0] = 1. # 表示是背景

　　target_masks[i, j, k] = -1 # labeled as a negative sample

　　cv2.circle(encoded_image, center=(int(0.5*(xmin+xmax)), int(0.5*(ymin+ymax))),

　　radius=1, color=[0,0,0], thickness=4) # 負預測框的中心點用黑圓表示

　　Image.fromarray(encoded_image).show()

　　在這裡，我們只考慮部分位置符合條件的預測框，如果這個預測框和某一個真實框(一張圖片中可以有多個真實框，這取決於圖片中檢測目標的個數)的 IOU 值大於給定的正閾值，我們就稱這個預測框為這個真實框的正預測框;如果這個預測框和某一個真實框的 IOU 值小於給定的負閾值，我們就稱這個預測框為這個真實框的負預測框。

　　如果一個預測框為某真實框的正預測框，我們就將它的檢測目標得分賦1，將其標記為正樣本，並計算這個正預測框和它所對應的真實框之間的迴歸變數;如果一個預測框對所有真實框來說都是負預測框，我們就將它的背景得分賦-1，並將其標記為負樣本。

　　最終得到：

　　5、根據每個預測框的得分和訓練變數得到迴歸框

　　############################## FASTER DECODE OUTPUT ###############################

　　faster_decode_image = np.copy(raw_image)

　　pred_bboxes = np.expand_dims(target_bboxes, 0).astype(np.float32)

　　pred_scores = np.expand_dims(target_scores, 0).astype(np.float32)

　　pred_scores, pred_bboxes = decode_output(pred_bboxes, pred_scores)

　　plot_boxes_on_image(faster_decode_image, pred_bboxes, color=[255, 0, 0]) # red boundig box

　　Image.fromarray(np.uint8(faster_decode_image)).show()

　　得到：

　　可見，迴歸框和真實框的位置差不多。

　　rpn.py

　　rpn.py 檔案是用來建立 RPN 網路的，在這裡，我們對原始的 RPN 網路進行了一些改進，其程式碼如下：

　　import tensorflow as tf

　　class RPNplus(tf.keras.Model):

　　# VGG_MEAN = [103.939, 116.779, 123.68]

　　def __init__(self):

　　super(RPNplus, self).__init__()

　　# conv1

　　self.conv1_1 = tf.keras.layers.Conv2D(64, 3, activation='relu', padding='same')

　　self.conv1_2 = tf.keras.layers.Conv2D(64, 3, activation='relu', padding='same')

　　self.pool1 = tf.keras.layers.MaxPooling2D(2, strides=2, padding='same')

　　# conv2

　　self.conv2_1 = tf.keras.layers.Conv2D(128, 3, activation='relu', padding='same')

　　self.conv2_2 = tf.keras.layers.Conv2D(128, 3, activation='relu', padding='same')

　　self.pool2 = tf.keras.layers.MaxPooling2D(2, strides=2, padding='same')

　　# conv3

　　self.conv3_1 = tf.keras.layers.Conv2D(256, 3, activation='relu', padding='same')

　　self.conv3_2 = tf.keras.layers.Conv2D(256, 3, activation='relu', padding='same')

　　self.conv3_3 = tf.keras.layers.Conv2D(256, 3, activation='relu', padding='same')

　　self.pool3 = tf.keras.layers.MaxPooling2D(2, strides=2, padding='same')

　　# conv4

　　self.conv4_1 = tf.keras.layers.Conv2D(512, 3, activation='relu', padding='same')

　　self.conv4_2 = tf.keras.layers.Conv2D(512, 3, activation='relu', padding='same')

　　self.conv4_3 = tf.keras.layers.Conv2D(512, 3, activation='relu', padding='same')

　　self.pool4 = tf.keras.layers.MaxPooling2D(2, strides=2, padding='same')

　　# conv5

　　self.conv5_1 = tf.keras.layers.Conv2D(512, 3, activation='relu', padding='same')

　　self.conv5_2 = tf.keras.layers.Conv2D(512, 3, activation='relu', padding='same')

　　self.conv5_3 = tf.keras.layers.Conv2D(512, 3, activation='relu', padding='same')

　　self.pool5 = tf.keras.layers.MaxPooling2D(2, strides=2, padding='same')

　　## region_proposal_conv

　　self.region_proposal_conv1 = tf.keras.layers.Conv2D(256, kernel_size=[5,2],

　　activation=tf.nn.relu,