pytorch實現yolov3(4) 非極大值抑制nms

sdu20112013發表於2019-07-08

上一篇裡我們實現了forward函式.得到了prediction.此時預測出了特別多的box以及各種class probability,現在我們要從中過濾出我們最終的預測box.
理解了yolov3的輸出的格式及每一個位置的含義,並不難理解原始碼.我在閱讀原始碼的過程中主要的困難在於對pytorch不熟悉,所以在這篇文章裡,關於其中涉及的一些pytorch中的函式的用法我都已經用加粗標示了並且給出了相應的連結,測試程式碼等.

obj score threshold

我們設定一個obj score thershold,超過這個值的才認為是有效的.

    conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2)
    prediction = prediction*conf_mask

prediction是1*boxnum*boxattr
prediction[:,:,4]是1*boxnum 元素值為boxattr的index=4的那個值.

torch中的Tensor index和numpy是類似的,參看下列程式碼輸出

import torch
x = torch.Tensor(1,3,10)    # Create an un-initialized Tensor of size 2x3
print(x)
print(x.shape)                  # Print out the Tensor

y = x[:,:,4]
print(y)
print(y.shape)

z = x[:,:,4:6]
print(z)
print(z.shape)

print((y>0.5).float().unsqueeze(2))

#### 輸出如下
tensor([[[2.5226e-18, 1.6898e-04, 1.0413e-11, 7.7198e-10, 1.0549e-08,
          4.0516e-11, 1.0681e-05, 2.9575e-18, 6.7333e+22, 1.7591e+22],
         [1.7184e+25, 4.3222e+27, 6.1972e-04, 7.2443e+22, 1.7728e+28,
          7.0367e+22, 5.9018e-10, 2.6540e-09, 1.2972e-11, 5.3370e-08],
         [2.7001e-06, 2.6801e-09, 4.1292e-05, 2.1511e+23, 3.2770e-09,
          2.5125e-18, 7.7052e+31, 1.9447e+31, 5.0207e+28, 1.1492e-38]]])
torch.Size([1, 3, 10])
tensor([[1.0549e-08, 1.7728e+28, 3.2770e-09]])
torch.Size([1, 3])
tensor([[[1.0549e-08, 4.0516e-11],
         [1.7728e+28, 7.0367e+22],
         [3.2770e-09, 2.5125e-18]]])
torch.Size([1, 3, 2])

tensor([[[0.],
         [0.],
         [0.]]])

Squeeze and unsqueeze 降低維度,升高維度.

t = torch.ones(2,1,2,1) # Size 2x1x2x1
r = torch.squeeze(t)     # Size 2x2
r = torch.squeeze(t, 1)  # Squeeze dimension 1: Size 2x2x1

# Un-squeeze a dimension
x = torch.Tensor([1, 2, 3])
r = torch.unsqueeze(x, 0)       # Size: 1x3  表示在第0個維度新增1維
r = torch.unsqueeze(x, 1)       # Size: 3x1  表示在第1個維度新增1維

這樣prediction中objscore<threshold的已經變成了0.

nms

tensor.new() 建立一個和原有tensor的dtype一致的新tensor https://stackoverflow.com/questions/49263588/pytorch-beginner-tensor-new-method

    #得到box座標(top-left corner x, top-left corner y, right-bottom corner x, right-bottom corner y)
    box_corner = prediction.new(prediction.shape)
    box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2)
    box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2)
    box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2) 
    box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2)
    prediction[:,:,:4] = box_corner[:,:,:4]

原始的prediction中boxattr存放的是x,y,w,h,...,不方便我們處理,我們將其轉換成(top-left corner x, top-left corner y, right-bottom corner x, right-bottom corner y)

接下來我們挨個處理每一張圖片對應的feature map.

    batch_size = prediction.size(0)
    write = False

    for ind in range(batch_size):
        #image_pred.shape=boxnum\*boxattr
        image_pred = prediction[ind]          #image Tensor  box_num*box_attr
        #confidence threshholding 
        #NMS
        #返回每一行的最大值,及最大值所在的列.
        max_conf, max_conf_score = torch.max(image_pred[:,5:5+ num_classes], 1)
        #升級成和image_pred同樣的維度
        max_conf = max_conf.float().unsqueeze(1)
        max_conf_score = max_conf_score.float().unsqueeze(1)
        seq = (image_pred[:,:5], max_conf, max_conf_score)
        
        #沿著列的方向拼接. 現在image_pred變成boxnum\*7
        image_pred = torch.cat(seq, 1)
        
        

這裡涉及到torch.max的用法,參見https://blog.csdn.net/Z_lbj/article/details/79766690
torch.max(input, dim, keepdim=False, out=None) -> (Tensor, LongTensor)
按維度dim 返回最大值.可以這麼記憶,沿著第dim維度比較.torch.max(0)即沿著行的方向比較,即得到每列的最大值.
假設input是二維矩陣,即行*列,行是第0維,列是第一維.

  • torch.max(a,0) 返回每一列中最大值的那個元素,且返回索引(返回最大元素在這一列的行索引)
  • torch.max(a,1) 返回每一行中最大值的那個元素,且返回其索引(返回最大元素在這一行的列索引)
c=torch.Tensor([[1,2,3],[6,5,4]])
print(c)
a,b=torch.max(c,1)
print(a)
print(b)

##輸出如下:
tensor([[1., 2., 3.],
        [6., 5., 4.]])
tensor([3., 6.])
tensor([2, 0])

torch.cat用法,參見https://pytorch.org/docs/stable/torch.html

torch.cat(tensors, dim=0, out=None) → Tensor
>>> x = torch.randn(2, 3)
>>> x
tensor([[ 0.6580, -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497]])
>>> torch.cat((x, x, x), 0)
tensor([[ 0.6580, -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497],
        [ 0.6580, -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497],
        [ 0.6580, -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497]])
>>> torch.cat((x, x, x), 1)
tensor([[ 0.6580, -1.0969, -0.4614,  0.6580, -1.0969, -0.4614,  0.6580,
         -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497, -0.1034, -0.5790,  0.1497, -0.1034,
         -0.5790,  0.1497]])

接下來我們只處理obj_score非0的資料(obj_score<obj_threshold轉變為0)

        non_zero_ind =  (torch.nonzero(image_pred[:,4]))
        try:
            image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7)
        except:
            continue

        #For PyTorch 0.4 compatibility
        #Since the above code with not raise exception for no detection 
        #as scalars are supported in PyTorch 0.4
        if image_pred_.shape[0] == 0:
            continue 

ok,接下來我們對每一種class做nms.
首先取到我們有哪些類別

        #Get the various classes detected in the image
        img_classes = unique(image_pred_[:,-1])  # -1 index holds the class index

然後依次對每一種類別做處理

for cls in img_classes:
            #perform NMS

        
            #get the detections with one particular class
            #取出當前class為當前class且class prob!=0的行
            cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1)
            class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()
            image_pred_class = image_pred_[class_mask_ind].view(-1,7)
            
            #sort the detections such that the entry with the maximum objectness
            #confidence is at the top
            #按照obj score從高到低做排序
            conf_sort_index = torch.sort(image_pred_class[:,4], descending = True )[1]
            image_pred_class = image_pred_class[conf_sort_index]
            idx = image_pred_class.size(0)   #Number of detections
            
            for i in range(idx):
                #Get the IOUs of all boxes that come after the one we are looking at 
                #in the loop
                try:
                    #計算第i個和其後每一行的的iou
                    ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])
                except ValueError:
                    break
            
                except IndexError:
                    break
            
                #Zero out all the detections that have IoU > treshhold
                #把與第i行iou>nms_conf的認為是同一個目標的box,將其轉成0
                iou_mask = (ious < nms_conf).float().unsqueeze(1)
                image_pred_class[i+1:] *= iou_mask       
            
                #把iou>nms_conf的移除掉
                non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
                image_pred_class = image_pred_class[non_zero_ind].view(-1,7)
                
            batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind)      #Repeat the batch_id for as many detections of the class cls in the image
            seq = batch_ind, image_pred_class

其中計算iou的程式碼如下,不多解釋了.iou=交疊面積/總面積

def bbox_iou(box1, box2):
    """
    Returns the IoU of two bounding boxes 
    
    
    """
    #Get the coordinates of bounding boxes
    b1_x1, b1_y1, b1_x2, b1_y2 = box1[:,0], box1[:,1], box1[:,2], box1[:,3]
    b2_x1, b2_y1, b2_x2, b2_y2 = box2[:,0], box2[:,1], box2[:,2], box2[:,3]
    
    #get the corrdinates of the intersection rectangle
    inter_rect_x1 =  torch.max(b1_x1, b2_x1)
    inter_rect_y1 =  torch.max(b1_y1, b2_y1)
    inter_rect_x2 =  torch.min(b1_x2, b2_x2)
    inter_rect_y2 =  torch.min(b1_y2, b2_y2)
    
    #Intersection area
    inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(inter_rect_y2 - inter_rect_y1 + 1, min=0)

    #Union Area
    b1_area = (b1_x2 - b1_x1 + 1)*(b1_y2 - b1_y1 + 1)
    b2_area = (b2_x2 - b2_x1 + 1)*(b2_y2 - b2_y1 + 1)
    
    iou = inter_area / (b1_area + b2_area - inter_area)
    
    return iou

關於nms可以看下https://blog.csdn.net/shuzfan/article/details/52711706

tensor index操作用法如下:

image_pred_ = torch.Tensor([[1,2,3,4,9],[5,6,7,8,9]])
#print(image_pred_[:,-1] == 9)
has_9 = (image_pred_[:,-1] == 9)
print(has_9)

###執行順序是(image_pred_[:,-1] == 9).float().unsqueeze(1) 再做tensor乘法
cls_mask = image_pred_*(image_pred_[:,-1] == 9).float().unsqueeze(1)
print(cls_mask)
class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()
image_pred_class = image_pred_[class_mask_ind]

輸出如下:
tensor([1, 1], dtype=torch.uint8)
tensor([[1., 2., 3., 4., 9.],
        [5., 6., 7., 8., 9.]])

torch.sort用法如下:

d=torch.Tensor([[1,2,3],[6,5,4]])
e=d[:,2]
print(e)
print(torch.sort(e))

輸出
tensor([3., 4.])

torch.return_types.sort(
values=tensor([3., 4.]),
indices=tensor([0, 1]))

總結一下我們做nms的流程
每一個image,會預測出N個detetction資訊,包括4+1+C(4個座標資訊,1個obj score以及C個class probability)

  • 首先過濾掉obj_score < confidence的行
  • 每一行只取class probability最高的作為預測出來的類別
  • 將所有的預測按照obj_score從大到小排序
  • 迴圈每一種類別,開始做nms
    • 比較第一個box與其後所有box的iou,刪除iou>threshold的box,即剔除所有相似box
    • 比較下一個box與其後所有box的iou,刪除所有與該box相似的box
    • 不斷重複上述過程,直至不再有相似box
    • 至此,實現了當前處理的類別的多個box均是獨一無二的box.

write_results最終的返回值是一個n*8的tensor,其中8是(batch_index,4個座標,1個objscore,1個class prob,一個class index)

def write_results(prediction, confidence, num_classes, nms_conf = 0.4):
    print("prediction.shape=",prediction.shape)

    #將obj_score < confidence的行置為0
    conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2)
    prediction = prediction*conf_mask

    #得到box座標(top-left corner x, top-left corner y, right-bottom corner x, right-bottom corner y)
    box_corner = prediction.new(prediction.shape)
    box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2)
    box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2)
    box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2) 
    box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2)
    #修改prediction第三個維度的前四列
    prediction[:,:,:4] = box_corner[:,:,:4]

    batch_size = prediction.size(0)
    write = False

    for ind in range(batch_size):
        #image_pred.shape=boxnum\*boxattr
        image_pred = prediction[ind]          #image Tensor
        #confidence threshholding 
        #NMS

        ##取出每一行的class score最大的一個
        max_conf_score,max_conf = torch.max(image_pred[:,5:5+ num_classes], 1)
        max_conf = max_conf.float().unsqueeze(1)
        max_conf_score = max_conf_score.float().unsqueeze(1)
        seq = (image_pred[:,:5], max_conf_score, max_conf)
        image_pred = torch.cat(seq, 1) #現在變成7列,分別為左上角x,左上角y,右下角x,右下角y,obj score,最大probabilty,相應的class index
        print(image_pred.shape)

        non_zero_ind =  (torch.nonzero(image_pred[:,4]))
        try:
            image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7)
        except:
            continue

        #For PyTorch 0.4 compatibility
        #Since the above code with not raise exception for no detection 
        #as scalars are supported in PyTorch 0.4
        if image_pred_.shape[0] == 0:
            continue 

        #Get the various classes detected in the image
        img_classes = unique(image_pred_[:,-1])  # -1 index holds the class index
        
        
        for cls in img_classes:
            #perform NMS

            #get the detections with one particular class
            #取出當前class為當前class且class prob!=0的行
            cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1)
            class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()
            image_pred_class = image_pred_[class_mask_ind].view(-1,7)
            
            #sort the detections such that the entry with the maximum objectness
            #confidence is at the top
            #按照obj score從高到低做排序
            conf_sort_index = torch.sort(image_pred_class[:,4], descending = True )[1]
            image_pred_class = image_pred_class[conf_sort_index]
            idx = image_pred_class.size(0)   #Number of detections
            
            for i in range(idx):
                #Get the IOUs of all boxes that come after the one we are looking at 
                #in the loop
                try:
                    #計算第i個和其後每一行的的iou
                    ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])
                except ValueError:
                    break
            
                except IndexError:
                    break
            
                #Zero out all the detections that have IoU > treshhold
                #把與第i行iou>nms_conf的認為是同一個目標的box,將其轉成0
                iou_mask = (ious < nms_conf).float().unsqueeze(1)
                image_pred_class[i+1:] *= iou_mask       
            
                #把iou>nms_conf的移除掉
                non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
                image_pred_class = image_pred_class[non_zero_ind].view(-1,7)
                
            batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind)      #Repeat the batch_id for as many detections of the class cls in the image
            seq = batch_ind, image_pred_class
            
            if not write:
                output = torch.cat(seq,1)  #沿著列方向,shape 1*8
                write = True
            else:
                out = torch.cat(seq,1)
                output = torch.cat((output,out)) #沿著行方向 shape n*8

    try:
        return output
    except:
        return 0

相關文章