二階段目標檢測網路-Mask RCNN 詳解

嵌入式視覺發表於2022-12-19

原文網址 : https://www.cnblogs.com/armcvai/p/16992202.html

CNN

Mask RCNN 是作者 Kaiming He 於 2018 年發表的論文

ROI Pooling 和 ROI Align 的區別

Understanding Region of Interest — (RoI Align and RoI Warp)

Mask R-CNN 網路結構

Mask RCNN 繼承自 Faster RCNN 主要有三個改進：

feature map 的提取採用了 FPN 的多尺度特徵網路
ROI Pooling 改進為 ROI Align
在 RPN 後面，增加了採用 FCN 結構的 mask 分割分支

網路結構如下圖所示：

mask-rcnn網路結構

可以看出，Mask RCNN 是一種先檢測物體，再分割的思路，簡單直接，在建模上也更有利於網路的學習。

骨幹網路 FPN

卷積網路的一個重要特徵：深層網路容易響應語義特徵，淺層網路容易響應影像特徵。Mask RCNN 的使用了 ResNet 和 FPN 結合的網路作為特徵提取器。

FPN 的程式碼出現在 ./mrcnn/model.py中，核心程式碼如下：

if callable(config.BACKBONE):
    _, C2, C3, C4, C5 = config.BACKBONE(input_image, stage5=True,
                                        train_bn=config.TRAIN_BN)
else:
    _, C2, C3, C4, C5 = resnet_graph(input_image, config.BACKBONE,
                                        stage5=True, train_bn=config.TRAIN_BN)
# Top-down Layers
# TODO: add assert to varify feature map sizes match what's in config
P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5)
P4 = KL.Add(name="fpn_p4add")([
    KL.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5),
    KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)])
P3 = KL.Add(name="fpn_p3add")([
    KL.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4),
    KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)])
P2 = KL.Add(name="fpn_p2add")([
    KL.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3),
    KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)])
# Attach 3x3 conv to all P layers to get the final feature maps.
P2 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2)
P3 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3)
P4 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4)
P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5)
# P6 is used for the 5th anchor scale in RPN. Generated by
# subsampling from P5 with stride of 2.
P6 = KL.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)

# Note that P6 is used in RPN, but not in the classifier heads.
rpn_feature_maps = [P2, P3, P4, P5, P6]
mrcnn_feature_maps = [P2, P3, P4, P5]

其中 resnet_graph 函式定義如下：

def resnet_graph(input_image, architecture, stage5=False, train_bn=True):
    """Build a ResNet graph.
        architecture: Can be resnet50 or resnet101
        stage5: Boolean. If False, stage5 of the network is not created
        train_bn: Boolean. Train or freeze Batch Norm layers
    """
    assert architecture in ["resnet50", "resnet101"]
    # Stage 1
    x = KL.ZeroPadding2D((3, 3))(input_image)
    x = KL.Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)
    x = BatchNorm(name='bn_conv1')(x, training=train_bn)
    x = KL.Activation('relu')(x)
    C1 = x = KL.MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)
    # Stage 2
    x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1), train_bn=train_bn)
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='b', train_bn=train_bn)
    C2 = x = identity_block(x, 3, [64, 64, 256], stage=2, block='c', train_bn=train_bn)
    # Stage 3
    x = conv_block(x, 3, [128, 128, 512], stage=3, block='a', train_bn=train_bn)
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='b', train_bn=train_bn)
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='c', train_bn=train_bn)
    C3 = x = identity_block(x, 3, [128, 128, 512], stage=3, block='d', train_bn=train_bn)
    # Stage 4
    x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', train_bn=train_bn)
    block_count = {"resnet50": 5, "resnet101": 22}[architecture]
    for i in range(block_count):
        x = identity_block(x, 3, [256, 256, 1024], stage=4, block=chr(98 + i), train_bn=train_bn)
    C4 = x
    # Stage 5
    if stage5:
        x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', train_bn=train_bn)
        x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', train_bn=train_bn)
        C5 = x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', train_bn=train_bn)
    else:
        C5 = None
    return [C1, C2, C3, C4, C5]

anchor 錨框生成規則

在 Faster-RCNN 中可以將 SCALE 也可以設定為多個值，而在 Mask RCNN 中則是每一特徵層只對應著一個SCALE 即對應著上述所設定的 16。

實驗

何凱明在論文中做了很多對比單個模組試驗，並放出了對比結果表格。

對比試驗結果

從上圖表格可以看出：

sigmoid 和 softmax 對比，sigmoid 有不小提升；
特徵網路選擇：可以看出更深的網路和採用 FPN 的實驗效果更好，可能因為 FPN 綜合考慮了不同尺寸的 feature map 的資訊，因此能夠把握一些更精細的細節。
RoI Align 和 RoI Pooling 對比：在 instance segmentation 和 object detection 上都有不小的提升。這樣看來，RoIAlign 其實就是一個更加精準的 RoIPooling，把前者放到 Faster RCNN 中，對結果的提升應該也會有幫助。

參考資料

Mask R-CNN 論文

二階段目標檢測網路-Faster RCNN 詳解
2022-12-15
ASTCNN
二階段目標檢測網路-Cascade RCNN 詳解
2022-12-20
CNN
二階段目標檢測網路-FPN 詳解
2022-12-16
一階段目標檢測網路-RetinaNet 詳解
2022-12-23
NaN
目標檢測入門系列手冊二：RCNN訓練教程
2019-12-10
CNN
【目標檢測從放棄到入門】SSD / RCNN / YOLO通俗講解
2020-03-27
CNNYOLO
【目標檢測】2萬字詳解 RCNN系列 YOLO系列 YOLOv3程式碼實現全流程詳解 pytorch
2020-11-09
CNNYOLOPyTorch
faster-RCNN臺標檢測
2018-06-09
ASTCNN
目標檢測網路之 YOLOv3
2018-10-17
YOLO
目標檢測：二維碼檢測方案
2022-03-26
ICCV2021 | TOOD：任務對齊的單階段目標檢測
2021-12-07
Mask RCNN測試過程經驗總結
2020-11-22
CNN
深度學習與CV教程(12) | 目標檢測 (兩階段,R-CNN系列)
2022-06-07
深度學習CNN
目標檢測
2018-04-24
2018目標檢測
2018-08-27
九、目標檢測
2024-10-03
從零開始 Mask RCNN 實戰：基於 Win10 的 Mask RCNN 環境搭建
2020-07-16
CNNWin10
47.4mAP！最強Anchor-free目標檢測網路：SAPD
2020-02-13
【百度飛漿】YOLO系列目標檢測演算法詳解
2020-09-24
YOLO演算法
目標檢測之SSD
2018-12-18
目標檢測之RetinaNet
2018-12-19
NaN
目標檢測面面觀
2018-09-04
28-目標檢測
2024-08-27
目標檢測綜述
2020-12-13
YOLO目標檢測從V1到V3結構詳解
2022-12-06
YOLO
目標檢測---教你利用yolov5訓練自己的目標檢測模型
2022-06-14
YOLO模型
目標檢測之YOLO系列
2018-12-18
YOLO
【目標檢測】Bounding Box Regression
2019-02-25
目標檢測發展方向
2018-08-09
SSD 目標檢測 Keras 版
2019-02-19
Keras
【目標檢測】R-CNN
2020-12-13
CNN
目標檢測（5）：手撕 CNN 經典網路之 AlexNet（理論篇）
2022-02-10
CNN
目標檢測（6）：手撕 CNN 經典網路之 VGGNet（理論篇）
2022-02-28
CNN
做目標檢測，這一篇就夠了！2019最全目標檢測指南
2019-09-29
mybatis各階段的詳解
2022-05-10
MyBatis
深度學習之目標檢測與目標識別
2018-06-05
深度學習
吳恩達《卷積神經網路》課程筆記（3）– 目標檢測
2018-08-02
吳恩達卷積神經網路筆記
Object Detection(目標檢測神文)
2018-11-02
Object