使用Tensorflow Object Detection進行訓練和推理

JYRoy發表於2021-04-26

原文網址 : https://www.cnblogs.com/jyroy/p/14704964.html

Object

整體流程(以PASCAL VOC為例)

1.下載PASCAL VOC2012資料集，並將資料集轉為tfrecord格式

2.選擇並下載預訓練模型

3.配置訓練檔案configuration（所有的訓練引數都通過配置檔案來配置）

4.訓練模型

5.利用tensorboard檢視訓練過程中loss，accuracy等變化曲線

6.凍結模型引數

7.呼叫凍結pb檔案進行預測

檔案格式

首先建立一下檔案結構，把models/research/object_detection/data下的label_map.pbtxt檔案移動到自己建立的data下。

label_map.txt：定義了class id和class name的對映

檔案結構如下：

.
├── data/
│   ├── eval-00000-of-00001.tfrecord  	# file
│   ├── label_map.txt  								 	# file
│   ├── train-00000-of-00002.tfrecord  	# file
│   └── train-00001-of-00002.tfrecord  	# file
└── models/
    └── my_model_dir/
        ├── eval/                 # Created by evaluation job.
        ├── my_model.config  			# pipeline config
        └── model_ckpt-100-data@1 #
        └── model_ckpt-100-index  # Created by training job.
        └── checkpoint            #

把label_map.pbtxt移動過去（以PASCAL VOC2012為例）：

cp /xxx/models/research/object_detection/data/pascal_label_map.pbtxt ./data/

準備輸入資料

Tensorflow Object Detection API使用TFRecord格式的資料。提供了create_pascal_tf_record.py 和create_pet_tf_record.py兩個指令碼來轉換PASCAL VOC和Pet資料集到TFRecord格式。

產生PASCAL VOC的TFRecord檔案

如果本地沒有資料集的話，使用如下命令下載資料集（here）：

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar -xvf VOCtrainval_11-May-2012.tar

使用如下命令將PSACAL VOC轉換成TFRecord格式：

Examples：data_dir改為自己的資料集路徑

# From tensorflow/models/research/
python object_detection/dataset_tools/create_pascal_tf_record.py \
    --label_map_path=/root/data/pascal_label_map.pbtxt \
    --data_dir=/data2/VOC2007/VOCdevkit --year=VOC2007 --set=train \
    --output_path=/root/data/pascal_train.record
python object_detection/dataset_tools/create_pascal_tf_record.py \
    --label_map_path=/root/data/pascal_label_map.pbtxt \
    --data_dir=/data2/VOC2007/VOCdevkit --year=VOC2007 --set=val \
    --output_path=/root/data/pascal_val.record

data_dir：PASCAL VOC的資料集的路徑
output_dir：想儲存TFRecord的路徑

執行完上述命令後可以在research資料夾下，看到pascal_train.record和pascal_val.record兩個檔案。

Generating the COCO TFRecord files.

COCO資料集的位置： here.
使用如下命令將COCO轉換成TFRecord格式：

Examples：路徑改為自己的路徑

# From tensorflow/models/research/
python object_detection/dataset_tools/create_coco_tf_record.py --logtostderr \
  --train_image_dir=/data2/datasets/coco/train2017 \
  --val_image_dir=/data2/datasets/coco/val2017 \
  --test_image_dir=/data2/datasets/coco/unlabeled2017 \
  --train_annotations_file=/data2/datasets/coco/annotations/instances_train2017.json \
  --val_annotations_file=/data2/datasets/coco/annotations/instances_val2017.json \
  --testdev_annotations_file=/data2/datasets/coco/annotations/image_info_test-dev2017.json \
  --output_dir=/root/data

執行完上述命令後可以在research資料夾下，可以看到coco開頭的許多檔案。

同時要把coco的pbtxt移動到output_dir下。

使用Tensorflow1進行訓練和推理

配置訓練的Pipeline

Tensorflow Object Detection API使用protobuf檔案來配置訓練和推理流程。訓練的Pipeline模板可以在object_detection/protos/pipeline.proto中找到。同時object_detection/samples/configs 資料夾中提供了簡單的可以直接使用的配置。

下面主要介紹配置的具體內容。

整個配置檔案可以分成五個部分：

model：
train_config
eval_config
train_input_config
eval_input_config

整體結構如下：

model {
(... Add model config here...)
}

train_config : {
(... Add train_config here...)
}

train_input_reader: {
(... Add train_input configuration here...)
}

eval_config: {
}

eval_input_reader: {
(... Add eval_input configuration here...)
}

選擇模型引數

需要注意修改 num_classes 的值去適配自己的任務。

定義輸入

支援TFRecord格式的輸入。需要指明training和evaluation的檔案位置，label map的位置。traning和evaluation資料集的label map應該是相同的。

例子：

tf_record_input_reader {
  input_path: "/usr/home/username/data/train.record"
}
label_map_path: "/usr/home/username/data/label_map.pbtxt"

配置Trainer

train_config定義了三部分訓練流程：

模型引數初始化
輸入預處理：可選的
SGD引數

例子：

batch_size: 1
optimizer {
  momentum_optimizer: {
    learning_rate: {
      manual_step_learning_rate {
        initial_learning_rate: 0.0002
        schedule {
          step: 0
          learning_rate: .0002
        }
        schedule {
          step: 900000
          learning_rate: .00002
        }
        schedule {
          step: 1200000
          learning_rate: .000002
        }
      }
    }
    momentum_optimizer_value: 0.9
  }
  use_moving_average: false
}
fine_tune_checkpoint: "/usr/home/username/tmp/model.ckpt-#####"
from_detection_checkpoint: true
load_all_detection_checkpoint_vars: true
gradient_clipping_by_norm: 10.0
data_augmentation_options {
  random_horizontal_flip {
  }
}

配置Evaluator

eval_config中主要的設定為num_examples和metrics_set。

num_examples：batches的大小
metrics_set：在evaluation的時候使用什麼metrics

Model Parameter Initialization

關於checkpoint的使用。配置檔案中的train_config部分提供了兩個已經存在的checkpoint：

fine_tune_checkpoint：一個路徑字首(ie:"/usr/home/username/checkpoint/model.ckpt-#####").
fine_tune_checkpoint_type：classification/detection

A list of classification checkpoints can be found here.

A list of detection checkpoints can be found here.

Training

單機單卡

Template:

# From the tensorflow/models/research/ directory
PIPELINE_CONFIG_PATH={path to pipeline config file}
MODEL_DIR={path to model directory}
NUM_TRAIN_STEPS=50000
SAMPLE_1_OF_N_EVAL_EXAMPLES=1
python object_detection/model_main.py \
    --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
    --model_dir=${MODEL_DIR} \
    --num_train_steps=${NUM_TRAIN_STEPS} \
    --sample_1_of_n_eval_examples=${SAMPLE_1_OF_N_EVAL_EXAMPLES} \
    --alsologtostderr

Examples：

python object_detection/model_main.py \
    --pipeline_config_path=/root/my_models/faster_rcnn_resnet101_voc07.config \
    --model_dir=/root/my_models/checkpoint \
    --num_train_steps=1 \

${PIPELINE_CONFIG_PATH} ：pipeline config的路徑
${MODEL_DIR}：訓練產生的checkpoint的儲存檔案路徑
num_train_steps：train steps的數量
num_worker：
- = 1：MirroredStrategy
- > 1：MultiWorkerMirroredStrategy.

單機多卡

單機多卡和單機單卡使用的不是用一個啟動程式

Examples：

CUDA_VISIBLE_DEVICES=0,1 python object_detection/legacy/train.py \
		--pipeline_config_path=/root/my_models/faster_rcnn_resnet101_voc07.config 
		--train_dir=/root/my_models/checkpoint \
		--num_clones=2 \
		--ps_tasks=1

train_dir：訓練產生的checkpoint的儲存檔案路徑
num_clones：通常有幾個gpu就是幾
ps_tasks：parameter server的數量。Default:0，不使用ps

多機多卡

官方沒有給出多機多卡的使用方式，google查到的一個是基於hadoop叢集實現的分散式訓練

Evaluation

單機單卡

Template:

# From the tensorflow/models/research/ directory
PIPELINE_CONFIG_PATH={path to pipeline config file}
MODEL_DIR={path to model directory}
CHECKPOINT_DIR=${MODEL_DIR}
MODEL_DIR={path to model directory}
python object_detection/model_main_tf2.py \
    --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
    --model_dir=${MODEL_DIR} \
    --checkpoint_dir=${CHECKPOINT_DIR} \
    --alsologtostderr

Examples:

python object_detection/model_main_tf2.py \
    --pipeline_config_path=/root/my_models/faster_rcnn_resnet101_voc07.config \
    --model_dir=/root/my_models \
    --checkpoint_dir=/root/my_models/checkpoint

${CHECKPOINT_DIR} ：訓練產生的checkpoint的地址。如果使用了這個引數，就會是eval-only的模式，evaluation metrix會存在model_dir路徑下。
${MODEL_DIR/eval}：推理產生的events的地址

單機多卡

Examples：

CUDA_VISIBLE_DEVICES=0,1 python object_detection/legacy/eval.py \
		--checkpoint_dir=/root/my_models/checkpoint \
		--eval_dir=/root/my_models/eval \
		--pipeline_config_path=/root/my_models/faster_rcnn_resnet101_voc07.config

使用Tensorflow2進行訓練和推理

Training

Template：

# From the tensorflow/models/research/ directory
PIPELINE_CONFIG_PATH={path to pipeline config file}
MODEL_DIR={path to model directory}
python object_detection/model_main_tf2.py \
    --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
    --model_dir=${MODEL_DIR} \
    --alsologtostderr

Examples：

python object_detection/model_main_tf2.py \
    --pipeline_config_path=/root/my_models/faster_rcnn_resnet101_voc07.config \
    --model_dir=/root/my_models/checkpoint

${PIPELINE_CONFIG_PATH} ：pipeline config的路徑
${MODEL_DIR}：訓練產生的checkpoint的儲存檔案路徑

注：tf2下預設使用MirroredStrategy()，會直接使用當前機器上的全部GPU進行訓練。如果只用一部分卡可以指定卡號，如strategy = tf.compat.v2.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"])，使用了第0號和第1號卡。

Evaluation

Template：

# From the tensorflow/models/research/ directory
PIPELINE_CONFIG_PATH={path to pipeline config file}
MODEL_DIR={path to model directory}
CHECKPOINT_DIR=${MODEL_DIR}
MODEL_DIR={path to model directory}
python object_detection/model_main_tf2.py \
    --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
    --model_dir=${MODEL_DIR} \
    --checkpoint_dir=${CHECKPOINT_DIR} \
    --alsologtostderr

Examples：

python object_detection/model_main_tf2.py \
    --pipeline_config_path=/root/my_models/faster_rcnn_resnet101_voc07.config \
    --model_dir=/root/my_models/checkpoint \
    --checkpoint_dir=/root/my_models/checkpoint/eval

${CHECKPOINT_DIR}：training產生的checkpoints的路徑
${MODEL_DIR/eval}：evaluation events儲存的路徑

多機多卡

參考Tensorflow1.X的多機多卡部分

常見問題

單機多卡訓練時報錯：ValueError: not enough values to unpack (expected 7, got 0)

配置檔案中batchsize設定成了1。batchsize需要設定成和num_clones同樣的大小。
Tensorflow2.X下使用Faster-RCNN模型報錯：RuntimeError: Groundtruth tensor boxes has not been provide

Tensorflow object detection api在2021/2之後的某次更新中新引入的bug，可以checkout到舊的commit id（31e86e8）。然後重新安裝object detection api。

Reference

Windows10 使用 Tensorflow Object_detection API 訓練自己的資料
2018-10-21
WindowsObjectAPI
[翻譯] 使用 TensorFlow 進行分散式訓練
2022-04-10
分散式
使用PaddleFluid和TensorFlow訓練序列標註模型
2018-07-11
UI模型
【預訓練語言模型】使用Transformers庫進行BERT預訓練
2024-03-13
模型ORM
TensorFlow2.0教程-使用keras訓練模型
2020-02-17
Keras模型
object-detection
2018-05-14
Object
Tesseract-OCR-04-使用 jTessBoxEditor 進行訓練
2018-09-07
使用AutoDL伺服器進行模型訓練
2024-05-06
伺服器模型
【預訓練語言模型】使用Transformers庫進行GPT2預訓練
2024-03-13
模型ORMGPT
使用BatchNorm替代LayerNorm可以減少Vision Transformer訓練時間和推理時間
2024-08-12
BATORM
一窺Habana的推理和訓練神經處理器
2019-12-16
獲取和生成基於TensorFlow的MobilNet預訓練模型
2020-11-03
模型
TorchVision 預訓練模型進行推斷
2021-02-26
模型
windows下使用pytorch進行單機多卡分散式訓練
2023-04-02
WindowsPyTorch分散式
HarmonyOS：使用 MindSpore Lite 引擎進行模型推理
2023-12-14
模型
基於訓練和推理場景下的MindStudio高精度對比
2022-12-06
詳解object detection中的mAP
2019-02-17
Object
Object Detection: Non-Maximum Suppression (NMS)
2024-08-17
Object
tensorflow2.0在訓練資料集的時候，fit和fit_generator的使用
2020-11-24
從DDPM到DDIM(三) DDPM的訓練與推理
2024-07-25
面向推理訓練一體化的 MNN 工作臺
2022-02-08
訓練自己的Android TensorFlow神經網路
2020-10-25
Android神經網路
Object Detection(目標檢測神文)
2018-11-02
Object
目標檢測（Object Detection）總覽
2018-08-08
Object
3D Object Detection Essay Reading 2024.03.27
2024-03-28
3DObject
3D Object Detection Essay Reading 2024.04.01
2024-04-03
3DObject
FCOS: Fully Convolutional One-Stage Object Detection
2020-08-12
Object
A Simple Semi-Supervised Learning Framework for Object Detection
2020-12-10
FrameworkObject
使用自己的資料集訓練MobileNet、ResNet實現影象分類（TensorFlow）
2019-03-09
pytorch和tensorflow的愛恨情仇之定義可訓練的引數
2020-10-06
PyTorch
機器推理系列第五彈：文字+視覺，跨模態預訓練新進展
2020-01-15
視覺
TensorFlow 呼叫預訓練好的模型—— Python 實現
2018-10-10
模型Python
tensorflow訓練時警告：BaseCollectiveExecutor::StartAbort Out of range: End of sequence
2021-01-02
雲端開爐,線上訓練,Bert-vits2-v2.2雲端線上訓練和推理實踐(基於GoogleColab)
2023-12-19
Go
Win10 tensorflow object_detection api 安裝中無法顯示影象的問題解決
2019-02-25
Win10ObjectAPI
Paper Reading:A Survey of Deep Learning-based Object Detection
2020-11-21
Object
MindSpore強化學習：使用PPO配合環境HalfCheetah-v2進行訓練
2024-04-29
強化學習
pytorch訓練簡單的CNN(visdom進行視覺化)
2020-11-02
PyTorchCNN視覺化

使用Tensorflow Object Detection進行訓練和推理

整體流程(以PASCAL VOC為例)

檔案格式

準備輸入資料

產生PASCAL VOC的TFRecord檔案

Generating the COCO TFRecord files.

使用Tensorflow1進行訓練和推理

配置訓練的Pipeline

選擇模型引數

定義輸入

配置Trainer

配置Evaluator

Model Parameter Initialization

Training

單機單卡

單機多卡

多機多卡

Evaluation

單機單卡

單機多卡

使用Tensorflow2進行訓練和推理

Training

Evaluation

多機多卡

常見問題

Reference

相關文章