本文分享自華為雲社群《昇騰 CANN YOLOV8 和 YOLOV9 適配》,作者:jackwangcumt。
1 概述
華為昇騰 CANN YOLOV8 推理示例 C++樣例 , 是基於Ascend CANN Samples官方示例中的sampleYOLOV7進行的YOLOV8適配。一般來說,YOLOV7模型輸出的資料大小為[1,25200,85],而YOLOV8模型輸出的資料大小為[1,84,8400],因此,需要對sampleYOLOV7中的後處理部分進行修改,從而做到YOLOV8/YOLOV9模型的適配。因專案研發需要,公司購置了一臺 Atlas 500 Pro 智慧邊緣伺服器, 安裝的作業系統為Ubuntu 20.04 LTS Server,並按照官方說明文件,安裝的Ascend-cann-toolkit_7.0.RC1_linux-aarch64.run等軟體。具體可以參考另外一篇博文【Atlas 500 Pro 智慧邊緣伺服器推理環境搭建】,這裡不再贅述。
2 YOLOV8模型準備
在進行YOLOV8模型適配工作之前,首先需要獲取YOLOV8的模型檔案,這裡以官方的 YOLOV8n.pt模型為例,在Windows作業系統上可以安裝YOLOV8環境,並執行如下python指令碼(pth2onnx.py)將.pt模型轉化成.onnx模型:
import argparse from ultralytics import YOLO def main(): parser = argparse.ArgumentParser() parser.add_argument('--pt', default="yolov8n", help='.pt file') args = parser.parse_args() model = YOLO(args.pt) onnx_model = model.export(format="onnx", dynamic=False, simplify=True, opset=11) if __name__ == '__main__': main()
具體的YOLOV8環境搭建步驟,可以參考 https://github.com/ultralytics/ultralytics 網站。當成功執行後,會生成yolov8n.onnx模型。輸出內容示例如下所示:
(base) I:\yolov8\Yolov8_for_PyTorch>python pth2onnx.py --pt=yolov8n.pt Ultralytics YOLOv8.0.229 🚀 Python-3.11.5 torch-2.1.2 CPU (Intel Core(TM) i7-10700K 3.80GHz) YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs PyTorch: starting from 'yolov8n.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (6.2 MB) ONNX: starting export with onnx 1.15.0 opset 11... ONNX: simplifying with onnxsim 0.4.36... ONNX: export success ✅ 1.0s, saved as 'yolov8n.onnx' (12.2 MB) Export complete (3.2s) Results saved to I:\yolov8\Yolov8_for_PyTorch Predict: yolo predict task=detect model=yolov8n.onnx imgsz=640 Validate: yolo val task=detect model=yolov8n.onnx imgsz=640 data=coco.yaml Visualize: https://netron.app
從輸出資訊中可以看出, yolov8n.pt原始模型的輸出尺寸為 (1, 3, 640, 640),格式為 BCHW ,輸出尺寸為 (1, 84, 8400) 。這個模型的更多資訊,可以用 netron 工具進行視覺化檢視,在安裝了netron後,可以執行如下命令開啟yolov8n.onnx模型進行Web網路結構的檢視:
(base) I:\yolov8\Yolov8_for_PyTorch>netron yolov8n.onnx Serving 'yolov8x.onnx' at http://localhost:8080
可以看到,轉化後的yolov8n.onnx模型輸入的節點名稱為images,輸入張量的大小為[1,3,640,640] 。在將yolov8n.onnx模型上傳到Atlas 500 Pro伺服器上,執行如下命令進行模型轉換:
atc --model=yolov8n.onnx --framework=5 --output=yolov8n --input_shape="images:1,3,640,640" --soc_version=Ascend310P3 --insert_op_conf=aipp.cfg
其中的:
--soc_version=Ascend310P3可以透過npu-smi info命令進行檢視,我這裡列印的是 310P3 則,--soc_version 為 Ascend字首加上310P3,即Ascend310P3。
--input_shape="images:1,3,640,640" 表示NCHW,即批處理為1,通道為3,圖片大小為640x640,這與onnx模型的輸入節點一致 。
--insert_op_conf=aipp.cfg 中的aipp.cfg來自官網sampleYOLOV7示例。由於原始輸入圖片的大小可能不符合要求,需要縮放到640x640的尺寸。aipp.cfg內容如下:
aipp_op{ aipp_mode:static input_format : YUV420SP_U8 src_image_size_w : 640 src_image_size_h : 640 csc_switch : true rbuv_swap_switch : false matrix_r0c0 : 256 matrix_r0c1 : 0 matrix_r0c2 : 359 matrix_r1c0 : 256 matrix_r1c1 : -88 matrix_r1c2 : -183 matrix_r2c0 : 256 matrix_r2c1 : 454 matrix_r2c2 : 0 input_bias_0 : 0 input_bias_1 : 128 input_bias_2 : 128 crop: true load_start_pos_h : 0 load_start_pos_w : 0 crop_size_w : 640 crop_size_h : 640 min_chn_0 : 0 min_chn_1 : 0 min_chn_2 : 0 var_reci_chn_0: 0.0039215686274509803921568627451 var_reci_chn_1: 0.0039215686274509803921568627451 var_reci_chn_2: 0.0039215686274509803921568627451 }
生執行成功後,會生成 yolov8n.om 離線模型
3 適配程式碼
根據官網sampleYOLOV7示例適配的YOLOV8示例,程式碼已經開源,地址為:https://gitee.com/cumt/ascend-yolov8-sample 。核心程式碼sampleYOLOV8.cpp中的後處理方法GetResult為:
Result SampleYOLOV8::GetResult(std::vector<InferenceOutput> &inferOutputs, string imagePath, size_t imageIndex, bool release) { uint32_t outputDataBufId = 0; float *classBuff = static_cast<float *>(inferOutputs[outputDataBufId].data.get()); // confidence threshold float confidenceThreshold = 0.35; // class number size_t classNum = 80; //// number of (x, y, width, hight) size_t offset = 4; // total number of boxs yolov8 [1,84,8400] size_t modelOutputBoxNum = 8400; // read source image from file cv::Mat srcImage = cv::imread(imagePath); int srcWidth = srcImage.cols; int srcHeight = srcImage.rows; // filter boxes by confidence threshold vector<BoundBox> boxes; size_t yIndex = 1; size_t widthIndex = 2; size_t heightIndex = 3; // size_t all_num = 1 * 84 * 8400 ; // 705,600 for (size_t i = 0; i < modelOutputBoxNum; ++i) { float maxValue = 0; size_t maxIndex = 0; for (size_t j = 0; j < classNum; ++j) { float value = classBuff[(offset + j) * modelOutputBoxNum + i]; if (value > maxValue) { // index of class maxIndex = j; maxValue = value; } } if (maxValue > confidenceThreshold) { BoundBox box; box.x = classBuff[i] * srcWidth / modelWidth_; box.y = classBuff[yIndex * modelOutputBoxNum + i] * srcHeight / modelHeight_; box.width = classBuff[widthIndex * modelOutputBoxNum + i] * srcWidth / modelWidth_; box.height = classBuff[heightIndex * modelOutputBoxNum + i] * srcHeight / modelHeight_; box.score = maxValue; box.classIndex = maxIndex; box.index = i; if (maxIndex < classNum) { boxes.push_back(box); } } } ACLLITE_LOG_INFO("filter boxes by confidence threshold > %f success, boxes size is %ld", confidenceThreshold,boxes.size()); // filter boxes by NMS vector<BoundBox> result; result.clear(); float NMSThreshold = 0.45; int32_t maxLength = modelWidth_ > modelHeight_ ? modelWidth_ : modelHeight_; std::sort(boxes.begin(), boxes.end(), sortScore); BoundBox boxMax; BoundBox boxCompare; while (boxes.size() != 0) { size_t index = 1; result.push_back(boxes[0]); while (boxes.size() > index) { boxMax.score = boxes[0].score; boxMax.classIndex = boxes[0].classIndex; boxMax.index = boxes[0].index; // translate point by maxLength * boxes[0].classIndex to // avoid bumping into two boxes of different classes boxMax.x = boxes[0].x + maxLength * boxes[0].classIndex; boxMax.y = boxes[0].y + maxLength * boxes[0].classIndex; boxMax.width = boxes[0].width; boxMax.height = boxes[0].height; boxCompare.score = boxes[index].score; boxCompare.classIndex = boxes[index].classIndex; boxCompare.index = boxes[index].index; // translate point by maxLength * boxes[0].classIndex to // avoid bumping into two boxes of different classes boxCompare.x = boxes[index].x + boxes[index].classIndex * maxLength; boxCompare.y = boxes[index].y + boxes[index].classIndex * maxLength; boxCompare.width = boxes[index].width; boxCompare.height = boxes[index].height; // the overlapping part of the two boxes float xLeft = max(boxMax.x, boxCompare.x); float yTop = max(boxMax.y, boxCompare.y); float xRight = min(boxMax.x + boxMax.width, boxCompare.x + boxCompare.width); float yBottom = min(boxMax.y + boxMax.height, boxCompare.y + boxCompare.height); float width = max(0.0f, xRight - xLeft); float hight = max(0.0f, yBottom - yTop); float area = width * hight; float iou = area / (boxMax.width * boxMax.height + boxCompare.width * boxCompare.height - area); // filter boxes by NMS threshold if (iou > NMSThreshold) { boxes.erase(boxes.begin() + index); continue; } ++index; } boxes.erase(boxes.begin()); } ACLLITE_LOG_INFO("filter boxes by NMS threshold > %f success, result size is %ld", NMSThreshold,result.size()); // opencv draw label params const double fountScale = 0.5; const uint32_t lineSolid = 2; const uint32_t labelOffset = 11; const cv::Scalar fountColor(0, 0, 255); // BGR const vector<cv::Scalar> colors{ cv::Scalar(255, 0, 0), cv::Scalar(0, 255, 0), cv::Scalar(0, 0, 255)}; int half = 2; for (size_t i = 0; i < result.size(); ++i) { cv::Point leftUpPoint, rightBottomPoint; leftUpPoint.x = result[i].x - result[i].width / half; leftUpPoint.y = result[i].y - result[i].height / half; rightBottomPoint.x = result[i].x + result[i].width / half; rightBottomPoint.y = result[i].y + result[i].height / half; cv::rectangle(srcImage, leftUpPoint, rightBottomPoint, colors[i % colors.size()], lineSolid); string className = label[result[i].classIndex]; string markString = to_string(result[i].score) + ":" + className; ACLLITE_LOG_INFO("object detect [%s] success", markString.c_str()); cv::putText(srcImage, markString, cv::Point(leftUpPoint.x, leftUpPoint.y + labelOffset), cv::FONT_HERSHEY_COMPLEX, fountScale, fountColor); } string savePath = "out_" + to_string(imageIndex) + ".jpg"; cv::imwrite(savePath, srcImage); if (release) { free(classBuff); classBuff = nullptr; } return SUCCESS; }
YOLOV8的輸出尺寸為 (1, 84, 8400),其中的8400代表模型原始預測的物件檢測框資訊,即程式碼中用 size_t modelOutputBoxNum = 8400 ; 進行表示。而 84 代表 4個位的邊界框預測值(x,y,w,h)位置資訊和80個檢測類別數,即 84 = 4 + 80 。由於模型檢測結果是用記憶體連續的一維陣列進行表示的,因此,需要根據yolov8輸出尺寸的實際含義,來訪問需要的陣列記憶體地址來獲取需要的值。根據資料顯示,yolov8模型不另外對置信度進行預測, 而是採用類別裡面最大的機率作為置信度的值,8400是yolov8模型各尺度輸出特徵圖疊加之後的結果,一般推理不需要處理。下面給出模型尺寸和記憶體陣列的對映示意圖 :
即除首行外,將其他83行的每一行依次變換到首行的末尾構成一維陣列,一維陣列的大小位 8400 x 84 。遍歷陣列時,首先將8400個預測資訊中的置信度獲取到,即偏移offset=4個後,獲取80個類別位中最大的值以及索引轉化為置信度和類別ID。前4個代表x,y,w,預測框資訊。對於個性化定製的模型,則需要修改 size_t classNum = 80; 即可,參考onnx輸出尺寸[1,84,8400]中的84-4 = 80 , 比如自定義的模型輸出為[1,26,8400],則 size_t classNum = 22 (26-4).
4 編譯執行
下載開原始碼,上傳伺服器,並解壓,然後執行如下命令進行程式碼編譯:
unzip ascend-yolov8-sample-master.zip -d ./kztech cd ascend-yolov8-sample-master/src # src目錄下 cmake . make #如果正確執行,則會在../out目錄中生成 main 可執行檔案,在src目錄中執行示例 ../out/main #如果報如下錯誤: ../out/main: error while loading shared libraries: libswresample.so.3: cannot open shared object file: No such file or directory 則嘗試設定如下環境變數後重試: export LD_LIBRARY_PATH=/usr/local/Ascend/thirdpart/aarch64/lib:$LD_LIBRARY_PATH #正確執行後,會在當前目錄中生成out_0.jpg檔案
執行成功,控制檯列印如下資訊:
root@atlas500ai:/home/kztech/ascend-yolov8-sample-master/src# ../out/main [INFO] Acl init ok [INFO] Open device 0 ok [INFO] Use default context currently [INFO] dvpp init resource ok [INFO] Load model ../model/yolov8n.om success [INFO] Create model description success [INFO] Create model(../model/yolov8n.om) output success [INFO] Init model ../model/yolov8n.om success [INFO] filter boxes by confidence threshold > 0.350000 success, boxes size is 10 [INFO] filter boxes by NMS threshold > 0.450000 success, result size is 1 [INFO] object detect [0.878906:dog] success [INFO] Inference elapsed time : 0.038817 s , fps is 25.761685 [INFO] Unload model ../model/yolov8n.om success [INFO] destroy context ok [INFO] Reset device 0 ok [INFO] Finalize acl ok
5 總結
YOLO各系列的適配過程,大部分都是處理輸入格式和輸出格式的變換上,參考YOLOV7,可以進行YOLOV8模型的適配,同理,YOLOV9的模型適配也是一樣的。目前YOLOV9和YOLOV8模型輸出格式一致,因此,只需要進行yolov9xx.om模型的生成工作即可。yolov9-c-converted.pt模型(https://github.com/WongKinYiu/yolov9/releases/download/v0.1/yolov9-c-converted.pt)轉換如下: 在windows作業系統上可以安裝YOLOV9環境,並執行如下python指令碼將.pt模型轉化成.onnx模型:
#從base環境建立新的環境yolov9 conda create -n yolov9 --clone base #啟用虛擬環境yolov9 conda activate yolov9 #克隆yolov9程式碼 git clone https://github.com/WongKinYiu/yolov9 # 安裝yolov9專案的依賴 (yolov9) I:\yolov9-main>pip install -r requirements.txt # 模型轉換匯出onnx (yolov9) I:\yolov9-main>python export.py --weights yolov9-c-converted.pt --include onnx
(yolov9) I:\yolov9-main>python export.py --weights yolov9-c-converted.pt --include onnx export: data=I:\yolov9-main\data\coco.yaml, weights=['yolov9-c-converted.pt'], imgsz=[640, 640], batch_size=1, device=cpu, half=False, inplace=False, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['onnx'] YOLO 2024-3-13 Python-3.11.5 torch-2.1.2 CPU Fusing layers... gelan-c summary: 387 layers, 25288768 parameters, 64944 gradients, 102.1 GFLOPs PyTorch: starting from yolov9-c-converted.pt with output shape (1, 84, 8400) (49.1 MB) ONNX: starting export with onnx 1.15.0... ONNX: export success 2.9s, saved as yolov9-c-converted.onnx (96.8 MB) Export complete (4.4s) Results saved to I:\yolov9-main Detect: python detect.py --weights yolov9-c-converted.onnx Validate: python val.py --weights yolov9-c-converted.onnx PyTorch Hub: model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov9-c-converted.onnx') Visualize: https://netron.app
atc --model=yolov9-c-converted.onnx --framework=5 --output=yolov9-c-converted --input_shape="images:1,3,640,640" --soc_version=Ascend310P3 --insert_op_conf=aipp.cfg
點選關注,第一時間瞭解華為雲新鮮技術~