OpenVINO計算機視覺模型加速

wydilearn發表於2022-12-07

OpenVINO計算機視覺模型加速

OpenVINO介紹

  • 計算機視覺部署框架,支援多種邊緣硬體平臺
  • Intel開發並開源使用的計算機視覺庫
  • 支援多個場景視覺任務場景的快速演示

四個主要模組:

1、開發環境搭建

安裝cmake、Miniconda3、Notepad++、PyCharm、VisualStudio 2019

注意:安裝Miniconda3一定要設定其自動新增環境變數,需要新增5個環境變數,手動新增很可能會漏掉,排錯困難

下載OpenVINO並安裝:[Download Intel® Distribution of OpenVINO™ Toolkit](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download-previous-versions.html?operatingsystem=window&distributions=webdownload&version=2021 4.2 LTS&options=offline)

安裝完畢後執行測試程式

出現下面執行結果代表安裝配置成功

新增OpenVINO環境變數

配置VisualStudio包含目錄、庫目錄及附加依賴項

執行以下指令碼自動獲取附加依賴項

新增附加依賴項

至此,開發環境搭建完畢!

2、SDK介紹與開發流程

inference_engine.dll 推理引擎

依賴支援:inference_engine_transformations.dll, tbb.dll, tbbmalloc.dll, ngraph.dll

一定要把這些dll檔案都新增到 C:/Windows/System32 中才可以正常執行OpenVINO程式

InferenceEngine相關API函式支援

  • InferenceEngine::Core
  • InferenceEngine::Blob, InferenceEngine::TBlob, InferenceEngine::NV12Blob
  • InferenceEngine::BlobMap
  • InferenceEngine::InputsDataMap, InferenceEngine::InputInfo
  • InferenceEngine::OutputsDataMap
  • InferenceEngine核心庫的包裝類
    • InferenceEngine::CNNNetwork
    • InferenceEngine::ExecutableNetwork
    • InferenceEngine::InferRequest

程式碼實現

#include <inference_engine.hpp>
#include <iostream>

using namespace InferenceEngine;

int main(int argc, char** argv) {

	InferenceEngine::Core ie;  //使用推理引擎獲取可用的裝置及cpu全稱
	std::vector<std::string> devices = ie.GetAvailableDevices();
	for (std::string name : devices) {
		std::cout << "device name: " << name << std::endl;
	}
	std::string cpuName = ie.GetMetric("CPU", METRIC_KEY(FULL_DEVICE_NAME)).as<std::string>();
	std::cout << "cpu full name: " << cpuName << std::endl;

	return 0;
}

效果:

3、ResNet18實現影像分類

預訓練模型介紹 - ResNet18

  • 預處理影像
  • mean = [0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225],影像歸一化後再減去均值,除以方差
  • 輸入:NCHW = 1 * 3 * 224 * 224 (num,channels,height,width)
  • 輸出格式:1 * 1000

程式碼實現整體步驟

  • 初始化Core ie
  • ie.ReadNetwork
  • 獲取輸入與輸出格式並設定精度
  • 獲取可執行網路並連結硬體
  • auto executable_network = ie.LoadNetwork(network, "CPU");
  • 建立推理請求
  • auto infer_request = executable_network.CreateInferRequest();
  • 設定輸入資料 - 影像資料預處理
  • 推理並解析輸出

程式碼實現

#include <inference_engine.hpp>
#include <opencv2/opencv.hpp>
#include <fstream>  //fstream檔案讀寫操作,iostream為控制檯操作

using namespace InferenceEngine;
std::string labels_txt_file = "D:/projects/models/resnet18_ir/imagenet_classes.txt";
std::vector<std::string> readClassNames();

int main(int argc, char** argv) {

	InferenceEngine::Core ie;
	std::vector<std::string> devices = ie.GetAvailableDevices();
	for (std::string name : devices) {
		std::cout << "device name: " << name << std::endl;
	}
	std::string cpuName = ie.GetMetric("CPU", METRIC_KEY(FULL_DEVICE_NAME)).as<std::string>();
	std::cout << "cpu name: " << cpuName << std::endl;
	
	std::string xml = "D:/projects/models/resnet18_ir/resnet18.xml";
	std::string bin = "D:/projects/models/resnet18_ir/resnet18.bin";
	std::vector<std::string> labels = readClassNames();  //讀取標籤
	cv::Mat src = cv::imread("D:/images/messi.jpg");  //讀取影像
	InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //讀取resnet18網路

	InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一個Mat陣列
	InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一個Mat陣列
	std::string input_name = "";
	for (auto item : inputs) {  //auto可以自動推斷變數型別
		input_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto input_data = item.second;
		input_data->setPrecision(Precision::FP32);
		input_data->setLayout(Layout::NCHW);
		input_data->getPreProcess().setColorFormat(ColorFormat::RGB);
		std::cout << "input name: " << input_name << std::endl;
	}
	std::string output_name = "";
	for (auto item : outputs) {  //auto可以自動推斷變數型別
		output_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);
		//注意:output_data不要設定結構
		std::cout << "output name: " << output_name << std::endl;
	}

	auto executable_network = ie.LoadNetwork(network, "CPU");  //設定執行的裝置
	auto infer_request = executable_network.CreateInferRequest();  //設定推理請求
	
	//影像預處理
	auto input = infer_request.GetBlob(input_name);  //獲取網路輸入影像資訊
	size_t num_channels = input->getTensorDesc().getDims()[1];  //size_t 型別表示C中任何物件所能達到的最大長度,它是無符號整數
	size_t h = input->getTensorDesc().getDims()[2];
	size_t w = input->getTensorDesc().getDims()[3];
	size_t image_size = h * w;
	cv::Mat blob_image;
	cv::resize(src, blob_image, cv::Size(w, h));  //將輸入圖片大小轉換為與網路輸入大小一致
	blob_image.convertTo(blob_image, CV_32F);  //將輸入影像轉換為浮點數
	blob_image = blob_image / 255.0;
	cv::subtract(blob_image, cv::Scalar(0.485, 0.456, 0.406), blob_image);
	cv::divide(blob_image, cv::Scalar(0.229, 0.224, 0.225), blob_image);
	// HWC =》NCHW  將輸入影像從HWC格式轉換為NCHW格式
	float* data = static_cast<float*>(input->buffer());  //將影像放到buffer中,放入input中
	for (size_t row = 0; row < h; row++) {
		for (size_t col = 0; col < w; col++) {
			for (size_t ch = 0; ch < num_channels; ch++) {
				//將每個通道變成一張圖,按照通道順序
				data[image_size * ch + row * w + col] = blob_image.at<cv::Vec3f>(row, col)[ch];
			}
		}
	}

	infer_request.Infer();
	auto output = infer_request.GetBlob(output_name);
	//轉換輸出資料
	const float* probs = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());
	const SizeVector outputDims = output->getTensorDesc().getDims();  //獲取輸出維度資訊 1*1000
	std::cout << outputDims[0] << "x" << outputDims[1] << std::endl;
	float max = probs[0];
	int max_index = 0;
	for (int i = 1; i < outputDims[1]; i++) {
		if (max < probs[i]) {  //找到結果probs中的最大值,獲取其下標
			max = probs[i];
			max_index = i;
		}
	}
	std::cout << "class index: " << max_index << std::endl;
	std::cout << "class name: " << labels[max_index] << std::endl;
	cv::putText(src, labels[max_index], cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
	cv::namedWindow("out", cv::WINDOW_FREERATIO);
	cv::imshow("out", src);
	cv::waitKey(0);
	return 0;
}

std::vector<std::string> readClassNames() {  //讀取檔案

	std::vector<std::string> classNames;
	std::ifstream fp(labels_txt_file);
	if (!fp.is_open()) {
		printf("could not open file...\n");
		exit(-1);
	}
	std::string name;
	while (!fp.eof()) {  //eof()函式判斷是否讀到檔案末尾
		std::getline(fp, name);  //逐行讀取檔案並儲存在變數中
		if (name.length()) {
			classNames.push_back(name);
		}
	}
	fp.close();
	return classNames;
}

效果:

4、車輛檢測與車牌識別

模型介紹

  • vehicle - license - plate - detection - varrier - 0106
  • 基於BIT-Vehicle資料集
  • 輸入 1 * 3 * 300 * 300 = NCHW
  • 輸出格式:[1, 1, N, 7]
  • 七個值:[image_id, label, conf, x_min, y_min, x_max, y_max]

呼叫流程

  • 載入模型
  • 設定輸入輸出
  • 構建輸入
  • 執行推斷
  • 解析輸出
  • 顯示結果

車輛及車牌檢測模型下載

cd C:\Program Files (x86)\Intel\openvino_2021.2.185\deployment_tools\open_model_zoo\tools\downloader  #以管理員身份執行cmd,切換到downloader資料夾下

python downloader.py --name vehicle-license-plate-detection-barrier-0106  #在該資料夾下執行該指令碼,下載模型

出現下圖代表下載成功:

將下載的模型檔案移動到模型資料夾中:

車輛及車牌檢測程式碼實現

#include <inference_engine.hpp>
#include <opencv2/opencv.hpp>
#include <fstream>  //fstream檔案讀寫操作,iostream為控制檯操作

using namespace InferenceEngine;

int main(int argc, char** argv) {

	InferenceEngine::Core ie;
	std::vector<std::string> devices = ie.GetAvailableDevices();
	for (std::string name : devices) {
		std::cout << "device name: " << name << std::endl;
	}
	std::string cpuName = ie.GetMetric("CPU", METRIC_KEY(FULL_DEVICE_NAME)).as<std::string>();
	std::cout << "cpu name: " << cpuName << std::endl;

	std::string xml = "D:/projects/models/vehicle-license-plate-detection-barrier-0106/FP32/vehicle-license-plate-detection-barrier-0106.xml";
	std::string bin = "D:/projects/models/vehicle-license-plate-detection-barrier-0106/FP32/vehicle-license-plate-detection-barrier-0106.bin";
	cv::Mat src = cv::imread("D:/images/car_1.bmp");  //讀取影像
	int im_h = src.rows;
	int im_w = src.cols;
	InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //讀取resnet18網路

	InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一個Mat陣列
	InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一個Mat陣列
	std::string input_name = "";
	for (auto item : inputs) {  //auto可以自動推斷變數型別
		input_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto input_data = item.second;
		input_data->setPrecision(Precision::U8);  //預設為unsigned char對應U8
		input_data->setLayout(Layout::NCHW);
		//input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  預設就是BGR
		std::cout << "input name: " << input_name << std::endl;
	}
	std::string output_name = "";
	for (auto item : outputs) {  //auto可以自動推斷變數型別
		output_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);  //輸出還是浮點數
		//注意:output_data不要設定結構
		std::cout << "output name: " << output_name << std::endl;
	}

	auto executable_network = ie.LoadNetwork(network, "CPU");  //設定執行的裝置
	auto infer_request = executable_network.CreateInferRequest();  //設定推理請求

	//影像預處理
	auto input = infer_request.GetBlob(input_name);  //獲取網路輸入影像資訊
	size_t num_channels = input->getTensorDesc().getDims()[1];  //size_t 型別表示C中任何物件所能達到的最大長度,它是無符號整數
	size_t h = input->getTensorDesc().getDims()[2];
	size_t w = input->getTensorDesc().getDims()[3];
	size_t image_size = h * w;
	cv::Mat blob_image;
	cv::resize(src, blob_image, cv::Size(w, h));  //將輸入圖片大小轉換為與網路輸入大小一致
	//cv::cvtColor(blob_image, blob_image, cv::COLOR_BGR2RGB);  //色彩空間轉換
	
	// HWC =》NCHW  將輸入影像從HWC格式轉換為NCHW格式
	unsigned char* data = static_cast<unsigned char*>(input->buffer());  //將影像放到buffer中,放入input中
	for (size_t row = 0; row < h; row++) {
		for (size_t col = 0; col < w; col++) {
			for (size_t ch = 0; ch < num_channels; ch++) {
				//將每個通道變成一張圖,按照通道順序
				data[image_size * ch + row * w + col] = blob_image.at<cv::Vec3b>(row, col)[ch];
			}
		}
	}

	infer_request.Infer();
	auto output = infer_request.GetBlob(output_name);
	//轉換輸出資料
	const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());
	//output:[1, 1, N, 7]
	//七個引數為:[image_id, label, conf, x_min, y_min, x_max, y_max]
	const SizeVector outputDims = output->getTensorDesc().getDims();  //獲取輸出維度資訊 1*1000
	std::cout << outputDims[2] << "x" << outputDims[3] << std::endl;
	const int max_count = outputDims[2];  //識別出的物件個數
	const int object_size = outputDims[3];  //獲取物件資訊的個數,此處為7個
	for (int n = 0; n < max_count; n++) {
		float label = detection_out[n * object_size + 1];
		float confidence = detection_out[n * object_size + 2];
		float xmin = detection_out[n * object_size + 3] * im_w;
		float ymin = detection_out[n * object_size + 4] * im_h;
		float xmax = detection_out[n * object_size + 5] * im_w;
		float ymax = detection_out[n * object_size + 6] * im_h;
		if (confidence > 0.5) {
			printf("label id: %d \n", static_cast<int>(label));
			cv::Rect box;
			box.x = static_cast<int>(xmin);
			box.y = static_cast<int>(ymin);
			box.width = static_cast<int>(xmax - xmin);
			box.height = static_cast<int>(ymax - ymin);
			cv::rectangle(src, box, cv::Scalar(0, 0, 255), 2, 8);
			//box.tl()返回矩形左上角座標
			cv::putText(src, cv::format("%.2f", confidence), box.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
		}
	}
	
	//cv::putText(src, labels[max_index], cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
	cv::namedWindow("out", cv::WINDOW_FREERATIO);
	cv::imshow("out", src);
	cv::waitKey(0);
	return 0;
}

效果:

車牌識別

  • 模型名稱:license-plate-recognition-barrier-0001
  • 輸入格式:BGR
  • 1 * 3 * 24 * 94,88 * 1 = [0, 1, 1, 1, 1, ...... , 1]
  • 輸出格式:1 * 88 * 1 * 1

下載模型(license-plate-recognition-barrier-0001),下載方法同上,實現思路:1初始化車牌識別網路,提升輸入輸出值的應用範圍;2呼叫車輛及車牌檢測模型進行車牌檢測;3將車牌檢測的資料輸入車牌識別函式,使用車牌識別網路初始化的輸入輸出值在該函式中進行識別,輸出識別到的車牌資訊。

車牌識別程式碼實現

#include <opencv2/opencv.hpp>
#include <inference_engine.hpp>
#include <fstream>

using namespace InferenceEngine;
static std::vector<std::string> items = {
	"0","1","2","3","4","5","6","7","8","9",
	"< Anhui >","< Beijing >","< Chongqing >","< Fujian >",
	"< Gansu >","< Guangdong >","< Guangxi >","< Guizhou >",
	"< Hainan >","< Hebei >","< Heilongjiang >","< Henan >",
	"< HongKong >","< Hubei >","< Hunan >","< InnerMongolia >",
	"< Jiangsu >","< Jiangxi >","< Jilin >","< Liaoning >",
	"< Macau >","< Ningxia >","< Qinghai >","< Shaanxi >",
	"< Shandong >","< Shanghai >","< Shanxi >","< Sichuan >",
	"< Tianjin >","< Tibet >","< Xinjiang >","< Yunnan >",
	"< Zhejiang >","< police >",
	"A","B","C","D","E","F","G","H","I","J",
	"K","L","M","N","O","P","Q","R","S","T",
	"U","V","W","X","Y","Z"
};

InferenceEngine::InferRequest plate_request;
std::string plate_input_name1;
std::string plate_input_name2;
std::string plate_output_name;

void load_plate_recog_model();
void fetch_plate_text(cv::Mat &image, cv::Mat &plateROI);

int main(int argc, char** argv) {

	InferenceEngine::Core ie;
	load_plate_recog_model();  //呼叫車牌識別模型,模型資訊儲存到plate_input_name1/name2/output_name中

	std::string xml = "D:/projects/models/vehicle-license-plate-detection-barrier-0106/FP32/vehicle-license-plate-detection-barrier-0106.xml";
	std::string bin = "D:/projects/models/vehicle-license-plate-detection-barrier-0106/FP32/vehicle-license-plate-detection-barrier-0106.bin";
	cv::Mat src = cv::imread("D:/images/car_1.bmp");  //讀取影像
	int im_h = src.rows;
	int im_w = src.cols;
	InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //讀取resnet18網路

	InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一個Mat陣列
	InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一個Mat陣列
	std::string input_name = "";
	for (auto item : inputs) {  //auto可以自動推斷變數型別
		input_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto input_data = item.second;
		input_data->setPrecision(Precision::U8);  //預設為unsigned char對應U8
		input_data->setLayout(Layout::NCHW);
		//input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  預設就是BGR
		std::cout << "input name: " << input_name << std::endl;
	}
	std::string output_name = "";
	for (auto item : outputs) {  //auto可以自動推斷變數型別
		output_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);  //輸出還是浮點數
		//注意:output_data不要設定結構
		std::cout << "output name: " << output_name << std::endl;
	}

	auto executable_network = ie.LoadNetwork(network, "CPU");  //設定執行的裝置
	auto infer_request = executable_network.CreateInferRequest();  //設定推理請求

	//影像預處理
	auto input = infer_request.GetBlob(input_name);  //獲取網路輸入影像資訊
	size_t num_channels = input->getTensorDesc().getDims()[1];  //size_t 型別表示C中任何物件所能達到的最大長度,它是無符號整數
	size_t h = input->getTensorDesc().getDims()[2];
	size_t w = input->getTensorDesc().getDims()[3];
	size_t image_size = h * w;
	cv::Mat blob_image;
	cv::resize(src, blob_image, cv::Size(w, h));  //將輸入圖片大小轉換為與網路輸入大小一致
	//cv::cvtColor(blob_image, blob_image, cv::COLOR_BGR2RGB);  //色彩空間轉換

	// HWC =》NCHW  將輸入影像從HWC格式轉換為NCHW格式
	unsigned char* data = static_cast<unsigned char*>(input->buffer());  //將影像放到buffer中,放入input中
	for (size_t row = 0; row < h; row++) {
		for (size_t col = 0; col < w; col++) {
			for (size_t ch = 0; ch < num_channels; ch++) {
				//將每個通道變成一張圖,按照通道順序
				data[image_size * ch + row * w + col] = blob_image.at<cv::Vec3b>(row, col)[ch];
			}
		}
	}

	infer_request.Infer();
	auto output = infer_request.GetBlob(output_name);
	//轉換輸出資料
	const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());
	//output:[1, 1, N, 7]
	//七個引數為:[image_id, label, conf, x_min, y_min, x_max, y_max]
	const SizeVector outputDims = output->getTensorDesc().getDims();  //獲取輸出維度資訊 1*1000
	std::cout << outputDims[2] << "x" << outputDims[3] << std::endl;
	const int max_count = outputDims[2];  //識別出的物件個數
	const int object_size = outputDims[3];  //獲取物件資訊的個數,此處為7個
	for (int n = 0; n < max_count; n++) {
		float label = detection_out[n * object_size + 1];
		float confidence = detection_out[n * object_size + 2];
		float xmin = detection_out[n * object_size + 3] * im_w;
		float ymin = detection_out[n * object_size + 4] * im_h;
		float xmax = detection_out[n * object_size + 5] * im_w;
		float ymax = detection_out[n * object_size + 6] * im_h;
		if (confidence > 0.5) {
			printf("label id: %d \n", static_cast<int>(label));
			cv::Rect box;
			box.x = static_cast<int>(xmin);
			box.y = static_cast<int>(ymin);
			box.width = static_cast<int>(xmax - xmin);
			box.height = static_cast<int>(ymax - ymin);

			if (label == 2) {  //將車牌用綠色表示
				cv::rectangle(src, box, cv::Scalar(0, 255, 0), 2, 8);
				//recognize plate
				cv::Rect plate_roi;
				plate_roi.x = box.x - 5;
				plate_roi.y = box.y - 5;
				plate_roi.width = box.width + 10;
				plate_roi.height = box.height + 10;
				cv::Mat roi = src(plate_roi);  //需要先初始化Mat&,才能使用
				//呼叫車牌識別方法
				fetch_plate_text(src, roi);
			}
			else {
				cv::rectangle(src, box, cv::Scalar(0, 0, 255), 2, 8);
			}

			//box.tl()返回矩形左上角座標
			cv::putText(src, cv::format("%.2f", confidence), box.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
		}
	}

	//cv::putText(src, labels[max_index], cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
	cv::namedWindow("out", cv::WINDOW_FREERATIO);
	cv::imshow("out", src);
	cv::waitKey(0);
	return 0;
}

void load_plate_recog_model() {
	InferenceEngine::Core ie;

	std::string xml = "D:/projects/models/license-plate-recognition-barrier-0001/FP32/license-plate-recognition-barrier-0001.xml";
	std::string bin = "D:/projects/models/license-plate-recognition-barrier-0001/FP32/license-plate-recognition-barrier-0001.bin";
	
	InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //讀取網路
	InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一個Mat陣列
	InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一個Mat陣列
	
	int cnt = 0;
	for (auto item : inputs) {  //auto可以自動推斷變數型別
		if (cnt == 0) {
			plate_input_name1 = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
			auto input_data = item.second;
			input_data->setPrecision(Precision::U8);  //預設為unsigned char對應U8
			input_data->setLayout(Layout::NCHW);
		}
		else if (cnt == 1) {
			plate_input_name2 = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
			auto input_data = item.second;
			input_data->setPrecision(Precision::FP32);  //預設為unsigned char對應U8
		}
		//input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  預設就是BGR
		std::cout << "input name: " << (cnt + 1) << ":" << item.first << std::endl;
		cnt++;
	}
	std::string output_name = "";
	for (auto item : outputs) {  //auto可以自動推斷變數型別
		plate_output_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);  //輸出還是浮點數
		//注意:output_data不要設定結構
		std::cout << "output name: " << plate_output_name << std::endl;
	}

	auto executable_network = ie.LoadNetwork(network, "CPU");  //設定執行的裝置
	plate_request = executable_network.CreateInferRequest();  //設定推理請求
}

void fetch_plate_text(cv::Mat &image, cv::Mat &plateROI) {
	//影像預處理,使用車牌識別的方法中獲取的輸入輸出資訊,用於文字獲取
	auto input1 = plate_request.GetBlob(plate_input_name1);  //獲取網路輸入影像資訊
	size_t num_channels = input1->getTensorDesc().getDims()[1];  //size_t 型別表示C中任何物件所能達到的最大長度,它是無符號整數
	size_t h = input1->getTensorDesc().getDims()[2];
	size_t w = input1->getTensorDesc().getDims()[3];
	size_t image_size = h * w;
	cv::Mat blob_image;
	cv::resize(plateROI, blob_image, cv::Size(94, 24));  //將輸入圖片大小轉換為與網路輸入大小一致
	//cv::cvtColor(blob_image, blob_image, cv::COLOR_BGR2RGB);  //色彩空間轉換

	// HWC =》NCHW  將輸入影像從HWC格式轉換為NCHW格式
	unsigned char* data = static_cast<unsigned char*>(input1->buffer());  //將影像放到buffer中,放入input中
	for (size_t row = 0; row < h; row++) {
		for (size_t col = 0; col < w; col++) {
			for (size_t ch = 0; ch < num_channels; ch++) {
				//將每個通道變成一張圖,按照通道順序
				data[image_size * ch + row * w + col] = blob_image.at<cv::Vec3b>(row, col)[ch];
			}
		}
	}

	//使用車牌識別的方法中獲取的輸入輸出資訊,用於文字獲取
	auto input2 = plate_request.GetBlob(plate_input_name2);
	int max_sequence = input2->getTensorDesc().getDims()[0];  //輸出字元長度
	float* blob2 = input2->buffer().as<float*>();
	blob2[0] = 0.0;
	std::fill(blob2 + 1, blob2 + max_sequence, 1.0f);  //填充起止範圍與填充值

	plate_request.Infer();  //執行推理
	auto output = plate_request.GetBlob(plate_output_name);  //獲取推理結果
	const float* plate_data = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());  //獲取浮點型別輸出值plate_data
	std::string result;
	for (int i = 0; i < max_sequence; i++) {
		if (plate_data[i] == -1) {  //end
			break;
		}
		result += items[std::size_t(plate_data[i])];  //型別轉換,字串拼接
	}
	std::cout << result << std::endl;
	cv::putText(image, result.c_str(), cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
}

效果:

5、行人檢測、人臉檢測及表情識別

影片行人檢測

模型介紹

  • pedestrian-detection-adas-0002
  • SSD MobileNetv1
  • 輸入格式:[1 * 3 * 384 * 672]
  • 輸出格式:[1, 1, N, 7]

程式碼實現

#include <inference_engine.hpp>
#include <opencv2/opencv.hpp>
#include <fstream>  //fstream檔案讀寫操作,iostream為控制檯操作

using namespace InferenceEngine;
void infer_process(cv::Mat &frame, InferenceEngine::InferRequest &request, std::string &input_name, std::string &output_name);
int main(int argc, char** argv) {

	InferenceEngine::Core ie;
	
	std::string xml = "D:/projects/models/pedestrian-detection-adas-0002/FP32/pedestrian-detection-adas-0002.xml";
	std::string bin = "D:/projects/models/pedestrian-detection-adas-0002/FP32/pedestrian-detection-adas-0002.bin";
	cv::Mat src = cv::imread("D:/images/pedestrians_test.jpg");  //讀取影像
	int im_h = src.rows;
	int im_w = src.cols;
	InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //讀取車輛檢測網路

	//獲取網路輸入輸出資訊
	InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一個Mat陣列
	InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一個Mat陣列
	std::string input_name = "";
	for (auto item : inputs) {  //auto可以自動推斷變數型別
		input_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto input_data = item.second;
		// A->B 表示提取A中的成員B
		input_data->setPrecision(Precision::U8);  //預設為unsigned char對應U8
		input_data->setLayout(Layout::NCHW);
		//input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  預設就是BGR
		std::cout << "input name: " << input_name << std::endl;
	}
	std::string output_name = "";
	for (auto item : outputs) {  //auto可以自動推斷變數型別
		output_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);  //輸出還是浮點數
		//注意:output_data不要設定結構
		std::cout << "output name: " << output_name << std::endl;
	}

	auto executable_network = ie.LoadNetwork(network, "CPU");  //設定執行的裝置
	auto infer_request = executable_network.CreateInferRequest();  //設定推理請求

	//建立影片流/載入影片檔案
	cv::VideoCapture capture("D:/images/video/pedestrians_test.mp4");
	cv::Mat frame;
	while (true) {
		bool ret = capture.read(frame);
		if (!ret) {  //影片幀為空就跳出迴圈
			break;
		}
		infer_process(frame, infer_request, input_name, output_name);
		cv::imshow("frame", frame);
		char c = cv::waitKey(1);
		if (c == 27) {  //ESC
			break;
		}
	}

	//cv::putText(src, labels[max_index], cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
	cv::namedWindow("out", cv::WINDOW_FREERATIO);
	cv::imshow("out", src);
	cv::waitKey(0);  //最後的影片畫面靜止
	return 0;
}

void infer_process(cv::Mat& frame, InferenceEngine::InferRequest& request, std::string& input_name, std::string& output_name) {
	//影像預處理
	auto input = request.GetBlob(input_name);  //獲取網路輸入影像資訊
	int im_w = frame.cols;
	int im_h = frame.rows;
	size_t num_channels = input->getTensorDesc().getDims()[1];  //size_t 型別表示C中任何物件所能達到的最大長度,它是無符號整數
	size_t h = input->getTensorDesc().getDims()[2];
	size_t w = input->getTensorDesc().getDims()[3];
	size_t image_size = h * w;
	cv::Mat blob_image;
	cv::resize(frame, blob_image, cv::Size(w, h));  //將輸入圖片大小轉換為與網路輸入大小一致
	//cv::cvtColor(blob_image, blob_image, cv::COLOR_BGR2RGB);  //色彩空間轉換

	// HWC =》NCHW  將輸入影像從HWC格式轉換為NCHW格式
	unsigned char* data = static_cast<unsigned char*>(input->buffer());  //將影像放到buffer中,放入input中
	for (size_t row = 0; row < h; row++) {
		for (size_t col = 0; col < w; col++) {
			for (size_t ch = 0; ch < num_channels; ch++) {
				//將每個通道變成一張圖,按照通道順序
				data[image_size * ch + row * w + col] = blob_image.at<cv::Vec3b>(row, col)[ch];
			}
		}
	}

	request.Infer();
	auto output = request.GetBlob(output_name);
	//轉換輸出資料
	const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());
	//output:[1, 1, N, 7]
	//七個引數為:[image_id, label, conf, x_min, y_min, x_max, y_max]
	const SizeVector outputDims = output->getTensorDesc().getDims();  //獲取輸出維度資訊 1*1000
	std::cout << outputDims[2] << "x" << outputDims[3] << std::endl;
	const int max_count = outputDims[2];  //識別出的物件個數
	const int object_size = outputDims[3];  //獲取物件資訊的個數,此處為7個
	for (int n = 0; n < max_count; n++) {
		float label = detection_out[n * object_size + 1];
		float confidence = detection_out[n * object_size + 2];
		float xmin = detection_out[n * object_size + 3] * im_w;
		float ymin = detection_out[n * object_size + 4] * im_h;
		float xmax = detection_out[n * object_size + 5] * im_w;
		float ymax = detection_out[n * object_size + 6] * im_h;
		if (confidence > 0.9) {
			printf("label id: %d \n", static_cast<int>(label));
			cv::Rect box;
			box.x = static_cast<int>(xmin);
			box.y = static_cast<int>(ymin);
			box.width = static_cast<int>(xmax - xmin);
			box.height = static_cast<int>(ymax - ymin);

			if (label == 2) {  //將車牌與車輛用不同顏色表示
				cv::rectangle(frame, box, cv::Scalar(0, 255, 0), 2, 8);
			}
			else {
				cv::rectangle(frame, box, cv::Scalar(0, 0, 255), 2, 8);
			}

			//cv::rectangle(src, box, cv::Scalar(0, 0, 255), 2, 8);
			//box.tl()返回矩形左上角座標
			cv::putText(frame, cv::format("%.2f", confidence), box.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
		}
	}
}

效果:

實時人臉檢測之非同步推理

模型介紹

  • 人臉檢測:face-detection-0202,SSD-MobileNetv2
  • 輸入格式:1 * 3 * 384 * 384
  • 輸出格式:[1, 1, N, 7]
  • OpenVINO中人臉檢測模型0202~0206

同步與非同步執行

程式碼實現

#include <inference_engine.hpp>
#include <opencv2/opencv.hpp>
#include <fstream>  //fstream檔案讀寫操作,iostream為控制檯操作

using namespace InferenceEngine;

//影像預處理函式
template <typename T>
void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0) {
	InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
	const size_t width = blobSize[3];
	const size_t height = blobSize[2];
	const size_t channels = blobSize[1];
	InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
	if (!mblob) {
		THROW_IE_EXCEPTION << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, "
			<< "but by fact we were not able to cast inputBlob to MemoryBlob";
	}
	// locked memory holder should be alive all time while access to its buffer happens
	auto mblobHolder = mblob->wmap();

	T* blob_data = mblobHolder.as<T*>();

	cv::Mat resized_image(orig_image);
	if (static_cast<int>(width) != orig_image.size().width ||
		static_cast<int>(height) != orig_image.size().height) {
		cv::resize(orig_image, resized_image, cv::Size(width, height));
	}

	int batchOffset = batchIndex * width * height * channels;

	for (size_t c = 0; c < channels; c++) {
		for (size_t h = 0; h < height; h++) {
			for (size_t w = 0; w < width; w++) {
				blob_data[batchOffset + c * width * height + h * width + w] =
					resized_image.at<cv::Vec3b>(h, w)[c];
			}
		}
	}
}

void frameToBlob(std::shared_ptr<InferenceEngine::InferRequest>& request, cv::Mat& frame, std::string& input_name) {
	//影像預處理,輸入資料 ->指標獲取成員方法
	InferenceEngine::Blob::Ptr input = request->GetBlob(input_name);  //獲取網路輸入影像資訊
	//該函式template模板型別,需要指定具體型別
	matU8ToBlob<uchar>(frame, input);  //使用該函式處理輸入資料
}

int main(int argc, char** argv) {

	InferenceEngine::Core ie;
	std::vector<std::string> devices = ie.GetAvailableDevices();
	for (std::string name : devices) {
		std::cout << "device name: " << name << std::endl;
	}
	std::string cpuName = ie.GetMetric("CPU", METRIC_KEY(FULL_DEVICE_NAME)).as<std::string>();
	std::cout << "cpu name: " << cpuName << std::endl;

	std::string xml = "D:/projects/models/face-detection-0202/FP32/face-detection-0202.xml";
	std::string bin = "D:/projects/models/face-detection-0202/FP32/face-detection-0202.bin";
	
	//cv::Mat src = cv::imread("D:/images/mmc2.jpg");  //讀取影像
	//int im_h = src.rows;
	//int im_w = src.cols;
	
	InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //讀取車輛檢測網路

	//獲取網路輸入輸出資訊並設定
	InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一個Mat陣列
	InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一個Mat陣列
	std::string input_name = "";
	for (auto item : inputs) {  //auto可以自動推斷變數型別
		input_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto input_data = item.second;
		// A->B 表示提取A中的成員B
		input_data->setPrecision(Precision::U8);  //預設為unsigned char對應U8
		input_data->setLayout(Layout::NCHW);
		//input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  預設就是BGR
		std::cout << "input name: " << input_name << std::endl;
	}
	std::string output_name = "";
	for (auto item : outputs) {  //auto可以自動推斷變數型別
		output_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);  //輸出還是浮點數
		//注意:output_data不要設定結構
		std::cout << "output name: " << output_name << std::endl;
	}

	auto executable_network = ie.LoadNetwork(network, "CPU");  //設定執行的裝置
	//建立指標型別便於後續操作
	auto curr_infer_request = executable_network.CreateInferRequestPtr();  //設定推理請求
	auto next_infer_request = executable_network.CreateInferRequestPtr();  //設定推理請求

	cv::VideoCapture capture("D:/images/video/pedestrians_test.mp4");
	cv::Mat curr_frame;
	cv::Mat next_frame;
	capture.read(curr_frame);  //先讀取一幀作為當前幀
	int im_h = curr_frame.rows;
	int im_w = curr_frame.cols;
	frameToBlob(curr_infer_request, curr_frame, input_name);
	bool first_frame = true;  //設定兩個bool變數控制執行緒開啟
	bool last_frame = false;
	//開啟兩個執行緒,curr轉換顯示結果,next預處理影像,預處理後交換給curr
	while (true) {
		int64 start = cv::getTickCount();  //計時
		bool ret = capture.read(next_frame);  //讀取一幀作為下一幀
		if (!ret) {
			last_frame = true;  //如果下一幀為空,則last_frame為true
		}
		if (!last_frame) {  //如果last_frame為false則預處理下一幀影像
			frameToBlob(next_infer_request, next_frame, input_name);
		}
		if (first_frame) {  //如果first_frame為true則開啟兩個執行緒,同時修改first_frame為false,避免多次開啟執行緒
			curr_infer_request->StartAsync();  //開啟執行緒
			next_infer_request->StartAsync();
			first_frame = false;
		}
		else {  //如果first_frame與last_frame同為false表示只有下一幀不為空,則開啟一個next執行緒
			if (!last_frame) {
				next_infer_request->StartAsync();
			}
		}
		//判斷當前請求是否預處理完畢
		if (InferenceEngine::OK == curr_infer_request->Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY)) {
			auto output = curr_infer_request->GetBlob(output_name);
			//轉換輸出資料
			const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());
			//output:[1, 1, N, 7]
			//七個引數為:[image_id, label, conf, x_min, y_min, x_max, y_max]
			const SizeVector outputDims = output->getTensorDesc().getDims();  //獲取輸出維度資訊 1*1000
			std::cout << outputDims[2] << "x" << outputDims[3] << std::endl;
			const int max_count = outputDims[2];  //識別出的物件個數
			const int object_size = outputDims[3];  //獲取物件資訊的個數,此處為7個
			for (int n = 0; n < max_count; n++) {
				float label = detection_out[n * object_size + 1];
				float confidence = detection_out[n * object_size + 2];
				float xmin = detection_out[n * object_size + 3] * im_w;
				float ymin = detection_out[n * object_size + 4] * im_h;
				float xmax = detection_out[n * object_size + 5] * im_w;
				float ymax = detection_out[n * object_size + 6] * im_h;
				if (confidence > 0.5) {
					printf("label id: %d \n", static_cast<int>(label));
					cv::Rect box;
					box.x = static_cast<int>(xmin);
					box.y = static_cast<int>(ymin);
					box.width = static_cast<int>(xmax - xmin);
					box.height = static_cast<int>(ymax - ymin);

					cv::rectangle(curr_frame, box, cv::Scalar(0, 0, 255), 2, 8);
					//getTickCount()相減得到cpu走過的時鐘週期數,getTickFrequency()得到cpu一秒走過的始終週期數
					float t = (cv::getTickCount() - start) / static_cast<float>(cv::getTickFrequency());
					std::cout << 1.0 / t << std::endl;
					//box.tl()返回矩形左上角座標
					cv::putText(curr_frame, cv::format("%.2f", confidence), box.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
				}
			}
		}
		//顯示結果
		cv::imshow("人臉檢測非同步顯示", curr_frame);
		char c = cv::waitKey(1);
		if (c == 27) {  //ESC
			break;
		}
		if (last_frame) {  //如果last_frame為true表示下一幀為空,則跳出迴圈
			break;
		}

		//非同步交換,下一幀複製到當前幀,當前幀請求與下一幀請求交換
		next_frame.copyTo(curr_frame);
		curr_infer_request.swap(next_infer_request);  //指標可以使用swap方法,否則不行
	}

	cv::waitKey(0);
	return 0;
}

效果:

實時人臉表情識別

模型介紹

  • 人臉檢測:face-detection-0202,SSD-MobileNetv2
  • 輸入格式:1 * 3 * 384 * 384
  • 輸出格式:[1, 1, N, 7]
  • 表情識別:emotions-recognition-retail-0003
  • 1 * 3 * 64 * 64
  • [1, 5, 1, 1] - ('neutral', 'happy', 'sad', 'suprise', 'anger')
  • 下載模型 emotions-recognition-retail-0003 同前

同步與非同步執行

程式碼實現

#include <inference_engine.hpp>
#include <opencv2/opencv.hpp>
#include <fstream>  //fstream檔案讀寫操作,iostream為控制檯操作

using namespace InferenceEngine;

static const char *const items[] = {
	"neutral","happy","sad","surprise","anger"
};

//影像預處理函式
template <typename T>
void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0) {
	InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
	const size_t width = blobSize[3];
	const size_t height = blobSize[2];
	const size_t channels = blobSize[1];
	InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
	if (!mblob) {
		THROW_IE_EXCEPTION << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, "
			<< "but by fact we were not able to cast inputBlob to MemoryBlob";
	}
	// locked memory holder should be alive all time while access to its buffer happens
	auto mblobHolder = mblob->wmap();

	T* blob_data = mblobHolder.as<T*>();

	cv::Mat resized_image(orig_image);
	if (static_cast<int>(width) != orig_image.size().width ||
		static_cast<int>(height) != orig_image.size().height) {
		cv::resize(orig_image, resized_image, cv::Size(width, height));
	}

	int batchOffset = batchIndex * width * height * channels;

	for (size_t c = 0; c < channels; c++) {
		for (size_t h = 0; h < height; h++) {
			for (size_t w = 0; w < width; w++) {
				blob_data[batchOffset + c * width * height + h * width + w] =
					resized_image.at<cv::Vec3b>(h, w)[c];
			}
		}
	}
}

void fetch_emotion(cv::Mat& image, InferenceEngine::InferRequest& request, cv::Rect& face_roi, std::string& e_input, std::string& e_output);
void frameToBlob(std::shared_ptr<InferenceEngine::InferRequest>& request, cv::Mat& frame, std::string& input_name) {
	//影像預處理,輸入資料 ->指標獲取成員方法
	InferenceEngine::Blob::Ptr input = request->GetBlob(input_name);  //獲取網路輸入影像資訊
	//該函式template模板型別,需要指定具體型別
	matU8ToBlob<uchar>(frame, input);  //使用該函式處理輸入資料
}

int main(int argc, char** argv) {

	InferenceEngine::Core ie;

	//load face model
	std::string xml = "D:/projects/models/face-detection-0202/FP32/face-detection-0202.xml";
	std::string bin = "D:/projects/models/face-detection-0202/FP32/face-detection-0202.bin";
	InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //讀取車輛檢測網路
	//獲取網路輸入輸出資訊並設定
	InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一個Mat陣列
	InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一個Mat陣列

	std::string input_name = "";
	for (auto item : inputs) {  //auto可以自動推斷變數型別
		input_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto input_data = item.second;
		// A->B 表示提取A中的成員B
		input_data->setPrecision(Precision::U8);  //預設為unsigned char對應U8
		input_data->setLayout(Layout::NCHW);
		//input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  預設就是BGR
		std::cout << "input name: " << input_name << std::endl;
	}
	std::string output_name = "";
	for (auto item : outputs) {  //auto可以自動推斷變數型別
		output_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);  //輸出還是浮點數
		//注意:output_data不要設定結構
		std::cout << "output name: " << output_name << std::endl;
	}

	auto executable_network = ie.LoadNetwork(network, "CPU");  //設定執行的裝置
	//建立指標型別便於後續操作
	auto curr_infer_request = executable_network.CreateInferRequestPtr();  //設定推理請求
	auto next_infer_request = executable_network.CreateInferRequestPtr();  //設定推理請求



	//load emotion model
	std::string em_xml = "D:/projects/models/emotions-recognition-retail-0003/FP32/emotions-recognition-retail-0003.xml";
	std::string em_bin = "D:/projects/models/emotions-recognition-retail-0003/FP32/emotions-recognition-retail-0003.bin";
	InferenceEngine::CNNNetwork em_network = ie.ReadNetwork(em_xml, em_bin);  //讀取車輛檢測網路
	//獲取網路輸入輸出資訊並設定
	InferenceEngine::InputsDataMap em_inputs = em_network.getInputsInfo();  //DataMap是一個Mat陣列
	InferenceEngine::OutputsDataMap em_outputs = em_network.getOutputsInfo();  //DataMap是一個Mat陣列
	
	std::string em_input_name = "";
	for (auto item : em_inputs) {
		em_input_name = item.first;
		//迴圈作用域內的變數可以不重新命名,為檢視更明確這裡重新命名
		auto em_input_data = item.second;
		em_input_data->setPrecision(Precision::U8);
		em_input_data->setLayout(Layout::NCHW);
	}
	std::string em_output_name = "";
	for (auto item : em_outputs) {  //auto可以自動推斷變數型別
		em_output_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto em_output_data = item.second;
		em_output_data->setPrecision(Precision::FP32);  //輸出還是浮點數
	}
	auto executable_em_network = ie.LoadNetwork(em_network, "CPU");  //設定執行的裝置
	//建立指標型別便於後續操作
	auto em_request = executable_em_network.CreateInferRequest();  //設定推理請求
	
	

	cv::VideoCapture capture("D:/images/video/face_detect.mp4");
	cv::Mat curr_frame;
	cv::Mat next_frame;
	capture.read(curr_frame);  //先讀取一幀作為當前幀
	int im_h = curr_frame.rows;
	int im_w = curr_frame.cols;
	frameToBlob(curr_infer_request, curr_frame, input_name);
	bool first_frame = true;  //設定兩個bool變數控制執行緒開啟
	bool last_frame = false;
	//開啟兩個執行緒,curr轉換顯示結果,next預處理影像,預處理後交換給curr
	while (true) {
		int64 start = cv::getTickCount();  //計時
		bool ret = capture.read(next_frame);  //讀取一幀作為下一幀
		if (!ret) {
			last_frame = true;  //如果下一幀為空,則last_frame為true
		}
		if (!last_frame) {  //如果last_frame為false則預處理下一幀影像
			frameToBlob(next_infer_request, next_frame, input_name);
		}
		if (first_frame) {  //如果first_frame為true則開啟兩個執行緒,同時修改first_frame為false,避免多次開啟執行緒
			curr_infer_request->StartAsync();  //開啟執行緒
			next_infer_request->StartAsync();
			first_frame = false;
		}
		else {  //如果first_frame與last_frame同為false表示只有下一幀不為空,則開啟一個next執行緒
			if (!last_frame) {
				next_infer_request->StartAsync();
			}
		}
		//判斷當前請求是否預處理完畢
		if (InferenceEngine::OK == curr_infer_request->Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY)) {
			auto output = curr_infer_request->GetBlob(output_name);
			//轉換輸出資料
			const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());
			//output:[1, 1, N, 7]
			//七個引數為:[image_id, label, conf, x_min, y_min, x_max, y_max]
			const SizeVector outputDims = output->getTensorDesc().getDims();  //獲取輸出維度資訊 1*1000
			std::cout << outputDims[2] << "x" << outputDims[3] << std::endl;
			const int max_count = outputDims[2];  //識別出的物件個數
			const int object_size = outputDims[3];  //獲取物件資訊的個數,此處為7個
			for (int n = 0; n < max_count; n++) {
				float label = detection_out[n * object_size + 1];
				float confidence = detection_out[n * object_size + 2];
				float xmin = detection_out[n * object_size + 3] * im_w;
				float ymin = detection_out[n * object_size + 4] * im_h;
				float xmax = detection_out[n * object_size + 5] * im_w;
				float ymax = detection_out[n * object_size + 6] * im_h;
				if (confidence > 0.5) {
					printf("label id: %d \n", static_cast<int>(label));
					cv::Rect box;
					box.x = static_cast<int>(xmin);
					box.y = static_cast<int>(ymin);
					xmax = xmax > im_w ? im_w : xmax;  //透過判斷避免越界
					ymax = ymax > im_h ? im_h : ymax;
					box.width = static_cast<int>(xmax - xmin);
					box.height = static_cast<int>(ymax - ymin);
					
					box.x = box.x < 0 ? 0 : box.x;  //透過判斷避免越界
					box.y = box.y < 0 ? 0 : box.y;
					box.width = box.x < 0 ? 0 : box.width;
					box.height = box.x < 0 ? 0 : box.height;

					cv::rectangle(curr_frame, box, cv::Scalar(0, 0, 255), 2, 8);
					
					fetch_emotion(curr_frame, em_request, box, em_input_name, em_output_name);  //獲取表情

					//getTickCount()相減得到cpu走過的時鐘週期數,getTickFrequency()得到cpu一秒走過的始終週期數
					float fps = static_cast<float>(cv::getTickFrequency()) / (cv::getTickCount() - start);
					
					cv::putText(curr_frame, cv::format("FPS:%.2f", fps), cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
				}
			}
		}
		//顯示結果
		cv::imshow("人臉檢測非同步顯示", curr_frame);
		char c = cv::waitKey(1);
		if (c == 27) {  //ESC
			break;
		}
		if (last_frame) {  //如果last_frame為true表示下一幀為空,則跳出迴圈
			break;
		}

		//非同步交換,下一幀複製到當前幀,當前幀請求與下一幀請求交換
		next_frame.copyTo(curr_frame);
		curr_infer_request.swap(next_infer_request);  //指標可以使用swap方法,否則不行
	}

	cv::waitKey(0);
	return 0;
}

//獲取表情
void fetch_emotion(cv::Mat& image, InferenceEngine::InferRequest& request, cv::Rect& face_roi, std::string& e_input, std::string& e_output) {
	
	cv::Mat faceROI = image(face_roi);  //獲取面部區域
	//影像預處理,使用車牌識別的方法中獲取的輸入輸出資訊,用於文字獲取
	auto blob = request.GetBlob(e_input);  //獲取網路輸入影像資訊
	matU8ToBlob<uchar>(faceROI, blob);

	request.Infer();  //執行推理

	auto output = request.GetBlob(e_output);
	//轉換輸出資料
	const float* probs = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());
	const SizeVector outputDims = output->getTensorDesc().getDims();  //獲取輸出維度資訊 1*1000
	std::cout << outputDims[0] << "x" << outputDims[1] << std::endl;
	float max = probs[0];
	int max_index = 0;
	for (int i = 1; i < outputDims[1]; i++) {
		if (max < probs[i]) {  //找到結果probs中的最大值,獲取其下標
			max = probs[i];
			max_index = i;
		}
	}
	std::cout << items[max_index] << std::endl;
	cv::putText(image, items[max_index], face_roi.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
}

效果:

人臉關鍵點landmark檢測

模型介紹

  • face-detection-0202 - 人臉檢測
  • facial-landmarks-35-adas-0002 - landmark提取
  • 輸入格式:[1 * 3 * 60 * 60]
  • 輸出格式:[1, 70]
  • 輸出人臉35個特徵點,浮點數座標

程式流程

程式碼實現

#include <inference_engine.hpp>
#include <opencv2/opencv.hpp>
#include <fstream>  //fstream檔案讀寫操作,iostream為控制檯操作

using namespace InferenceEngine;

//影像預處理函式
template <typename T>
void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0) {
	InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
	const size_t width = blobSize[3];
	const size_t height = blobSize[2];
	const size_t channels = blobSize[1];
	InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
	if (!mblob) {
		THROW_IE_EXCEPTION << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, "
			<< "but by fact we were not able to cast inputBlob to MemoryBlob";
	}
	// locked memory holder should be alive all time while access to its buffer happens
	auto mblobHolder = mblob->wmap();

	T* blob_data = mblobHolder.as<T*>();

	cv::Mat resized_image(orig_image);
	if (static_cast<int>(width) != orig_image.size().width ||
		static_cast<int>(height) != orig_image.size().height) {
		cv::resize(orig_image, resized_image, cv::Size(width, height));
	}

	int batchOffset = batchIndex * width * height * channels;

	for (size_t c = 0; c < channels; c++) {
		for (size_t h = 0; h < height; h++) {
			for (size_t w = 0; w < width; w++) {
				blob_data[batchOffset + c * width * height + h * width + w] =
					resized_image.at<cv::Vec3b>(h, w)[c];
			}
		}
	}
}

void frameToBlob(std::shared_ptr<InferenceEngine::InferRequest>& request, cv::Mat& frame, std::string& input_name) {
	//影像預處理,輸入資料 ->指標獲取成員方法
	InferenceEngine::Blob::Ptr input = request->GetBlob(input_name);  //獲取網路輸入影像資訊
	//該函式template模板型別,需要指定具體型別
	matU8ToBlob<uchar>(frame, input);  //使用該函式處理輸入資料
}

InferenceEngine::InferRequest landmark_request;  //提高推理請求作用域
void loadLandmarksRequest(Core& ie, std::string& land_input_name, std::string& land_output_name);
int main(int argc, char** argv) {

	InferenceEngine::Core ie;
	std::vector<std::string> devices = ie.GetAvailableDevices();
	for (std::string name : devices) {
		std::cout << "device name: " << name << std::endl;
	}
	std::string cpuName = ie.GetMetric("CPU", METRIC_KEY(FULL_DEVICE_NAME)).as<std::string>();
	std::cout << "cpu name: " << cpuName << std::endl;

	std::string xml = "D:/projects/models/face-detection-0202/FP32/face-detection-0202.xml";
	std::string bin = "D:/projects/models/face-detection-0202/FP32/face-detection-0202.bin";

	//cv::Mat src = cv::imread("D:/images/mmc2.jpg");  //讀取影像
	//int im_h = src.rows;
	//int im_w = src.cols;

	InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //讀取車輛檢測網路

	//獲取網路輸入輸出資訊並設定
	InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一個Mat陣列
	InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一個Mat陣列
	std::string input_name = "";
	for (auto item : inputs) {  //auto可以自動推斷變數型別
		input_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto input_data = item.second;
		// A->B 表示提取A中的成員B
		input_data->setPrecision(Precision::U8);  //預設為unsigned char對應U8
		input_data->setLayout(Layout::NCHW);
		//input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  預設就是BGR
		std::cout << "input name: " << input_name << std::endl;
	}
	std::string output_name = "";
	for (auto item : outputs) {  //auto可以自動推斷變數型別
		output_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);  //輸出還是浮點數
		//注意:output_data不要設定結構
		std::cout << "output name: " << output_name << std::endl;
	}

	auto executable_network = ie.LoadNetwork(network, "CPU");  //設定執行的裝置
	//建立指標型別便於後續操作
	auto curr_infer_request = executable_network.CreateInferRequestPtr();  //設定推理請求
	auto next_infer_request = executable_network.CreateInferRequestPtr();  //設定推理請求

	//載入landmark模型
	std::string land_input_name = "";
	std::string land_output_name = "";
	loadLandmarksRequest(ie, land_input_name, land_output_name);

	cv::VideoCapture capture("D:/images/video/emotion_detect.mp4");
	cv::Mat curr_frame;
	cv::Mat next_frame;
	capture.read(curr_frame);  //先讀取一幀作為當前幀
	int im_h = curr_frame.rows;
	int im_w = curr_frame.cols;
	frameToBlob(curr_infer_request, curr_frame, input_name);
	bool first_frame = true;  //設定兩個bool變數控制執行緒開啟
	bool last_frame = false;
	//開啟兩個執行緒,curr轉換顯示結果,next預處理影像,預處理後交換給curr
	while (true) {
		int64 start = cv::getTickCount();  //計時
		bool ret = capture.read(next_frame);  //讀取一幀作為下一幀
		if (!ret) {
			last_frame = true;  //如果下一幀為空,則last_frame為true
		}
		if (!last_frame) {  //如果last_frame為false則預處理下一幀影像
			frameToBlob(next_infer_request, next_frame, input_name);
		}
		if (first_frame) {  //如果first_frame為true則開啟兩個執行緒,同時修改first_frame為false,避免多次開啟執行緒
			curr_infer_request->StartAsync();  //開啟執行緒
			next_infer_request->StartAsync();
			first_frame = false;
		}
		else {  //如果first_frame與last_frame同為false表示只有下一幀不為空,則開啟一個next執行緒
			if (!last_frame) {
				next_infer_request->StartAsync();
			}
		}
		//判斷當前請求是否預處理完畢
		if (InferenceEngine::OK == curr_infer_request->Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY)) {
			auto output = curr_infer_request->GetBlob(output_name);
			//轉換輸出資料
			const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());
			//output:[1, 1, N, 7]
			//七個引數為:[image_id, label, conf, x_min, y_min, x_max, y_max]
			const SizeVector outputDims = output->getTensorDesc().getDims();  //獲取輸出維度資訊 1*1000
			std::cout << outputDims[2] << "x" << outputDims[3] << std::endl;
			const int max_count = outputDims[2];  //識別出的物件個數
			const int object_size = outputDims[3];  //獲取物件資訊的個數,此處為7個
			for (int n = 0; n < max_count; n++) {
				float label = detection_out[n * object_size + 1];
				float confidence = detection_out[n * object_size + 2];
				float xmin = detection_out[n * object_size + 3] * im_w;
				float ymin = detection_out[n * object_size + 4] * im_h;
				float xmax = detection_out[n * object_size + 5] * im_w;
				float ymax = detection_out[n * object_size + 6] * im_h;
				if (confidence > 0.5) {
					printf("label id: %d \n", static_cast<int>(label));
					cv::Rect box;

					float x1 = std::min(std::max(0.0f, xmin), static_cast<float>(im_w));  //防止目標區域越界
					float y1 = std::min(std::max(0.0f, ymin), static_cast<float>(im_h));
					float x2 = std::min(std::max(0.0f, xmax), static_cast<float>(im_w));
					float y2 = std::min(std::max(0.0f, ymax), static_cast<float>(im_h));

					box.x = static_cast<int>(x1);
					box.y = static_cast<int>(y1);
					box.width = static_cast<int>(x2 - x1);
					box.height = static_cast<int>(y2 - y1);

					cv::Mat face_roi = curr_frame(box);
					auto face_input_blob = landmark_request.GetBlob(land_input_name);
					matU8ToBlob<uchar>(face_roi, face_input_blob);
					landmark_request.Infer();  //執行推理獲取目標區域面部特徵點

					auto land_output = landmark_request.GetBlob(land_output_name);
					const float* blob_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(land_output->buffer());
					const SizeVector land_dims = land_output->getTensorDesc().getDims();
					const int b = land_dims[0];
					const int cc = land_dims[1];

					//共70個特徵引數(x0, y0, x1, y1, ..., x34, y34),所以每次要 +2
					for (int i = 0; i < cc; i += 2) {
						float x = blob_out[i] * box.width + box.x;
						float y = blob_out[i + 1] * box.height + box.y;
						cv::circle(curr_frame, cv::Point(x, y), 3, cv::Scalar(255, 0, 0), 2, 8, 0);
					}

					cv::rectangle(curr_frame, box, cv::Scalar(0, 0, 255), 2, 8);
					//getTickCount()相減得到cpu走過的時鐘週期數,getTickFrequency()得到cpu一秒走過的始終週期數
					float t = (cv::getTickCount() - start) / static_cast<float>(cv::getTickFrequency());
					std::cout << 1.0 / t << std::endl;
					//box.tl()返回矩形左上角座標
					cv::putText(curr_frame, cv::format("%.2f", confidence), box.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
				}
			}
		}
		//顯示結果
		cv::imshow("人臉檢測非同步顯示", curr_frame);
		char c = cv::waitKey(1);
		if (c == 27) {  //ESC
			break;
		}
		if (last_frame) {  //如果last_frame為true表示下一幀為空,則跳出迴圈
			break;
		}

		//非同步交換,下一幀複製到當前幀,當前幀請求與下一幀請求交換
		next_frame.copyTo(curr_frame);
		curr_infer_request.swap(next_infer_request);  //指標可以使用swap方法,否則不行
	}

	cv::waitKey(0);
	return 0;
}

void loadLandmarksRequest (Core& ie, std::string& land_input_name, std::string& land_output_name) {
    //下載模型同前
	std::string xml = "D:/projects/models/facial-landmarks-35-adas-0002/FP32/facial-landmarks-35-adas-0002.xml";
	std::string bin = "D:/projects/models/facial-landmarks-35-adas-0002/FP32/facial-landmarks-35-adas-0002.bin";

	InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //讀取網路
	InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一個Mat陣列
	InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一個Mat陣列

	int cnt = 0;
	for (auto item : inputs) {  //auto可以自動推斷變數型別
		land_input_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto input_data = item.second;
		input_data->setPrecision(Precision::U8);  //預設為unsigned char對應U8
		input_data->setLayout(Layout::NCHW);
	}
	for (auto item : outputs) {  //auto可以自動推斷變數型別
		land_output_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);  //輸出還是浮點數
	}

	auto executable_network = ie.LoadNetwork(network, "CPU");  //設定執行的裝置
	landmark_request = executable_network.CreateInferRequest();  //設定推理請求
}

效果:

6、影像語義分割與例項分割

實時道路語義分割

  • 識別道路、背景、路邊、標誌線四個類別

道路分割模型介紹

  • 模型:road-segmentation-adas-0001
  • 輸入格式:[B, C=3, H=512, W=896], BGR
  • 輸出格式:[B, C=4, H=512, W=896]
  • 四個類別:BG, road, curb, mark

程式流程

程式碼實現

#include <inference_engine.hpp>
#include <opencv2/opencv.hpp>
#include <fstream>  //fstream檔案讀寫操作,iostream為控制檯操作

using namespace InferenceEngine;

//影像預處理函式
template <typename T>
void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0) {
	InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
	const size_t width = blobSize[3];
	const size_t height = blobSize[2];
	const size_t channels = blobSize[1];
	InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
	if (!mblob) {
		THROW_IE_EXCEPTION << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, "
			<< "but by fact we were not able to cast inputBlob to MemoryBlob";
	}
	// locked memory holder should be alive all time while access to its buffer happens
	auto mblobHolder = mblob->wmap();

	T* blob_data = mblobHolder.as<T*>();

	cv::Mat resized_image(orig_image);
	if (static_cast<int>(width) != orig_image.size().width ||
		static_cast<int>(height) != orig_image.size().height) {
		cv::resize(orig_image, resized_image, cv::Size(width, height));
	}

	int batchOffset = batchIndex * width * height * channels;

	for (size_t c = 0; c < channels; c++) {
		for (size_t h = 0; h < height; h++) {
			for (size_t w = 0; w < width; w++) {
				blob_data[batchOffset + c * width * height + h * width + w] =
					resized_image.at<cv::Vec3b>(h, w)[c];
			}
		}
	}
}

void frameToBlob(std::shared_ptr<InferenceEngine::InferRequest>& request, cv::Mat& frame, std::string& input_name) {
	//影像預處理,輸入資料 ->指標獲取成員方法
	InferenceEngine::Blob::Ptr input = request->GetBlob(input_name);  //獲取網路輸入影像資訊
	//該函式template模板型別,需要指定具體型別
	matU8ToBlob<uchar>(frame, input);  //使用該函式處理輸入資料
}

int main(int argc, char** argv) {

	InferenceEngine::Core ie;

	std::string xml = "D:/projects/models/road-segmentation-adas-0001/FP32/road-segmentation-adas-0001.xml";
	std::string bin = "D:/projects/models/road-segmentation-adas-0001/FP32/road-segmentation-adas-0001.bin";

	InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //讀取車輛檢測網路

	//獲取網路輸入輸出資訊並設定
	InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一個Mat陣列
	InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一個Mat陣列
	std::string input_name = "";
	for (auto item : inputs) {  //auto可以自動推斷變數型別
		input_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto input_data = item.second;
		// A->B 表示提取A中的成員B
		input_data->setPrecision(Precision::U8);  //預設為unsigned char對應U8
		input_data->setLayout(Layout::NCHW);
		//input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  預設就是BGR
		std::cout << "input name: " << input_name << std::endl;
	}
	std::string output_name = "";
	for (auto item : outputs) {  //auto可以自動推斷變數型別
		output_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);  //輸出還是浮點數
		//注意:output_data不要設定結構
		std::cout << "output name: " << output_name << std::endl;
	}

	auto executable_network = ie.LoadNetwork(network, "CPU");  //設定執行的裝置
	//建立指標型別便於後續操作
	auto curr_infer_request = executable_network.CreateInferRequestPtr();  //設定推理請求
	auto next_infer_request = executable_network.CreateInferRequestPtr();  //設定推理請求

	cv::VideoCapture capture("D:/images/video/road_segmentation.mp4");
	cv::Mat curr_frame;
	cv::Mat next_frame;
	capture.read(curr_frame);  //先讀取一幀作為當前幀
	int im_h = curr_frame.rows;
	int im_w = curr_frame.cols;
	frameToBlob(curr_infer_request, curr_frame, input_name);
	bool first_frame = true;  //設定兩個bool變數控制執行緒開啟
	bool last_frame = false;

	std::vector<cv::Vec3b> color_tab;  //設定分割輸出影像中的不同顏色代表不同分類
	color_tab.push_back(cv::Vec3b(0, 0, 0));  //背景
	color_tab.push_back(cv::Vec3b(255, 0, 0));  //道路
	color_tab.push_back(cv::Vec3b(0, 0, 255));  //路邊
	color_tab.push_back(cv::Vec3b(0, 255, 255));  //路標

	//開啟兩個執行緒,curr轉換顯示結果,next預處理影像,預處理後交換給curr
	while (true) {
		int64 start = cv::getTickCount();  //計時
		bool ret = capture.read(next_frame);  //讀取一幀作為下一幀
		if (!ret) {
			last_frame = true;  //如果下一幀為空,則last_frame為true
		}
		if (!last_frame) {  //如果last_frame為false則預處理下一幀影像
			frameToBlob(next_infer_request, next_frame, input_name);
		}
		if (first_frame) {  //如果first_frame為true則開啟兩個執行緒,同時修改first_frame為false,避免多次開啟執行緒
			curr_infer_request->StartAsync();  //開啟執行緒
			next_infer_request->StartAsync();
			first_frame = false;
		}
		else {  //如果first_frame與last_frame同為false表示只有下一幀不為空,則開啟一個next執行緒
			if (!last_frame) {
				next_infer_request->StartAsync();
			}
		}
		//判斷當前請求是否預處理完畢
		if (InferenceEngine::OK == curr_infer_request->Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY)) {
			auto output = curr_infer_request->GetBlob(output_name);
			//轉換輸出資料
			const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());
			
			//output:[B, C, H, W]
			const SizeVector outputDims = output->getTensorDesc().getDims();  //獲取輸出維度資訊 1*1000
			
			//每個畫素針對每種分類分別有一個識別結果數值,數值最大的為該畫素的分類
			//結果矩陣格式為:每種分類各有一個輸出影像大小的矩陣,每個畫素位置對應其在該分類的可能性
			const int out_c = outputDims[1];  //分割識別的型別個數,此處為4
			const int out_h = outputDims[2];  //分割網路輸出影像的高
			const int out_w = outputDims[3];  //分割網路輸出影像的寬
			cv::Mat result = cv::Mat::zeros(cv::Size(out_w, out_h), CV_8UC3);
			int step = out_h * out_w;
			for (int row = 0; row < out_h; row++) {
				for (int col = 0; col < out_w; col++) {
					int max_index = 0;  //定義一個變數儲存最大分類結果數值的下標
					float max_prob = detection_out[row * out_w + col];
					for (int cn = 1; cn < out_c; cn++) {
						//比較每個畫素在四種不同分類矩陣中的可能性,找到最大可能性的分類
						float prob = detection_out[cn * step + row * out_w + col];
						if (prob > max_prob) {
							max_prob = prob;
							max_index = cn;
						}
					}
					//在結果矩陣中對應畫素位置儲存原圖中該畫素分類對應的顏色
					result.at<cv::Vec3b>(row, col) = color_tab[max_index];
				}
			}
			//先初始化一個網路輸出結果大小的矩陣儲存每個畫素點對應的顏色,再將結果矩陣恢復到原圖大小,以便最終結果顯示
			cv::resize(result, result, cv::Size(im_w, im_h));
			//在輸入影像中對應位置按比例增加結果矩陣中對應的顏色
			cv::addWeighted(curr_frame, 0.5, result, 0.5, 0, curr_frame);
		}
		//getTickCount()相減得到cpu走過的時鐘週期數,getTickFrequency()得到cpu一秒走過的始終週期數
		float t = (cv::getTickCount() - start) / static_cast<float>(cv::getTickFrequency());
		cv::putText(curr_frame, cv::format("FPS: %.2f", 1.0 / t), cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
		//顯示結果
		cv::imshow("道路分割非同步顯示", curr_frame);
		char c = cv::waitKey(1);
		if (c == 27) {  //ESC
			break;
		}
		if (last_frame) {  //如果last_frame為true表示下一幀為空,則跳出迴圈
			break;
		}

		//非同步交換,下一幀複製到當前幀,當前幀請求與下一幀請求交換
		next_frame.copyTo(curr_frame);
		curr_infer_request.swap(next_infer_request);  //指標可以使用swap方法,否則不行
	}

	cv::waitKey(0);
	return 0;
}

效果:

黃色為路面標誌,紅色為路邊,藍色為道路,其餘部分為背景

例項分割

例項分割模型介紹(Mask R-CNN)

  • instance-segmentation-security-0050
  • 有兩個輸入層:
  • im_data: [1 * 3 * 480 * 480],影像資料 1 * C * H * C(num、channels、height、width)
  • im_info: [1 * 3],影像資訊,寬、高和scale
  • 輸出格式:
  • classes: [100, ],最多100個例項,屬於不超過80個分類
  • scores: [100, ],每個檢測到物件不是背景的機率
  • Boxes: [100, 4],每個檢測到的物件的位置(左上角及右下角座標)
  • raw_masks: [100, 81, 28, 28],實際是對每個例項都生成一個14*14的mask,對每個例項獲取81個類別(80個類別+背景)的機率值,輸出81個14 * 14大小的矩陣
  • 實際記憶體中的結果矩陣是 14 * 14 * 81 * 100

程式碼實現

#include <inference_engine.hpp>
#include <opencv2/opencv.hpp>
#include <fstream>  //fstream檔案讀寫操作,iostream為控制檯操作

using namespace InferenceEngine;
/*
void read_coco_labels(std::vector<std::string>& labels) {
	std::string label_file = "D:/projects/models/coco_labels.txt";
	std::ifstream fp(label_file);
	if (!fp.is_open())
	{
		printf("could not open file...\n");
		exit(-1);
	}
	std::string name;
	while (!fp.eof())
	{
		std::getline(fp, name);
		if (name.length())
			labels.push_back(name);
	}
	fp.close();
}
*/

//影像預處理函式
template <typename T>
void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0) {
	InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
	const size_t width = blobSize[3];
	const size_t height = blobSize[2];
	const size_t channels = blobSize[1];
	InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
	if (!mblob) {
		THROW_IE_EXCEPTION << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, "
			<< "but by fact we were not able to cast inputBlob to MemoryBlob";
	}
	// locked memory holder should be alive all time while access to its buffer happens
	auto mblobHolder = mblob->wmap();

	T* blob_data = mblobHolder.as<T*>();

	cv::Mat resized_image(orig_image);
	if (static_cast<int>(width) != orig_image.size().width ||
		static_cast<int>(height) != orig_image.size().height) {
		cv::resize(orig_image, resized_image, cv::Size(width, height));
	}

	int batchOffset = batchIndex * width * height * channels;

	for (size_t c = 0; c < channels; c++) {
		for (size_t h = 0; h < height; h++) {
			for (size_t w = 0; w < width; w++) {
				blob_data[batchOffset + c * width * height + h * width + w] =
					resized_image.at<cv::Vec3b>(h, w)[c];
			}
		}
	}
}

int main(int argc, char** argv) {

	InferenceEngine::Core ie;
	std::vector<std::string> coco_labels;
	//read_coco_labels(coco_labels);
	cv::RNG rng(12345);
	
	std::string xml = "D:/projects/models/instance-segmentation-security-0050/FP32/instance-segmentation-security-0050.xml";
	std::string bin = "D:/projects/models/instance-segmentation-security-0050/FP32/instance-segmentation-security-0050.bin";
	cv::Mat src = cv::imread("D:/images/instance_segmentation.jpg");  //讀取影像
	int im_h = src.rows;
	int im_w = src.cols;
	InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //讀取車輛檢測網路

	//獲取網路輸入輸出資訊
	InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一個Mat陣列
	InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一個Mat陣列
	std::string image_input_name = "";
	std::string image_info_name = "";
	int in_index = 0;
	
	//設定兩個網路輸入資料的引數
	for (auto item : inputs) {  //auto可以自動推斷變數型別
		if (in_index == 0) {
			image_input_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
			auto input_data = item.second;
			// A->B 表示提取A中的成員B
			input_data->setPrecision(Precision::U8);  //預設為unsigned char對應U8,浮點型別則為FP32
			input_data->setLayout(Layout::NCHW);
		}
		else {
			image_info_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
			auto input_data = item.second;
			// A->B 表示提取A中的成員B
			input_data->setPrecision(Precision::FP32);  //預設為unsigned char對應U8,浮點型別則為FP32
		}
		in_index++;
	}
	
	for (auto item : outputs) {  //auto可以自動推斷變數型別
		std::string output_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);  //輸出還是浮點數
		//注意:output_data不要設定結構
		std::cout << "output name: " << output_name << std::endl;
	}

	auto executable_network = ie.LoadNetwork(network, "CPU");  //設定執行的裝置
	auto infer_request = executable_network.CreateInferRequest();  //設定推理請求

	//影像預處理
	auto input = infer_request.GetBlob(image_input_name);  //獲取網路輸入影像資訊
	//將輸入影像轉換為網路的輸入格式
	matU8ToBlob<uchar>(src, input);

	//設定網路的第二個輸入
	auto input2 = infer_request.GetBlob(image_info_name);
	auto imInforDim = inputs.find(image_info_name)->second->getTensorDesc().getDims()[1];
	InferenceEngine::MemoryBlob::Ptr minput2 = InferenceEngine::as<InferenceEngine::MemoryBlob>(input2);
	auto minput2Holder = minput2->wmap();
	float* p = minput2Holder.as<InferenceEngine::PrecisionTrait<InferenceEngine::Precision::FP32>::value_type*>();
	p[0] = static_cast<float>(inputs[image_input_name]->getTensorDesc().getDims()[2]);  //輸入影像的高
	p[1] = static_cast<float>(inputs[image_input_name]->getTensorDesc().getDims()[3]);  //輸入影像的寬
	p[2] = 1.0f;  //scale,前面影像已經轉換為480*480,這裡保持為1.0就可以

	infer_request.Infer();

	float w_rate = static_cast<float>(im_w) / 480.0;  //用於透過網路輸出中的座標獲取原圖的座標
	float h_rate = static_cast<float>(im_h) / 480.0;

	auto scores = infer_request.GetBlob("scores");  //獲取網路輸出中的資訊
	auto boxes = infer_request.GetBlob("boxes");
	auto classes = infer_request.GetBlob("classes");
	auto raw_masks = infer_request.GetBlob("raw_masks");
	//轉換輸出資料
	const float* scores_data = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(scores->buffer());  //強制轉換資料型別為浮點型
	const float* boxes_data = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(boxes->buffer());
	const float* classes_data = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(classes->buffer());
	const auto raw_masks_data = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(raw_masks->buffer());
	const SizeVector scores_outputDims = scores->getTensorDesc().getDims();  //獲取輸出維度資訊 1*1000
	const SizeVector boxes_outputDims = boxes->getTensorDesc().getDims();  //獲取輸出維度資訊 1*1000
	const SizeVector raw_masks_outputDims = raw_masks->getTensorDesc().getDims();  //[100, 81, 28, 28]
	const int max_count = scores_outputDims[0];  //識別出的物件個數
	const int object_size = boxes_outputDims[1];  //獲取物件資訊的個數,此處為4個
	printf("mask NCHW=[%d, %d, %d, %d]\n", raw_masks_outputDims[0], raw_masks_outputDims[1], raw_masks_outputDims[2], raw_masks_outputDims[3]);
	int mask_h = raw_masks_outputDims[2];
	int mask_w = raw_masks_outputDims[3];
	size_t box_stride = mask_h * mask_w * raw_masks_outputDims[1];  //兩個mask之間的距離
	for (int n = 0; n < max_count; n++) {
		float confidence = scores_data[n];
		float xmin = boxes_data[n * object_size] * w_rate;  //轉換為原圖中的座標
		float ymin = boxes_data[n * object_size + 1] * h_rate;
		float xmax = boxes_data[n * object_size + 2] * w_rate;
		float ymax = boxes_data[n * object_size + 3] * h_rate;
		if (confidence > 0.5) {
			cv::Scalar color(rng.uniform(0, 255), rng.uniform(0, 255), rng.uniform(0, 255));
			cv::Rect box;
			float x1 = std::min(std::max(0.0f, xmin), static_cast<float>(im_w));  //避免越界
			float y1 = std::min(std::max(0.0f, ymin), static_cast<float>(im_h));
			float x2 = std::min(std::max(0.0f, xmax), static_cast<float>(im_w));
			float y2 = std::min(std::max(0.0f, ymax), static_cast<float>(im_h));
			box.x = static_cast<int>(x1);
			box.y = static_cast<int>(y1);
			box.width = static_cast<int>(x2 - x1);
			box.height = static_cast<int>(y2 - y1);
			int label = static_cast<int>(classes_data[n]);  //第幾個例項
			//std::cout << "confidence: " << confidence << "class name: " << coco_labels[label] << std::endl;
			//解析mask,raw_masks_data表示所有mask起始位置,box_stride*n表示跳過遍歷的例項
			float* mask_arr = raw_masks_data + box_stride * n + mask_h * mask_w * label;  //找到當前例項當前分類mask的起始指標
			cv::Mat mask_mat(mask_h, mask_w, CV_32FC1, mask_arr);  //從mask_arr指標開始取值構建Mat
			cv::Mat roi_img = src(box);  //建立src大小的Mat並保留box區域
			cv::Mat resized_mask_mat(box.height, box.width, CV_32FC1);
			cv::resize(mask_mat, resized_mask_mat, cv::Size(box.width, box.height));
			cv::Mat uchar_resized_mask(box.height, box.width, CV_8UC3, color);
			roi_img.copyTo(uchar_resized_mask, resized_mask_mat <= 0.5);  //resized_mask_mat中畫素值<=0.5的畫素都不會複製到uchar_resized_mask上
			cv::addWeighted(uchar_resized_mask, 0.7, roi_img, 0.3, 0.0f, roi_img);

			//cv::rectangle(src, box, cv::Scalar(0, 0, 255), 2, 8);
			//box.tl()返回矩形左上角座標
			cv::putText(src, cv::format("%.2f", confidence), box.tl(), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 255), 1, 8);
		}
	}

	//cv::putText(src, labels[max_index], cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
	cv::namedWindow("out", cv::WINDOW_AUTOSIZE);
	cv::imshow("out", src);
	cv::waitKey(0);
	return 0;
}

效果:

7、場景文字檢測與識別

場景文字檢測

模型介紹

  • text-detection-0003
  • PixelLink模型庫,BGR順序
  • 1個輸入層:[B, C, H, W] [1 * 3 * 768 * 1280]
  • 2個輸出層:
  • model/link_logits_/add:[1x16x192x320] - 畫素與周圍畫素的聯絡
  • model/segm_logits/add:[1x2x192x320] - 每個畫素所屬分類(文字/非文字),只要解析第二個輸出就可以獲取文字區域

程式碼實現

#include <inference_engine.hpp>
#include <opencv2/opencv.hpp>
#include <fstream>  //fstream檔案讀寫操作,iostream為控制檯操作

using namespace InferenceEngine;

//影像預處理函式
template <typename T>
void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0) {
	InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
	const size_t width = blobSize[3];
	const size_t height = blobSize[2];
	const size_t channels = blobSize[1];
	InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
	if (!mblob) {
		THROW_IE_EXCEPTION << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, "
			<< "but by fact we were not able to cast inputBlob to MemoryBlob";
	}
	// locked memory holder should be alive all time while access to its buffer happens
	auto mblobHolder = mblob->wmap();

	T* blob_data = mblobHolder.as<T*>();

	cv::Mat resized_image(orig_image);
	if (static_cast<int>(width) != orig_image.size().width ||
		static_cast<int>(height) != orig_image.size().height) {
		cv::resize(orig_image, resized_image, cv::Size(width, height));
	}

	int batchOffset = batchIndex * width * height * channels;

	for (size_t c = 0; c < channels; c++) {
		for (size_t h = 0; h < height; h++) {
			for (size_t w = 0; w < width; w++) {
				blob_data[batchOffset + c * width * height + h * width + w] =
					resized_image.at<cv::Vec3b>(h, w)[c];
			}
		}
	}
}

int main(int argc, char** argv) {

	InferenceEngine::Core ie;

	std::string xml = "D:/projects/models/text-detection-0003/FP32/text-detection-0003.xml";
	std::string bin = "D:/projects/models/text-detection-0003/FP32/text-detection-0003.bin";
	cv::Mat src = cv::imread("D:/images/text_detection.png");  //讀取影像
	cv::imshow("input", src);
	int im_h = src.rows;
	int im_w = src.cols;
	InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //讀取車輛檢測網路

	//獲取網路輸入輸出資訊
	InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一個Mat陣列
	InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一個Mat陣列
	std::string image_input_name = "";

	//設定兩個網路輸入資料的引數
	for (auto item : inputs) {  //auto可以自動推斷變數型別
		image_input_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto input_data = item.second;
		// A->B 表示提取A中的成員B
		input_data->setPrecision(Precision::U8);  //預設為unsigned char對應U8,浮點型別則為FP32
		input_data->setLayout(Layout::NCHW);
	}
	std::string output_name1 = "";
	std::string output_name2 = "";
	int out_index = 0;
	for (auto item : outputs) {  //auto可以自動推斷變數型別
		if (out_index == 1) {
			output_name2 = item.first;
		}
		else {
			output_name1 = item.first;
		}
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);  //輸出還是浮點數
		out_index++;
	}

	auto executable_network = ie.LoadNetwork(network, "CPU");  //設定執行的裝置
	auto infer_request = executable_network.CreateInferRequest();  //設定推理請求

	//影像預處理
	auto input = infer_request.GetBlob(image_input_name);  //獲取網路輸入影像資訊
	//將輸入影像轉換為網路的輸入格式
	matU8ToBlob<uchar>(src, input);

	infer_request.Infer();

	auto output = infer_request.GetBlob(output_name2);  //只解析第二個輸出即可
	//轉換輸出資料
	const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());

	//output:[B, C, H, W] [1, 2, 192, 320]
	const SizeVector outputDims = output->getTensorDesc().getDims();  //獲取輸出維度資訊 1*1000

	//每個畫素針對每種分類分別有一個識別結果數值,數值最大的為該畫素的分類
	//結果矩陣格式為:每種分類各有一個輸出影像大小的矩陣,每個畫素位置對應其在該分類的可能性
	const int out_c = outputDims[1];  //分割識別的型別個數,此處為2
	const int out_h = outputDims[2];  //分割網路輸出影像的高
	const int out_w = outputDims[3];  //分割網路輸出影像的寬
	cv::Mat mask = cv::Mat::zeros(cv::Size(out_w, out_h), CV_32F);
	int step = out_h * out_w;
	for (int row = 0; row < out_h; row++) {
		for (int col = 0; col < out_w; col++) {
			float p1 = detection_out[row * out_w + col];  //獲取每個畫素最大可能的分類類別
			float p2 = detection_out[step + row * out_w + col];
			if (p1 < p2) {
				mask.at<float>(row, col) = p2;
			}
		}
	}
	//先初始化一個網路輸出結果大小的矩陣儲存每個畫素點對應的顏色,再將結果矩陣恢復到原圖大小,以便最終結果顯示
	cv::resize(mask, mask, cv::Size(im_w, im_h));
	mask = mask * 255;
	mask.convertTo(mask, CV_8U);  //把mask從浮點數轉換為整數,並將範圍轉換為0-255
	cv::threshold(mask, mask, 100, 255, cv::THRESH_BINARY);  //將mask按指定範圍進行二值化分割
	std::vector<std::vector<cv::Point>> contours;
	cv::findContours(mask, contours, cv::RETR_EXTERNAL, cv::CHAIN_APPROX_SIMPLE);
	for (size_t t = 0; t < contours.size(); t++) {  //繪製每個前景區域外輪廓,遍歷這些外輪廓並繪製到輸入影像中
		cv::Rect box = cv::boundingRect(contours[t]);
		cv::rectangle(src, box, cv::Scalar(0, 0, 255), 2, 8, 0);
	}
	//cv::putText(src, labels[max_index], cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
	cv::namedWindow("mask", cv::WINDOW_AUTOSIZE);
	cv::imshow("mask", mask);
	cv::imshow("場景文字檢測", src);
	cv::waitKey(0);
	return 0;
}

效果:

場景文字識別

模型介紹

  • 模型名稱:text-recognition-0012
  • 輸入格式 - BCHW = [1 * 1 * 32 * 120],輸入的是單通道灰度圖
  • 輸出層 - WBL = [30, 1, 37], W表示序列長度,每個字元佔一行,共30行,每個字元有37種可能,所以佔37列
  • 其中L為:0123456789abcdefghijklmnopqrstuvwxyz#
  • 表示CTC解析時候的空白字元,CTC的輸出連續兩個字元不能相同,相同字元間必有空格隔開,可參見以下部落格:超詳細講解CTC理論和實戰 - 簡書 (jianshu.com)

程式碼實現

#include <inference_engine.hpp>
#include <opencv2/opencv.hpp>
#include <fstream>  //fstream檔案讀寫操作,iostream為控制檯操作

using namespace InferenceEngine;

//影像預處理函式
template <typename T>
void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0) {
	InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
	const size_t width = blobSize[3];
	const size_t height = blobSize[2];
	const size_t channels = blobSize[1];
	InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
	if (!mblob) {
		THROW_IE_EXCEPTION << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, "
			<< "but by fact we were not able to cast inputBlob to MemoryBlob";
	}
	// locked memory holder should be alive all time while access to its buffer happens
	auto mblobHolder = mblob->wmap();

	T* blob_data = mblobHolder.as<T*>();

	cv::Mat resized_image(orig_image);
	if (static_cast<int>(width) != orig_image.size().width ||
		static_cast<int>(height) != orig_image.size().height) {
		cv::resize(orig_image, resized_image, cv::Size(width, height));
	}

	int batchOffset = batchIndex * width * height * channels;

	for (size_t c = 0; c < channels; c++) {
		for (size_t h = 0; h < height; h++) {
			for (size_t w = 0; w < width; w++) {
				blob_data[batchOffset + c * width * height + h * width + w] =
					resized_image.at<cv::Vec3b>(h, w)[c];
			}
		}
	}
}

//文字識別預處理
void loadTextRecogRequest(Core& ie, std::string& reco_input_name, std::string& reco_output_name);
std::string alphabet = "0123456789abcdefghijklmnopqrstuvwxyz#";  //用於匹配的字元表
std::string ctc_decode(const float* blob_out, int seq_w, int seq_l);  //CTC字元匹配函式
InferenceEngine::InferRequest reco_request;
int main(int argc, char** argv) {

	InferenceEngine::Core ie;

	std::string xml = "D:/projects/models/text-detection-0003/FP32/text-detection-0003.xml";
	std::string bin = "D:/projects/models/text-detection-0003/FP32/text-detection-0003.bin";
	cv::Mat src = cv::imread("D:/images/text_detection02.png");  //讀取影像
	cv::imshow("input", src);
	int im_h = src.rows;
	int im_w = src.cols;
	InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //讀取車輛檢測網路

	//獲取網路輸入輸出資訊
	InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一個Mat陣列
	InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一個Mat陣列
	std::string image_input_name = "";

	//設定兩個網路輸入資料的引數
	for (auto item : inputs) {  //auto可以自動推斷變數型別
		image_input_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto input_data = item.second;
		// A->B 表示提取A中的成員B
		input_data->setPrecision(Precision::U8);  //預設為unsigned char對應U8,浮點型別則為FP32
		input_data->setLayout(Layout::NCHW);
	}
	std::string output_name1 = "";
	std::string output_name2 = "";
	int out_index = 0;
	for (auto item : outputs) {  //auto可以自動推斷變數型別
		if (out_index == 1) {
			output_name2 = item.first;
		}
		else {
			output_name1 = item.first;
		}
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);  //輸出還是浮點數
		out_index++;
	}

	auto executable_network = ie.LoadNetwork(network, "CPU");  //設定執行的裝置
	auto infer_request = executable_network.CreateInferRequest();  //設定推理請求

	//影像預處理
	auto input = infer_request.GetBlob(image_input_name);  //獲取網路輸入影像資訊
	//將輸入影像轉換為網路的輸入格式
	matU8ToBlob<uchar>(src, input);

	infer_request.Infer();

	auto output = infer_request.GetBlob(output_name2);  //只解析第二個輸出即可
	//轉換輸出資料
	const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());

	//output:[B, C, H, W] [1, 2, 192, 320]
	const SizeVector outputDims = output->getTensorDesc().getDims();  //獲取輸出維度資訊 1*1000

	//每個畫素針對每種分類分別有一個識別結果數值,數值最大的為該畫素的分類
	//結果矩陣格式為:每種分類各有一個輸出影像大小的矩陣,每個畫素位置對應其在該分類的可能性
	const int out_c = outputDims[1];  //分割識別的型別個數,此處為2
	const int out_h = outputDims[2];  //分割網路輸出影像的高
	const int out_w = outputDims[3];  //分割網路輸出影像的寬
	cv::Mat mask = cv::Mat::zeros(cv::Size(out_w, out_h), CV_8U);
	int step = out_h * out_w;
	for (int row = 0; row < out_h; row++) {
		for (int col = 0; col < out_w; col++) {
			float p1 = detection_out[row * out_w + col];  //獲取每個畫素最大可能的分類類別
			float p2 = detection_out[step + row * out_w + col];
			if (p2 >= 1.0) {
				mask.at<uchar>(row, col) = 255;
			}
		}
	}
	//先初始化一個網路輸出結果大小的矩陣儲存每個畫素點對應的顏色,再將結果矩陣恢復到原圖大小,以便最終結果顯示
	cv::resize(mask, mask, cv::Size(im_w, im_h));
	
	std::vector<std::vector<cv::Point>> contours;  //初始化一個容器儲存輪廓點集
	cv::findContours(mask, contours, cv::RETR_EXTERNAL, cv::CHAIN_APPROX_SIMPLE);

	cv::Mat gray;
	cv::cvtColor(src, gray, cv::COLOR_BGR2GRAY);
	std::string reco_input_name = "";
	std::string reco_output_name = "";
	loadTextRecogRequest(ie, reco_input_name, reco_output_name);
	std::cout << "text input: " << reco_input_name << "text output: " << reco_output_name << std::endl;

	for (size_t t = 0; t < contours.size(); t++) {  //繪製每個前景區域外輪廓,遍歷這些外輪廓並繪製到輸入影像中
		cv::Rect box = cv::boundingRect(contours[t]);
		cv::rectangle(src, box, cv::Scalar(0, 0, 255), 2, 8, 0);
		box.x = box.x - 4;  //擴大文字檢測區域減少漏檢誤檢
		box.y = box.y - 4;
		box.width = box.width + 8;
		box.height = box.height + 8;

		cv::Mat roi = gray(box);

		auto reco_input_blob = reco_request.GetBlob(reco_input_name);
		size_t num_channels = reco_input_blob->getTensorDesc().getDims()[1];
		size_t h = reco_input_blob->getTensorDesc().getDims()[2];
		size_t w = reco_input_blob->getTensorDesc().getDims()[3];
		size_t image_size = h * w;
		cv::Mat blob_image;
		cv::resize(roi, blob_image, cv::Size(w, h));  //轉換影像為網路輸入大小

		//HWC =》NCHW
		unsigned char* data = static_cast<unsigned char*>(reco_input_blob->buffer());
		for (size_t row = 0; row < h; row++) {
			for (size_t col = 0; col < w; col++) {
				data[row * w + col] = blob_image.at<uchar>(row, col);  //uchar型別無符號 0-255
			}
		}
		reco_request.Infer();

		auto reco_output = reco_request.GetBlob(reco_output_name);
		//獲取輸出資料的指標
		const float* blob_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(reco_output->buffer());
		const SizeVector reco_dims = reco_output->getTensorDesc().getDims();
		const int RW = reco_dims[0];  //30
		const int RB = reco_dims[1];  //1
		const int RL = reco_dims[2];  //37
		//透過CTC解碼來處理網路輸出的資料
		std::string ocr_txt = ctc_decode(blob_out, RW, RL);  //識別輸出的資料為字元
		std::cout << ocr_txt << std::endl;
		cv::putText(src, ocr_txt, box.tl(), cv::FONT_HERSHEY_PLAIN, 1.0, cv::Scalar(255, 0, 0), 1);
	}
	//cv::putText(src, labels[max_index], cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
	cv::namedWindow("mask", cv::WINDOW_AUTOSIZE);
	cv::imshow("mask", mask);
	cv::imshow("場景文字檢測", src);
	cv::waitKey(0);
	return 0;
}

void loadTextRecogRequest(Core& ie, std::string& reco_input_name, std::string& reco_output_name) {

	std::string xml = "D:/projects/models/text-recognition-0012/FP32/text-recognition-0012.xml";
	std::string bin = "D:/projects/models/text-recognition-0012/FP32/text-recognition-0012.bin";
	InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);

	InferenceEngine::InputsDataMap inputs = network.getInputsInfo();
	InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();
	
	for (auto item : inputs) {
		reco_input_name = item.first;
		auto input_data = item.second;
		input_data->setPrecision(Precision::U8);
		input_data->setLayout(Layout::NCHW);
	}
	for (auto item : outputs) {
		reco_output_name = item.first;
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);
	}

	auto exec_network = ie.LoadNetwork(network, "CPU");
	reco_request = exec_network.CreateInferRequest();
}

std::string ctc_decode(const float* blob_out, int seq_w, int seq_l) {
	printf("seq width:%d,seq length:%d\n", seq_w, seq_l);
	std::string res = "";
	bool prev_pad = false;
	const int num_classes = alphabet.length();
	int seq_len = seq_w * seq_l;
	for (int i = 0; i < seq_w; i++) {
		int argmax = 0;
		int max_prob = blob_out[i * seq_l];
		for (int j = 0; j < num_classes; j++) {
			if (blob_out[i * seq_l + j] > max_prob) {
				max_prob = blob_out[i * seq_l + j];
				argmax = j;
			}
		}
		auto symbol = alphabet[argmax];  //遍歷查詢每個字元的最大可能字元
		if (symbol == '#') {  //去除字串中的空字元
			//透過prev_pad來控制空字元之後的字元一定會新增到結果字串中,而兩個連續相同字元的第二個不會被新增到結果字串中
			prev_pad = true;
		}
		else {
			if (res.empty() || prev_pad || (!res.empty() && symbol != res.back())) {  //back()方法獲取字串最後一個字元;front()獲取第一個字元
				prev_pad = false;
				res += symbol;  //字串拼接
			}
		}
	}
	return res;
}

效果:

8、模型轉換與部署

pytorch模型轉換與部署

  • ONNX轉換與支援
  • 首先需要儲存pth檔案,然後轉化為內ONNX格式檔案
  • OpenVINO支援直接讀取ONNX格式檔案解析
  • ONNX轉換為IR檔案

pytorch模型轉換為onnx模型

從pytorch官網安裝:Start Locally | PyTorch

import torch
import torchvision

def main():
    model = torchvision.models.resnet18(pretrained=True).eval()  #模型的推理模式
    dummy_input = torch.randn((1,3,224,224))  #tensor張量,多維陣列,此處模型輸入為3通道,224*224大小的影像
    torch.onnx.export(model,dummy_input,"resnet18.onnx")

if __name__ == '__main__':
    main()

執行後獲取的onnx模型檔案:

onnx模型轉換為IR模型

  1. 進入OpenVINO安裝路徑下的model_optimizer資料夾,路徑如下:C:\Program Files (x86)\Intel\openvino_2021.2.185\deployment_tools\model_optimizer

  2. 可以透過執行該資料夾中的install_prerequisites資料夾中的bat指令碼來安裝onnx及tensorflow環境,也可手動根據requirements_onnx.txt檔案中的環境要求安裝,安裝完環境後,以管理員身份執行cmd命令提示符並進入到model_optimizer資料夾下

  1. 執行model_optimizer資料夾下mo_onnx.py指令碼將onnx模型轉換為IR模型,執行後該資料夾下會生成xml及bin兩個檔案

執行指令碼如下:

python mo_onnx.py --input_model D:/projects/models/resnet18_ir/resnet18.onnx

轉換獲得的onnx模型及IR模型測試程式碼

#include <inference_engine.hpp>
#include <opencv2/opencv.hpp>
#include <fstream>

using namespace InferenceEngine;
std::string labels_txt_file = "D:/projects/models/resnet18_ir/imagenet_classes.txt";
std::vector<std::string> readClassNames();
int main(int argc, char** argv) {
	InferenceEngine::Core ie;
	std::vector<std::string> devices = ie.GetAvailableDevices();
	for (std::string name : devices) {
		std::cout << "device name: " << name << std::endl;
	}
	std::string cpuName = ie.GetMetric("CPU", METRIC_KEY(FULL_DEVICE_NAME)).as<std::string>();
	std::cout << "cpu full name: " << cpuName << std::endl;
	//std::string xml = "D:/projects/models/resnet18_ir/resnet18.xml";  //IR模型
	//std::string bin = "D:/projects/models/resnet18_ir/resnet18.bin";
	std::string onnx = "D:/projects/models/resnet18_ir/resnet18.onnx";  //ONNX模型
	std::vector<std::string> labels = readClassNames();
	cv::Mat src = cv::imread("D:/images/messi02.jpg");

	//IR和ONNX格式的模型都可以被InferenceEngine讀取
	// InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);
	InferenceEngine::CNNNetwork network = ie.ReadNetwork(onnx);
	InferenceEngine::InputsDataMap inputs = network.getInputsInfo();
	InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();

	std::string input_name = "";
	for (auto item : inputs) {
		input_name = item.first;
		auto input_data = item.second;
		input_data->setPrecision(Precision::FP32);
		input_data->setLayout(Layout::NCHW);
		input_data->getPreProcess().setColorFormat(ColorFormat::RGB);
		std::cout << "input name: " << input_name << std::endl;
	}

	std::string output_name = "";
	for (auto item : outputs) {
		output_name = item.first;
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);
		std::cout << "output name: " << output_name << std::endl;
	}

	auto executable_network = ie.LoadNetwork(network, "CPU");
	auto infer_request = executable_network.CreateInferRequest();

	auto input = infer_request.GetBlob(input_name);
	size_t num_channels = input->getTensorDesc().getDims()[1];
	size_t h = input->getTensorDesc().getDims()[2];
	size_t w = input->getTensorDesc().getDims()[3];
	size_t image_size = h * w;
	cv::Mat blob_image;
	cv::resize(src, blob_image, cv::Size(w, h));
	cv::cvtColor(blob_image, blob_image, cv::COLOR_BGR2RGB);
	blob_image.convertTo(blob_image, CV_32F);
	blob_image = blob_image / 255.0;
	cv::subtract(blob_image, cv::Scalar(0.485, 0.456, 0.406), blob_image);
	cv::divide(blob_image, cv::Scalar(0.229, 0.224, 0.225), blob_image);

	// HWC =》NCHW
	float* data = static_cast<float*>(input->buffer());
	for (size_t row = 0; row < h; row++) {
		for (size_t col = 0; col < w; col++) {
			for (size_t ch = 0; ch < num_channels; ch++) {
				data[image_size * ch + row * w + col] = blob_image.at<cv::Vec3f>(row, col)[ch];
			}
		}
	}

	infer_request.Infer();

	auto output = infer_request.GetBlob(output_name);
	const float* probs = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());
	const SizeVector outputDims = output->getTensorDesc().getDims();
	std::cout << outputDims[0] << "x" << outputDims[1] << std::endl;
	float max = probs[0];
	int max_index = 0;
	for (int i = 1; i < outputDims[1]; i++) {
		if (max < probs[i]) {
			max = probs[i];
			max_index = i;
		}
	}

	std::cout << "class index : " << max_index << std::endl;
	std::cout << "class name : " << labels[max_index] << std::endl;
	cv::putText(src, labels[max_index], cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
	cv::imshow("輸入影像", src);
	cv::waitKey(0);
	return 0;
}


std::vector<std::string> readClassNames()
{
	std::vector<std::string> classNames;

	std::ifstream fp(labels_txt_file);
	if (!fp.is_open())
	{
		printf("could not open file...\n");
		exit(-1);
	}
	std::string name;
	while (!fp.eof())
	{
		std::getline(fp, name);
		if (name.length())
			classNames.push_back(name);
	}
	fp.close();
	return classNames;
}

效果:

tensorflow模型轉換與部署

  • 通用引數設定
    • --input_model <path_to_frozen.pb>
    • --transformations_config <path_to_subgraph_replacement_configuration_file.json>
    • --tensorflow_object_detection_api_pipeline_config <path_topipeline.config>
    • --input_shape
    • --reverse_input_channels(將rgb通道反序轉換為bgr方便opencv後續操作)
  • 版本資訊要求
  • tensorflow:required:>=1.15.2
  • numpy:required:<1.19.0
  • pip install tensorflow-gpu==1.15.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
  • pip install tensorflow-gpu==1.15.2 -i https://pypi.doubanio.com/simple/
  • networkx>=1.11
  • numpy>=1.14.0,<1.19.0
  • test-generator==0.1.1
  • defusedxml>=0.5.0

獲取tensorflow預訓練模型及檢視OpenVINO模型轉換文件

使用mobilenetv2版本pb轉換為IR並呼叫推理

COCO-trained models連結:models/tf1_detection_zoo.md at master · tensorflow/models · GitHub

OpenVINO中的tesorflow模型轉換連結:https://docs.openvino.ai/2021.2/openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_Object_Detection_API_Models.html

獲取的預訓練模型資料夾中的pipeline.config文件可以對模型進行設定,比如引數image_resized可以保持原有固定輸入影像大小300 * 300,也可以設定為保持原影像比例並設定影像大小縮放在一個範圍內

pb模型轉換為IR模型程式碼:

python mo_tf.py --input_model=D:/tensorflow/ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.pb --transformations_config extensions/front/tf/ssd_v2_support.json --tensorflow_object_detection_api_pipeline_config D:/tensorflow/ssd_mobilenet_v2_coco_2018_03_29/pipeline.config --reverse_input_channels --input_shape [1,300,300,3]

tensorflow模型轉換環境搭建及執行

python版本必須為3.8以下才能安裝tensorflow 1.15.2,同時1.0版的tensorflow分cpu版與gpu版兩種;

本機安裝的python版本是3.8,所以使用conda建立python版本為3.6的虛擬環境用於模型轉換:

conda create -n py36 python==3.6.5
conda activate py36
pip install tensorflow==1.15.2 -i https://pypi.doubanio.com/simple/
pip install tensorflow-gpu==1.15.2 -i https://pypi.doubanio.com/simple/
pip install networkx==1.11
pip install numpy==1.18.4
pip install test-generator==0.1.1
pip install defusedxml==0.5.0
cd C:\Program Files (x86)\Intel\openvino_2021.2.185\deployment_tools\model_optimizer
python mo_tf.py --input_model=D:/tensorflow/ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.pb --transformations_config extensions/front/tf/ssd_v2_support.json --tensorflow_object_detection_api_pipeline_config D:/tensorflow/ssd_mobilenet_v2_coco_2018_03_29/pipeline.config --reverse_input_channels --input_shape [1,300,300,3]

模型轉換成功及轉換得到的xml與bin檔案

模型轉換後的IR模型測試程式碼

#include <inference_engine.hpp>
#include <opencv2/opencv.hpp>
#include <fstream>  //fstream檔案讀寫操作,iostream為控制檯操作

void read_coco_labels(std::vector<std::string>& labels) {
	std::string label_file = "D:/projects/models/object_detection_classes_coco.txt";
	std::ifstream fp(label_file);
	if (!fp.is_open())
	{
		printf("could not open file...\n");
		exit(-1);
	}
	std::string name;
	while (!fp.eof())
	{
		std::getline(fp, name);
		if (name.length())
			labels.push_back(name);
	}
	fp.close();
}

using namespace InferenceEngine;

int main(int argc, char** argv) {

	InferenceEngine::Core ie;

	std::string xml = "D:/projects/models/tf_ssdv2_ir/frozen_inference_graph.xml";
	std::string bin = "D:/projects/models/tf_ssdv2_ir/frozen_inference_graph.bin";

	std::vector<std::string> coco_labels;
	read_coco_labels(coco_labels);

	cv::Mat src = cv::imread("D:/images/dog_bike_car.jpg");  //讀取影像
	int im_h = src.rows;
	int im_w = src.cols;
	InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //讀取車輛檢測網路

	//獲取網路輸入輸出資訊
	InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一個Mat陣列
	InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一個Mat陣列
	std::string input_name = "";
	for (auto item : inputs) {  //auto可以自動推斷變數型別
		input_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto input_data = item.second;
		// A->B 表示提取A中的成員B
		input_data->setPrecision(Precision::U8);  //預設為unsigned char對應U8
		input_data->setLayout(Layout::NCHW);
		//input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  預設就是BGR
		std::cout << "input name: " << input_name << std::endl;
	}
	std::string output_name = "";
	for (auto item : outputs) {  //auto可以自動推斷變數型別
		output_name = item.first;  //第一個引數是name,第二個引數是結構,第二個引數設定精度與結構
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);  //輸出還是浮點數
		//注意:output_data不要設定結構
		std::cout << "output name: " << output_name << std::endl;
	}

	auto executable_network = ie.LoadNetwork(network, "CPU");  //設定執行的裝置
	auto infer_request = executable_network.CreateInferRequest();  //設定推理請求

	//影像預處理
	auto input = infer_request.GetBlob(input_name);  //獲取網路輸入影像資訊
	size_t num_channels = input->getTensorDesc().getDims()[1];  //size_t 型別表示C中任何物件所能達到的最大長度,它是無符號整數
	size_t h = input->getTensorDesc().getDims()[2];
	size_t w = input->getTensorDesc().getDims()[3];
	size_t image_size = h * w;
	cv::Mat blob_image;
	cv::resize(src, blob_image, cv::Size(w, h));  //將輸入圖片大小轉換為與網路輸入大小一致
	//cv::cvtColor(blob_image, blob_image, cv::COLOR_BGR2RGB);  //色彩空間轉換

	// HWC =》NCHW  將輸入影像從HWC格式轉換為NCHW格式
	unsigned char* data = static_cast<unsigned char*>(input->buffer());  //將影像放到buffer中,放入input中
	for (size_t row = 0; row < h; row++) {
		for (size_t col = 0; col < w; col++) {
			for (size_t ch = 0; ch < num_channels; ch++) {
				//將每個通道變成一張圖,按照通道順序
				data[image_size * ch + row * w + col] = blob_image.at<cv::Vec3b>(row, col)[ch];
			}
		}
	}

	infer_request.Infer();
	auto output = infer_request.GetBlob(output_name);
	//轉換輸出資料
	const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());
	//output:[1, 1, N, 7]
	//七個引數為:[image_id, label, conf, x_min, y_min, x_max, y_max]
	const SizeVector outputDims = output->getTensorDesc().getDims();  //獲取輸出維度資訊 1*1000
	std::cout << outputDims[2] << "x" << outputDims[3] << std::endl;
	const int max_count = outputDims[2];  //識別出的物件個數
	const int object_size = outputDims[3];  //獲取物件資訊的個數,此處為7個
	for (int n = 0; n < max_count; n++) {
		float label = detection_out[n * object_size + 1];
		float confidence = detection_out[n * object_size + 2];
		float xmin = detection_out[n * object_size + 3] * im_w;
		float ymin = detection_out[n * object_size + 4] * im_h;
		float xmax = detection_out[n * object_size + 5] * im_w;
		float ymax = detection_out[n * object_size + 6] * im_h;
		if (confidence > 0.7) {
			printf("label id: %d,label name: %s \n", static_cast<int>(label), coco_labels[static_cast<int>(label)]);
			cv::Rect box;
			box.x = static_cast<int>(xmin);
			box.y = static_cast<int>(ymin);
			box.width = static_cast<int>(xmax - xmin);
			box.height = static_cast<int>(ymax - ymin);

			cv::rectangle(src, box, cv::Scalar(0, 0, 255), 2, 8);
			//box.tl()返回矩形左上角座標
			cv::putText(src, coco_labels[static_cast<int>(label)], box.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 255, 0), 2, 8);
		}
	}

	//cv::putText(src, labels[max_index], cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
	cv::namedWindow("out", cv::WINDOW_FREERATIO);
	cv::imshow("out", src);
	cv::waitKey(0);
	return 0;
}

效果:

9、YOLOv5模型部署與推理

  • Pytorch版本YOLOv5安裝與配置
  • YOLOv5轉ONNX格式生成
  • OpenVINO部署支援

YOLOv5安裝與配置

強烈建議使用pycharm中的terminal命令列進行相關環境安裝,速度快並且降低失敗機率

  • pytorch三件套安裝(cuda版本11.6)
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

下載YOLOv5專案地址:GitHub - ultralytics/yolov5: YOLOv5 ? in PyTorch > ONNX > CoreML > TFLite

從命令列進入YOLOv5專案解壓資料夾路徑安裝相關依賴環境並測試(下載模型需FQ):

pip install -r requirements.txt  #根據requirements文字檔案的環境需求進行自動安裝,失敗時手動安裝成功後重新該命令直到成功
python detect.py --source data/images --weights yolov5s.pt --conf 0.25  #使用yolov5最新模型進行圖片識別
python detect.py --source D:/images/video/Boogie_Up.mp4 --weights yolov5s.pt --conf 0.25  #使用yolov5最新模型進行影片識別

效果:

1、圖片

2、影片

YOLOv5轉ONNX格式

  • YOLOv5轉換為ONNX程式碼
python export.py --weights yolov5s.pt --img 640 --batch 1 --include onnx  #include引數為轉換之後的模型型別
  • 轉IR格式檔案,與pytorch模型轉換中ONNX轉IR一樣
python mo_onnx.py --input_model D:/python/yolov5/yolov5s.onnx

OpenVINO+YOLOv5部署

YOLO識別原理概述:

1、YOLO識別原理

YOLO將圖片分割成s2個網格。每個網格的大小相同,並且讓s2個網格每個都可以預測出B個邊界箱(預測框)。預測出來的每個邊界箱都有5個資訊量: 物體的中心位置(x,y),物體的高h,物體的寬w以及這次預測的置信度conf(預測這個網格里是否有目標的置信度)。每個網格不僅只預測B個邊界箱,還預測這個網格是什麼類別。假設我們要預測C類物體,則有C個置信度(預測是某一類目標的置信度)。那麼這次預測的資訊就有ss(5*B+C)個。

2、NMS非極大值抑制概念
方案一:選擇預測類別的置信度(預測這個網格里是否有目標的置信度)高的裡留下來,其餘的預測都刪除。方案二:把置信度(預測這個網格里是否有目標的置信度)最高的那個網格的邊界箱作為極大邊界箱,計算極大邊界箱和其他幾個網格的邊界箱的IOU,如果超過一個閾值,例如0.5,就認為這兩個網格實際上預測的是同一個物體,就把其中置信度比較小的刪除。

YOLOv5s結構圖:

  • 載入IR/ONNX格式檔案
  • 設定輸入格式 RGB - 640 * 640
  • YOLOv5的輸出層是三層,對應的降取樣倍數是32,16,8
  • 以輸入 640 * 640大小的影像為例,得到三個輸出層大小分別是20、40、80,每層上對應三個尺度的anchor(檢測框)如下表
尺度 anchor 尺度比率(x * y)
80 10 * 13 16 * 30 33 * 23
40 30 * 61 62 * 45 59 * 119
20 116 * 90 156 * 198 373 * 326

模型的預測是在 20 * 20,40 * 40,80 * 80 每個輸出層的每個特徵點上預測三個anchor框,每個框預測分類,每個框的維度大小(不是輸出維度)為:cx,cy,w,h,conf(置信度,表示框中物件不是背景的機率及框出物件的自信程度,當多個檢測框檢測到的物件重合時可根據大小值來判斷選擇哪一個檢測框) + num of class(框中物件為COCO_labels中80類中每一類的機率),共5+80,圖示如下(此處Hout和Wout可以看成是每層影像的高和寬):

輸出結果解析時:先依次對每一個特徵點遍歷anchor0,再對每一個特徵點遍歷anchor1,即先保持anchor檢測框不變,遍歷特徵點,遍歷完所有特徵點,再移動至下一個anchor檢測框重新開始遍歷特徵點。

參考部落格:目標檢測之詳解yolo的anchor、置信度和類別機率_專注於計算機視覺的AndyJiang的部落格-CSDN部落格_yolo置信度計算

output格式:[1, 3, 80, 80, 85]、[1, 3, 40, 40, 85]、[1, 3, 20, 20, 85],其中85是上面的(cx, cy, w, h, conf + number of class)。

每層每個特徵點對應 85個引數個數 * 3個檢測框,記憶體順序按檢測框順序來儲存,即先依次儲存所有畫素對應的第一個檢測框再儲存所有畫素對應的第二個檢測框。

程式碼實現

6.0版YOLOv5

#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/dnn.hpp>
#include <inference_engine.hpp>

using namespace std;
using namespace cv;
using namespace cv::dnn;
using namespace InferenceEngine;

class YOLOObjectDetection {
public:
	void detect(std::string xml, std::string bin, std::string filePath, int camera_index);
private:
	void inferAndOutput(cv::Mat &frame, InferenceEngine::InferRequest &request, InferenceEngine::InputsDataMap & input_info,
		InferenceEngine::OutputsDataMap &output_info, float sx, float sy);
};

/*====================================================================*/

#include <yolo_object_detection.h>

using namespace std;
using namespace cv;

int main(int argc, char** argv) {
	std::string xml = "D:/python/yolov5/yolov5s.xml";
	std::string bin = "D:/python/yolov5/yolov5s.bin";
	std::string onnx_yolov5s = "D:/python/yolov5/yolov5s.onnx";
	std::string image_file = "D:/python/yolov5/data/images/zidane.jpg";
	std::string video_file = "D:/images/video/Boogie_Up.mp4";
	YOLOObjectDetection yolo_detector;
	yolo_detector.detect(xml, bin, video_file, 1);
	return 0;
}

/*====================================================================*/

#include <yolo_object_detection.h>

using namespace std;
using namespace cv;
using namespace cv::dnn;
using namespace InferenceEngine;

std::vector<float> anchors = {
	10,13, 16,30, 33,23,
	30,61, 62,45, 59,119,
	116,90, 156,198, 373,326
};

int get_anchor_index(int scale_w, int scale_h) {
	if (scale_w == 20) {
		return 12;
	}
	if (scale_w == 40) {
		return 6;
	}
	if (scale_w == 80) {
		return 0;
	}
	return -1;
}

float get_stride(int scale_w, int scale_h) {
	if (scale_w == 20) {
		return 32.0;
	}
	if (scale_w == 40) {
		return 16.0;
	}
	if (scale_w == 80) {
		return 8.0;
	}
	return -1;
}

float sigmoid_function(float a)
{
	float b = 1. / (1. + exp(-a));
	return b;
}

void YOLOObjectDetection::detect(std::string xml, std::string bin, std::string filePath, int camera_index) {
	VideoCapture cap;
	Mat frame;
	if (camera_index == 0) {
		cap.open(0);
	}
	if (camera_index == 1) {
		cap.open(filePath);
	}
	if (camera_index == -1) {
		frame = imread(filePath);
	}
	if (frame.empty()) {
		cap.read(frame);
	}
	int image_height = frame.rows;
	int image_width = frame.cols;

	// 建立IE外掛, 查詢支援硬體裝置
	Core ie;
	vector<string> availableDevices = ie.GetAvailableDevices();
	for (int i = 0; i < availableDevices.size(); i++) {
		printf("supported device name : %s \n", availableDevices[i].c_str());
	}

	//  載入檢測模型
	auto network = ie.ReadNetwork(xml, bin);
	// auto network = ie.ReadNetwork(xml);

	// 請求網路輸入與輸出資訊
	InferenceEngine::InputsDataMap input_info(network.getInputsInfo());
	InferenceEngine::OutputsDataMap output_info(network.getOutputsInfo());

	// 設定輸入格式
	for (auto &item : input_info) {
		auto input_name = item.first;
		auto input_data = item.second;
		input_data->setPrecision(Precision::FP32);
		input_data->setLayout(Layout::NCHW);
		input_data->getPreProcess().setResizeAlgorithm(RESIZE_BILINEAR);
		input_data->getPreProcess().setColorFormat(ColorFormat::RGB);
	}

	// 設定輸出格式
	for (auto &item : output_info) {
		auto input_name = item.first;
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);
	}
	auto executable_network = ie.LoadNetwork(network, "CPU");

	// 請求推斷圖
	auto infer_request = executable_network.CreateInferRequest();  //生成推理請求
	float scale_x = image_width / 640.0;
	float scale_y = image_height / 640.0;

	if (camera_index == -1) {
		inferAndOutput(frame, infer_request, input_info, output_info, scale_x, scale_y);
		cv::imshow("OpenVINO2021R2+YOLOv5物件檢測", frame);
	}
	else {
		while (true) {
			bool ret = cap.read(frame);
			if (frame.empty()) {
				break;
			}
			inferAndOutput(frame, infer_request, input_info, output_info, scale_x, scale_y);
			cv::imshow("YOLOv5s+OpenVINO2021R02+Demo", frame);
			char c = cv::waitKey(1);
			if (c == 27) {
				break;
			}
		}
	}
	waitKey(0);
	destroyAllWindows();
}

void YOLOObjectDetection::inferAndOutput(cv::Mat &frame, InferenceEngine::InferRequest &infer_request,
	InferenceEngine::InputsDataMap & input_info, InferenceEngine::OutputsDataMap &output_info, float sx, float sy) {
	int64 start = getTickCount();

	// 處理解析輸出結果
	vector<Rect> boxes;  //建立三個容器用於儲存檢測框、分類id及置信度
	vector<int> classIds;
	vector<float> confidences;
	/** Iterating over all input blobs **/
	for (auto & item : input_info) {
		auto input_name = item.first;

		/** Getting input blob **/
		auto input = infer_request.GetBlob(input_name);
		size_t num_channels = input->getTensorDesc().getDims()[1];
		size_t h = input->getTensorDesc().getDims()[2];
		size_t w = input->getTensorDesc().getDims()[3];
		size_t image_size = h*w;
		Mat blob_image;
		resize(frame, blob_image, Size(w, h));
		cvtColor(blob_image, blob_image, COLOR_BGR2RGB);

		// NCHW
		float* data = static_cast<float*>(input->buffer());
		for (size_t row = 0; row < h; row++) {
			for (size_t col = 0; col < w; col++) {
				for (size_t ch = 0; ch < num_channels; ch++) {
					data[image_size*ch + row*w + col] = float(blob_image.at<Vec3b>(row, col)[ch]) / 255.0;
				}
			}
		}
	}

	// 執行預測
	infer_request.Infer();

	//迴圈遍歷三個輸出
	for (auto &item : output_info) {
		auto output_name = item.first;
		auto output = infer_request.GetBlob(output_name);

		const float* output_blob = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());
		const SizeVector outputDims = output->getTensorDesc().getDims();
		const int out_n = outputDims[0];  //1張圖
		const int out_c = outputDims[1];  //3個檢測框
		const int side_h = outputDims[2];  //當前層的height
		const int side_w = outputDims[3];  //當前層的width
		const int side_data = outputDims[4];  //num of class + 5 = 85

		float stride = get_stride(side_h, side_h);  //獲取該檢測框的放縮倍數
		int anchor_index = get_anchor_index(side_h, side_h);
		int side_square = side_h*side_w;  //當前層的面積
		int side_data_square = side_square*side_data;  //每個檢測框都有85個值(cx, cy, w, h, conf + number of class)
		int side_data_w = side_w*side_data;

		//每層每個特徵點都有三個檢測框,每個檢測框都有85個值(cx, cy, w, h, conf + number of class)
		for (int i = 0; i < side_square; ++i) {  //遍歷每個特徵點
			for (int c = 0; c < out_c; c++) {  //遍歷每層的三個檢測框
				int row = i / side_h;  //當前輸出層中每一個特徵點的縱座標
				int col = i % side_h;  //當前輸出層中每一個特徵點的橫座標
				//獲取每一個anchor檢測框的起始位置
				//先保持檢測框不變,依次遍歷完當前層特徵點,再移動至下一個檢測框遍歷所有特徵點
				int object_index = c*side_data_square + row*side_data_w + col*side_data;

				//獲取每一個anchor的conf置信度並進行過濾
				float conf = sigmoid_function(output_blob[object_index + 4]);
				if (conf < 0.25) {
					continue;
				}

				// 置信度滿足要求則解析cx, cy, width, height
				float x = (sigmoid_function(output_blob[object_index]) * 2 - 0.5 + col)*stride;  //stride為當前層放縮倍數,col是橫座標,-0.5可以獲得左上角座標
				float y = (sigmoid_function(output_blob[object_index + 1]) * 2 - 0.5 + row)*stride;
				float w = pow(sigmoid_function(output_blob[object_index + 2]) * 2, 2)*anchors[anchor_index + (c * 2)];
				float h = pow(sigmoid_function(output_blob[object_index + 3]) * 2, 2)*anchors[anchor_index + (c * 2) + 1];
				float max_prob = -1;
				int class_index = -1;

				// 解析類別
				for (int d = 5; d < 85; d++) {
					float prob = sigmoid_function(output_blob[object_index + d]);
					if (prob > max_prob) {
						max_prob = prob;
						class_index = d - 5;
					}
				}

				// 轉換為top-left, bottom-right座標
				int x1 = saturate_cast<int>((x - w / 2) * sx);  // top left x
				int y1 = saturate_cast<int>((y - h / 2) * sy);  // top left y
				int x2 = saturate_cast<int>((x + w / 2) * sx);  // bottom right x
				int y2 = saturate_cast<int>((y + h / 2) * sy); // bottom right y

				// 解析輸出
				classIds.push_back(class_index);
				confidences.push_back((float)conf);
				boxes.push_back(Rect(x1, y1, x2 - x1, y2 - y1));
			}
		}
	}

	vector<int> indices;
	cv::dnn::NMSBoxes(boxes, confidences, 0.25, 0.5, indices);  //非最大抑制,去除同一個物件上的多個輸出框
	for (size_t i = 0; i < indices.size(); ++i)
	{
		int idx = indices[i];
		Rect box = boxes[idx];
		rectangle(frame, box, Scalar(140, 199, 0), 4, 8, 0);
	}
	float fps = getTickFrequency() / (getTickCount() - start);
	float time = (getTickCount() - start) / getTickFrequency();

	ostringstream ss;
	ss << "FPS : " << fps << " detection time: " << time * 1000 << " ms";
	cv::putText(frame, ss.str(), Point(20, 50), 0, 1.0, Scalar(0, 0, 255), 2);
}

7.0版YOLOv5

#include <fstream>                   //C++ 檔案操作
#include <iostream>                  //C++ input & output stream
#include <sstream>                   //C++ String stream, 讀寫記憶體中的string物件
#include <opencv2\opencv.hpp>        //OpenCV 標頭檔案

#include <openvino\openvino.hpp>     //OpenVINO >=2022.1

using namespace std;
using namespace ov;
using namespace cv;
// COCO資料集的標籤
vector<string> class_names = { "person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck", "boat", "traffic light","fire hydrant",
"stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe","backpack", "umbrella",
"handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove","skateboard", "surfboard",
"tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange","broccoli", "carrot",
"hot dog", "pizza", "donut", "cake", "chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse","remote",
"keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush" };
//OpenVINO IR模型檔案路徑
string ir_filename = "D:/yolov5/yolov5s.xml";

// @brief 對網路的輸入為圖片資料的節點進行賦值,實現圖片資料輸入網路
// @param input_tensor 輸入節點的tensor
// @param inpt_image 輸入圖片資料
void fill_tensor_data_image(ov::Tensor& input_tensor, const cv::Mat& input_image) {
	// 獲取輸入節點要求的輸入圖片資料的大小
	ov::Shape tensor_shape = input_tensor.get_shape();
	const size_t width = tensor_shape[3]; // 要求輸入圖片資料的寬度
	const size_t height = tensor_shape[2]; // 要求輸入圖片資料的高度
	const size_t channels = tensor_shape[1]; // 要求輸入圖片資料的維度
	// 讀取節點資料記憶體指標
	float* input_tensor_data = input_tensor.data<float>();
	// 將圖片資料填充到網路中
	// 原有圖片資料為 H、W、C 格式,輸入要求的為 C、H、W 格式
	for (size_t c = 0; c < channels; c++) {
		for (size_t h = 0; h < height; h++) {
			for (size_t w = 0; w < width; w++) {
				input_tensor_data[c * width * height + h * width + w] = input_image.at<cv::Vec<float, 3>>(h, w)[c];
			}
		}
	}
}

int main(int argc, char** argv) {

	//建立OpenVINO Core
	Core core;
	CompiledModel compiled_model = core.compile_model(ir_filename, "AUTO");
	InferRequest infer_request = compiled_model.create_infer_request();

	// 預處理輸入資料 - 格式化操作
	VideoCapture cap;
	cap.open(0);
	if (!cap.isOpened()) {
		cout << "Exit!webcam fails to open!" << endl;
		return -1;
	}

	// 獲取輸入節點tensor
	Tensor input_image_tensor = infer_request.get_tensor("images");
	int input_h = input_image_tensor.get_shape()[2]; //獲得"images"節點的Height
	int input_w = input_image_tensor.get_shape()[3]; //獲得"images"節點的Width
	cout << "input_h:" << input_h << "; input_w:" << input_w << endl;
	cout << "input_image_tensor's element type:" << input_image_tensor.get_element_type() << endl;
	cout << "input_image_tensor's shape:" << input_image_tensor.get_shape() << endl;
	// 獲取輸出節點tensor
	Tensor output_tensor = infer_request.get_tensor("output");
	int out_rows = output_tensor.get_shape()[1]; //獲得"output"節點的out_rows
	int out_cols = output_tensor.get_shape()[2]; //獲得"output"節點的Width
	cout << "out_cols:" << out_cols << "; out_rows:" << out_rows << endl;

	//連續採集處理迴圈
	while (true) {

		Mat frame;
		cap >> frame;

		int64 start = cv::getTickCount();
		int w = frame.cols;
		int h = frame.rows;
		int _max = std::max(h, w);
		cv::Mat image = cv::Mat::zeros(cv::Size(_max, _max), CV_8UC3);
		cv::Rect roi(0, 0, w, h);
		frame.copyTo(image(roi));

		float x_factor = image.cols / input_w;
		float y_factor = image.rows / input_h;

		cv::Mat blob_image;
		resize(image, blob_image, cv::Size(input_w, input_h));
		blob_image.convertTo(blob_image, CV_32F);
		blob_image = blob_image / 255.0;

		// 將圖片資料填充到tensor資料記憶體中
		fill_tensor_data_image(input_image_tensor, blob_image);

		// 執行推理計算
		infer_request.infer();

		// 獲得推理結果
		const ov::Tensor& output_tensor = infer_request.get_tensor("output");

		// 解析推理結果,YOLOv5 output format: cx,cy,w,h,score
		cv::Mat det_output(out_rows, out_cols, CV_32F, (float*)output_tensor.data());

		std::vector<cv::Rect> boxes;
		std::vector<int> classIds;
		std::vector<float> confidences;

		for (int i = 0; i < det_output.rows; i++) {
			float confidence = det_output.at<float>(i, 4);
			if (confidence < 0.4) {
				continue;
			}
			cv::Mat classes_scores = det_output.row(i).colRange(5, 85);
			cv::Point classIdPoint;
			double score;
			minMaxLoc(classes_scores, 0, &score, 0, &classIdPoint);

			// 置信度 0~1之間
			if (score > 0.5)
			{
				float cx = det_output.at<float>(i, 0);
				float cy = det_output.at<float>(i, 1);
				float ow = det_output.at<float>(i, 2);
				float oh = det_output.at<float>(i, 3);
				int x = static_cast<int>((cx - 0.5 * ow) * x_factor);
				int y = static_cast<int>((cy - 0.5 * oh) * y_factor);
				int width = static_cast<int>(ow * x_factor);
				int height = static_cast<int>(oh * y_factor);
				cv::Rect box;
				box.x = x;
				box.y = y;
				box.width = width;
				box.height = height;

				boxes.push_back(box);
				classIds.push_back(classIdPoint.x);
				confidences.push_back(score);
			}
		}
		// NMS
		std::vector<int> indexes;
		cv::dnn::NMSBoxes(boxes, confidences, 0.25, 0.45, indexes);
		for (size_t i = 0; i < indexes.size(); i++) {
			int index = indexes[i];
			int idx = classIds[index];
			cv::rectangle(frame, boxes[index], cv::Scalar(0, 0, 255), 2, 8);
			cv::rectangle(frame, cv::Point(boxes[index].tl().x, boxes[index].tl().y - 20),
				cv::Point(boxes[index].br().x, boxes[index].tl().y), cv::Scalar(0, 255, 255), -1);
			cv::putText(frame, class_names[idx], cv::Point(boxes[index].tl().x, boxes[index].tl().y - 10), cv::FONT_HERSHEY_SIMPLEX, .5, cv::Scalar(0, 0, 0));
		}

		// 計算FPS render it
		float t = (cv::getTickCount() - start) / static_cast<float>(cv::getTickFrequency());
		cout << "Infer time(ms): " << t * 1000 << "ms; Detections: " << indexes.size() << endl;
		putText(frame, cv::format("FPS: %.2f", 1.0 / t), cv::Point(20, 40), cv::FONT_HERSHEY_PLAIN, 2.0, cv::Scalar(255, 0, 0), 2, 8);
		cv::imshow("YOLOv5-6.1 + OpenVINO 2022.1 C++ Demo", frame);

		char c = cv::waitKey(1);
		if (c == 27) { // ESC
			break;
		}
	}

	cv::waitKey(0);
	cv::destroyAllWindows();

	return 0;
}

10、Python版本SDK配置與YOLOv5部署推理

Python版本環境配置

  • 環境變數與DLL載入支援
  • VS2019支援
  • Python的PYTHONPATH支援與配置

  • 測試安裝與配置
  • 控制檯python環境下匯入openvino測試:
from openvino.inference_engine import IECore

  • Pycharm測試:
from openvino.inference_engine import IECore

ie = IECore()
devices = ie.available_devices

for device in devices:
    print(device)

ResNet18影像分類部署推理

程式碼實現

from openvino.inference_engine import IECore
import numpy as np
import cv2 as cv

ie = IECore()
for device in ie.available_devices:
    print(device)

with open('imagenet_classes.txt') as f:
    labels = [line.strip() for line in f.readlines()]

model_xml = "resnet18.xml"
model_bin = "resnet18.bin"

net = ie.read_network(model=model_xml, weights= model_bin)
input_blob = next(iter(net.input_info))
out_blob = next(iter(net.outputs))

n, c, h, w = net.input_info[input_blob].input_data.shape
print(n, c, h, w)

src = cv.imread("D:/images/messi.jpg")
image = cv.resize(src, (w, h))
image = np.float32(image) / 255.0
image[:, :, ] -= (np.float32(0.485), np.float32(0.456), np.float32(0.406))
image[:, :, ] /= (np.float32(0.229), np.float32(0.224), np.float32(0.225))
image = image.transpose(2, 0, 1)

exec_net = ie.load_network(network=net, device_name="CPU")
res = exec_net.infer(inputs={input_blob:[image]})

res = res[out_blob]
print(res.shape)
label_index = np.argmax(res, 1)[0]
print(label_index, labels[label_index])
cv.putText(src, labels[label_index], (50, 50), cv.FONT_HERSHEY_SIMPLEX, 1.0, (0, 0, 255), 2, 8)
cv.namedWindow("image classification", cv.WINDOW_FREERATIO)
cv.imshow("image classification", src)
cv.waitKey(0)

效果:

Python版本YOLOv5部署推理

程式碼實現:

# YOLOv5 ? by Ultralytics, GPL-3.0 license

import argparse
import os
import platform
import sys
from pathlib import Path

import torch

FILE = Path(__file__).resolve()
ROOT = FILE.parents[0]  # YOLOv5 root directory
if str(ROOT) not in sys.path:
    sys.path.append(str(ROOT))  # add ROOT to PATH
ROOT = Path(os.path.relpath(ROOT, Path.cwd()))  # relative

from models.common import DetectMultiBackend
from utils.dataloaders import IMG_FORMATS, VID_FORMATS, LoadImages, LoadScreenshots, LoadStreams
from utils.general import (LOGGER, Profile, check_file, check_img_size, check_imshow, check_requirements, colorstr, cv2,
                           increment_path, non_max_suppression, print_args, scale_boxes, strip_optimizer, xyxy2xywh)
from utils.plots import Annotator, colors, save_one_box
from utils.torch_utils import select_device, smart_inference_mode


@smart_inference_mode()
def run(
        weights=ROOT / 'yolov5s.onnx',  # model path or triton URL
        # weights=ROOT / 'yolov5s.pt',  # model path or triton URL
        source= 'D:/images/video/Boogie_Up.mp4',  # file/dir/URL/glob/screen/0(webcam)
        # source=ROOT / 'data/images',  # file/dir/URL/glob/screen/0(webcam)
        data=ROOT / 'data/coco128.yaml',  # dataset.yaml path
        imgsz=(640, 640),  # inference size (height, width)
        conf_thres=0.25,  # confidence threshold
        iou_thres=0.45,  # NMS IOU threshold
        max_det=1000,  # maximum detections per image
        device='',  # cuda device, i.e. 0 or 0,1,2,3 or cpu
        view_img=False,  # show results
        save_txt=False,  # save results to *.txt
        save_conf=False,  # save confidences in --save-txt labels
        save_crop=False,  # save cropped prediction boxes
        nosave=False,  # do not save images/videos
        classes=None,  # filter by class: --class 0, or --class 0 2 3
        agnostic_nms=False,  # class-agnostic NMS
        augment=False,  # augmented inference
        visualize=False,  # visualize features
        update=False,  # update all models
        project=ROOT / 'runs/detect',  # save results to project/name
        name='exp',  # save results to project/name
        exist_ok=False,  # existing project/name ok, do not increment
        line_thickness=3,  # bounding box thickness (pixels)
        hide_labels=False,  # hide labels
        hide_conf=False,  # hide confidences
        half=False,  # use FP16 half-precision inference
        dnn=False,  # use OpenCV DNN for ONNX inference
        vid_stride=1,  # video frame-rate stride
):
    source = str(source)
    save_img = not nosave and not source.endswith('.txt')  # save inference images
    is_file = Path(source).suffix[1:] in (IMG_FORMATS + VID_FORMATS)
    is_url = source.lower().startswith(('rtsp://', 'rtmp://', 'http://', 'https://'))
    webcam = source.isnumeric() or source.endswith('.streams') or (is_url and not is_file)
    screenshot = source.lower().startswith('screen')
    if is_url and is_file:
        source = check_file(source)  # download

    # Directories
    save_dir = increment_path(Path(project) / name, exist_ok=exist_ok)  # increment run
    (save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir

    # Load model
    device = select_device(device)
    model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
    stride, names, pt = model.stride, model.names, model.pt
    imgsz = check_img_size(imgsz, s=stride)  # check image size

    # Dataloader
    bs = 1  # batch_size
    if webcam:
        view_img = check_imshow(warn=True)
        dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
        bs = len(dataset)
    elif screenshot:
        dataset = LoadScreenshots(source, img_size=imgsz, stride=stride, auto=pt)
    else:
        dataset = LoadImages(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
    vid_path, vid_writer = [None] * bs, [None] * bs

    # Run inference
    model.warmup(imgsz=(1 if pt or model.triton else bs, 3, *imgsz))  # warmup
    seen, windows, dt = 0, [], (Profile(), Profile(), Profile())
    for path, im, im0s, vid_cap, s in dataset:
        with dt[0]:
            im = torch.from_numpy(im).to(model.device)
            im = im.half() if model.fp16 else im.float()  # uint8 to fp16/32
            im /= 255  # 0 - 255 to 0.0 - 1.0
            if len(im.shape) == 3:
                im = im[None]  # expand for batch dim

        # Inference
        with dt[1]:
            visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
            pred = model(im, augment=augment, visualize=visualize)

        # NMS
        with dt[2]:
            pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)

        # Second-stage classifier (optional)
        # pred = utils.general.apply_classifier(pred, classifier_model, im, im0s)

        # Process predictions
        for i, det in enumerate(pred):  # per image
            seen += 1
            if webcam:  # batch_size >= 1
                p, im0, frame = path[i], im0s[i].copy(), dataset.count
                s += f'{i}: '
            else:
                p, im0, frame = path, im0s.copy(), getattr(dataset, 'frame', 0)

            p = Path(p)  # to Path
            save_path = str(save_dir / p.name)  # im.jpg
            txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}')  # im.txt
            s += '%gx%g ' % im.shape[2:]  # print string
            gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
            imc = im0.copy() if save_crop else im0  # for save_crop
            annotator = Annotator(im0, line_width=line_thickness, example=str(names))
            if len(det):
                # Rescale boxes from img_size to im0 size
                det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()

                # Print results
                for c in det[:, 5].unique():
                    n = (det[:, 5] == c).sum()  # detections per class
                    s += f"{n} {names[int(c)]}{'s' * (n > 1)}, "  # add to string

                # Write results
                for *xyxy, conf, cls in reversed(det):
                    if save_txt:  # Write to file
                        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                        line = (cls, *xywh, conf) if save_conf else (cls, *xywh)  # label format
                        with open(f'{txt_path}.txt', 'a') as f:
                            f.write(('%g ' * len(line)).rstrip() % line + '\n')

                    if save_img or save_crop or view_img:  # Add bbox to image
                        c = int(cls)  # integer class
                        label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}')
                        annotator.box_label(xyxy, label, color=colors(c, True))
                    if save_crop:
                        save_one_box(xyxy, imc, file=save_dir / 'crops' / names[c] / f'{p.stem}.jpg', BGR=True)

            # Stream results
            im0 = annotator.result()
            if view_img:
                if platform.system() == 'Linux' and p not in windows:
                    windows.append(p)
                    cv2.namedWindow(str(p), cv2.WINDOW_NORMAL | cv2.WINDOW_KEEPRATIO)  # allow window resize (Linux)
                    cv2.resizeWindow(str(p), im0.shape[1], im0.shape[0])
                cv2.imshow(str(p), im0)
                cv2.waitKey(1)  # 1 millisecond

            # Save results (image with detections)
            if save_img:
                if dataset.mode == 'image':
                    cv2.imwrite(save_path, im0)
                else:  # 'video' or 'stream'
                    if vid_path[i] != save_path:  # new video
                        vid_path[i] = save_path
                        if isinstance(vid_writer[i], cv2.VideoWriter):
                            vid_writer[i].release()  # release previous video writer
                        if vid_cap:  # video
                            fps = vid_cap.get(cv2.CAP_PROP_FPS)
                            w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                            h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
                        else:  # stream
                            fps, w, h = 30, im0.shape[1], im0.shape[0]
                        save_path = str(Path(save_path).with_suffix('.mp4'))  # force *.mp4 suffix on results videos
                        vid_writer[i] = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
                    vid_writer[i].write(im0)

        # Print time (inference-only)
        LOGGER.info(f"{s}{'' if len(det) else '(no detections), '}{dt[1].dt * 1E3:.1f}ms")

    # Print results
    t = tuple(x.t / seen * 1E3 for x in dt)  # speeds per image
    LOGGER.info(f'Speed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape {(1, 3, *imgsz)}' % t)
    if save_txt or save_img:
        s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ''
        LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}{s}")
    if update:
        strip_optimizer(weights[0])  # update model (to fix SourceChangeWarning)


def parse_opt():  //要在這裡進行相關設定
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', nargs='+', type=str, default=ROOT / 'yolov5s.onnx', help='model path or triton URL')
    # parser.add_argument('--weights', nargs='+', type=str, default=ROOT / 'yolov5s.pt', help='model path or triton URL')
    parser.add_argument('--source', type=str, default='D:/images/video/Boogie_Up.mp4', help='file/dir/URL/glob/screen/0(webcam)')
    # parser.add_argument('--source', type=str, default=ROOT / 'data/images', help='file/dir/URL/glob/screen/0(webcam)')
    parser.add_argument('--data', type=str, default=ROOT / 'data/coco128.yaml', help='(optional) dataset.yaml path')
    parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640], help='inference size h,w')
    parser.add_argument('--conf-thres', type=float, default=0.25, help='confidence threshold')
    parser.add_argument('--iou-thres', type=float, default=0.45, help='NMS IoU threshold')
    parser.add_argument('--max-det', type=int, default=1000, help='maximum detections per image')
    parser.add_argument('--device', default='cpu', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    # parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--view-img', action='store_true', help='show results')
    parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
    parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
    parser.add_argument('--save-crop', action='store_true', help='save cropped prediction boxes')
    parser.add_argument('--nosave', action='store_true', help='do not save images/videos')
    parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --classes 0, or --classes 0 2 3')
    parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
    parser.add_argument('--augment', action='store_true', help='augmented inference')
    parser.add_argument('--visualize', action='store_true', help='visualize features')
    parser.add_argument('--update', action='store_true', help='update all models')
    parser.add_argument('--project', default=ROOT / 'runs/detect', help='save results to project/name')
    parser.add_argument('--name', default='exp', help='save results to project/name')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    parser.add_argument('--line-thickness', default=3, type=int, help='bounding box thickness (pixels)')
    parser.add_argument('--hide-labels', default=False, action='store_true', help='hide labels')
    parser.add_argument('--hide-conf', default=False, action='store_true', help='hide confidences')
    parser.add_argument('--half', action='store_true', help='use FP16 half-precision inference')
    parser.add_argument('--dnn', action='store_true', help='use OpenCV DNN for ONNX inference')
    parser.add_argument('--vid-stride', type=int, default=1, help='video frame-rate stride')
    opt = parser.parse_args()
    opt.imgsz *= 2 if len(opt.imgsz) == 1 else 1  # expand
    print_args(vars(opt))
    return opt


def main(opt):
    check_requirements(exclude=('tensorboard', 'thop'))
    run(**vars(opt))


if __name__ == "__main__":
    opt = parse_opt()
    main(opt)

效果:

總結思維導圖:

相關文章