[深度學習] 計算機視覺低程式碼工具Supervision庫使用指北

Supervision庫是一款出色的Python計算機視覺低程式碼工具，其設計初衷在於為使用者提供一個便捷且高效的介面，用以處理資料集以及直觀地展示檢測結果。Supervision庫的官方開源倉庫地址為：supervision，官方文件地址為：supervision-doc。

Supervision庫需要在Python3.8及以上版本的環境下執行。如果需要支援包含OpenCV的GUI元件以支援顯示影像和影片，supervision安裝方式如下：

pip install supervision[desktop]

如果僅僅想部署應用，而不需要GUI介面，supervision安裝方式如下：

pip install supervision

注意，由於supervision版本經常變動，所提供的介面函式會相應發生變化。

import supervision as sv
# 列印supervision的版本
sv.__version__

'0.19.0'

1 不同任務的處理
- 1.1 目標檢測與語義分割
  - 1.1.1 結果分析
  - 1.1.2 輔助函式
- 1.2 目標跟蹤
- 1.3 影像分類
2 資料展示與輔助處理
- 2.1 顏色設定
- 2.2 識別結果視覺化示例
- 2.3 輔助函式
  - 2.3.1 影片相關
  - 2.3.2 影像相關
- 2.4 其他函式
3 面向實際任務的工具
- 3.1 越線數量統計
- 3.2 對特定區域進行檢測跟蹤
- 3.3 切片推理
- 3.4 軌跡平滑
4 參考

1 不同任務的處理

1.1 目標檢測與語義分割

1.1.1 結果分析

supervision提供了多種介面來支援對目標檢測或語義分割結果的分析。supervision.Detections為主流目標檢測或語義分割模型的輸出結果分析提供了多種介面，常用的幾個介面如下：

from_ultralytics(Ultralytics, YOLOv8)
from_detectron2(Detectron2)
from_mmdetection(MMDetection)
from_yolov5(YOLOv5)
from_sam(Segment Anything Model)
from_transformers(HuggingFace Transformers)
from_paddledet(PaddleDetecticon)

以上介面的具體使用見：supervision-doc-detection。下面以YOLOv8的結果分析為例，來說明相關程式碼的使用，以下程式碼輸入圖片如下：

img/cat.png

import cv2
import supervision as sv
from ultralytics import YOLO

model = YOLO("yolov8n.pt")
# model = YOLO("yolov8n-seg.pt")
image = cv2.imread("img/dog.png")
results = model(image, verbose=False)[0]
# 從YOLOv8中載入資料結果
detections = sv.Detections.from_ultralytics(results)
# 檢視輸出結果
detections

Detections(xyxy=array([[     255.78,      432.86,      612.01,      1078.5],
       [     255.98,      263.57,      1131.1,       837.2],
       [     927.72,      143.31,      1375.6,      338.81],
       [     926.96,      142.04,      1376.7,      339.12]], dtype=float32), mask=None, confidence=array([    0.91621,     0.85567,     0.56325,     0.52481], dtype=float32), class_id=array([16,  1,  2,  7]), tracker_id=None, data={'class_name': array(['dog', 'bicycle', 'car', 'truck'], dtype='<U7')})

# 檢視輸出結果預測框個數
len(detections)

# 檢視第一個框輸出結果
detections[0]

Detections(xyxy=array([[     255.78,      432.86,      612.01,      1078.5]], dtype=float32), mask=None, confidence=array([    0.91621], dtype=float32), class_id=array([16]), tracker_id=None, data={'class_name': array(['dog'], dtype='<U7')})

# 檢視每一個目標邊界框的面積
detections.box_area

array([ 2.3001e+05,  5.0201e+05,       87569,       88643], dtype=float32)

# 視覺化識別結果

# 確定視覺化引數
bounding_box_annotator = sv.BoundingBoxAnnotator()
label_annotator = sv.LabelAnnotator()

labels = [
    model.model.names[class_id]
    for class_id
    in detections.class_id
]

annotated_image = bounding_box_annotator.annotate(
    scene=image, detections=detections)
annotated_image = label_annotator.annotate(
    scene=annotated_image, detections=detections, labels=labels)
sv.plot_image(annotated_image)

png

在上圖示註了三個檢測框，然而實際的檢測結果中卻包含了四個框。這是由於圖中的汽車同時被識別為卡車(truck)和轎車(car)。

len(detections)
detections

Detections(xyxy=array([[     255.78,      432.86,      612.01,      1078.5],
       [     255.98,      263.57,      1131.1,       837.2],
       [     927.72,      143.31,      1375.6,      338.81],
       [     926.96,      142.04,      1376.7,      339.12]], dtype=float32), mask=None, confidence=array([    0.91621,     0.85567,     0.56325,     0.52481], dtype=float32), class_id=array([16,  1,  2,  7]), tracker_id=None, data={'class_name': array(['dog', 'bicycle', 'car', 'truck'], dtype='<U7')})

labels

['dog', 'bicycle', 'car', 'truck']

解決的辦法是，對目標檢測結果執行執行類無關的非最大抑制NMS，程式碼如下：

detections = detections.with_nms(threshold=0.5, class_agnostic=True)
# 列印輸出結果
for class_id in detections.class_id:
    print(model.model.names[class_id])

dog
bicycle
car

1.1.2 輔助函式

計算Intersection over Union(IOU)

import supervision as sv
import numpy as np
box1 = np.array([[50, 50, 150, 150]])  # (x_min, y_min, x_max, y_max)
box2 = np.array([[100, 100, 200, 200]])

print(sv.box_iou_batch(box1,box2))

[[    0.14286]]

計算Non-Maximum Suppression (NMS)

import supervision as sv
box = np.array([[50, 50, 150, 150, 0.2],[100, 100, 200, 200, 0.5]])  # (x_min, y_min, x_max, y_max, score)

# 返回哪些邊界框需要儲存
# 引數：輸入框陣列和閾值
print(sv.box_non_max_suppression(box,0.1))

[False  True]

從多邊形生成mask

import cv2
import supervision as sv
import numpy as np

# 多邊形
vertices = np.array([(50, 50), (30, 50), (60,20), (70, 50), (90, 10)])

# 建立遮罩mask
# 引數：輸入框陣列和輸出mask的寬高
mask = sv.polygon_to_mask(vertices, (100,60))
# mask中白色(畫素值為1)表示多邊形，其他區域畫素值為0
sv.plot_image(mask)

# 從mask生成多邊形
# vertices = sv.mask_to_polygons(mask)

png

根據面積過濾多邊形

import supervision as sv
import numpy as np

# 建立包含多個多邊形的示例列表
polygon1 = np.array([[0, 0], [0, 1], [1, 1], [1, 0]])
polygon2 = np.array([[0, 0], [0, 2], [2, 2], [2, 0]])
polygon3 = np.array([[0, 0], [0, 3], [3, 3], [3, 0]])

polygons = [polygon1, polygon2, polygon3]

# 引數：輸入多邊形列表，面積最小值，面積最大值（為None表示無最大值限制）
filtered_polygons = sv.filter_polygons_by_area(polygons, 2.5, None)

print("原始多邊形陣列個數:", len(polygons))
print("篩選後的多邊形陣列:", len(filtered_polygons))

原始多邊形陣列個數: 3
篩選後的多邊形陣列: 2

縮放邊界框

import numpy as np
import supervision as sv

boxes = np.array([[10, 10, 20, 20], [30, 30, 40, 40]])
# 表示按比例縮放長方體尺寸的因子。大於1的因子將放大長方體，而小於1的因子將縮小長方體
factor = 1.2
scaled_bb = sv.scale_boxes(boxes, factor)
print(scaled_bb)

[[          9           9          21          21]
 [         29          29          41          41]]

1.2 目標跟蹤

supervision中內建了ByteTrack目標跟蹤器，ByteTrack與基於ReID特徵進行匹配的目標跟蹤方法不同，ByteTrack主要依賴目標檢測器提供的目標框資訊進行跟蹤。因此，目標檢測器的準確性和穩定性會直接影響到ByteTrack的跟蹤效果。

透過supervision.ByteTrack類即可初始化ByteTrack追蹤器，supervision.ByteTrack類的初始化引數如下：

track_thresh（float, 可選，預設0.25）: 檢測置信度閾值
track_buffer（int，可選，預設30）: 軌道丟失時要緩衝的幀數。
match_thresh（float，可選，預設0.8）: 將軌道與檢測相匹配的閾值。
frame_rate（int，可選，預設30）: 影片的幀速率。

supervision.ByteTrack類的主要類函式如下：

reset()：重置ByteTrack跟蹤器的內部狀態。
update_with_detections(detections)：使用提供的檢測更新跟蹤器並返回更新的檢測結果，detections為supervision的目標檢測結果。

示例程式碼如下：

import supervision as sv
from ultralytics import YOLO
import numpy as np
model = YOLO("yolov8n.pt")

# 初始化目標跟蹤器
tracker = sv.ByteTrack()

bounding_box_annotator = sv.BoundingBoxAnnotator()
label_annotator = sv.LabelAnnotator()

def callback(frame: np.ndarray, index: int) -> np.ndarray:
    results = model(frame)[0]
    # 獲得Detections結果
    detections = sv.Detections.from_ultralytics(results)
    # 軌跡跟蹤
    detections = tracker.update_with_detections(detections)

    labels = [f"#{tracker_id}" for tracker_id in detections.tracker_id]

    annotated_frame = bounding_box_annotator.annotate(scene=frame.copy(), detections=detections)
    annotated_frame = label_annotator.annotate( scene=annotated_frame, detections=detections, labels=labels)
    return annotated_frame

sv.process_video(
    source_path="https://media.roboflow.com/supervision/video-examples/people-walking.mp4",
    # 輸出結果參考：https://media.roboflow.com/supervision/video-examples/how-to/track-objects/annotate-video-with-traces.mp4
    target_path="output.mp4",
    callback=callback
)

由於示例程式碼用的是yolov8n.pt，跟蹤效果會很不穩定，可以考慮使用效能更強的目標跟蹤器。

1.3 影像分類

supervision支援分析clip，timm，YOLOv8分類模型結果的輸出，但是功能很弱，僅支援輸出top-k及機率。

以下程式碼輸入圖片如下：

img/cat.png

import cv2
from ultralytics import YOLO
import supervision as sv

# 載入圖片和模型
image = cv2.imread("img/cat.png")
# 載入分類模型
model = YOLO('yolov8n-cls.pt')

output = model(image)[0]
# 將YOLO的分類輸出匯入supervision
classifications = sv.Classifications.from_ultralytics(output)
# 除此之外還支援from_clip和from_timm兩類模型

# 列印top2，輸出類別和機率
print(classifications.get_top_k(2))

0: 224x224 tiger_cat 0.29, tabby 0.23, Egyptian_cat 0.15, Siamese_cat 0.05, Pembroke 0.03, 36.7ms
Speed: 6.3ms preprocess, 36.7ms inference, 0.0ms postprocess per image at shape (1, 3, 224, 224)
(array([282, 281]), array([    0.29406,     0.22982], dtype=float32))

2 資料展示與輔助處理

2.1 顏色設定

supervision提供Color類和ColorPalette類來設定顏色（調色盤）和轉換顏色。具體如下：

# 獲得預設顏色
import supervision as sv

# WHITE BLACK RED GREEN	BLUE YELLOW	ROBOFLOW
sv.Color.ROBOFLOW

Color(r=163, g=81, b=251)

# 獲得顏色的bgr值
sv.Color(r=255, g=255, b=0).as_bgr()
# 獲得rgb值
# sv.Color(r=255, g=255, b=0).as_rgb()

(0, 255, 255)

# 獲得顏色的16進位制值
sv.Color(r=255, g=255, b=0).as_hex()

'#ffff00'

# 基於16進位制色Color物件
sv.Color.from_hex('#ff00ff')

Color(r=255, g=0, b=255)

# 返回預設調色盤
sv.ColorPalette.DEFAULT
# sv.ColorPalette.ROBOFLOW
# sv.ColorPalette.LEGACY

ColorPalette(colors=[Color(r=163, g=81, b=251), Color(r=255, g=64, b=64), Color(r=255, g=161, b=160), Color(r=255, g=118, b=51), Color(r=255, g=182, b=51), Color(r=209, g=212, b=53), Color(r=76, g=251, b=18), Color(r=148, g=207, b=26), Color(r=64, g=222, b=138), Color(r=27, g=150, b=64), Color(r=0, g=214, b=193), Color(r=46, g=156, b=170), Color(r=0, g=196, b=255), Color(r=54, g=71, b=151), Color(r=102, g=117, b=255), Color(r=0, g=25, b=239), Color(r=134, g=58, b=255), Color(r=83, g=0, b=135), Color(r=205, g=58, b=255), Color(r=255, g=151, b=202), Color(r=255, g=57, b=201)])

# 返回撥試第i個顏色
color_palette = sv.ColorPalette.from_hex(['#ff0000', '#00ff00', '#0000ff'])
color_palette.by_idx(1)

Color(r=0, g=255, b=0)

# 從matpotlib匯入調色盤
sv.ColorPalette.from_matplotlib('tab20', 5)

ColorPalette(colors=[Color(r=31, g=119, b=180), Color(r=152, g=223, b=138), Color(r=140, g=86, b=75), Color(r=199, g=199, b=199), Color(r=158, g=218, b=229)])

2.2 識別結果視覺化示例

supervision提供了多種函式來對識別結果進行視覺化（主要針對目標檢測和目標跟蹤任務）。本文主要介紹目標檢測邊界框的各種展示效果。關於supervision所有資料註釋視覺化示例函式見：supervision-doc-annotators。

以下是主要示例：

# 獲得資料結果
import cv2
import supervision as sv
from ultralytics import YOLO

model = YOLO("yolov8n.pt")
image = cv2.imread("img/person.png")
results = model(image, verbose=False)[0]
# 從YOLOv8中載入資料結果
detections = sv.Detections.from_ultralytics(results)
# 檢視輸出結果維度
len(detections)

目標框繪製

import supervision as sv

# 設定邊界框繪製器
# 引數：color-設定顏色，thickness-線條粗細，color_lookup-顏色對映策略/選項有INDEX、CLASS、TRACK。
bounding_box_annotator = sv.BoundingBoxAnnotator(color= sv.ColorPalette.DEFAULT, thickness = 2, color_lookup = sv.ColorLookup.CLASS)
annotated_frame = bounding_box_annotator.annotate(
    scene=image.copy(),
    detections=detections
)

sv.plot_image(annotated_frame)

png

圓角目標框繪製

# roundness-邊界框邊緣的圓度百分比
round_box_annotator = sv.RoundBoxAnnotator(color_lookup = sv.ColorLookup.INDEX, roundness=0.6)

annotated_frame = round_box_annotator.annotate(
    scene=image.copy(),
    detections=detections
)

sv.plot_image(annotated_frame)

png

角點邊界框繪製

import supervision as sv

# corner_length-每個角線的長度，
corner_annotator = sv.BoxCornerAnnotator(corner_length=12, color=sv.Color(r=255, g=255, b=0))
annotated_frame = corner_annotator.annotate(
    scene=image.copy(),
    detections=detections
)
sv.plot_image(annotated_frame)

png

遮罩邊界框繪製

# 顏色遮罩的不透明度
color_annotator = sv.ColorAnnotator(opacity=0.4)
annotated_frame = color_annotator.annotate(
    scene=image.copy(),
    detections=detections
)
sv.plot_image(annotated_frame)

png

圓形邊界框繪製

circle_annotator = sv.CircleAnnotator(color=sv.Color(r=255, g=255, b=128))
annotated_frame = circle_annotator.annotate(
    scene=image.copy(),
    detections=detections
)
sv.plot_image(annotated_frame)

png

點形邊界框繪製

Supervision提供DotAnnotator繪製類以在影像上的目標檢測框特定位置繪製關鍵點，該繪製類有兩個獨有引數：radius（點的半徑），position（點在邊界框上的繪製）。position可選引數如下：

CENTER = "CENTER"
CENTER_LEFT = "CENTER_LEFT"
CENTER_RIGHT = "CENTER_RIGHT"
TOP_CENTER = "TOP_CENTER"
TOP_LEFT = "TOP_LEFT"
TOP_RIGHT = "TOP_RIGHT"
BOTTOM_LEFT = "BOTTOM_LEFT"
BOTTOM_CENTER = "BOTTOM_CENTER"
BOTTOM_RIGHT = "BOTTOM_RIGHT"
CENTER_OF_MASS = "CENTER_OF_MASS"

透過程式碼檢視position可選引數實現如下：

for i in sv.Position:
    print(i)

dot_annotator = sv.DotAnnotator(radius=4)
annotated_frame = dot_annotator.annotate(
    scene=image.copy(),
    detections=detections
)
sv.plot_image(annotated_frame)

png

三角形邊界框繪製

# base/height-三角形的寬高，position-位置
triangle_annotator = sv.TriangleAnnotator(base = 30, height = 30, position = sv.Position['TOP_CENTER'])
annotated_frame = triangle_annotator.annotate(
    scene=image.copy(),
    detections=detections
)
sv.plot_image(annotated_frame)

png

橢圓形邊界框繪製

# start_angle/end_angle-橢圓開始/結束角度
ellipse_annotator = sv.EllipseAnnotator(start_angle=-45, end_angle=215)
annotated_frame = ellipse_annotator.annotate(
    scene=image.copy(),
    detections=detections
)
sv.plot_image(annotated_frame)

png

置信度邊界框繪製

# 用於展示置信度百分比
# border_color-百分比條顏色
# position-位置
# width/height-百分比條寬/高
percentage_bar_annotator = sv.PercentageBarAnnotator(border_color = sv.Color(r=128, g=0, b=0), position=sv.Position['BOTTOM_CENTER'],
                                                    width = 100, height = 20)
annotated_frame = percentage_bar_annotator.annotate(
    scene=image.copy(),
    detections=detections
)
sv.plot_image(annotated_frame)

png

文字描述框繪製

# color-文字背景色，text_color-文字顏色，text_scale-文字大小
# text_position-文字位置，text_thickness-文字粗細，text_padding-文字填充距離
label_annotator = sv.LabelAnnotator(color=sv.Color(r=255, g=255, b=255),text_color=sv.Color(r=128, g=0, b=128), text_scale=2, 
                                    text_position=sv.Position.TOP_CENTER, text_thickness=2,text_padding=10)

# 獲得各邊界框的標籤
labels = [
    model.model.names[class_id]
    for class_id
    in detections.class_id
]

annotated_frame = label_annotator.annotate(
    scene=image.copy(),
    detections=detections,
    labels=labels
)
sv.plot_image(annotated_frame)

png

畫素化目標

# pixel_size-畫素化的大小。
pixelate_annotator = sv.PixelateAnnotator(pixel_size=12)
annotated_frame = pixelate_annotator.annotate(
    scene=image.copy(),
    detections=detections
)
# 疊加其他邊界框展示效果
annotated_frame = label_annotator.annotate(
    scene=annotated_frame.copy(),
    detections=detections,
    labels=labels
)
sv.plot_image(annotated_frame)

png

2.3 輔助函式

2.3.1 影片相關

讀取影片資訊

import supervision as sv
# 讀取影片檔案的寬度、高度、fps和總幀數。
video_info = sv.VideoInfo.from_video_path(video_path="https://media.roboflow.com/supervision/video-examples/people-walking.mp4")
video_info

VideoInfo(width=1920, height=1080, fps=25, total_frames=341)

影片讀寫

import supervision as sv
from tqdm import tqdm
video_path="https://media.roboflow.com/supervision/video-examples/people-walking.mp4"
video_info = sv.VideoInfo.from_video_path(video_path)
# 獲取一個生成影片幀的生成器
# stride: 指示返回幀的時間間隔，預設為1
# start: 開始幀編號，預設為0
# end：結束幀編號，預設為None（一直到影片結束）
frames_generator = sv.get_video_frames_generator(source_path=video_path, stride=10, start=0, end=100)
TARGET_VIDEO_PATH = "out.avi"
# target_path儲存路徑
with sv.VideoSink(target_path=TARGET_VIDEO_PATH, video_info=video_info) as sink:
    for frame in tqdm(frames_generator):
        sink.write_frame(frame=frame)

10it [00:24,  2.47s/it]

fps計算

import supervision as sv

frames_generator = sv.get_video_frames_generator(source_path="https://media.roboflow.com/supervision/video-examples/people-walking.mp4")
# 初始化fps監視器
fps_monitor = sv.FPSMonitor()

for frame in frames_generator:
    # 新增時間戳
    fps_monitor.tick()
# 根據儲存的時間戳計算並返回平均 FPS。
fps = fps_monitor.fps
fps

174.4186046525204

2.3.2 影像相關

# 儲存圖片
import supervision as sv
# 建立影像儲存類
# target_dir_path-儲存路徑
# overwrite-是否是否覆蓋儲存路徑，預設False
# image_name_pattern-影像檔名模式。 預設為“image_{:05d}.png”。
with sv.ImageSink(target_dir_path='output', overwrite=True, image_name_pattern= "img_{:05d}.png") as sink:
    for image in sv.get_video_frames_generator( source_path='out.avi', stride=2):
        sink.save_image(image=image)

# 根據給定的邊界框裁剪影像。
import supervision as sv
import cv2
import supervision as sv
from ultralytics import YOLO

model = YOLO("yolov8n.pt")
image = cv2.imread("img/person.png")
results = model(image)[0]
# 從YOLOv8中載入資料結果
detections = sv.Detections.from_ultralytics(results)
with sv.ImageSink(target_dir_path='output') as sink:
    for xyxy in detections.xyxy:
        # 獲得邊界框裁剪影像
        cropped_image = sv.crop_image(image=image, xyxy=xyxy)
        sink.save_image(image=cropped_image)

0: 384x640 31 persons, 1 bird, 76.8ms
Speed: 2.5ms preprocess, 76.8ms inference, 2.7ms postprocess per image at shape (1, 3, 384, 640)

2.4 其他函式

supervision中還有其他常用類，本文將不對其進行詳細介紹，具體情況如下：

supervision提供了一種能夠使各類資料集（主要是目標檢測和影像分類）在不同格式之間相互轉換的類：supervision-doc-datasets。
supervision提供了計算目標檢測結果各類分析指標的類：supervision-doc-metrics
supervision提供繪製各種圖形的類：supervision-doc-draw-utils

3 面向實際任務的工具

3.1 越線數量統計

supversion提供了LineZone類來實現越線數量統計功能，原理很簡單就是目標檢測+目標跟蹤，然後根據車輛的邊界框中心點來判斷是否穿過預設線，從而實現越線數量統計。程式碼如下：

import supervision as sv
from ultralytics import YOLO

model = YOLO("yolov8n.pt")
tracker = sv.ByteTrack()
frames_generator = sv.get_video_frames_generator("https://media.roboflow.com/supervision/video-examples/vehicles.mp4",start=0,end=500)
video_info = sv.VideoInfo.from_video_path("https://media.roboflow.com/supervision/video-examples/vehicles.mp4")
w = video_info.width
h = video_info.height
# 設定預設線（從左至右）
start, end = sv.Point(x=0, y=int(h/2)), sv.Point(x=w, y=int(h/2))
# 初始預線檢測器
line_zone = sv.LineZone(start=start, end=end)
# 初始化視覺化物件
trace_annotator = sv.TraceAnnotator()
label_annotator = sv.LabelAnnotator(text_scale=2,text_color= sv.Color.BLACK)
line_zone_annotator = sv.LineZoneAnnotator(thickness=4, text_thickness=4, text_scale=1)
with sv.ImageSink(target_dir_path='output', overwrite=False, image_name_pattern= "img_{:05d}.png") as sink:
    for frame in frames_generator:
        result = model(frame)[0]
        detections = sv.Detections.from_ultralytics(result)
        # 更新目標跟蹤器
        detections = tracker.update_with_detections(detections)
        # 更新預線檢測器，crossed_in是否進入結果，crossed_out是否出去結果
        crossed_in, crossed_out = line_zone.trigger(detections)
        

        # 獲得各邊界框的標籤
        labels = [
            f"#{tracker_id} {model.model.names[class_id]}"
            for class_id, tracker_id
            in zip(detections.class_id, detections.tracker_id)
        ]
        
        # 繪製軌跡
        annotated_frame = trace_annotator.annotate(scene=frame.copy(), detections=detections)
        # 繪製標籤
        annotated_frame = label_annotator.annotate(scene=annotated_frame, detections=detections, labels=labels)
        # 繪製預製線
        annotated_frame = line_zone_annotator.annotate(annotated_frame, line_counter=line_zone)
        # 資料展示
        # sv.plot_image(annotated_frame)
        # 儲存視覺化結果
        # sink.save_image(image=annotated_frame)

# 從外到內越線的物件數量，從內到外越線的物件數量。
print(line_zone.in_count, line_zone.out_count)
# 程式碼輸出結果見：https://media.roboflow.com/supervision/cookbooks/count-objects-crossing-the-line-result-1280x720.mp4

3.2 對特定區域進行檢測跟蹤

supversion提供了PolygonZone類來對特定區域進行檢測跟蹤，原理很簡單就是目標檢測或加上目標跟蹤，然後選取特定區域來判斷目標是否在此區域以及統計當前區域的目標個數。程式碼如下：

import numpy as np
import supervision as sv

from ultralytics import YOLO

model = YOLO('yolov8n.pt')

# 影片路徑
video_path = "https://media.roboflow.com/supervision/video-examples/vehicles-2.mp4"
# 檢視影片資訊
video_info = sv.VideoInfo.from_video_path(video_path)
print(video_info)
# 讀取影片
generator = sv.get_video_frames_generator(video_path)

# 設定要監控的區域
polygons = [
  np.array([
    [718, 595],[927, 592],[851, 1062],[42, 1059]
  ]),
  np.array([
    [987, 595],[1199, 595],[1893, 1056],[1015, 1062]
  ])
]


# 設定調色盤
colors = sv.ColorPalette.DEFAULT
zones = [
    # 定義多邊形區域以檢測物件。
    sv.PolygonZone(
        polygon=polygon, # 輸入多邊形
        frame_resolution_wh=video_info.resolution_wh # 全圖尺寸
    )
    for polygon in polygons
]
# 初始化視覺化物件
zone_annotators = [
    # 對不同監控區域分開進行視覺化
    sv.PolygonZoneAnnotator(
        zone=zone,
        color=colors.by_idx(index), # 顏色
        thickness=4, # 線寬
        text_thickness=8, # 文字粗細
        text_scale=4, # 文字比例
        display_in_zone_count=False # 是否展示目標統計個數
    )
    for index, zone in enumerate(zones)
]
# 分開為檢測區域定義不同的邊界框展示
box_annotators = [
    sv.BoxAnnotator(
        color=colors.by_idx(index),
        thickness=4,
        text_thickness=4,
        text_scale=2
        )
    for index in range(len(polygons))
]

with sv.ImageSink(target_dir_path='output', overwrite=False, image_name_pattern= "img_{:05d}.png") as sink:
    for frame in generator:
        # 為提高識別精度，需要設定模型各大的輸入尺寸
        results = model(frame, imgsz=1280, verbose=False)[0]
        detections = sv.Detections.from_ultralytics(results)

        for zone, zone_annotator, box_annotator in zip(zones, zone_annotators, box_annotators):
            # 確定哪些目標檢測結果位於多邊形區域
            mask = zone.trigger(detections=detections)
            detections_filtered = detections[mask]
            frame = box_annotator.annotate(scene=frame, detections=detections_filtered)
            frame = zone_annotator.annotate(scene=frame)
        # 資料展示
        sv.plot_image(frame, (16, 16))
        # 儲存視覺化結果
        # sink.save_image(image=annotated_frame)
# 程式碼輸出結果見：https://blog.roboflow.com/content/media/2023/03/trim-counting.mp4

3.3 切片推理

supervision支援對圖片進行切片推理以最佳化小目標識別，即基於SAHI(Slicing Aided Hyper Inference，切片輔助超推理)透過影像切片的方式來檢測小目標。SAHI檢測過程可以描述為：透過滑動視窗將影像切分成若干區域，各個區域分別進行預測，同時也對整張圖片進行推理。然後將各個區域的預測結果和整張圖片的預測結果合併，最後用NMS（非極大值抑制）進行過濾。SAHI的具體使用見：基於切片輔助超推理庫SAHI最佳化小目標識別。

supervision透過SAHI進行切片推理的示例程式碼如下所示：

import cv2
import supervision as sv
from ultralytics import YOLO
import numpy as np

model = YOLO("yolov8n.pt")
image = cv2.imread("img/person.png")
results = model(image,verbose=False)[0]
# 從YOLOv8中載入資料結果
detections = sv.Detections.from_ultralytics(results)
# 檢視輸出結果維度
print("before slicer",len(detections))

# 切片回撥函式
def callback(image_slice: np.ndarray) -> sv.Detections:
    result = model(image_slice,verbose=False)[0]
    return sv.Detections.from_ultralytics(result)

# 設定 Slicing Adaptive Inference(SAHI)處理物件
# callback-對於切片後每張子圖進行處理的回撥函式
# slice_wh-切片後子圖的大小
# overlap_ratio_wh-連續切片之間的重疊率
# iou_threshold-子圖合併時用於nms的iou閾值
# thread_workers-處理執行緒數
slicer = sv.InferenceSlicer(callback = callback, slice_wh=(320,320),
                            overlap_ratio_wh=(0.3,0.3), iou_threshold=0.4, thread_workers=4)

detections = slicer(image)
# 檢視輸出結果維度
print("after slicer",len(detections))

before slicer 32
after slicer 53

3.4 軌跡平滑

supervision提供了用於平滑影片跟蹤軌跡的實用類DetectionsSmoother。 DetectionsSmoother維護每個軌跡的檢測歷史記錄，並根據這些歷史記錄提供平滑的預測。具體程式碼如下：

import supervision as sv

from ultralytics import YOLO

video_path = "https://media.roboflow.com/supervision/video-examples/grocery-store.mp4"
video_info = sv.VideoInfo.from_video_path(video_path=video_path)
frame_generator = sv.get_video_frames_generator(source_path=video_path)

model = YOLO("yolov8n.pt")
tracker = sv.ByteTrack(frame_rate=video_info.fps)
# 跟蹤結果平滑器，length-平滑檢測時要考慮的最大幀數
smoother = sv.DetectionsSmoother(length=4)

annotator = sv.BoundingBoxAnnotator()

with sv.VideoSink("output.mp4", video_info=video_info) as sink:
    for frame in frame_generator:
        result = model(frame)[0]
        detections = sv.Detections.from_ultralytics(result)
        detections = tracker.update_with_detections(detections)
        # 平滑目標跟蹤軌跡
        detections = smoother.update_with_detections(detections)

        annotated_frame = annotator.annotate(frame.copy(), detections)
        # 資料展示
        sv.plot_image(annotated_frame, (16, 16))
        # sink.write_frame(annotated_frame)

# 程式碼輸出結果見：https://media.roboflow.com/supervision-detection-smoothing.mp4

4 參考

supervision
supervision-doc
supervision-doc-detection
supervision-doc-annotators
supervision-doc-datasets
supervision-doc-metrics
supervision-doc-draw-utils
ByteTrack
基於切片輔助超推理庫SAHI最佳化小目標識別