DeepSort之原始碼解讀

文章有點長...

traker是一個類，負責對多個track的進行操作，包括預測和更新。

self.tracker.predict()
self.tracker.update(detections)

tracker預測階段是對每個track進行預測，包括

卡爾曼預測
track年齡 age+1
time_since_update+1，此變數用於記錄track上次更新的時間

程式碼如下：

 def predict(self, kf):
        """Propagate the state distribution to the current time step using a
        Kalman filter prediction step.

        Parameters
        ----------
        kf : kalman_filter.KalmanFilter
            The Kalman filter.

        """
        self.mean, self.covariance = kf.predict(self.mean, self.covariance)
        self.age += 1
        self.time_since_update += 1

tracker更新是對多個track更新：

track和det的匹配
track更新
距離指標更新

程式碼如下：

 def update(self, detections):
        """Perform measurement update and track management.

        Parameters
        ----------
        detections : List[deep_sort.detection.Detection]
            A list of detections at the current time step.

        """
        # Run matching cascade.
        matches, unmatched_tracks, unmatched_detections = \
            self._match(detections)
        print("matches:",matches, "unmatched_tracks:",unmatched_tracks, "unmatched_detections:", unmatched_detections)

        # Update track set.
        for track_idx, detection_idx in matches:
            self.tracks[track_idx].update(
                self.kf, detections[detection_idx])
        for track_idx in unmatched_tracks:
            self.tracks[track_idx].mark_missed()
        for detection_idx in unmatched_detections:
            self._initiate_track(detections[detection_idx])
        self.tracks = [t for t in self.tracks if not t.is_deleted()]

        # Update distance metric.
        active_targets = [t.track_id for t in self.tracks if t.is_confirmed()]
        features, targets = [], []
        for track in self.tracks:
            if not track.is_confirmed():
                continue
            features += track.features
            print("1 features",track.track_id,np.array(features).shape)
            targets += [track.track_id for _ in track.features]
            print("1 targets_id",track.track_id,targets)
            track.features = []
        self.metric.partial_fit(
            np.asarray(features), np.asarray(targets), active_targets)

第一幀

檢測結果如下：

det [array([307,  97, 105, 345]), array([546, 151,  72, 207]), array([215, 154,  59, 184]), array([400, 181,  45, 126])]

得到檢測結果後進入track predict階段，但是第一幀還沒有track，所以沒有predict結果。

#code1
for track in self.tracks:
            track.predict(self.kf)

接著進入track update階段，首先對檢測結果進行匹配

matches, unmatched_tracks, unmatched_detections = \
            self._match(detections)

在匹配中，首先要將track分成confirmed_track和unconfirmed_track，

  confirmed_tracks = [
            i for i, t in enumerate(self.tracks) if t.is_confirmed()]
        unconfirmed_tracks = [
            i for i, t in enumerate(self.tracks) if not t.is_confirmed()]

顯然

confirmed_track: [] unconfirmed_track: []

對confirmed_track進行級連匹配

        matches_a, unmatched_tracks_a, unmatched_detections = \
            linear_assignment.matching_cascade(
                gated_metric, self.metric.matching_threshold, self.max_age,
                self.tracks, detections, confirmed_tracks)

顯然，沒有匹配的track，也沒有沒匹配的track，只有沒匹配的檢測。

matches_a [] unmatched_track_a [] unmatched_detections  [0, 1, 2, 3]

對所有檢測建立新的track：

   for detection_idx in unmatched_detections:
            self._initiate_track(detections[detection_idx])

初始化程式碼如下：

def _initiate_track(self, detection):
        mean, covariance = self.kf.initiate(detection.to_xyah())
        self.tracks.append(Track(
            mean, covariance, self._next_id, self.n_init, self.max_age,
            detection.feature))
        self._next_id += 1

通過Track類初始化一個track， self._next_id += 1，因為建立一個track後，id也多一個了。

每個track初始化的屬性如下：

self.mean = mean
self.covariance = covariance
self.track_id = track_id
self.hits = 1
self.age = 1
self.time_since_update = 0

self.state = TrackState.Tentative
self.features = []
if feature is not None:
    self.features.append(feature)

self._n_init = n_init
self._max_age = max_age

初始化的track，狀態為Tentative，age=1，time_since_update = 0，features=[]。

默然3幀以內track的狀態都是tentative。3幀以後便是conformed。30幀不更新則是deleted

第二幀

檢測結果：

det [array([227, 152,  52, 189]), array([546, 153,  66, 203]), array([ 35,  52, 114, 466]), array([339, 130,  92, 278]), array([273, 134,  90, 268])]

因為第一幀得到了4個track，每個track進入predict階段，進行卡爾曼預測，age和time_since_update都分別+1

  def predict(self, kf):
        """Propagate the state distribution to the current time step using a
        Kalman filter prediction step.

        Parameters
        ----------
        kf : kalman_filter.KalmanFilter
            The Kalman filter.

        """
        self.mean, self.covariance = kf.predict(self.mean, self.covariance)
        self.age += 1
        self.time_since_update += 1

此時，每個track的age和time_since_update分別為：

track update age 2 time_since_update 1
track update age 2 time_since_update 1
track update age 2 time_since_update 1
track update age 2 time_since_update 1

預測後進入track更新階段

對預測的檢測結果與之前得到的track進行匹配，首先將之前的track劃分tracks為 confirmed_tracks 和unconfirmed_tracks，結果為：

confirmed_track [] unconfirmed_track [0, 1, 2, 3]

因confirmed_track為空，所以級聯匹配結果為：

matches_a [] unmatched_track_a [] unmatched_detections  [0, 1, 2, 3,4]

接著，unconfirmed_track跟級聯匹配結果的unmatched_track_a中time_since_update=1(上一幀得到更新)的track組成候選track。

   iou_track_candidates = unconfirmed_tracks + [
            k for k in unmatched_tracks_a if
            self.tracks[k].time_since_update == 1]

        unmatched_tracks_a = [
            k for k in unmatched_tracks_a if
            self.tracks[k].time_since_update != 1]

候選track跟unmatched_track_a結果為：

iou_track_candidates [0, 1, 2, 3],unmatched_track_a []

對候選track和沒匹配的檢測進行iou匹配

 matches_b, unmatched_tracks_b, unmatched_detections = \
            linear_assignment.min_cost_matching(
                iou_matching.iou_cost, self.max_iou_distance, self.tracks,
                detections, iou_track_candidates, unmatched_detections)

IOU匹配的結果為：

matches_b [(0, 0), (1, 1), (2, 2), (3, 3)] unmatches_track_b [] unmatched_detections [4]

最後將結果合併，級聯匹配到的track跟iou匹配到track合併成最終的匹配結果，級聯匹配中time_since_update!=1的track和iou沒匹配到的track合併成最終的沒匹配的track。可以看出，上一幀有更新的confirmed track會進行級聯匹配和iou匹配，上一幀沒更新的confirmed track會直接成為沒匹配的track，從概率上說，上一幀有更新的track，當前幀會繼續更新的概率會更大。

 matches = matches_a + matches_b
 unmatched_tracks = list(set(unmatched_tracks_a + unmatched_tracks_b))

最後結果為:

matches: [(0, 0), (1, 1), (2, 2), (3, 3)] unmatched_tracks: [] unmatched_detections: [4]

匹配完後，會有三種結果，分別是匹配到檢測，未匹配到的track和未匹配的檢測框。

接下來進入track資料更新階段

對於匹配的結果，執行

 for track_idx, detection_idx in matches:
            self.tracks[track_idx].update(
                self.kf, detections[detection_idx])

每個track進行update

卡爾曼
檢測邊框特徵，每個track都會儲存一系列的特徵，用作特徵匹配
hits
time_since_update置0
track狀態，判斷能夠將狀態設定為confirmed

   def update(self, kf, detection):
        """Perform Kalman filter measurement update step and update the feature
        cache.

        Parameters
        ----------
        kf : kalman_filter.KalmanFilter
            The Kalman filter.
        detection : Detection
            The associated detection.

        """
        self.mean, self.covariance = kf.update(
            self.mean, self.covariance, detection.to_xyah())
        self.features.append(detection.feature)

        self.hits += 1
        self.time_since_update = 0
        if self.state == TrackState.Tentative and self.hits >= self._n_init:
            self.state = TrackState.Confirmed

此時，所有track都能匹配到，他們的time_since_update都是0,。

對於未匹配到的track，對其狀態進行標記，如果當前track狀態為tentative，則該狀態更新為deleted。如果太久沒更新，time_since_update>max_age，該狀態也將更新為deleted。

for track_idx in unmatched_tracks:
            self.tracks[track_idx].mark_missed()

 def mark_missed(self):
        """Mark this track as missed (no association at the current time step).
        """
        if self.state == TrackState.Tentative:
            self.state = TrackState.Deleted
        elif self.time_since_update > self._max_age:
            self.state = TrackState.Deleted

對於沒有匹配到檢測，建立新的track

   for detection_idx in unmatched_detections:
            self._initiate_track(detections[detection_idx])

然後檢查所有track，將deleted狀態的track刪除。

 self.tracks = [t for t in self.tracks if not t.is_deleted()]

第三幀

檢測結果為：

[array([307, 105, 108, 325]), array([547, 148,  70, 211]), array([216, 151,  59, 190]), array([402, 183,  43, 124]), array([ 35,  87,  70, 376])]

跟蹤過程跟上一幀差不多，這裡檢測結果跟之前的track都能匹配上，track年齡和time_since_update為

track update age 3 time_since_update 1
track update age 3 time_since_update 1
track update age 3 time_since_update 1
track update age 3 time_since_update 1
track update age 2 time_since_update 1

匹配完之後，track set的更新會將部分track的狀態更新為confirmed。

我們直接看第四幀。

第四幀

檢測結果：

[array([318, 119, 105, 301]), array([545, 146,  71, 215]), array([216, 151,  59, 192]), array([ 30,  75,  82, 398]), array([403, 185,  41, 121])]

得到檢測結果後進入預測階段，track更新卡爾曼預測，age和time_since_update。

track update age 4 time_since_update 1
track update age 4 time_since_update 1
track update age 4 time_since_update 1
track update age 4 time_since_update 1
track update age 3 time_since_update 1

預測完成後進入track更新階段

首先是檢測的結果跟track匹配，在匹配中，要將track分成confirmed_track和unconfirmed_track，結果如下：

confirmed_t [0, 1, 2, 3] unconfirmed [4]

因為第4個det是第2幀才檢出，所以狀態還是unconfirmed。

接著對confirmed的track進行級聯匹配

首先是對dets和confirmed_tracks建立索引

 if track_indices is None:
        track_indices = list(range(len(tracks)))
    if detection_indices is None:
        detection_indices = list(range(len(detections)))

結果為：

track_indices [0, 1, 2, 3] detection_indices [0, 1, 2, 3, 4]

當level=0時候，track_indices_l 索引中對應的time_since_update都是1，然後得到matches_l的匹配結果，當然level=1時候，track_indices_l 索引中對應的time_since_update都是2，然後再次得到匹配結果與之間結果進行合併，如此迴圈...，也就是先匹配最近有更新的track，由近到遠...，保證了最近更新track的優先順序。

 unmatched_detections = detection_indices
    matches = []
    for level in range(cascade_depth):
        if len(unmatched_detections) == 0:  # No detections left
            break

        track_indices_l = [
            k for k in track_indices
            if tracks[k].time_since_update == 1 + level
        ]
        if len(track_indices_l) == 0:  # Nothing to match at this level
            continue

        matches_l, _, unmatched_detections = \
            min_cost_matching(
                distance_metric, max_distance, tracks, detections,
                track_indices_l, unmatched_detections)
        matches += matches_l
    unmatched_tracks = list(set(track_indices) - set(k for k, _ in matches))

經過級聯匹配後，得到的結果為：

matches_a [(0, 0), (1, 1), (2, 2), (3, 4)] unmatched_track_a [] unmatched_detections  [3]

剩下了一個沒匹配的det。

unconfirmed_tracks和級聯匹配中未匹配並且time_since_update =1的track組成了候選tracks。

iou_track_candidates [4]

候選tracks跟未匹配的det進行IOU匹配，結果如下：

matches_b [(4, 3)] unmatches_track_b [] unmatched_detections []

最終結果如下：

matches: [(0, 0), (1, 1), (2, 2), (3, 4), (4, 3)] unmatched_tracks: [] unmatched_detections: []

匹配結束後，將當前幀dets的feature更新到map(trackid->feature)中。

  active_targets = [t.track_id for t in self.tracks if t.is_confirmed()]
        features, targets = [], []
        for track in self.tracks:
            if not track.is_confirmed():
                continue
            features += track.features
            targets += [track.track_id for _ in track.features]
            track.features = []
        self.metric.partial_fit(
            np.asarray(features), np.asarray(targets), active_targets)

 def partial_fit(self, features, targets, active_targets):
        for feature, target in zip(features, targets):
            self.samples.setdefault(target, []).append(feature)
            if self.budget is not None:
                self.samples[target] = self.samples[target][-self.budget:]
        
        self.samples = {k: self.samples[k] for k in active_targets}

整個deepsort過程就這樣子了，我們來看看更加細節的問題。

IOU匹配

如何得到代價矩陣？

初始化代價矩陣，矩陣(i,j)代表track i和det j的代價。然後計算卡爾曼濾波預測的bbx和det的IOU，代價=1-IOU。但是如果track已經有一幀以上(包含)沒有更新，那麼cost將會設定得很大，即為INFTY( 1e+5)。

def iou_cost(tracks, detections, track_indices=None,
             detection_indices=None):
    
   if track_indices is None:
        track_indices = np.arange(len(tracks))
    if detection_indices is None:
        detection_indices = np.arange(len(detections))

    cost_matrix = np.zeros((len(track_indices), len(detection_indices)))
    for row, track_idx in enumerate(track_indices):
        if tracks[track_idx].time_since_update > 1:
            cost_matrix[row, :] = linear_assignment.INFTY_COST
            continue

        bbox = tracks[track_idx].to_tlwh()
        candidates = np.asarray([detections[i].tlwh for i in detection_indices])
        cost_matrix[row, :] = 1. - iou(bbox, candidates)
    return cost_matrix

得到代價矩陣後，如果元素大於max_distance，該元素會設定為max_distance + 1e-5

cost_matrix[cost_matrix > max_distance] = max_distance + 1e-5

第二幀代價矩陣為：

[[0.04281178 1.         1.         0.96899767 1.        ]
 [1.         0.03566279 1.         1.         1.        ]
 [1.         1.         0.04389799 1.         1.        ]
 [0.95802783 1.         1.         0.08525083 1.        ]]
 
 #處理後
 [[0.04281178 0.70001    0.70001    0.70001    0.70001   ]
 [0.70001    0.03566279 0.70001    0.70001    0.70001   ]
 [0.70001    0.70001    0.04389799 0.70001    0.70001   ]
 [0.70001    0.70001    0.70001    0.08525083 0.70001   ]]

得到代價矩陣後，將其輸入到匈牙利演算法中

row_indices, col_indices = linear_assignment(cost_matrix)

當然也不是所有track和det都能得到匹配，iou匹配中把大於max_distacne的被認為是不匹配的。

 matches, unmatched_tracks, unmatched_detections = [], [], []
    for col, detection_idx in enumerate(detection_indices):
        if col not in col_indices:
            unmatched_detections.append(detection_idx)
    for row, track_idx in enumerate(track_indices):
        if row not in row_indices:
            unmatched_tracks.append(track_idx)
    for row, col in zip(row_indices, col_indices):
        track_idx = track_indices[row]
        detection_idx = detection_indices[col]
        if cost_matrix[row, col] > max_distance:
            unmatched_tracks.append(track_idx)
            unmatched_detections.append(detection_idx)
        else:
            matches.append((track_idx, detection_idx))

級聯匹配

看看如何得到代價矩陣。一個track中儲存了多個det的特徵，所以該track跟當前幀某個det的特徵會有多個餘弦距離，取最小值作為該track與該det的最終餘弦距離，然後再結合馬氏矩陣進行處理。

def gated_metric(tracks, dets, track_indices, detection_indices):
            features = np.array([dets[i].feature for i in detection_indices])
            targets = np.array([tracks[i].track_id for i in track_indices])
            cost_matrix = self.metric.distance(features, targets) #計算代價矩陣
            cost_matrix = linear_assignment.gate_cost_matrix( #結合馬氏矩陣進行處理
                self.kf, cost_matrix, tracks, dets, track_indices, #
                detection_indices)
            return cost_matrix

  def distance(self, features, targets):
        cost_matrix = np.zeros((len(targets), len(features)))
        for i, target in enumerate(targets):
            cost_matrix[i, :] = self._metric(self.samples[target], features)
        return cost_matrix

def _nn_cosine_distance(x, y):
    distances = _cosine_distance(x, y)
    return distances.min(axis=0) #取最小值

首先將det轉換成xyah格式，

  measurements = np.asarray(
        [detections[i].to_xyah() for i in detection_indices])

接著計算track預測結果和檢測結果的馬氏距離，將馬氏距離中大於gating_threshold( 9.4877 )的代價設定為gated_cost(100000.0)

for row, track_idx in enumerate(track_indices):
        track = tracks[track_idx]
        gating_distance = kf.gating_distance(
            track.mean, track.covariance, measurements, only_position)
        cost_matrix[row, gating_distance > gating_threshold] = gated_cost

最後將代價矩陣中大於max_distance的設定為max_distance(級接匹配中設為0.2) + 1e-5。

在第四幀中，餘弦距離得到的代價矩陣為

[[0.02467382 0.29672492 0.14992237 0.20593166 0.25746107]
 [0.27289903 0.01389802 0.2490201  0.26275396 0.18523771]
 [0.1549592  0.25630915 0.00923228 0.10906434 0.27596951]
 [0.26783013 0.19509423 0.26934785 0.24842238 0.01052856]]

計算馬氏距離，將馬氏距離作用於餘弦距離，將馬氏大於gating_threshold的餘弦代價設定為gated_cost(100000.0)。

然後得到的結果為

[[2.46738195e-02 1.00000000e+05 1.00000000e+05 1.00000000e+05
  1.00000000e+05]
 [1.00000000e+05 1.38980150e-02 1.00000000e+05 1.00000000e+05
  1.00000000e+05]
 [1.00000000e+05 1.00000000e+05 9.23228264e-03 1.00000000e+05
  1.00000000e+05]
 [1.00000000e+05 1.00000000e+05 1.00000000e+05 1.00000000e+05
  1.05285645e-02]]

代價矩陣中大於max_distance的設定為max_distance(級接匹配中設為0.2) + 1e-5，最終得到的代價矩陣為:

[[0.02467382 0.20001    0.20001    0.20001    0.20001   ]
 [0.20001    0.01389802 0.20001    0.20001    0.20001   ]
 [0.20001    0.20001    0.00923228 0.20001    0.20001   ]
 [0.20001    0.20001    0.20001    0.20001    0.01052856]]

然後將代價矩陣輸入到匈牙利演算法中求解。

deepsrot步驟如下

track劃分為uncomfirmed_track和comfirmed_track
confirmed_track和det進行級聯匹配
- 1.計算track和檢測結果的特徵餘弦距離cost matrix
- 2.計算馬氏距離，將馬氏距離作用與cost matrix，若馬氏距離中大於gating_threshold，cost matrix中相應的代價設定為gated_cost。
- 3.將const matrix中大於max_distance的設定為max_distance
- 4.匈牙利求解，刪除匹配值較大的結果。
- 根據track的time_since_update，迴圈1-4，併合並結果。
unconfirmed_track和級聯匹配中未能匹配並且time_since_update=1的track組成候選track，候選track和沒匹配的det進行iou匹配
- 對預測結果和檢測結果計算iou代價矩陣
- 匈牙利求解
合併級聯匹配和iou匹配結果。
對於最終匹配到track進行以下操作
- 卡爾曼更新
- 儲存邊框特徵
- hits+1
- time_since_update置0
- track狀態更新，判斷能夠將狀態設定為confirmed
對於最終未能匹配到的track進行以下操作
- 判斷保留還是刪除track，如果30幀沒能更新，就刪除。
對於最終未能匹配到的det建立新的track

整個流程如下圖

ref:

Deep Sort 演算法程式碼解讀

[SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP ASSOCIATION METRIC](https:/

DeepSort之原始碼解讀

相關文章