ECCV 2020 | Robust Re-Identification by Multiple Views Knowledge Distillation

小样本学习發表於2020-12-15

原文網址 : https://www.jiqizhixin.com/articles/2020-12-14-9

paper link :https://link.springer.com/chapter/10.1007%2F978-3-030-58607-2_6 code link: https://github.com/aimagelab/VKD.

文章目錄

Introducation

在這裡插入圖片描述

動機

V2V he I2V之間還存在較大的差距。　

As observed in [10], a large gap in Re-ID performance still subsists between V2V and I2V,

　VKD

we propose Views Knowledge Distillation (VKD), which transfers the knowledge lying in several views in a teacher-student fashion. VKD devises a two-stage procedure, which pins the visual variety as a teaching signal for a student who has to recover it using fewer views.

主要貢獻

i）學生的表現大大超過其老師，尤其是在“影像到影片”設定中；
ii）徹底的調查顯示，與老師相比，學生將更多的精力放在目標上，並且丟棄了無用的細節；
iii）重要的是，我們不將分析侷限於單個領域，而是在人，車輛和動物的Re-ID方面取得了出色的結果。

i) the student outperforms its teacher by a large margin, especially in the Image-To-Video setting;
ii) a thorough investigation shows that the student focuses more on the target compared to its teacher and discards uninformative details;
iii) importantly, we do not limit our analysis to a single domain, but instead achieve strong results on Person, Vehicle and Animal Re-ID.

Related works

Image-To-Video Re-Identification.
Knowledge Distillation

Method

在這裡插入圖片描述圖２VKD概述。學生網路被最佳化來在僅使用少量檢視的情況下模仿老師的行為。　

our proposal frames the training algorithm as a two-stage procedure, as follows

First step (Sect. 3.1): the backbone network is trained for the standard Video-To-Video setting.
Second step (Sect. 3.2): we appoint it as the teacher and freeze its parameters. Then, a new network with the role of the student is instantiated. As depicted in Fig. 2, we feed frames representing different views as input to the teacher and ask the student to mimic the same outputs from fewer frames.
第一步，用標準的Ｖ2V設定訓練骨幹網路。　第二步，固定老師網路的引數，初始化學生網路。如圖２所示，我們將表達不同檢視的幀餵給老師網路，並且叫學生網路根據少量的幀來模仿相同的輸出。

Teacher Network

用Imagenet初始化了網路的權重，還對架構做了少量的修改。

首先，我們拋棄了最後一個ReLU啟用函式和最終分類層，轉而使用BNNeck。第二：受益於細粒度的空間細節，最後一個殘差塊的步幅從2減少到1。　

Set Representation.

Here, we naively compute the set-level embedding $F (S)$ through a temporal average pooling. While we acknowledge better aggregation modules exist, we do not place our focus on devising a new one, but instead on improving the earlier features extractor.

Teacher Optimisation.

We train the base network - which will be the teacher during the following stage - combining a classification term $L_{C E}$ (cross-entropy) with the triplet loss $L_{T R}$ , The first can be formulated as: 在這裡插入圖片描述其中 \textbf{y} 和 $\hat{y}$ 分別表示one-shot 標籤和softmax輸出的標籤。 $L_{T R}$ 鼓勵特徵空間中的距離約束，將相同目標變得更近，不同目標變得更遠。形式化為：

在這裡插入圖片描述其中， $S_{p}$ 和 $S_{n}$ 分別為錨點 $S_{a}$ 在batch內的最強正錨點和負錨點。　

Views Knowledge Distillation (VKD)

Views Knowledge Distillation（VKD）透過迫使學生網路 $F_{θ_{S}} (\cdot)$ 來匹配教師網路 $F_{θ_{T}} (\cdot)$ 的輸出來解決問題。為此，我們１）允許教師網路從不同的視角訪問幀 ${\hat{S}}_{T} = ({\hat{s}}_{1}, {\hat{s}}_{2}, {\hat{s}}_{3}, . . ., {\hat{s}}_{N})$ ，２）強迫學生網路根據 ${\hat{S}}_{S} = ({\hat{s}}_{1}, {\hat{s}}_{2}, {\hat{s}}_{3}, . . ., {\hat{s}}_{M})$ 　來模仿教師網路的輸出。其中候選量Ｍ<N (在文章實驗中，Ｍ＝２，Ｎ＝８)．

Views Knowledge Distillation (VKD) stresses this idea by forcing a student network $F_{θ_{S}} (\cdot)$ to match the outputs of the teacher $F_{θ_{T}} (\cdot)$ . In doing so, we: i) allow the teacher to access frames ${\hat{S}}_{T} = ({\hat{s}}_{1}, {\hat{s}}_{2}, {\hat{s}}_{3}, . . ., {\hat{s}}_{N})$ from different viewpoints; ii) force the student to mimic the teacher output starting from a subset ${\hat{S}}_{S} = ({\hat{s}}_{1}, {\hat{s}}_{2}, {\hat{s}}_{3}, . . ., {\hat{s}}_{M})$ with cardinality ?<? (in our experiments, ?=2 and ?=8 ). The frames in ${\hat{S}}_{T}$ are uniformly sampled from ${\hat{S}}_{S}$ without replacement. This asymmetry between the teacher and the student leads to a self-distillation objective, where the latter can achieve better solutions despite inheriting the same architecture of the former.

VKD探索知識蒸餾損失為：在這裡插入圖片描述

In addition to fitting the output distribution of the teacher (Eq. 3), our proposal devises additional constraints on the embedding space learnt by the student. In details, VKD encourages the student to mirror the pairwise distances spanned by the teacher. Indicating with 在這裡插入圖片描述

he distance induced by the teacher between the i-th and j-th sets (the same notation $D_{S} [i, j]$ also holds for the student), VKD seeks to minimise: 在這裡插入圖片描述 where B equals the batch size.

因為教師模型可以使用多個檢視，因此我們人氣其空間中跨越的距離可以對相應的身份進行有力的描述。從學生模型的角度來看，距離保持可以提供其他語義資訊。因此，這保留了有效的監督訊號，由於學生可獲得的影像更少，因此其最佳化更具有挑戰。

Student Optimisation.

The VKD overall objective combines the distillation terms ( $L_{K D}$ and $L_{D P}$ ) with the ones optimised by the teacher - $L_{C E}$ and $L_{T R}$ - that promote higher conditional likelihood w.r.t. ground truth labels. To sum up, VKD aims at strengthening the features of a CNN in Re-ID settings through the following optimisation problem: 在這裡插入圖片描述

其中 $α$ 和 $β$ 是用來平衡貢獻的超引數。根據經驗，我們發現除了最後的卷積塊以外，從老師的權重開始是較好的，最後的卷積塊根據ImageNet預訓練進行重新初始化。我們認為，這代表了在探索新的配置和利用老師已經獲得的能力之間有了良好的折中。

Experience

資料集

Person Re-ID

MARS
Duke-Video-ReID

Vehicle Re-ID

VeRi-776

Animal Re-ID

Amur Tiger

Self-distillation

在這裡插入圖片描述 Table 1 reports the comparisons for different backbones: in the vast majority of the settings, the student outperforms its teacher.

在這裡插入圖片描述 As an additional proof, plots from Fig. 3 draw a comparison between models before and after distillation. VKD improves metrics considerably on all three dataset, as highlighted by the bias between the teachers and their corresponding students. Surprisingly, this often applies when comparing lighter students with deeper teachers: as an example, ResVKD-34 scores better than even ResNet-101 on VeRi-776, regardless of the number of images sampled for a gallery tracklet.

Comparison with State-Of-The-Art

Image-To-Video.

在這裡插入圖片描述

Tables 2, 3 and 4 report a thorough comparison with current state-of-the-art (SOTA) methods, on MARS, Duke and VeRi-776 respectively. As common practice [3, 10, 32], we focus our analysis on ResNet-50, and in particular on its distilled variants ResVKD-50 and ResVKD-50bam. Our method clearly outperforms other competitors, with an increase in mAP w.r.t. top-scorers of 6.3% on MARS, 8.6% on Duke and 5% on VeRi-776. This results is totally in line with our goal of conferring robustness when just a single image is provided as query. In doing so, we do not make any task-specific assumption, thus rendering our proposal easily applicable to both person and vehicle Re-ID.

Video-To-Video.

在這裡插入圖片描述

Analogously, we conduct experiments on the V2V setting and report results in Table 5 (MARS) and Table 6 (Duke)4. Here, VKD yields the following results: on the one hand, on MARS it pushes a baseline architecture as ResVKD-50 close to NVAN and STE-NVAN [22], the latter being tailored for the V2V setting. Moreover – when exploiting spatial attention modules (ResVKD-50bam) – it establishes new SOTA results, suggesting that a positive transfer occurs when matching tracklets also. On the other hand, the same does not hold true for Duke, where exploiting video features as in STA [8] and NVAN appears rewarding. We leave the investigation of further improvements on V2V to future works. As of today, our proposals is the only one guaranteeing consistent and stable results under both I2V and V2V settings.

Analysis on VKD

In the Absence of Camera Information.

在這裡插入圖片描述

Distilling Viewpoints vs time.

在這裡插入圖片描述

VKD Reduces the Camera Bias.

在這裡插入圖片描述

Can Performance of the Student be Obtained Without Distillation?

在這裡插入圖片描述

Student Explanation.

在這裡插入圖片描述

Cross-distillation.

在這裡插入圖片描述

On the Impact of Loss Terms.

在這裡插入圖片描述

Conclusion

有效的Re-ID方法要求視覺描述符對背景外觀和視點的變化均具有魯棒性。此外，即使對於由單個影像組成的查詢，也應確保其有效性。為了實現這些目標，我們提出了Views Knowledge Distillationl（VKD），這是一種teacher-student方法，學生只能觀察一小部分輸入檢視。這種策略鼓勵學生發現更好的表現形式：因此，在訓練結束時，它的表現優於老師。重要的是，VKD在各種領域（人，車輛和動物）上都表現出了強大的魯棒性，遠遠超過了I2V領域的最新水平。由於進行了廣泛的分析，我們著重指出，學生表現出對目標的更強聚焦，並減少了相機偏差。

Ensemble distillation for robust model fusion in federated learning論文筆記
2020-11-20
筆記
ESD Related knowledge
2024-09-05
[2020CVPR]Hierarchical Clustering with Hard-batch Triplet Loss for Person Re-identification
2020-11-10
BATIDE
(AAAI2020 Yao) Graph Few-shot Learning via knowledge transfer
2022-05-14
AI
8.1.1 V$ Views
2020-05-04
View
8.1.2 GV$ Views
2020-05-04
View
MODEL COMPRESSION VIA DISTILLATION AND QUANTIZATION翻譯
2024-06-23
2D目標檢測綜述 2020 CVPR ECCV
2020-09-30
Feature Mask Network for Person Re-identification
2020-12-17
IDE
論文翻譯：2020_A Robust and Cascaded Acoustic Echo Cancellation Based on Deep Learning
2022-01-06
G. Gifts from Knowledge
2024-11-19
閱讀論文：《MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering》
2022-03-30
ECCV 2020 Spotlight | 多標籤長尾識別前沿進展
2020-08-31
Oracle's V$ Views(轉)
2019-05-14
OracleView
JavaScript select multiple
2018-10-27
JavaScript
Logstash Multiple Pipelines
2019-06-28
robust 熱修復實踐
2018-06-29
【AP】a pratical guide to robust optimization（1）
2020-10-02
GUIIDE
論文解讀（IGSD）《Iterative Graph Self-Distillation》
2022-04-28
Small Multiple（最短路）
2018-08-03
2.3.6.2 Synchronization of Multiple Applications
2020-03-16
APP
LLM multiple modal applications
2024-09-17
APP
kubernetes traefik multiple namespaces
2018-04-11
namespace
SAVIOR Securing Autonomous Vehicles with Robust Physical Invariants
2020-10-24
Robust Loop Closure by Textual Cues in Challenging Environments
2024-10-24
OOP
CPNDet：粗暴地給CenterNet加入two-stage精調，更快更強 | ECCV 2020
2021-01-18
Multiple Books多賬薄
2020-04-20
POJ1426-Find The Multiple
2024-09-12
ECCV 2020 GigaVision挑戰賽雙賽道冠軍DeepBlueAI團隊技術分享
2020-09-01
AI
HITSC_1_Views and Quality Objectives of Software Construction
2024-05-25
ViewObjectStruct
Android熱補丁之Robust（三）坑和解
2019-02-28
Android
Lowest Common Multiple Plus hd 2028
2020-04-06
ASP.NET Core MVC 之檢視（Views）
2019-07-22
ASP.NETMVCView
DKT模型及其TensorFlow實現（Deep knowledge tracing with Tensorflow）
2021-12-25
模型
Service Cloud 零基礎（二）Knowledge淺談
2020-12-15
Cloud
《Graph Representation Learning》【4】——Multi-relational Data and Knowledge Graphs
2020-11-27
Deep Robust Multi-Robot Re-localisation in Natural Environments
2024-10-27
論文解讀（GROC）《Towards Robust Graph Contrastive Learning》
2022-04-24
AST

ECCV 2020 | Robust Re-Identification by Multiple Views Knowledge Distillation

文章目錄

Introducation

動機

VKD

主要貢獻

Related works

Method

Teacher Network

Set Representation.

Teacher Optimisation.

Views Knowledge Distillation (VKD)

Student Optimisation.

Experience

資料集

Person Re-ID

Vehicle Re-ID

Animal Re-ID

Self-distillation

Comparison with State-Of-The-Art

Image-To-Video.

Video-To-Video.

Analysis on VKD

In the Absence of Camera Information.

Distilling Viewpoints vs time.

VKD Reduces the Camera Bias.

Can Performance of the Student be Obtained Without Distillation?

Student Explanation.

Cross-distillation.

On the Impact of Loss Terms.

Conclusion

相關文章

　VKD