深度學習時代的多源域適應 : 系統的 Survey

Phoenixtree_Zhao發表於2020-12-22

Multi-source Domain Adaptation in the Deep Learning Era: A Systematic Survey

[paper]

 

Abstract

In many practical applications, it is often difficult and expensive to obtain enough large-scale labeled data to train deep neural networks to their full capability. Therefore, transferring the learned knowledge from a separate, labeled source domain to an unlabeled or sparsely labeled target domain becomes an appealing alternative. However, direct transfer often results in significant performance decay due to domain shift. Domain adaptation (DA) addresses this problem by minimizing the impact of domain shift between the source and target domains. Multi-source domain adaptation (MDA) is a powerful extension in which the labeled data may be collected from multiple sources with different distributions. Due to the success of DA methods and the prevalence of multi-source data, MDA has attracted increasing attention in both academia and industry. In this survey, we define various MDA strategies and summarize available datasets for evaluation. We also compare modern MDA methods in the deep learning era, including latent space transformation and intermediate domain generation. Finally, we discuss future research directions for MDA.

MDA 概念的介紹:

在許多實際應用中,獲取足夠的大規模標記資料來訓練深度神經網路,使其充分發揮其功能往往是困難和昂貴的。因此,將學到的知識從一個單獨的、標記的源領域轉移到一個未標記或標記稀疏的目標領域成為一個有吸引力的替代方案。然而,直接傳輸常常會由於域轉移而導致顯著的效能下降。領域自適應 (DA) 通過最小化源領域和目標領域之間的領域轉移的影響來解決這個問題。多源領域適應 (MDA) 是一個強大的擴充套件,它可以從不同分佈的多個源收集標記資料。由於DA方法的成功和多源資料的流行,MDA 越來越受到學術界和業界的關注。

本文的工作內容:

在本調查中,本文歸納了各種 MDA 策略,並總結了可用的資料集以進行評估。本文還比較了深度學習的 MDA 方法,包括潛變數空間變換和中間域生成。最後,對 MDA 今後的研究方向進行了展望。

 

Background and Motivation

The availability of large-scale labeled training data, such as ImageNet, has enabled deep neural networks (DNNs) to achieve remarkable success in many learning tasks, ranging from computer vision to natural language processing. For example, the classification error of the “Classification + localization with provided training data” task in the Large Scale Visual Recognition Challenge has reduced from 0.28 in 2010 to 0.0225 in 2017 , outperforming even human classification. However, in many practical applications, obtaining labeled training data is often expensive, time-consuming, or even impossible. For example, in fine-grained recognition, only the experts can provide reliable labels [Gebru et al., 2017]; in semantic segmentation, it takes about 90 minutes to label each Cityscapes image [Cordts et al., 2016]; in autonomous driving, it is difficult to label point-wise 3D LiDAR point clouds [Wu et al., 2019].

領域自適應要解決的實際應用問題:實際應用中,資料標記是非常複雜和困難的。列舉了 細粒度識別(標籤需要專家級別的工作人員)、Cityspaces(90分鐘標記一張語義分割)和 3D LiDAR(畫素級點雲標記太困難了)。

 

One potential solution is to transfer a model trained on a separate, labeled source domain to the desired unlabeled or sparsely labeled target domain. But as Figure 1 demonstrates, the direct transfer of models across domains leads to poor performance. Figure 1(a) shows that even for the simple task of digit recognition, training on the MNIST source domain [LeCun et al., 1998] for digit classification in the MNIST-M target domain [Ganin and Lempitsky, 2015] leads to a digit classification accuracy decrease from 96.0% to 52.3% when training a LeNet-5 model [LeCun et al., 1998]. Figure 1(b) shows a more realistic example of training a semantic segmentation model on a synthetic source dataset GTA [Richter et al., 2016] and conducting pixel-wise segmentation on a real target dataset Cityscapes [Cordts et al., 2016] using the FCN model [Long et al., 2015a]. If we train on the real data, we obtain a mean intersection-over-union (mIoU) of 62.6%; but if we train on synthetic data, the mIoU drops significantly to 21.7%.

Figure 1: An example of domain shift in the single-source scenario. The models trained on the labeled source domain do not perform well when directly transferring to the target domain.

領域自適應要面臨的實際演算法問題:為了解決標記難的問題,能想到的方法就是,在有標籤的域(資料集)上訓練,然後在目標域(另一個資料集)上測試,這種方法可以看做是直接的目標遷移方式。但是直接遷移會導致效能下降。如圖 1 所示(a),在 MNIST 域上訓練的模型,在 MNIST-M 上測試的準確度為 52.3,遠低於在 MNIST-M 訓練的模型在該域的準確度 96.0。在圖 1 (b)中的語義分割中,亦是如此。
 

The poor performance from directly transferring models across domains stems from a phenomenon known as domain shift [Torralba and Efros, 2011; Zhao et al., 2018b]: whereby the joint probability distributions of observed data and labels are different in the two domains. Domain shift exists in many forms, such as from dataset to dataset, from simulation to real-world, from RGB images to depth, and from CAD models to real images.、

解釋圖 1 問題的原因:

跨域直接傳輸模型的糟糕效能源於一種稱為域遷移的現象 [Torralba和Efros, 2011;Zhao et al.,2018b]:據此觀測資料和標籤在兩個域的聯合概率分佈不同。領域遷移存在多種形式,如從資料集到資料集,從模擬到真實世界,從 RGB 影像到 深度,從 CAD 模型到真實影像。

 

The phenomenon of domain shift motivates the research on domain adaptation (DA), which aims to learn a model from a labeled source domain that can generalize well to a different, but related, target domain. Existing DA methods mainly focus on the single-source scenario. In the deep learning era, recent single-source DA (SDA) methods usually employ a conjoined architecture with two approaches to respectively represent the models for the source and target domains. One approach aims to learn a task model based on the labeled source data using corresponding task losses, such as cross-entropy loss for classification. The other approach aims to deal with the domain shift by aligning the target and source domains. Based on the alignment strategies, deep SDA methods can be classified into four categories:

1. Discrepancy-based methods try to align the features by explicitly measuring the discrepancy on corresponding activation layers, such as maximum mean discrepancy (MMD) [Long et al., 2015b], correlation alignment [Sun et al., 2017], and contrastive domain discrepancy [Kang et al., 2019].

2. Adversarial generative methods generate fake data to align the source and target domains at pixel-level based on Generative Adversarial Network (GAN) [Goodfellow et al., 2014] and its variants, such as CycleGAN [Zhu et al., 2017; Zhao et al., 2019b].

3. Adversarial discriminative methods employ an adversarial objective with a domain discriminator to align the features [Tzeng et al., 2017; Tsai et al., 2018].

4. Reconstruction based methods aim to reconstruct the target input from the extracted features using the source task model [Ghifary et al., 2016].

單一源域領域自適應介紹和分類:

1. Discrepancy-based methods:基於離散的方法試圖通過顯式測量相應啟用層上的差異來對齊特徵,例如最大平均差異 (MMD) [Long et al., 2015b]、相關對齊 [Sun et al., 2017] 和對比域差異 [Kang et al., 2019]。

2. Adversarial generative methods:對抗生成方法基於生成對抗網路 (GAN) [Goodfellow et al., 2014] 及其變體,如 CycleGAN [Zhu et al., 2017; Zhao et al.,2019b]。

3. Adversarial discriminative methods:對抗判別方法利用一個對抗目標和一個域判別器來對齊特徵 [Tzeng et al.,2017; Cai et al.,2018]。

4. Reconstruction based methods:基於重構的方法旨在利用源任務模型從提取的特徵中重構目標輸入 [Ghifary et al.,2016]。

 

In practice, the labeled data may be collected from multiple sources with different distributions [Sun et al., 2015; Bhatt et al., 2016]. In such cases, the aforementioned SDA methods could be trivially applied by combining the sources into a single source: an approach we refer to as sourcecombined DA. However, source-combined DA oftentimes results in a poorer performance than simply using one of the sources and discarding the others. As illustrated in Figure 2, the accuracy on the best single source digit recognition adaptation using DANN [Ganin et al., 2016] is 71.3%, while the source-combined accuracy drops to 70.8%. For segmentation adaptation using CyCADA [Hoffman et al., 2018b], the mIoU of source-combined DA (37.3%) is also lower than that of SDA from GTA (38.7%). Because the domain shift not only exists between each source and target, but also exists among different sources, the source-combined data from different sources may interfere with each other during the learning process [Riemer et al., 2019]. Therefore, multi-source domain adaptation (MDA) is needed in order to leverage all of the available data.

Figure 2: An example of domain shift in the multi-source scenario. Combining multiple sources into one source and directly performing single-source domain adaptation on the entire dataset does not guarantee better performance compared to just using the best individual source domain.

實際中,被標籤的資料集(源域)可能不止一個哦,這就誕生了 多源域領域自適應,MDA。

多源域,是不是就簡單地混合訓練多個域(將多種資料集混合在一起,作為訓練資料集)訓練模型,就可以了呢?直覺地,混合域下訓練的模型具備更泛化的能力,但實驗結果並非如此。

以數字識別圖 2(a)為例,分別在四個有標籤的源域上分別訓練出 4 個模型,然後分別在 MNIST-M 目標域上進行測試,最好的一個獲得 71.3 的準確性;當把這四個源域混合起來,訓練出一個新的模型,直覺地,這個混合訓練出來的模型泛化能力應該更強,然而其在目標域上的測試,反不如單源訓練最好的一組。這就說明,想通過簡單混合資料集的方法,並不可行。

語義分割的例子也說明了這點。

因此,可以看出,域遷移是一個非常重要的研究領域。

 

The early MDA methods mainly focus on shallow models [Sun et al., 2015], either learning a latent feature space for different domains [Sun et al., 2011; Duan et al., 2012] or combining pre-learned source classifiers [Schweikert et al., 2009]. Recently, the emphasis on MDA has shifted to deep learning architectures. In this paper, we systematically survey recent progress on deep learning based MDA, summarize and compare similarities and differences in the approaches, and discuss potential future research directions.

早期的 MDA 方法主要針對淺層模型 [Sun et al., 2015],要麼學習不同域的潛特徵空間 [Sun et al., 2011;Duan et al,2012 ] 或 結合預先學習的源分類器 [Schweikert et al.,2009]。最近,MDA 的重點已經轉移到深度學習架構上。本文系統地綜述了基於 MDA 的深度學習研究的最新進展,總結比較了各種方法的異同,並對未來可能的研究方向進行了探討。

 

Problem Definition

在一個經典的 MDA 中,定義如下幾個符號:

多源域:multiple source domains ,M 表示源域的總個數;

源域中的樣本為:,X 是資料,Y 是標籤(GT);i 表示第 i 個域;N 表示 該域中樣本個數;j 表示該域中的第 j 個樣本;

目標域中的樣本: 和 ;N 表示 該域中樣本個數。

 

MDA 的分類學方法:

1. 估計目標域中,被標記樣本的個數,可分為 無監督,監督和半監督

Suppose the number of labeled target samples is N_{TL}, the MDA problem can be classified into different categories:

unsupervised MDA, when N_{TL} = 0;

fully supervised MDA, when N_{TL} = NT ;

semi-supervised MDA, otherwise.

 

2. 根據源域是否來自相同域,可分為 同構和異構 MDA:

Suppose are an observation in source Si and target T, we can classify MDA into:

homogeneous MDA, when d1 = · · · = dM = dT ;

heterogeneous MDA, otherwise.

 

3. 根據 源域中標籤 是否在 目標域 中出現,可分為 closed set,open set,partialuniversal MDA 四類:

Suppose Ci and CT are the label set for source Si and target T, we can define different MDA strategies:

closed set MDA, when C1 = · · · = CM = CT ;

open set MDA, for at least one Ci , Ci ∩ CT ⊂ CT ;   

partial MDA, for at least one Ci , CT ⊂ Ci ;

universal MDA, when no prior knowledge of the label sets is available; 

where ∩ and ⊂ indicate the intersection set and proper subset between two sets

 

4. 根據源域中樣本與標記樣本個數是否相同,可分為 強監督和弱監督 MDA:

Suppose the number of labeled source samples is N_{iL} for source Si , the MDA problem can be classified into:

strongly supervised MDA, when N_{iL}=N_i for i = 1 · · · M;

weakly supervised MDA, otherwise

 

When adapting to multiple target domains simultaneously, the task becomes multi-target MDA. When the target data is unavailable during training [Yue et al., 2019], the task is often called multi-source domain generalization or zero-shot MDA.

當同時適應多個目標域時,任務就變成了多目標MDA。當訓練期間目標資料不可用時 [Yue et al., 2019],該任務通常被稱為多源域泛化或 zero-shot MDA

 

 

Datasets

這部分先略過不講了,一方面這節內容易懂,沒啥可講;另外,根據自己研究相關領域檢視即可,沒必要通讀。可參考原文:https://arxiv.org/pdf/2002.12169.pdf

直接跳到演算法介紹章節吧。

 

Deep Multi-source Domain Adaptation

Existing methods on deep MDA primarily focus on the unsupervised, homogeneous, closed set, strongly supervised, one target, and target data available settings. That is, there is one target domain, the target data is unlabeled but available during the training process, the source data is fully labeled, the source and target data are observed in the same data space, and the label sets of all sources and the target are the same. In this paper, we focus on MDA methods under these settings.

現有的深層 MDA 方法主要集中於無監督、同構、封閉集、強監督、單目標和目標資料有標籤。即存在一個目標域,在訓練過程中,目標資料未標記但可用,源資料被完全標記,源資料和目標資料在同一資料空間中觀察,所有源資料和目標的標籤集相同。本文只涉及上述幾種 MDA 方法。

 

There are some theoretical analysis to support existing MDA algorithms. Most theories are based on the seminal theoretical model [Blitzer et al., 2008; Ben-David et al., 2010]. Mansour et al. [2009] assumed that the target distribution can be approximated by a mixture of the M source distributions. Therefore, weighted combination of source classifiers has been widely employed for MDA. Moreover, tighter cross domain generalization bound and more accurate measurements on domain discrepancy can provide intuitions to derive effective MDA algorithms. Hoffman et al. [2018a] derived a novel bound using DC-programming and calculated more accurate combination weights. Zhao et al. [2018a] extended the generalization bound of seminal theoretical model to multiple sources under both classification and regression settings. Besides the domain discrepancy between the target and each source [Hoffman et al., 2018a; Zhao et al., 2018a], Li et al. [2018] also considered the relationship between pairwise sources and derived a tighter bound on weighted multi-source discrepancy. Based on this bound, more relevant source domains can be picked out.

本文對現有的 MDA 演算法進行了理論分析。大多數理論都是基於開創性的理論模型 [Blitzer et al., 2008;Ben-David et al.,2010]。Mansour et al.[2009] 假設目標分佈可以通過混合的 M 源分佈來近似。因此,加權的源分類器組合已被廣泛應用於 MDA 分類。此外,更緊密的跨域泛化界和更精確的域差異度量可以為推導有效的 MDA 演算法提供直觀的依據。Hoffman et al.[2018a] 使用 DC-programming 推匯出一個新的界,並計算出更精確的組合權值。Zhao et al.[2018a] 將開創性理論模型的泛化範圍擴充套件到分類和迴歸設定下的多個來源。除了目標和每個源之間的域差異之外 [Hoffman et al.,2018a;Zhao et al.,2018a], Li et al.[2018] 也考慮了兩兩源之間的關係,推導了加權多源差異的更緊界。基於這個邊界,可以挑選出更多相關的源域。

 

Typically, some task models (e.g. classifiers) are learned based on the labeled source data with corresponding task loss, such as cross-entropy loss for classification. Meanwhile, specific alignments among the source and target domains are conducted to bridge the domain shift so that the learned task models can be better transferred to the target domain. Based on the different alignment strategies, we can classify MDA into different categories. Latent space transformation tries to align the latent space (e.g. features) of different domains based on optimizing the discrepancy loss or adversarial loss. Intermediate domain generation explicitly generates an intermediate adapted domain for each source that is indistinguishable from the target domain. The task models are then trained on the adapted domain. Figure 3 summarizes the common overall framework of existing MDA methods.

Figure 3: Illustration of widely employed framework for MDA. The solid arrows and dashed dot arrows indicate the training of latent space transformation and intermediate domain generation, respectively. The dashed arrows represent the reference process. Most existing MDA methods can be obtained by employing different component details, enforcing some constraints, or slightly changing the architecture. Best viewed in color.

通常,一些任務模型 (如分類器) 是基於標記好的源資料進行學習的,這些資料具有相應的任務損失,如交叉熵損失用於分類。同時,對源領域和目標領域進行具體的對齊,以橋接領域遷移,使學習到的任務模型能夠更好地轉移到目標領域。根據不同的對齊策略,可以將 MDA 劃分為不同的類別。隱含層空間變換 試圖通過優化差異損失或對抗損失來對齊不同域的潛空間(如特徵)。中間域生成 顯式地為每個與目標域不可區分的源生成一箇中間適應域。然後,任務模型在適應的領域上進行訓練。圖 3 總結了現有 MDA 方法的通用總體框架。

 

Latent Space Transformation

The two common methods for aligning the latent spaces of different domains are discrepancy-based methods and adversarial methods. We discuss these two methods below, and Table 2 summarizes key examples of each method.

對齊不同域隱含在空間的兩種常用方法是基於離散的方法對抗的方法。下面我討論這兩種方法,表 2 總結了每種方法的關鍵示例。

 

Discrepancy-based methods explicitly measure the discrepancy of the latent spaces (typically features) from different domains by optimizing some specific discrepancy losses, such as maximum mean discrepancy (MMD) [Guo et al., 2018; Zhu et al., 2019], Renyi-divergence ´ [Hoffman et al., 2018a], L2 distance [Rakshit et al., 2019], and moment distance [Peng et al., 2019]. Guo et al. [2020] claimed that different discrepancies or distances can only provide specific estimates of domain similarities and that each distance has its pathological cases. Therefore, they consider the mixture of several distances [Guo et al., 2020], including L2 distance, Cosine distance, MMD, Fisher linear discriminant, and Correlation alignment. Minimizing the discrepancy to align the features among the source and target domains does not introduce any new parameters that must be learned.

基於離散的方法通過優化一些特定的差異損失,顯式地測量來自不同域的隱含空間 (典型特徵) 的差異,如最大平均差異(MMD) [Guo et al., 2018;Renyi-divergence [Hoffman et al., 2018a], L2 距離[Rakshit et al., 2019], moment distance [Peng et al., 2019]。Guo et al.[2020] 認為不同的差異或距離只能提供域相似性的具體估計,每個距離都有其病態情況。因此,他們考慮了幾種距離的混合[Guo et al., 2020],包括 L2 距離、餘弦距離、MMD、Fisher 線性判別和相關對齊。最小化差異以對齊源和目標域之間的特性不會引入任何必須學習的新引數

 

Adversarial methods try to align the features by making them indistinguishable to a discriminator. Some representative optimized objectives include GAN loss [Xu et al., 2018], H-divergence [Zhao et al., 2018a], Wasserstein distance [Li et al., 2018; Wang et al., 2019; Zhao et al., 2020]. These methods aim to confuse the discriminator’s ability to distinguishing whether the features from multiple sources were drawn from the same distribution. Compared with GAN loss and H-divergence, Wasserstein distance can provide more stable gradients even when the target and source distributions do not overlap [Zhao et al., 2020]. The discriminator is often implemented as a network, which leads to new parameters that must be learned.

There are many modular implementation details for both types of methods, such as how to align the target and multiple sources, whether the feature extractors are shared, how to select the more relevant sources, and how to combine the multiple predictions from different classifiers.

對抗方法試圖通過使特徵與判別器無法區分來對齊特徵。一些有代表性的優化目標包括 GAN loss [Xu et al.,2018]、H-divergence [Zhao et al.,2018a]、Wasserstein distance [Li et al.,2018;Wang et al.,2019;Zhao et al.,2020]。這些方法的目的是混淆判別器區分來自多個源的特徵是否來自同一分佈的能力。與 GAN 損失和 H 散度相比,即使目標和源分佈不重疊,Wasserstein 距離也能提供更穩定的梯度 [Zhao et al., 2020]。判別器通常被實現為一個網路,這導致了必須學習的新引數

 

Alignment domains. There are different ways to align the target and multiple sources. The most common method is to pairwise align the target with each source [Xu et al., 2018; Guo et al., 2018; Zhao et al., 2018a; Hoffman et al., 2018a; Zhu et al., 2019; Zhao et al., 2020; Guo et al., 2020]. Since domain shift also exists among different sources, several methods enforce pairwise alignment between every domain in both the source and target domains [Li et al., 2018; Rakshit et al., 2019; Peng et al., 2019; Wang et al., 2019].

對齊域。有不同的方法來對齊目標和多個源。最常見的方法是將目標與每個源成對對齊 [Xu et al.,2018;Guo et al.,2018;Zhao et al.,2018a;霍夫曼 et al.,2018a;Zhu et al.,2019;Zhao et al.,2020;Guo et al.,2020]。由於不同源之間也存在域移動,有幾種方法強制源和目標域中每個域之間的成對對齊 [Li et al.,2018;Rakshit et al.,2019年;Peng et al., 2019;Wang et al.,2019]。

 

Weight sharing of feature extractor. Most methods employ shared feature extractors to learn domain-invariant features. However, domain invariance may be detrimental to discriminative power. On the contrary, Rakshit et al. [2019] adopted one feature extractor for each source and target pair with unshared weights, while Zhao et al. [2020] first pretrained one feature extractor for each source and then mapped the target into the feature space of each source. Correspondingly, there are M and 2M feature extractors. Although unshared feature extractors can better align the target and sources in the latent space, this substantially increases the number of parameters in the model.

特徵提取器的權重共享。大多數方法採用共享特徵提取器來學習域不變特徵。然而,域不變性可能不利於區分能力的發揮。相反,Rakshit et al. [2019]對每個源和目標對採用一個權值不共享的特徵提取器,而 Zhao et al. [2020]首先為每個源預先訓練一個特徵提取器,然後將目標對映到每個源的特徵空間。對應的特徵提取器有 M 個和 2M 個。儘管非共享特徵提取器可以更好地在潛在空間中對齊目標和源,但這大大增加了模型中的引數數量。

 

Classifier alignment. Intuitively, the classifiers trained on different sources may result in misaligned predictions for the target samples that are close to the domain boundary. By minimizing specific classifier discrepancy, such as L1 loss [Zhu et al., 2019; Peng et al., 2019], the classifiers are better aligned, which can learn a generalized classification boundary for target samples mentioned above. Instead of explicitly training one classifier for each source, many methods focus on training a compound classifier based on specific combined task loss, such as normalized activations [Mancini et al., 2018] and bandit controller [Guo et al., 2020].

分類器對齊。直觀上,訓練於不同來源的分類器可能會導致對接近領域邊界的目標樣本的預測不一致。通過最小化特定分類器差異,如 L1 丟失 [Zhu et al., 2019;Peng et al., 2019],分類器更好地對齊,可以學習上述目標樣本的廣義分類邊界。許多方法不是為每個源明確訓練一個分類器,而是基於特定的組合任務丟失訓練一個複合分類器,如規範化啟用 [Mancini et al., 2018] 和 bandit 控制器 [Guo et al., 2020]。

 

Target prediction. After aligning the features of target and source domains in the latent space, the classifiers trained based on the labeled source samples can be used to predict the labels of a target sample. Since there are multiple sources, it is possible that they will yield different target predictions. One way to reconcile these different predictions is to uniformly average the predictions from different source classi- fiers [Zhu et al., 2019]. However, different sources may have different relationships with the target, e.g. one source might better align with the target, so a non-uniform, weighted averaging of the predictions leads to better results. Weighting strategies, known as a source selection process, include uniform weight [Zhu et al., 2019], perplexity score based on adversarial loss [Xu et al., 2018], point-to-set (PoS) metric using Mahalanobis distance [Guo et al., 2018], relative error based on source-only accuracy [Peng et al., 2019], and Wasserstein distance based weights [Zhao et al., 2020].

目標預測。將潛在空間中目標域和源域的特徵對齊後,基於標記源樣本訓練的分類器可以用來預測目標樣本的標籤。由於有多個來源,它們可能會產生不同的目標預測。調和這些不同預測的一種方法是統一平均來自不同來源分類器的預測 [Zhu et al., 2019]。然而,不同的來源可能與目標有不同的關係,例如,一個來源可能更好地與目標保持一致,因此對預測進行不一致的加權平均可以得到更好的結果。權重策略,稱為源選擇過程,包括統一的體重 (Zhu et al ., 2019),困惑得分基於對抗的損失(徐et al ., 2018), point-to-set (PoS) 度量使用 Mahalanobis 距離 (Guo et al ., 2018),基於單一的相對誤差精度 (Peng et al ., 2019) 和瓦瑟斯坦距離權重 (Zhao et al ., 2020)。

 

Besides the source importance, Zhao et al. [2020] also considered the sample importance, i.e. different samples from the same source may still have different similarities from the target samples. The source samples that are closer to the target are distilled (based on a manually selected Wasserstein distance threshold) to fine-tune the source classifiers. Automatically and adaptively selecting the most relevant training samples for each source remains an open research problem.

除了源的重要性,Zhao et al.[2020] 還考慮了樣本的重要性,即來自同一源的不同樣本與目標樣本的相似度可能仍然不同。接近目標的源樣本被蒸餾(基於一個手動選擇的 Wasserstein 距離閾值)來微調源分類器。自動地、自適應地為每個源選擇最相關的訓練樣本仍然是一個開放的研究問題。

 

Intermediate Domain Generation

Feature-level alignment only aligns high-level information, which is insufficient for fine-grained predictions, such as pixel-wise semantic segmentation [Zhao et al., 2019a]. Generating an intermediate adapted domain with pixel-level alignment, typically via GANs [Goodfellow et al., 2014], can help address this problem.

特徵級對齊只對高層次資訊進行對齊,這對於細粒度預測是不夠的,比如畫素級語義分割 [Zhao et al.,2019a]。生成畫素級對齊的中間自適應域,通常是通過GANs [Goodfellow et al.,2014],可以幫助解決這個問題。

Domain generator. Since the original GAN is highly under-constrained, some improved versions are employed, such as Coupled GAN (CoGAN) in [Russo et al., 2019] and CycleGAN in MADAN [Zhao et al., 2019a]. Instead of directly taking the original source data as input to the generator [Russo et al., 2019; Zhao et al., 2019a], Lin et al. [2020] used a variational autoencoder to map all source and target domains to a latent space and then generated an adapted domain from the latent space. Russo et al. [2019] then tried to align the target and each adapted domain, while Lin et al. [2020] aligned the target and combined adapted domain from the latent space. Zhao et al. [2019a] proposed to aggregate different adapted domains using a sub-domain aggregation discriminator and cross-domain cycle discriminator, where the pixel-level alignment is then conducted between the aggregated and target domains. Zhao et al. [2019a] and Lin et al. [2020] showed that the semantics might change in the intermediate representation, and that enforcing a semantic consistency before and after generation can help preserve the labels.

域生成器。由於原始GAN高度受限,採用了一些改進版本,如 Russo et al.,2019年的耦合 GAN (CoGAN) 和 MADAN 的 CycleGAN [Zhao et al.,2019a]。而不是直接將原始源資料作為生成器的輸入[Russo et al.,2019;Zhao et al.,2019a], Lin et al.[2020]使用變分自編碼器將所有源和目標域對映到一個潛在空間,然後從潛在空間生成一個自適應域。Russo et al.[2019] 試圖將目標與每個自適應域對齊,而 Lin et al.[2020] 則將目標對齊並從潛在空間合併自適應域。Zhao et al.[2019a] 提出使用子域聚合判別器和跨域週期判別器聚合不同的適應域,然後在聚合域和目標域之間進行畫素級對齊。Zhao et al.[2019a] 和 Lin et al.[2020]表明,中間表示的語義可能會發生變化,在生成前後加強語義一致性有助於保留標籤。

 

Feature alignment and target prediction. Feature-level alignment is often jointly considered with pixel-level alignment. Both alignments are usually achieved by minimizing the GAN loss with a discriminator. One classifier is trained on each adapted domain [Russo et al., 2019] and the multiple predictions for a given target sample are averaged. Only one classifier is trained on the aggregated domain [Zhao et al., 2020] or on the combined adapted domain [Lin et al., 2020] which is obtained by a unique generator from the latent space for all source domains. The comparison of these methods are summarized in Table 3.

特徵對齊與目標預測。特徵級對齊通常與畫素級對齊一起考慮。這兩種校準通常都是通過利用判別器使GAN損耗最小化來實現的。在每個適應域上訓練一個分類器 [Russo et al.,2019],並對給定目標樣本的多個預測進行平均。在聚集域 [Zhao et al., 2020] 或組合適應域 [Lin et al., 2020] 上只訓練一個分類器。組合適應域是由一個唯一的發生器從所有源域的潛在空間中獲得的。表3總結了這些方法的比較結果。

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

相關文章