Paper information

Titile：Deep Fusion Clustering Network

Authors：Wenxuan Tu, Sihang Zhou, Xinwang Liu, Xifeng Guo, Zhiping Cai, En Zhu, Jieren Cheng

Sources：2020, AAAI

Code：Download

Paper：Download

Others：4 Citations, 41 References

Abstract

　　The disadvantage of former literure.

- Lacks a dynamic fusion mechanism to selectively integrate and refine the information of graph structure and node attributes for consensus representation learning;
- fails to extract information from both sides for robust target distribution generation.

1 Introduction

　　Deep cluster method can be divide into five categories：

- Subspace clustering-based methods
- Generative adversarial network-based methods
- Spectral clustering-based methods
- Gaussian mixture model-based methods
- Self-optimizing-based methods.（DFCN的型別）

　　Recent literature shows a strong tendency in extracting geometrical structure information and then integrates it with attribute information for representation learning.（將結構資訊與特徵資訊相融合）

　　We fund that the formaer efforts have some problem：

The existing methods lack an crossmodality dynamic information fusion and processing mechanism.
The generation of the target distribution in existing literature has seldom used information from both sources.

　　To tackle the above issues, we propose a deep fusion clustering network (DFCN).

　　The main idea of our solution is to design a dynamic information fusion module to finely process the attribute and structure information extracted from autoencoder (AE) and graph autoencoder (GAE) for a more comprehensive and accurate representation construction.

　　Step of DFCN method：

- Firstly, we integrate two kinds of sample embeddings in both the perspective of local and global level for consensus representation learning.
- After that, by estimating the similarity between sample points and pre-calculated cluster centers in the latent embedding space with Students’ t-distribution, we acquire more precise target distribution.
- Finally, we design a triplet selfsupervision mechanism which uses the target distribution to provide more dependable guidance for AE, GAE, and the information fusion part simultaneously.

　　We develop an improved graph autoencoder (IGAE) with a symmetric structure and reconstruct the adjacency matrix with both the latent representations and the feature representations reconstructed by the graph decoder.(我們開發了一種改進的具有對稱結構的圖自動編碼器(IGAE)，並利用圖解碼器重構的潛在表示和特徵表示重構了鄰接矩陣。)

　　 The key contributions of this paper are listed as follows:

- Propose a deep fusion clustering network (DFCN).
- Develop a symmetric graph autoencoder.
- Extensive experiment results on six public benchmark datasets have get SOTA.

2 Related Work

2.1 Attributed Graph Clustering

GCN-based clustering methods that jointly learn graph structure and node attributes.
Graph autoencoder (GAE) and variational graph autoencoder (VGAE) are proposed to integrate graph structure into node attributes via iteratively aggregating neighborhood representations around each central node.
SDCN proves that integrate autoencoder and GCN module for better representation learning.
- Autoencoder can help provide complementary attribute information and help relieve the over-smoothing phenomenon of GCN modul.
- GCN module provides high-order structure information to auto encoder.

　　由於 SDCN 將 GCN model 作為一個正則化技巧，並沒有充分利用 self-optimizing network （應該是指AE部分）學習到的表示做修正。所以本文提出一種融合 AE 和 IGAE 的框架。

2.2 Target Distribution Generation

Many deep clustering methods seek to generate the target distribution for discriminative representation learning in a self-optimizing manner.（生成一個 target network 指導另外一個網路訓練，對這個 target network 一般需要高置信度。可以參考《 SDCN 》）

　　Here, I have some models to recommend：

- DEC
- IDEC
- SDCN

3 The Proposed Method

　　Our model can be divided into four part：

- an autoencoder.
- an improved graph autoencoder.
- a fusion module.
- the optimization targets

3.1 Notations

undirected graph $\mathcal{G}=\{\mathcal{V}, \mathcal{E}\}$ with $K$ cluster centers,
$\mathcal{V}=\left\{v_{1}, v_{2}, \ldots, v_{N}\right\} $ are node set.
$E$ are the node set and the edge set.
attribute matrix $\mathbf{X} \in \mathbb{R}^{N \times d}$ .
original adjacency matrix $\mathbf{A}=\left(a_{i j}\right)_{N \times N} \in \mathbb{R}^{N \times N}$ . Here, $a_{i j}=1$ if $\left(v_{i}, v_{j}\right) \in \mathcal{E} $, otherwise $a_{i j}=0$ .
degree matrix is $\mathbf{D}=\operatorname{diag}\left(d_{1}, d_{2}, \ldots, d_{N}\right) \in \mathbb{R}^{N \times N}$ and $d_{i}=\sum\limits_{v_{j} \in V} a_{i j} $.
$\widetilde{\mathbf{A}} \in \mathbb{R}^{N \times N}= \mathbf{D}^{-\frac{1}{2}}(\mathbf{A}+\mathbf{I}) \mathbf{D}^{-\frac{1}{2}} $, where $\mathbf{I} \in \mathbb{R}^{N \times N}$ indicates that each node in $\mathcal{V} $ is linked with a self-loop structure.

　　All notations are summarized in Table 1.

3.2 Fusion-based Autoencoders

3.2.1 Input of the Decoder.

In our proposed method, with the compressed representations of AE and GAE, we first integrate the information from both sources for a consensus latent representation.
Then, with this embedding as an input, both the decoders of AE and GAE reconstruct the inputs of two sub-networks.

3.2.2 Improved Graph Autoencoder.

The classic AE are usually symmetric, while GCN are usually asymmetric.
GAE and AE require only the latent representation to reconstruct the adjacency information and overlook that the structure-based attribute information can also be exploited for improving the generalization capability of the corresponding network.
To solve the problem ,we design a symmetric improved graph autoencoder (IGAE) which need to reconstruct both the weighted attribute matrix and the adjacency matrix simultaneously.

　　 In the proposed IGAE, a layer in the encoder and decoder is formulated as:

　　　　$\mathbf{Z}^{(l)}=\sigma\left(\widetilde{\mathbf{A}} \mathbf{Z}^{(l-1)} \mathbf{W}^{(l)}\right)\quad \quad \quad (1)$

　　　　$\widehat{\mathbf{Z}}^{(h)}=\sigma\left(\widetilde{\mathbf{A}} \widehat{\mathbf{Z}}^{(h-1)} \widehat{\mathbf{W}}^{(h)}\right)\quad \quad \quad (2)$

　　Where

- $\mathbf{W}^{(l)}$ and $\widehat{\mathbf{W}}^{(h)}$ denote the learnable parameters of the $l-th$ encoder layer and $h-th$ decoder layer.
- $\sigma$ is a non-linear activation function, such as ReLU or Tanh.

　　Our IGAE is designed to minimize a hybrid loss function:

　　　　$L_{I G A E}=L_{w}+\gamma L_{a}\quad \quad \quad (3)$

　　Where

- $\gamma$ is a pre-defined hyper-parameter that balances the weight of the two reconstruction loss functions.

　　Specially, $ L_{w} $ and $ L_{a} $ are defined as follows:

　　　　$L_{w} =\frac{1}{2 N}\|\widetilde{\mathbf{A}} \mathbf{X}-\widehat{\mathbf{Z}}\|_{F}^{2}\quad \quad \quad (4)$

　　　　$L_{a} =\frac{1}{2 N}\|\widetilde{\mathbf{A}}-\widehat{\mathbf{A}}\|_{F}^{2}\quad \quad \quad (5)$

- In Eq.(4), $\widehat{\mathbf{Z}} \in \mathbb{R}^{N \times d}$ is the reconstructed weighted attribute matrix.
- In Eq.( 5 ), $\widehat{\mathbf{A}} \in \mathbb{R}^{N \times N}$ is the reconstructed adjacency matrix generated by an inner product operation with multi-level representations of the network.

　　IGAE is termed to minimize the reconstruction loss over the weighted attribute matrix and the adjacency matrix at the same time.

3.2.3 Structure and Attribute Information Fusion

　　Structure and attribute information fusion (SAIF) module. It has two parts：

- a crossmodality dynamic fusion mechanism.
- a triplet selfsupervised strategy.

　　The overall structure of SAIF is illustrated in Fig. 2.

Part 1 ：Cross-modality Dynamic Fusion Mechanism.

　　This modul includes four Steps：

First, we combine the latent embedding of $\mathrm{AE} \left(\mathbf{Z}_{A E} \in \mathbb{R}^{N \times d^{\prime}}\right)$ and IGAE $\left(\mathbf{Z}_{I G A E} \in \mathbb{R}^{N \times d^{\prime}}\right)$ with a linear combination operation:

　　　　$\mathbf{Z}_{I}=\alpha \mathbf{Z}_{A E}+(1-\alpha) \mathbf{Z}_{I G A E}\quad \quad \quad (6)$

　　 In Eq.(6)

- $d^{\prime}$ is the latent embedding dimension, and
- $\alpha$ is a learnable coefficient and $\alpha$ is initialized as $0.5$ and then tuned automatically with a gradient decent method.
Secondly, we process the combined information $\mathbf{Z}_{I}$ with a graph convolution-like operation. With this operation, we enhance the initial fused embedding $\mathbf{Z}_{I} \in \mathbb{R}^{N \times d^{\prime}}$ by considering the local structure within data:

　　　　$ \mathbf{Z}_{L}=\widetilde{\mathbf{A}} \mathbf{Z}_{I}\quad \quad \quad (7)$

　　 In Eq.(7)

- $\mathbf{Z}_{L} \in \mathbb{R}^{N \times d^{\prime}}$ denotes the local structure enhanced $\mathbf{Z}_{I}$ .
Thirdly, we further introduce a self-correlated learning mechanism to exploit the non-local relationship in the preliminary information fusion space among samples.We first calculate the normalized self-correlation matrix $\mathbf{S} \in \mathbb{R}^{N \times N}$ through Eq.(8):

　　　　${\large \mathbf{S}_{i j}=\frac{e^{\left(\mathbf{Z}_{L} \mathbf{Z}_{L}^{\mathrm{T}}\right)_{i j}}}{\sum\limits _{k=1}^{N} e^{\left(\mathbf{Z}_{L} \mathbf{Z}_{L}^{\mathrm{T}}\right)_{i k}}}} \quad \quad \quad (8)$

　　 And then ,caculate the global correlation：

　　　　$\mathbf{Z}_{G}=\mathbf{S} \mathbf{Z}_{L}$ .

Finally，we adopt a skip connection to encourage information to pass smoothly within the fusion mechanism:

　　　　$\widetilde{\mathbf{Z}}=\beta \mathbf{Z}_{G}+\mathbf{Z}_{L} \quad \quad \quad (9)$

　　In Eq.(9)

- $\beta$ is a scale parameter. We initialize it as $0$ and learn its weight while training the network.

　　Conclusion of this Part

- Technically, our cross-modality dynamic fusion mechanism considers the sample correlation in both the perspective of the local and global level. Thus, it has potential benefit on finely fusing and refining the information from both $\mathrm{AE}$ and IGAE for learning consensus latent representations.

Part 2 ：Triplet Self-supervised Strategy

　　To generate more reliable guidance for clustering network training.

　　The target distribution generation steps as following：

Step1：In Eq. (10), we calculate the similarity between the $i -th$ sample $\left(\tilde{z}_{i}\right)$ and the $j -th$ precalculated clustering center $\left(u_{j}\right)$ in the fused embedding space using Student's t -distribution as kernel.

　　　　${\large q_{i j}=\frac{\left(1+\left\|\tilde{z}_{i}-u_{j}\right\|^{2} / v\right)^{-\frac{v+1}{2}}}{\sum_{j^{\prime}}\left(1+\left\|\tilde{z}_{i}-u_{j^{\prime}}\right\|^{2} / v\right)^{-\frac{v+1}{2}}}}\quad \quad \quad (10)$

　　Where

- $v$ is the degree of freedom for Student's t -distribution.
- $q_{i j}$ indicates the probability of assigning the $i -th$ node to the $j- th$ center.

　　The soft assignment matrix $\mathbf{Q} \in \mathbb{R}^{N \times K}$ reflects the distribution of all samples.

Step2：To increase the confidence of cluster assignment, we introduce Eq.(11) to drive all samples to get closer to cluster centers.

　　　　${\large p_{i j}=\frac{q_{i j}^{2} / \sum\limits _{i} q_{i j}}{\sum\limits _{j^{\prime}}\left(q_{i j^{\prime}}^{2} / \sum\limits _{i} q_{i j^{\prime}}\right)}} \quad \quad \quad (11)$

　　Where

- $0 \leq p_{i j} \leq 1$ is an element of the generated target distribution $\mathbf{P} \in \mathbb{R}^{N \times K} $.

　　We then calculate the soft assignment distribution of $\mathrm{AE}$ and IGAE by using Eq. 10 over the latent embeddings of two subnetworks, respectively. We denote the soft assignment distribution of IGAE and $\mathrm{AE}$ as $\mathrm{Q}^{\prime}$ and $\mathrm{Q}^{\prime \prime} $.

　　Unified training framework：To improve the representative capability of each component, we design a triplet clustering loss by adapting the KL-divergence in the following form:

　　　　$L_{K L}=\sum\limits _{i} \sum\limits_{j} p_{i j} \log \frac{p_{i j}}{\left(q_{i j}+q_{i j}^{\prime}+q_{i j}^{\prime \prime}\right) / 3}\quad \quad \quad (12)$

3.2.4 Joint loss and Optimization

　　The overall learning objective consists of two main parts, i.e., the reconstruction loss of AE and IGAE, and the clustering loss which is correlated with the target distribution:

　　　　$L=\underbrace{L_{\text {AE }}+L_{\text {IGAE }}}_{\text {Reconstruction }}+\underbrace{\lambda L_{K L}}_{\text {Clustering }} .\quad \quad \quad (13)$

　　Where

- $L_{AE}$ is the mean square error (MSE) reconstruction loss of AE.

　　The detailed learning procedure of the proposed DFCN is shown in Algorithm 1.

4 Experiments

　　The clustering performance of our method and 10 baseline methods on six benchmark datasets are summarized in Table 3. Based on the results, we have the following observations:

DFCN shows superior performance against the compared methods in most circumstances.
It is obvious that GCN-based methods such as GAE, VGAE, ARGA, and DAEGC are not comparable to ours, because these methods under-utilize abundant information from data itself and might be limited to the oversmoothing phenomenon.
DFCN achieves better clustering results than the strongest baseline methods $SDCN_{Q}$ and SDCN in the majority of cases, especially on HHAR, DBLP, and CITE datasets.

4.1 Ablation Studies

Effectiveness of IGAE

　　We conduct ablation studies to verify the effectiveness of IGAE and report the results in Fig. 3

　　$GAE- \mathrm{L}_{w}$ or $GAE- \mathrm{L}_{a}$ denotes the method optimized by the reconstruction loss function of weighted attribute matrix or adjacency matrix only.

　　We can find out that

- $GAE-L _{w}$ consistently performs better than $GAE-L _{a}$ on six datasets.
- IGAE clearly improves the clustering performance over the method which constructs the adjacency matrix only.

　　Both observations illustrate that our proposed reconstruction measure is able to exploit more comprehensive information for improving the generalization capability of the deep clustering network. By this means, the latent embedding inherits more properties from the attribute space of the original graph, preserving representative features that generate better clustering decisions.

Analysis of the SAIF Module

　　 As summarized in Fig. 4, we observe that

- Compared with the baseline, Baseline-C method has about 0.5% to 5.0% performance improvements, indicating that exploring graph structure and node attributes in both the perspective of the local and global level is helpful to learn consensus latent representations for better clustering;
- The performance of Baseline-C-T method is consistently better than that of Baseline-C-S method on all datasets. The reason is that our triplet self-supervised strategy successfully generates more reliable guidance for the training of AE, IGAE, and the fusion part, making them benefit from each other.

Influence of Exploiting Both-source Information

　　We compare our method with two variants to validate the effectiveness of complementary two-modality (structure and attribute) information learning for target distribution generation.

　　As reported in Table 4, +AE or +IGAE refers to the DFCN with only AE or IGAE part, respectively.

　　We found that：

- +AE and +IGRE achieve better performance on different datasets, it indicates that information from either AE or IGAE cannot consistently outperform that of their counterparts, combining the both-source information can potentially improve the robustness of the hybrid method.
- DFCN encodes both DNN- and GCN-based representations and consistently outperforms the singlesource methods. This shows that 1) both-source information is equally essential for the performance improvement of DFCN; 2) DFCN can facilitate the complementary twomodality information to make the target distribution more reliable and robust for better clustering.

Analysis of Hyper-parameter λ

　　As can be seen in Eq. 13 , DFCN introduces a hyperparameter $\lambda$ to make a trade-off between the reconstruction and clustering.

　　Fig. 5 illustrates the performance variation of DFCN when $\lambda$ varies from $0.01$ to $100$ .

　　From these figures, we observe that

the hyper-parameter $\lambda$ is effective in improving the clustering performance;
the performance of the method is stable in a wide range of $\lambda$ ;
$ \mathrm{DFCN}$ tends to perform well by setting $\lambda$ to $10$ across all datasets.

Visualization of Clustering Results

　　To intuitively verify the effectiveness of DFCN, we visualize the distribution of the learned clustering embedding $\widetilde{\mathbf{Z}}$ in two-dimensional space by employing t -SNE algorithm (Maaten and Hinton 2008). As illustrated in Fig. 6, DFCN can better reveal the intrinsic clustering structure among data.

Conclusion

　　本文提出了一種新的基於神經網路的聚類方法，即深度融合聚類網路(DFCN)。

　　在我們的方法中，SAIF 模組通過動態交叉模態融合機制和三重自監督策略來利用圖的結構和節點屬性。這樣，對雙方更多的共識和鑑別資訊進行編碼，構建魯棒目標分佈，有效地提供精確的網路訓練指導。此外，所提出的IGAE能夠幫助提高該方法的泛化能力。在6個基準資料集上的實驗表明，DFCN始終優於最先進的基線方法。在未來，我們計劃進一步改進我們的方法，使其適應於多檢視圖聚類和不完全多檢視圖聚類的應用。

　　Last modify ：2022-01-27 17:59:09

『總結不易，加個關注唄！』

論文解讀（DFCN）《Deep Fusion Clustering Network》

Paper information

Abstract

1 Introduction

2 Related Work

2.1 Attributed Graph Clustering

2.2 Target Distribution Generation

3 The Proposed Method

3.1 Notations

3.2 Fusion-based Autoencoders

3.2.1 Input of the Decoder.

3.2.2 Improved Graph Autoencoder.

3.2.3 Structure and Attribute Information Fusion

3.2.4 Joint loss and Optimization

4 Experiments

4.1 Ablation Studies

Conclusion

相關文章