Paper Information

Title：《Attributed Graph Clustering: A Deep Attentional Embedding Approach》
Authors：Chun Wang、Shirui Pan、Ruiqi Hu、Guodong Long、Jing Jiang、C. Zhang
Source：2019, IJCAI
Other：96 Citations, 42 References
Paper：Download
Code：Download
Task：Graph Clustering、Graph Embedding、Node Clustering

Abstract

　　該方法側重於屬性圖的構建，並使用 attention network 描述鄰居節點對 target node 的重要性。

1 Introduction

　　目前研究現狀：基於圖表示學習的方法都是兩階段的方法。

　　總結目前研究：重構拓撲結構以及重建節點表示的方法。

　　研究缺陷：拓撲結構和節點表示融合機制並不完美。

　　本文模型：$\text{DAEGC}$ [ a goal-directed graph attentional autoencoder based attributed graph clustering framework ]

　　重建節點表示：採用 $\text{graph attentional autoencoder }$：

- $\text{Encoder}$ 可以同時學習節點內容以及圖結構；
- $\text{Decoder}$ 重建圖拓撲結構；

　　訓練模型：自訓練模型 [ 高置信度分佈指導模型訓練 ]

　　本文模型與傳統的 $\text{two-step}$ 方法的比較如 Figure 1 所示：

本文模型是將 $\text{node embedding}$ 和聚類放在一個統一的框架中學習。
$\text{Two-step}$ 方法則是先學習 $\text{node embedding}$，然後進行聚類。

　　本文貢獻：

- 第一個提出圖注意自編碼器；
- 提出了基於 $\text{goal-directed}$ 的圖聚類框架；

2 Related Work

2.1 Graph Clustering

　　闡述早起方法的不頂用，以及感謝深度方法對圖聚類的發展。

2.2 Deep Clustering Algorithms

　　銘記 DEC 深度聚類。

3 Problem Definition and Overall Framework

　　$\text{Graph basic definition}$ ：略。

　　給定圖 $G$，圖聚類的目的是將 $G$ 中的節點劃分為 $k$ 個不相交 $\text{groups}$： ${G_1、G_2、···、G_k}$，使在同一 $\text{group}$ 的節點滿足兩個條件：

- 彼此圖結構相似；[ 社群結構類似 ]
- 節點屬性相似；

　　本文模型框架包括兩個部分，如 Fig 2 所示：

- Graph Attentional Autoencoder ：AE 以屬性值和圖結構作為輸入，並通過最小化重構損失來學習潛在的 representation ；
- Self-training Clustering ：根據學習到的 representation 進行聚類，並根據聚類結果對潛在 representation 進行操作；

　　該框架將學習 graph embedding 和執行聚類放在一個統一的框架中，因此可以使每個元件彼此受益。

4 Proposed Method

　　本節先闡述 graph attentional autoencoder [ 有效的學習圖結構和 content information ] 生成 latent representation，然後闡述 self-training module 指導聚類。

4.1 Graph Attentional Autoencoder

　　Graph Attentional Autoencoder：通過關注每個節點的鄰居來學習每個節點的 latent representation ，從而將 attribute values 與圖結構資訊融入 latent representation。

　　首先：衡量 $\text{node}$ $i$ 的鄰居 $N_i$ 對於 $\text{node}$ $i$ 的影響，這裡考慮的是不同鄰居對 $\text{node}$ $i $ 的影響不一樣，主要體現在對鄰居賦予不同的權重。

　　　　$z_{i}^{l+1}=\sigma\left(\sum\limits _{j \in N_{i}} \alpha_{i j} W z_{j}^{l}\right)\quad\quad\quad(1)$

　　其中：

- $z_{i}^{l+1}$ denotes the output representation of node $i$ ；
- $N_{i}$ denotes the neighbors of $i$ ；
- $\alpha_{i j}$ is the attention coefficient that indicates the importance of neighbor node $j$ to node $i$ ；
- $\sigma$ is a nonlinerity function ；

　　對於 attention 係數 $\alpha_{i j}$ [ 重要度 ] 主要參考兩個方面：

1. 屬性值（attribute values）；
2. 拓撲距離（ topological distance ）；

　　Aspact 1：屬性值

　　attention 係數 $\alpha_{i j}$ 可以表示為由 $x_i$ 和 $x_j$ 拼接形成的單層前饋神經網路：

　　　　$c_{i j}=\vec{a}^{T}\left[W x_{i} \| W x_{j}\right]\quad \quad \quad(2)$

　　其中：

- $\vec{a} \in R^{2 m^{\prime}}$ 是權重向量；

　　Aspact 2：拓撲距離

　　在 AE 的 $\text{Encoder}$ 中考慮高階鄰居資訊（這裡指 $ \text{t-order} $ 鄰居），得到 $\text{proximity matrix} $ ：

　　　　$M=\left(B+B^{2}+\cdots+B^{t}\right) / t\quad \quad\quad(3)$

　　其中：

- $B$ 是轉移矩陣（transition matrix），當 $e_{i j} \in E$ 有邊相連，那麼 $B_{i j}=1 / d_{i}$ ，否則 $B_{i j}=0$ 。
- $M_{i j}$ 表示 $\text{node}$ $i$ 和 $\text{node}$ $j$ 的 $t$ 階內的拓撲相關性。這意味著如果 $\text{node}$ $i$ 和 $\text{node}$ $j$ 存在鄰居關係（$t$ 階之內），那麼 $M_{i j}>0 $。

　　通常對每個 $\text{group}$ 中的 $\text{node}$ 做標準化：採用 $\text{softmax function}$

　　　　${\large \alpha_{i j}=\operatorname{softmax}_{j}\left(c_{i j}\right)=\frac{\exp \left(c_{i j}\right)}{\sum_{r \in N_{i}} \exp \left(c_{i r}\right)}} \quad \quad \quad(4)$

　　將 $\text{Eq.2}$ 中 $c_{ij}$ 帶入，並新增上 $\text{topological weights }$ $M$ 和啟用函式 $\delta$ ，那麼 $\text{attention}$ 係數可以表示為：

　　　　${\large \alpha_{i j}=\frac{\exp \left(\delta M_{i j}\left(\vec{a}^{T}\left[W x_{i} \| W x_{j}\right]\right)\right)}{\sum_{r \in N_{i}} \exp \left(\delta M_{i r}\left(\vec{a}^{T}\left[W x_{i} \| W x_{r}\right]\right)\right)}} \quad\quad\quad(5)$

　　其中

- 啟用函式 $\delta$ 採用 $LeakyReLU$ ；
- $x_{i}=z_{i}^{0}$ 作為問題的輸入；

　　這裡我們堆疊 $2$ 個 $\text{graph attention layers}$ ：

　　　　$z_{i}^{(1)}=\sigma\left(\sum \limits _{j \in N_{i}} \alpha_{i j} W^{(0)} x_{j}\right)\quad \quad \quad (6)$

　　　　$z_{i}^{(2)}=\sigma\left(\sum\limits _{j \in N_{i}} \alpha_{i j} W^{(1)} z_{j}^{(1)}\right)\quad \quad\quad(7)$

　　到這就 Encoder 就編碼了結構資訊和屬性資訊（node attributes），並且我們最終的 $z_{i}=z_{i}^{(2)}$ 。

Inner product decoder

　　本文采用了簡單的 $\text{Inner product decoder}$ [ 輸入已經包括了屬性值和拓撲結構 ] 去預測節點之間的連線：

　　　　$\hat{A}_{i j}=\operatorname{sigmoid}\left(z_{i}{ }^{\top} z_{j}\right)\quad \quad \quad (8)$

　　其中：

- $\hat{A}$ 是重建後的圖結構矩陣；

Reconstruction loss

　　通過最小化度量 $A$ 和 $\hat{A}$ 重構錯誤：

　　　　$L_{r}=\sum\limits _{i=1}^{n} \operatorname{loss}\left(A_{i, j}, \hat{A}_{i j}\right)\quad\quad \quad (9)$

4.2 Self-optimizing Embedding

　　除了優化重構誤差外，我們還將 hidden embedding 輸入一個自優化聚類模組，該模組最小化以下目標：

　　　　$L_{c}=K L(P \| Q)=\sum\limits_{i} \sum\limits _{u} p_{i u} \log \frac{p_{i u}}{q_{i u}}\quad\quad\quad(10)$

其中：

$q_{iu}$度量 node embedding $z_{i}$ 和 cluster center embedding $\mu_{u}$ 之間的相似性，本文通過 Student's t-distribution 度量。同時它可以看作是每個節點的一個軟聚類分配分佈。；
$p_{iu}$ 代表著目標分佈，由於在Q中，具有高概率的軟分配（靠近叢集中心的節點）被認為是可信的，所以考慮將 $Q$ 提高到二次方，以增加高可信度；

　　　　${\large q_{i u}=\frac{\left(1+\left\|z_{i}-\mu_{u}\right\|^{2}\right)^{-1}}{\sum\limits _{k}\left(1+\left\|z_{i}-\mu_{k}\right\|^{2}\right)^{-1}}} \quad\quad\quad（11）$

　　　　${\large p_{i u}=\frac{q_{i u}^{2} / \sum_{i} q_{i u}}{\sum_{k}\left(q_{i k}^{2} / \sum_{i} q_{i k}\right)}}\quad \quad\quad(12) $

　　聚類損失迫使當前分佈 $Q$ 接近目標分佈 $P$，從而將這些 “confident assignments” 設定為軟標籤來監督 $Q$ 的嵌入學習。

　　演算法概述

- 首先使用沒有用 selfoptimize clustering part 的自編碼器獲得初始 embedding ；
- 其次為計算 Eq.11 ，先使用 $k-means$ 獲得初始聚類中心 $\mu$
- 然後根據 $L_c$ 使用 SGD 進行優化更新 $\mu$ 和 $z$ 。

　　需要注意的是：$P$ 每 $5$ 個 iteration 更新一次，$Q$ 每個 iteration 更新一次。

　　演算法步驟：

4.3 Joint Embedding and Clustering Optimization

　　我們聯合優化了自動編碼器的重構損失和聚類損失，總目標函式為：

　　　　$L=L_{r}+\gamma L_{c}\quad \quad\quad (13)$

　　其中：

- $L_{r}$ 代表著 reconstruction loss ；
- $L_{c} $ 代表著 clustering loss ；
- $ \gamma \geq 0 $ 控制著 $L_{r}$ 和 $L_{c} $ 的平衡；

　　最終 $v_{i}$ 的 soft label 通過 $Q$ 獲得：

　　　　$s_{i}=\arg \underset{u}{\text{max}} \; q_{i u}\quad \quad\quad(14)$

　　我們的演算法有以下優點

- Interplay Exploitation ：structure and content information ;
- Clustering Specialized Embedding：self-training clustering component ;
- Joint Learning：Jointly optimizes the two parts of the loss functions ;

5 Experiments

5.1 Results

6 Conclusion

　　在本文中，我們提出了一種無監督的深度注意嵌入演算法DAEGC，以在一個統一的框架中聯合執行圖聚類和學習圖嵌入。學習到的圖嵌入整合了結構資訊和內容資訊，專門用於聚類任務。雖然圖的聚類任務自然是無監督的，但我們提出了一個自訓練的聚類元件，它從“自信的”分配中生成軟標籤來監督嵌入的更新。對聚類損失和自編碼器重構損失進行聯合優化，同時得到圖嵌入和圖聚類結果。將實驗結果與各種先進演算法的比較，驗證了DAEGC的圖聚類效能。

論文解讀（DAEGC）《Improved Deep Embedded Clustering with Local Structure Preservation》