論文資訊
論文標題:Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning
論文作者:Ming Jin, Yizhen Zheng, Yuan-Fang Li, Chen Gong, Chuan Zhou, Shirui Pan
論文來源:2021, IJCAI
論文地址:download
論文程式碼:download
1 Introduction
創新:融合交叉檢視對比和交叉網路對比。
2 Method
演算法圖示如下:
模型組成部分:
-
- Graph augmentations
- Cross-network contrastive learning
- Cross-view contrastive learning
2.1 Graph Augmentations
- Graph Diffusion (GD)
$S=\sum\limits _{k=0}^{\infty} \theta_{k} T^{k} \in \mathbb{R}^{N \times N}\quad\quad\quad(1)$
這裡採用 PPR kernel:
$S=\alpha\left(I-(1-\alpha) D^{-1 / 2} A D^{-1 / 2}\right)^{-1}\quad\quad\quad(2)$
- Edge Modification (EM)
給定修改比例 $P$ ,先隨機刪除 $P/2$ 的邊,再隨機新增$P/2$ 的邊。(新增和刪除服從均勻分佈)
- Subsampling (SS)
在鄰接矩陣中隨機選擇一個節點索引作為分割點,然後使用它對原始圖進行裁剪,建立一個固定大小的子圖作為增廣圖檢視。
- Node Feature Masking (NFM)
給定特徵矩陣 $X$ 和增強比 $P$,我們在 $X$ 中隨機選擇節點特徵維數的 $P$ 部分,然後用 $0$ 掩碼它們。
在本文中,將 SS、EM 和 NFM 應用於第一個檢視,並將 SS+GD+NFM 應用於第二個檢視。
2.2 Cross-Network Contrastive Learning
MERIT 引入了一個孿生網路架構,它由兩個相同的編碼器(即 $g_{\theta}$, $p_{\theta}$, $g_{\zeta}$ 和 $p_{\zeta}$)組成,在 online encoder 上有一個額外的預測器$q_{\theta}$,如 Figure 1 所示。
這種對比性的學習過程如 Figure 2(a) 所示:
其中:
-
- $H^{1}=q_{\theta}\left(Z^{1}\right)$
- $Z^{1}=p_{\theta}\left(g_{\theta}\left(\tilde{X}_{1}, \tilde{A}_{1}\right)\right)$
- $Z^{2}=p_{\theta}\left(g_{\theta}\left(\tilde{X}_{2}, \tilde{A}_{2}\right)\right)$
- $\hat{Z}^{1}=p_{\zeta}\left(g_{\zeta}\left(\tilde{X}_{1}, \tilde{A}_{1}\right)\right)$
- $\hat{Z}^{2}=p_{\zeta}\left(g_{\zeta}\left(\tilde{X}_{2}, \tilde{A}_{2}\right)\right)$
引數更新策略(動量更新機制):
$\zeta^{t}=m \cdot \zeta^{t-1}+(1-m) \cdot \theta^{t}\quad\quad\quad(3)$
其中,$m$、$\zeta$、$\theta$ 分別為動量引數、target network 引數和 online network 引數。
損失函式如下:
$\mathcal{L}_{c n}=\frac{1}{2 N} \sum\limits _{i=1}^{N}\left(\mathcal{L}_{c n}^{1}\left(v_{i}\right)+\mathcal{L}_{c n}^{2}\left(v_{i}\right)\right)\quad\quad\quad(6)$
其中:
$\mathcal{L}_{c n}^{1}\left(v_{i}\right)=-\log {\large \frac{\exp \left(\operatorname{sim}\left(h_{v_{i}}^{1}, \hat{z}_{v_{i}}^{2}\right)\right)}{\sum_{j=1}^{N} \exp \left(\operatorname{sim}\left(h_{v_{i}}^{1}, \hat{z}_{v_{j}}^{2}\right)\right)}}\quad\quad\quad(4) $
$\mathcal{L}_{c n}^{2}\left(v_{i}\right)=-\log {\large \frac{\exp \left(\operatorname{sim}\left(h_{v_{i}}^{2}, \hat{z}_{v_{i}}^{1}\right)\right)}{\sum_{j=1}^{N} \exp \left(\operatorname{sim}\left(h_{v_{i}}^{2}, \hat{z}_{v_{j}}^{1}\right)\right)}}\quad\quad\quad(5) $
2.3 Cross-View Contrastive Learning
損失函式:
$\mathcal{L}_{c v}^{k}\left(v_{i}\right)=\mathcal{L}_{\text {intra }}^{k}\left(v_{i}\right)+\mathcal{L}_{\text {inter }}^{k}\left(v_{i}\right), \quad k \in\{1,2\}\quad\quad\quad(10)$
其中:
$\mathcal{L}_{c v}=\frac{1}{2 N} \sum\limits _{i=1}^{N}\left(\mathcal{L}_{c v}^{1}\left(v_{i}\right)+\mathcal{L}_{c v}^{2}\left(v_{i}\right)\right)\quad\quad\quad(9)$
$\mathcal{L}_{\text {inter }}^{1}\left(v_{i}\right)=-\log {\large \frac{\exp \left(\operatorname{sim}\left(h_{v_{i}}^{1}, h_{v_{i}}^{2}\right)\right)}{\sum_{j=1}^{N} \exp \left(\operatorname{sim}\left(h_{v_{i}}^{1}, h_{v_{j}}^{2}\right)\right)}}\quad\quad\quad(7) $
$\begin{aligned}\mathcal{L}_{i n t r a}^{1}\left(v_{i}\right) &=-\log \frac{\exp \left(\operatorname{sim}\left(h_{v_{i}}^{1}, h_{v_{i}}^{2}\right)\right)}{\exp \left(\operatorname{sim}\left(h_{v_{i}}^{1}, h_{v_{i}}^{2}\right)\right)+\Phi} \\\Phi &=\sum\limits_{j=1}^{N} \mathbb{1}_{i \neq j} \exp \left(\operatorname{sim}\left(h_{v_{i}}^{1}, h_{v_{j}}^{1}\right)\right)\end{aligned}\quad\quad\quad(8)$
2.4 Model Training
$\mathcal{L}=\beta \mathcal{L}_{c v}+(1-\beta) \mathcal{L}_{c n}\quad\quad\quad(11)$
3 Experiment
資料集
基線實驗