iGraph——圖挖掘助力社會網路分析

周慶成發表於2012-05-08

社交網路（如Facebook，Twitter）可以完整地表現人們的生活。人們用不同的方式與他人互動，並且這些資訊都可以在社交網路中抓取到。挖掘某個站點的有用資訊可以幫助一些團體增加競爭力。

我最近無意中發現一款叫做“iGraph”的工具，它提供了一些非常有效的挖掘功能。以下列舉幾條我覺得有意思的：

建立圖表

圖表由節點和連線組成，兩者都可以附上一系列屬性值（鍵/值對）。此外，連線可以是有向的也可以是無向的，還可以給它加上權重。

> library(igraph)
> # Create a directed graph
> g <- graph(c(0,1, 0,2, 1,3, 0,3), directed=T)
> g
Vertices: 4
Edges: 4
Directed: TRUE
Edges:

[0] 0 -> 1
[1] 0 -> 2
[2] 1 -> 3
[3] 0 -> 3
> # Create a directed graph using adjacency matrix
> m <- matrix(runif(4*4), nrow=4)
> m
[,1]      [,2]      [,3]      [,4]
[1,] 0.4086389 0.2160924 0.1557989 0.2896239
[2,] 0.4669456 0.1071071 0.1290673 0.3715809
[3,] 0.2031678 0.3911691 0.5906273 0.7417764
[4,] 0.8808119 0.7687493 0.9734323 0.4487252
> g <- graph.adjacency(m > 0.5)
> g
Vertices: 4
Edges: 5
Directed: TRUE
Edges:

[0] 2 -> 2
[1] 2 -> 3
[2] 3 -> 0
[3] 3 -> 1
[4] 3 -> 2
> plot(g, layout=layout.fruchterman.reingold)
>

enter image description here

iGraph也提供了多種建立各種圖形的圖表的簡單方法

> #Create a full graph
> g1 <- graph.full(4)
> g1
Vertices: 4
Edges: 6
Directed: FALSE
Edges:

[0] 0 -- 1
[1] 0 -- 2
[2] 0 -- 3
[3] 1 -- 2
[4] 1 -- 3
[5] 2 -- 3
> #Create a ring graph
> g2 <- graph.ring(3)
> g2
Vertices: 3
Edges: 3
Directed: FALSE
Edges:

[0] 0 -- 1
[1] 1 -- 2
[2] 0 -- 2
> #Combine 2 graphs
> g <- g1 %du% g2
> g
Vertices: 7
Edges: 9
Directed: FALSE
Edges:

[0] 0 -- 1
[1] 0 -- 2
[2] 0 -- 3
[3] 1 -- 2
[4] 1 -- 3
[5] 2 -- 3
[6] 4 -- 5
[7] 5 -- 6
[8] 4 -- 6
> graph.difference(g, graph(c(0,1,0,2), directed=F))
Vertices: 7
Edges: 7
Directed: FALSE
Edges:

[0] 0 -- 3
[1] 1 -- 3
[2] 1 -- 2
[3] 2 -- 3
[4] 4 -- 6
[5] 4 -- 5
[6] 5 -- 6
> # Create a lattice
> g1 = graph.lattice(c(3,4,2))
> # Create a tree
> g2 = graph.tree(12, children=2)
> plot(g1, layout=layout.fruchterman.reingold)
> plot(g2, layout=layout.reingold.tilford)

enter image description here

iGraph還提供了另外兩種圖表生成的機制。“隨機圖表”可以在任意兩個節點之間進行連線。而“優先連線”會給已經擁有較大度數的節點再增加連線（也就是多者更多）。

# Generate random graph, fixed probability
> g <- erdos.renyi.game(20, 0.3)
> plot(g, layout=layout.fruchterman.reingold, vertex.label=NA, vertex.size=5)

# Generate random graph, fixed number of arcs
> g <- erdos.renyi.game(20, 15, type='gnm')

# Generate preferential attachment graph
> g <- barabasi.game(60, power=1, zero.appeal=1.3)

enter image description here

簡單圖表演算法

這一節會介紹如何使用iGraph來實現一些簡單的圖表演算法。

最小生成樹演算法可以在圖表裡連線所有的節點，並使所有的連線權重最小。

# Create the graph and assign random edge weights
> g <- erdos.renyi.game(12, 0.35)
> E(g)$weight <- round(runif(length(E(g))),2) * 50
> plot(g, layout=layout.fruchterman.reingold, edge.label=E(g)$weight)
# Compute the minimum spanning tree
> mst <- minimum.spanning.tree(g)
> plot(mst, layout=layout.reingold.tilford,  edge.label=E(mst)$weight)

enter image description here

連通分支演算法可以找到會連通其他節點的連線，也就是說，兩個節點之間的路徑會穿過其他節點。需要注意的是，在無向圖裡連通是要對稱的，在有向圖（節點A指向節點B，但節點B不指向節點A的圖表）裡不是必須的。因此在有向圖中存在一種連線的概念叫做“強”，也就是隻有兩個節點都分別指向對方才意味著它們是連通的。“弱”的連線意味著它們不是連通的。

> g <- graph(c(0, 1, 1, 2, 2, 0, 1, 3, 3, 4, 4, 5, 5, 3, 4, 6, 6, 7, 7, 8, 8, 6, 9, 10, 10, 11, 11, 9))
# Nodes reachable from node4
> subcomponent(g, 4, mode="out")
[1] 4 5 6 3 7 8
# Nodes who can reach node4
> subcomponent(g, 4, mode="in")
[1] 4 3 1 5 0 2

> clusters(g, mode="weak")
$membership
 [1] 0 0 0 0 0 0 0 0 0 1 1 1
$csize
[1] 9 3
$no
[1] 2

> myc <- clusters(g, mode="strong")
> myc
$membership
 [1] 1 1 1 2 2 2 3 3 3 0 0 0
$csize
[1] 3 3 3 3
$no
[1] 4

> mycolor <- c('green', 'yellow', 'red', 'skyblue')
> V(g)$color <- mycolor[myc$membership + 1]
> plot(g, layout=layout.fruchterman.reingold)

最短路徑演算法是最普遍的演算法，它能找到節點A和節點B之間最短的路徑。在iGraph裡，如果圖表是未加權的（也就是權重為1的）而且在權重為正時使用了迪傑斯特拉演算法，會使用“breath-first search”演算法。要是連線的權重是負數，則會使用Bellman-ford演算法。

> g <- erdos.renyi.game(12, 0.25)
> plot(g, layout=layout.fruchterman.reingold)
> pa <- get.shortest.paths(g, 5, 9)[[1]]
> pa
[1] 5 0 4 9
> V(g)[pa]$color <- 'green'
> E(g)$color <- 'grey'
> E(g, path=pa)$color <- 'red'
> E(g, path=pa)$width <- 3
> plot(g, layout=layout.fruchterman.reingold)

enter image description here

圖表統計

通過大量統計資訊我們可以大致看到圖表的形狀。在最高許可權下，我們可以看到圖表的各類資訊，它包括:
- 圖表的大小（節點和連線的數量）
- 圖表的密度是緊密的（|E|與|V|的平方成正比）還是稀疏的（|E|與|V|成正比）?
- 圖表是連通的（大部分節點是互通的）還是非連通的（節點是孤立的）？
- 圖表中最長的兩點之間距離
- 有向圖的對稱性
- 出/入“度”的分佈

> # Create a random graph 
> g <- erdos.renyi.game(200, 0.01)
> plot(g, layout=layout.fruchterman.reingold, vertex.label=NA, vertex.size=3)
> # No of nodes
> length(V(g))
[1] 200
> # No of edges
> length(E(g))
[1] 197
> # Density (No of edges / possible edges)
> graph.density(g)
[1] 0.009899497
> # Number of islands
> clusters(g)$no
[1] 34
> # Global cluster coefficient:
> #(close triplets/all triplets)
> transitivity(g, type="global")
[1] 0.015
> # Edge connectivity, 0 since graph is disconnected
> edge.connectivity(g)
[1] 0
> # Same as graph adhesion
> graph.adhesion(g)
[1] 0
> # Diameter of the graph
> diameter(g)
[1] 18
> # Reciprocity of the graph
> reciprocity(g)
[1] 1
> # Diameter of the graph
> diameter(g)
[1] 18
> # Reciprocity of the graph
> reciprocity(g)
[1] 1
> degree.distribution(g)
[1] 0.135 0.280 0.315 0.110 0.095 0.050 0.005 0.010
> plot(degree.distribution(g), xlab="node degree")
> lines(degree.distribution(g))

enter image description here

往下一點，我們也可以看到每對節點的統計資訊，比如:
- 計算兩點之間沒有公用連線的路徑（也就是需要移除多少條連線可以使兩節點不連通）
- 計算兩點之間的最短路徑
- 計算兩點之間路徑的數量和長度

> # Create a random graph
> g <- erdos.renyi.game(9, 0.5)
> plot(g, layout=layout.fruchterman.reingold)
> # Compute the shortest path matrix
> shortest.paths(g)
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
 [1,]    0    1    3    1    2    2    1    3    2
 [2,]    1    0    2    2    3    2    2    2    1
 [3,]    3    2    0    2    1    2    2    2    1
 [4,]    1    2    2    0    3    1    2    2    1
 [5,]    2    3    1    3    0    3    1    3    2
 [6,]    2    2    2    1    3    0    2    1    1
 [7,]    1    2    2    2    1    2    0    2    1
 [8,]    3    2    2    2    3    1    2    0    1
 [9,]    2    1    1    1    2    1    1    1    0
> # Compute the connectivity matrix
> M <- matrix(rep(0, 81), nrow=9)
> M
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
 [1,]    0    0    0    0    0    0    0    0    0
 [2,]    0    0    0    0    0    0    0    0    0
 [3,]    0    0    0    0    0    0    0    0    0
 [4,]    0    0    0    0    0    0    0    0    0
 [5,]    0    0    0    0    0    0    0    0    0
 [6,]    0    0    0    0    0    0    0    0    0
 [7,]    0    0    0    0    0    0    0    0    0
 [8,]    0    0    0    0    0    0    0    0    0
 [9,]    0    0    0    0    0    0    0    0    0
> for (i in 0:8) {
+   for (j in 0:8) {
+     if (i == j) {
+       M[i+1, j+1] <- -1
+     } else {
+       M[i+1, j+1] <- edge.connectivity(g, i, j)
+     }
+   }
+ }
> M
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
 [1,]   -1    2    2    3    2    3    3    2    3
 [2,]    2   -1    2    2    2    2    2    2    2
 [3,]    2    2   -1    2    2    2    2    2    2
 [4,]    3    2    2   -1    2    3    3    2    3
 [5,]    2    2    2    2   -1    2    2    2    2
 [6,]    3    2    2    3    2   -1    3    2    3
 [7,]    3    2    2    3    2    3   -1    2    3
 [8,]    2    2    2    2    2    2    2   -1    2
 [9,]    3    2    2    3    2    3    3    2   -1
>

enter image description here

中心性計算

在細節方面，我們可以看到各個節點的統計資訊。根據這些數字可以測出節點的“中心性”
- 擁有較高出/入度數的節點也擁有較高的“度中心性”
- 與其他節點之間有短路徑的節點擁有較高的“密集中心性”
- 與其他節點對之間有最短路徑的節點擁有較高的“中間性”
- 連線了許多中心性較高節點的節點擁有較高的“特徵向量中心性”
- 本地簇係數意味著相鄰節點的互聯性

> # Degree
> degree(g)
[1] 2 2 2 2 2 3 3 2 6
> # Closeness (inverse of average dist)
> closeness(g)
[1] 0.4444444 0.5333333 0.5333333 0.5000000
[5] 0.4444444 0.5333333 0.6153846 0.5000000
[9] 0.8000000
> # Betweenness
> betweenness(g)
[1]  0.8333333  2.3333333  2.3333333
[4]  0.0000000  0.8333333  0.5000000
[7]  6.3333333  0.0000000 18.8333333
> # Local cluster coefficient
> transitivity(g, type="local")
[1] 0.0000000 0.0000000 0.0000000 1.0000000
[5] 0.0000000 0.6666667 0.0000000 1.0000000
[9] 0.1333333
> # Eigenvector centrality
> evcent(g)$vector
[1] 0.3019857 0.4197153 0.4197153 0.5381294
[5] 0.3019857 0.6693142 0.5170651 0.5381294
[9] 1.0000000
> # Now rank them
> order(degree(g))
[1] 1 2 3 4 5 8 6 7 9
> order(closeness(g))
[1] 1 5 4 8 2 3 6 7 9
> order(betweenness(g))
[1] 4 8 6 1 5 2 3 7 9
> order(evcent(g)$vector)
[1] 1 5 2 3 7 4 8 6 9

從中Drew Conway發現擁有低“特徵向量中心性”和高“中間性”的人是很重要的聯絡人，而擁有高“特徵向量中心性”和低“中間性”的人與重要的人有關聯。現在我們來繪製“特徵向量中心性”和“中間性”的圖表。

> # Create a graph
> g1 <- barabasi.game(100, directed=F)
> g2 <- barabasi.game(100, directed=F)
> g <- g1 %u% g2
> lay <- layout.fruchterman.reingold(g)
> # Plot the eigevector and betweenness centrality
> plot(evcent(g)$vector, betweenness(g))
> text(evcent(g)$vector, betweenness(g), 0:100, cex=0.6, pos=4)
> V(g)[12]$color <- 'red'
> V(g)[8]$color <- 'green'
> plot(g, layout=lay, vertex.size=8, vertex.label.cex=0.6)

enter image description here

在之後的帖子裡我還會介紹一些特殊的社交網路分析的例子。

原文連結：Basic graph analytics using igraph（需要翻牆）

# Generate preferential attachment graph

社會網路分析及其Python實現
2024-05-02
Python
網際網路助力鄉村振興戰略社會價值研究報告
2022-05-18
網路遊戲與無緣社會
2019-11-06
遊戲
學生們都愛社會化網路-資料資訊圖
2012-01-10
Pew：網際網路正在破壞社會
2017-03-31
“社會化網路時代”的結束，“社會化圈子時代”的興起
2011-06-02
美創科技助力某省人社廳資料安全建設，加速推進“網際網路+人社”
2021-08-12
網路安全是“無現金社會”的前提
2017-07-06
無線網路：社會工程的沃土（By Jim Stickley）
2011-05-31
eMarketer：印尼社會化網路發展情況
2012-05-05
RedHat釋出社會性網路站點Mugshot
2007-08-25
Redhat
《社會媒體挖掘》作者劉歡教授訪談問題有獎徵集（圖靈訪談）
2015-12-01
圖靈
Dentsu Aegis：2018年網路社會指數
2018-03-12
中國社會各階層分析
2024-07-29
群--網路社團
2012-09-22
全球服裝品牌的社會化營銷排行分析–資訊圖
2013-09-27
網際網路+人社峰會召開騰訊開放三大能力建“人社”生態
2018-05-08
維護國家網路安全應是社會公責
2017-07-03
社會化公司–資料資訊圖
2012-06-14
電通安吉斯：2020年網路社會指數
2020-09-01
GigaOM：網路社會化增強大眾社交行為
2011-01-19
網路傳謠入刑“野蠻生長”成社會公害
2015-11-02
復旦發展研究院：2014年中國網路社會心態報告（社會情緒篇）
2014-12-15
新興科技+網際網路創新大賽，科技+社會公益專項賽
2018-05-21
美國社會學學會:網民擁有伴侶機率高網際網路成當代”媒婆”
2010-08-18
資料分析與挖掘-挖掘建模
2020-09-30
研究人性弱點的黑客？聊聊社會工程學與網路安全
2020-10-30
黑客
算力網路串聯數字社會 SPN奠定堅實底座
2022-03-10
2022：網路社會將迎來哪些新趨勢？
2022-04-13
網路詐騙致女孩死亡，歹徒之罪還是社會安全之殤?
2016-08-25
社會學對網路應用創新的三大啟示
2011-03-10
復旦發展研究院：2014年中國網路社會心態報告之“社會情緒”篇
2014-10-22
都會網路路由器網路層可靠技術分析
2016-07-04
路由器
盛邦安全受邀出席2022世界網際網路大會，探討網路空間地圖話題，助力構建數字安全底圖
2022-11-14
地圖
亞洲誠信助力2018 ISC網際網路安全大會，共築網路安全
2018-09-06
開發者福音 | 維陣（AI圖神經網路漏洞挖掘）上線公測
2020-06-28
AI神經網路
利用資訊“麵包屑”分析人類社會
2016-02-23
社會化客戶關係管理–資訊圖
2013-11-25

iGraph——圖挖掘助力社會網路分析

相關文章