技術分享 | 用圖資料庫來降低 MySQL 處理多層關係的延遲（一）

作者：楊濤濤
資深資料庫專家，專研 MySQL 十餘年。擅長 MySQL、PostgreSQL、MongoDB 等開源資料庫相關的備份恢復、SQL 調優、監控運維、高可用架構設計等。目前任職於愛可生，為各大運營商及銀行金融企業提供 MySQL 相關技術支援、MySQL 相關課程培訓等工作。
本文來源：原創投稿
*愛可生開源社群出品，原創內容未經授權不得隨意使用，轉載請聯絡小編並註明來源。

MySQL 資料庫可以用來處理絕大多數線上業務場景，而且能處理得很好，無論從單節點效能，或者說從多機擴充套件後的總體吞吐量來講，都很佔優勢。不過，萬事無絕對，MySQL 在某些場景下效能並不能達到預期。例如在各種繁雜的關係處理方面，MySQL處理起來就有些吃力。此類場景下，NoSQL 就比關係型資料庫要更加合適。本篇我們用圖資料庫Neo4J（NoSQL的一種）來舉例說明在這種場景下如何彌補 MySQL 的不足。

這裡先用簡單的人物關係來舉例說明。

對於簡單的人物關係，比如說小宋、小李、小楊、小愛、小徐、小娜、小喬幾個人之間的關係鏈條可能如下：

小宋 “認識” 小李。
小李 ”認識“ 小楊。
小楊 “認識“ 小愛。
小愛 ”認識” 小徐。
小徐 “認識“ 小娜。
小娜 ”認識” 小喬。

其中 “認識” 即為幾個人之間的關係。這樣的關係有很多種，比如 “認識”、“見過”、”好友“、”同事“，”暗戀“、”戀人“ 等等。本篇我們先來看基本的關係：”認識“。也就是說關係僅僅“認識”而已，別無其他。

假設有如下三個需求：

找出使用者總數目。
找出“認識” 小楊並且小楊又 “認識” 的人。
找出小楊 “認識” 的 “認識” 的 “認識” 的 “認識” 的人。

對於這樣幾個需求，我們先基於 MySQL 來設計兩張表：（如果僅僅實現最後兩個需求，只需要表 2 即可。）

表1：使用者表，儲存使用者記錄；表2：使用者關係表，儲存使用者之間的關係。為了簡化起見，兩張表無主鍵，由MySQL 內部分配。

mysql:ytt>create table user1 (name varchar(64));
Query OK, 0 rows affected (0.09 sec)

mysql:ytt>insert user1 values ("小宋"),("小李"),("小楊"),("小愛"),("小徐"),("小娜"),("小喬");
Query OK, 7 rows affected (0.01 sec)
Records: 7  Duplicates: 0  Warnings: 0


mysql:ytt>create table relation1 (user_name varchar(64),friend_name varchar(64));
Query OK, 0 rows affected (0.07 sec)

mysql:ytt>insert relation1 values ("小宋","小李"),("小李","小楊"),("小楊","小愛"),("小愛","小徐"),("小徐","小娜"),("小娜","小喬");
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0

那我們來實現以上三個需求：

找出使用者總數目：非常簡單，直接求 count(*) 即可。

   mysql:ytt>select count(*) as total_number from user1;
   +--------------+
   | total_number |
   +--------------+
   |            7 |
   +--------------+
   1 row in set (0.00 sec)

找出“認識” 小楊並且小楊又 “認識” 的人：由於記錄條數少，直接全表過濾。

   mysql:ytt>select * from (select (case when friend_name = "小楊" then user_name when user_name = "小楊" then friend_name end) as user_name from relation1) as sub_table where user_name is not null;
   +-----------+
   | user_name |
   +-----------+
   | 小李      |
   | 小愛      |
   +-----------+
   2 rows in set (0.00 sec)

找出小楊 “認識” 的 “認識” 的 “認識” 的 “認識” 的人：也就是找到以小楊為起點的四層關係網的終端使用者名。

   mysql:ytt>select d.friend_name from relation1 b 
       -> inner join relation1 a on  b.user_name = "小楊" and b.friend_name = a.user_name
       -> inner join relation1 c on a.friend_name = c.user_name
       -> inner join relation1 d on c.friend_name = d.user_name;
   +-------------+
   | friend_name |
   +-------------+
   | 小喬        |
   +-------------+
   1 row in set (0.00 sec)

以上三個需求，特別是第三個。找出以小楊為起點的關係網，層數越多，需要 join 的表個數越多。在 MySQL 裡，表關聯數量越多，效能也就越差，後期我將會在“SQL 優化” 主題裡繼續延伸探討這個問題。

我們用圖資料庫 Neo4J 來解決同樣的需求。

建立使用者節點以及使用者之間的關係，

neo4j@ytt> create (x1:user {name:"小宋"}),
(x2:user {name:"小李"}),
(x3:user {name:"小楊"}),
(x4:user {name:"小愛"}),
(x5:user {name:"小徐"}),
(x6:user {name:"小娜"}),
(x7:user {name:"小喬"})
with x1,x2,x3,x4,x5,x6,x7
create (x1)-[:認識]->(x2),
(x2)-[:認識]->(x3),
(x3)-[:認識]->(x4),
(x4)-[:認識]->(x5),
(x5)-[:認識]->(x6),
(x6)-[:認識]->(x7);
0 rows
ready to start consuming query after 269 ms, results consumed after another 0 ms
Added 7 nodes, Created 6 relationships, Set 7 properties, Added 7 labels

對應的關係圖展示如下：

以上需求在 Neo4J 裡的實現如下：

需求一：

   neo4j@ytt> match(x:user) return count(*) as total_number;
   +--------------+
   | total_number |
   +--------------+
   | 7            |
   +--------------+
   
   1 row
   ready to start consuming query after 21 ms, results consumed after another 1 ms

需求二：

   neo4j@ytt> match (y1:user)-[:認識]->(x:user {name:"小楊"})-[:認識]->(y2:user) return y1.name,y2.name;
   +-------------------+
   | y1.name | y2.name |
   +-------------------+
   | "小李"    | "小愛"    |
   +-------------------+
   
   1 row
   ready to start consuming query after 95 ms, results consumed after another 2 ms

需求三：

   neo4j@ytt> match(x:user {name:"小楊"})-[:認識*4]->(y) return y.name;
   +--------+
   | y.name |
   +--------+
   | "小喬"   |
   +--------+
   
   1 row
   ready to start consuming query after 398 ms, results consumed after another 174 ms

單從三個需求的查詢結果上看，MySQL 和 Neo4J 效能差不多，區別只是寫法上有些差異。但是如果把資料量放大，特別是對需求三的處理，MySQL 就有點吃力了。

我們來把資料放大數倍，使用者表記錄數增加到千條，關係表記錄數增加到十萬條。

分別給使用者表和關係表造點資料：(user1.csv 裡包含 1100個使用者，relation1.csv 裡包含 1W 條記錄，每個使用者大概“認識” 100 個人，並且給關係表加上索引。)

mysql:ytt>truncate table user1;
Query OK, 0 rows affected (0.19 sec)

mysql:ytt>load data infile '/var/lib/mysql-files/user1.csv' into table user1 fields terminated by ',' enclosed by '"';
Query OK, 1100 rows affected (0.10 sec)
Records: 1100  Deleted: 0  Skipped: 0  Warnings: 0


mysql:ytt>truncate table relation1;
Query OK, 0 rows affected (0.11 sec)

mysql:ytt>load data infile '/var/lib/mysql-files/relation1.csv' into table relation1 fields terminated by ',' enclosed by '"';
Query OK, 100000 rows affected (1.60 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

mysql:ytt>alter table relation1 add key idx_user_name (user_name), add key idx_friend_name(friend_name);
Query OK, 0 rows affected (4.09 sec)
Records: 0  Duplicates: 0  Warnings: 0

表部分資料如下：

mysql:ytt>table user1 limit 2;
+-------+
| name  |
+-------+
| user1 |
| user2 |
+-------+
2 rows in set (0.00 sec)

mysql:ytt>table relation1 limit 2;
+-----------+-------------+
| user_name | friend_name |
+-----------+-------------+
| user1     | user101     |
| user2     | user101     |
+-----------+-------------+
2 rows in set (0.00 sec)

接下來實現第三個需求：

這裡把使用者“小楊” 替換為“user1”，由於結果很多，所以只求個總數即可，花費時間4分多鐘，對於使用者端來講，已經無法接受。

mysql:ytt>select count(distinct d.friend_name) as cnt from relation1 b 
    -> inner join relation1 a on  b.user_name = "user1" and b.friend_name = a.user_name
    -> inner join relation1 c on a.friend_name = c.user_name
    -> inner join relation1 d on c.friend_name = d.user_name;


+-----+
| cnt |
+-----+
| 100 |
+-----+
1 row in set (4 min 15.47 sec)

接下來把 MySQL 資料匯入到 Neo4J ，來繼續實現同樣的需求：

匯入節點：

neo4j@ytt> load csv from  "file:///user1.csv" as x create (a:user {name:x[0]});
0 rows
ready to start consuming query after 523 ms, results consumed after another 0 ms
Added 1100 nodes, Set 1100 properties, Added 1100 labels

匯入關係：

neo4j@ytt> load csv from "file:///relation1.csv" as x with x match (a:user {name:x[0]}), (b:user {name:x[1]}) merge (a)-[:認識]-(b);

Created 100000 relationships, completed after 94636 ms.

給節點加上索引：

neo4j@ytt> create index on :user(name);
0 rows
ready to start consuming query after 31 ms, results consumed after another 0 ms
Added 1 indexes

Neo4J 實現第三個需求：執行時間小於1秒，已經比 MySQL 快了很多倍。

neo4j@ytt> match(x:user {name:"user1"})-[*4]->(y) 
return count(distinct y.name) as total_friend;
+--------------+
| total_friend |
+--------------+
| 100          |
+--------------+

1 row
ready to start consuming query after 44 ms,
results consumed after another 692 ms

總結：

本篇基於圖資料庫在處理人物關係上優於關係型資料庫的基礎上做了簡單介紹，更多關係處理，請繼續閱讀後續篇章。

技術分享 | 用圖資料庫來降低 MySQL 處理多層關係的延遲（一）

假設有如下三個需求：

找出使用者總數目。

找出“認識” 小楊 並且 小楊 又 “認識” 的人。

找出小楊 “認識” 的 “認識” 的 “認識” 的 “認識” 的人。

對於這樣幾個需求，我們先基於 MySQL 來設計兩張表：（如果僅僅實現最後兩個需求，只需要表 2 即可。）

我們用圖資料庫 Neo4J 來解決同樣的需求。

相關文章

找出“認識” 小楊並且小楊又 “認識” 的人。