分層資料 Hierarchical Data 探索 (3.巢狀集合模型) 無限極分類

arunfung發表於2019-05-16

分層資料Hierarchical Data探索(例如:無限級分類、多級選單、省份城市)

引言

第一篇 分層資料Hierarchical Data探索(1.遞迴) 已經介紹了分層資料以及使用遞迴演算法實現了無限極分類,但是遞迴即浪費時間,又浪費空間(記憶體),尤其是在資料量大的情況下效率顯著下降。
第二篇 分層資料Hierarchical Data探索(2.鄰接表模型) 介紹了一種資料模型鄰接表模型來實現,但在檢索路徑的過程中,除了本層外,每一層都會對應一個LEFT JOIN,那麼如果層數不定怎麼辦?或者層數過多?

鄰接表模型的侷限性

用純SQL編碼實現鄰接表模型有一定的難度。在我們檢索某分類的路徑之前,我們需要知道該分類所在的層次。在刪除中間層的節點時,需要同時刪除該節點下的所有節點,否則會出現孤立節點。

那麼,在MySQL中如何更好的處理分層資料呢?下面我們來說一說巢狀集合模型

巢狀集合模型(Nested Set Model)

更多 巢狀集合模型(Nested Set Model)的介紹請見:wiki

在巢狀集合模型中,我們將以一種新的方式來理解我們的分層資料,不再是線與點了,而是巢狀容器。下圖以巢狀容器的方式畫出了electronics分類圖:

通過集合的包含關係,巢狀結合模型可以表示分層結構,每一個分層可以用一個Set來表示(一個圈),父節點所在的圈包含所有子節點所在的圈。

為了用MySQL來表示集合關係,需要定義連個欄位 lftrgt (表示一個集合的範圍)。

# 為了模擬,我們建立一個表category包含三個欄位:id,title,lft,rgt如下:
CREATE TABLE category (
  id int(10) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
  title varchar(255) NOT NULL,
  lft int(10) NOT NULL,
  rgt int(10) NOT NULL
);

# 插入模擬資料
INSERT INTO category(title,lft,rgt) VALUES('Electronics',1,28);

INSERT INTO category(title,lft,rgt) VALUES('Laptops & PC',2,7);

INSERT INTO category(title,lft,rgt) VALUES('Laptops',3,4);
INSERT INTO category(title,lft,rgt) VALUES('PC',5,6);

INSERT INTO category(title,lft,rgt) VALUES('Cameras & photo',8,11);
INSERT INTO category(title,lft,rgt) VALUES('Camera',9,10);

INSERT INTO category(title,lft,rgt) VALUES('Phones & Accessories',12,27);
INSERT INTO category(title,lft,rgt) VALUES('Smartphones',13,20);

INSERT INTO category(title,lft,rgt) VALUES('Android',14,15);
INSERT INTO category(title,lft,rgt) VALUES('iOS',16,17);
INSERT INTO category(title,lft,rgt) VALUES('Other Smartphones',18,19);

INSERT INTO category(title,lft,rgt) VALUES('Batteries',21,22);
INSERT INTO category(title,lft,rgt) VALUES('Headsets',23,24);
INSERT INTO category(title,lft,rgt) VALUES('Screen Protectors',25,26);

select * from category;
+----+----------------------+-----+-----+
| id | title                | lft | rgt |
+----+----------------------+-----+-----+
|  1 | Electronics          |   1 |  28 |
|  2 | Laptops & PC         |   2 |   7 |
|  3 | Laptops              |   3 |   4 |
|  4 | PC                   |   5 |   6 |
|  5 | Cameras & photo      |   8 |  11 |
|  6 | Camera               |   9 |  10 |
|  7 | Phones & Accessories |  12 |  27 |
|  8 | Smartphones          |  13 |  20 |
|  9 | Android              |  14 |  15 |
| 10 | iOS                  |  16 |  17 |
| 11 | Other Smartphones    |  18 |  19 |
| 12 | Batteries            |  21 |  22 |
| 13 | Headsets             |  23 |  24 |
| 14 | Screen Protectors    |  25 |  26 |
+----+----------------------+-----+-----+
14 rows in set (0.00 sec)
  • 檢索分層路徑

由於子節點的 lft 值總在父節點的 lft 和 rgt 值之間,所以可以通過父節點連線到子節點上來檢索整棵樹

SELECT node.id,node.title,node.lft,node.rgt
FROM category AS node,
     category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND parent.title = 'Electronics'
ORDER BY node.lft;
+----+----------------------+-----+-----+
| id | title                | lft | rgt |
+----+----------------------+-----+-----+
|  1 | Electronics          |   1 |  28 |
|  2 | Laptops & PC         |   2 |   7 |
|  3 | Laptops              |   3 |   4 |
|  4 | PC                   |   5 |   6 |
|  5 | Cameras & photo      |   8 |  11 |
|  6 | Camera               |   9 |  10 |
|  7 | Phones & Accessories |  12 |  27 |
|  8 | Smartphones          |  13 |  20 |
|  9 | Android              |  14 |  15 |
| 10 | iOS                  |  16 |  17 |
| 11 | Other Smartphones    |  18 |  19 |
| 12 | Batteries            |  21 |  22 |
| 13 | Headsets             |  23 |  24 |
| 14 | Screen Protectors    |  25 |  26 |
+----+----------------------+-----+-----+
14 rows in set (0.05 sec)

不像之前鄰接表模型的例子,這個查詢語句不管樹的層次有多深都能很好的工作。在BETWEEN的子句中我們沒有去關心node的rgt值,是因為使用node的rgt值得出的父節點總是和使用lft值得出的是相同的。

  • 檢索所有葉子節點

檢索出所有的葉子節點,使用巢狀集合模型的方法比鄰接表模型的LEFT JOIN方法簡單多了。如果你仔細得看了category表,你可能已經注意到葉子節點的左右值是連續的。要檢索出葉子節點,我們只要查詢滿足 rgt=lft+1 的節點:

SELECT id,title,lft,rgt
FROM category
WHERE rgt = lft + 1;
+----+-------------------+-----+-----+
| id | title             | lft | rgt |
+----+-------------------+-----+-----+
|  3 | Laptops           |   3 |   4 |
|  4 | PC                |   5 |   6 |
|  6 | Camera            |   9 |  10 |
|  9 | Android           |  14 |  15 |
| 10 | iOS               |  16 |  17 |
| 11 | Other Smartphones |  18 |  19 |
| 12 | Batteries         |  21 |  22 |
| 13 | Headsets          |  23 |  24 |
| 14 | Screen Protectors |  25 |  26 |
+----+-------------------+-----+-----+
9 rows in set (0.00 sec)

查詢

  • 檢索單一路徑

在巢狀集合模型中,我們可以不用多個自連線就可以檢索出單一路徑:

SELECT parent.id,parent.title,parent.lft,parent.rgt
FROM category AS node,
     category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.title = 'PC'
ORDER BY parent.lft;

+----+--------------+-----+-----+
| id | title        | lft | rgt |
+----+--------------+-----+-----+
|  1 | Electronics  |   1 |  28 |
|  2 | Laptops & PC |   2 |   7 |
|  4 | PC           |   5 |   6 |
+----+--------------+-----+-----+
3 rows in set (0.00 sec)
  • 檢索節點的深度

我們已經知道怎樣去呈現一棵整樹,但是為了更好的標識出節點在樹中所處層次,我們怎樣才能檢索出節點在樹中的層級呢?我們可以在之前的查詢語句上增加COUNT函式和GROUP BY子句來實現:

SELECT node.title,(COUNT(parent.title) - 1) AS lev
FROM category AS node,
     category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.title
ORDER BY node.lft;

+----------------------+-----+
| title                | lev |
+----------------------+-----+
| Electronics          |   0 |
| Laptops & PC         |   1 |
| Laptops              |   2 |
| PC                   |   2 |
| Cameras & photo      |   1 |
| Camera               |   2 |
| Phones & Accessories |   1 |
| Smartphones          |   2 |
| Android              |   3 |
| iOS                  |   3 |
| Other Smartphones    |   3 |
| Batteries            |   2 |
| Headsets             |   2 |
| Screen Protectors    |   2 |
+----------------------+-----+
14 rows in set (0.01 sec)

如果當前MySQL版本是5.7或者以上可能會出現 1055 的報錯,下面是是解決辦法

報錯:
ERROR 1055 (42000): Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'test.node.lft' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

原因:In 5.7 the sqlmode is set by default to:
ONLY_FULL_GROUP_BY,NO_AUTO_CREATE_USER,STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION

解決:To remove the clause ONLY_FULL_GROUP_BY you can do this:
SET sql_mode=(SELECT REPLACE(@@sql_mode,'ONLY_FULL_GROUP_BY',''));

This supposed you need to make that GROUP BY with non aggregated columns.

我們可以根據 lev 值來縮排分類名字,使用 CONCAT 和 REPEAT 字串函式:

SELECT CONCAT( REPEAT(' ', COUNT(parent.title) - 1), node.title) AS name,(COUNT(parent.title) - 1) AS lev
FROM category AS node,
     category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.title
ORDER BY node.lft;
+-----------------------+-----+
| name                  | lev |
+-----------------------+-----+
| Electronics           |   0 |
|  Laptops & PC         |   1 |
|   Laptops             |   2 |
|   PC                  |   2 |
|  Cameras & photo      |   1 |
|   Camera              |   2 |
|  Phones & Accessories |   1 |
|   Smartphones         |   2 |
|    Android            |   3 |
|    iOS                |   3 |
|    Other Smartphones  |   3 |
|   Batteries           |   2 |
|   Headsets            |   2 |
|   Screen Protectors   |   2 |
+-----------------------+-----+
14 rows in set (0.01 sec)
  • 檢索子樹的深度
SELECT node.title, (COUNT(parent.title) - (sub_tree.lev + 1)) AS lev
FROM category AS node,
    category AS parent,
    category AS sub_parent,
    (
        SELECT node.title, (COUNT(parent.title) - 1) AS lev
        FROM category AS node,
        category AS parent
        WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.title = 'Phones & Accessories'
        GROUP BY node.title
        ORDER BY node.lft
    ) AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
    AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
    AND sub_parent.title = sub_tree.title
GROUP BY node.title
ORDER BY node.lft;

這個查詢語句可以檢索出任一節點子樹的深度值,包括根節點。這裡的深度值跟你指定的節點有關。

  • 檢索節點的直接子節點

可以想象一下,你在零售網站上呈現電子產品的分類。當使用者點選分類後,你將要呈現該分類下的產品,同時也需列出該分類下的直接子分類,而不是該分類下的全部分類。為此,我們只呈現該節點及其直接子節點,不再呈現更深層次的節點。
要實現它非常的簡單,在先前的查詢語句上新增 HAVING 子句:

SELECT node.title, (COUNT(parent.title) - (sub_tree.lev + 1)) AS lev
FROM category AS node,
    category AS parent,
    category AS sub_parent,
    (
        SELECT node.title, (COUNT(parent.title) - 1) AS lev
        FROM category AS node,
        category AS parent
        WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.title = 'Phones & Accessories'
        GROUP BY node.title
        ORDER BY node.lft
    ) AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
    AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
    AND sub_parent.title = sub_tree.title
GROUP BY node.title
HAVING lev <= 1
ORDER BY node.lft;

如果你不希望呈現父節點,你可以更改 HAVING lev <= 1HAVING lev = 1

新增節點

  • 新增同一層次的節點

到現在,我們已經知道了如何去查詢我們的樹,是時候關注一下如何增加一個新節點來更新我們的樹了。
當我們想要在 Laptops & PCCameras & photo節點之間新增一個節點,新節點的 lft 和 rgt 的 值為8和9,所有該節點的右邊節點的lft和rgt值都將加2,之後我們再新增新節點並賦相應的lft和rgt值。我使用了鎖表(LOCK TABLES)語句來隔離查詢:

LOCK TABLE category WRITE;

SELECT @myRight := rgt FROM category WHERE title = 'Laptops & PC';

UPDATE category SET rgt = rgt + 2 WHERE rgt > @myRight;
UPDATE category SET lft = lft + 2 WHERE lft > @myRight;

INSERT INTO category(title, lft, rgt) VALUES('Game Consoles', @myRight + 1, @myRight + 2);

UNLOCK TABLES;

我們可以檢驗一下新節點插入的正確性:
SELECT CONCAT( REPEAT(' ', COUNT(parent.title) - 1), node.title) AS name,(COUNT(parent.title) - 1) AS lev
FROM category AS node,
     category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.title
ORDER BY node.lft;

+-----------------------+-----+
| name                  | lev |
+-----------------------+-----+
| Electronics           |   0 |
|  Laptops & PC         |   1 |
|   Laptops             |   2 |
|   PC                  |   2 |
|  Game Consoles        |   1 |
|  Cameras & photo      |   1 |
|   Camera              |   2 |
|  Phones & Accessories |   1 |
|   Smartphones         |   2 |
|    Android            |   3 |
|    iOS                |   3 |
|    Other Smartphones  |   3 |
|   Batteries           |   2 |
|   Headsets            |   2 |
|   Screen Protectors   |   2 |
+-----------------------+-----+
15 rows in set (0.00 sec)
  • 新增葉子節點

如果我們想要在葉子節點下增加節點,我們得稍微修改一下查詢語句。讓我們在 Camera 葉子節點下新增 SLR 節點:

LOCK TABLE category WRITE;

SELECT @myLeft := lft FROM category WHERE title = 'Camera';

UPDATE category SET rgt = rgt + 2 WHERE rgt > @myLeft;
UPDATE category SET lft = lft + 2 WHERE lft > @myLeft;

INSERT INTO category(title, lft, rgt) VALUES('SLR', @myLeft + 1, @myLeft + 2);

UNLOCK TABLES;

刪除節點

最後刪除節點。刪除節點的處理過程跟節點在分層資料中所處的位置有關,刪除一個葉子節點比刪除一個子節點要簡單得多,因為刪除子節點的時候,我們需要去處理孤立節點。

  • 刪除葉子節點

刪除一個葉子節點的過程正好是新增一個葉子節點的逆過程,我們在刪除節點的同時該節點右邊所有節點的左右值和該父節點的右值都會減去該節點的寬度值:

LOCK TABLE category WRITE;

SELECT @myLeft := lft, @myRight := rgt, @myWidth := rgt - lft + 1 FROM category WHERE title = 'Game Consoles';

DELETE FROM category WHERE lft BETWEEN @myLeft AND @myRight;

UPDATE category SET rgt = rgt - @myWidth WHERE rgt > @myRight;
UPDATE category SET lft = lft - @myWidth WHERE lft > @myRight;

UNLOCK TABLES;
  • 刪除子節點以及整顆子樹
LOCK TABLE category WRITE;

SELECT @myLeft := lft, @myRight := rgt, @myWidth := rgt - lft + 1 FROM category WHERE title = 'Cameras & photo';

DELETE FROM category WHERE lft BETWEEN @myLeft AND @myRight;

UPDATE category SET rgt = rgt - @myWidth WHERE rgt > @myRight;
UPDATE category SET lft = lft - @myWidth WHERE lft > @myRight;

UNLOCK TABLES;
  • 刪除該節點,而不刪除該節點的子節點
LOCK TABLE category WRITE;

SELECT @myLeft := lft, @myRight := rgt, @myWidth := rgt - lft + 1 FROM category WHERE title = 'Cameras & photo';

DELETE FROM category WHERE lft = @myLeft;

UPDATE category SET rgt = rgt - 1, lft = lft - 1 WHERE lft BETWEEN @myLeft AND @myRight;
UPDATE category SET rgt = rgt - 2 WHERE rgt > @myRight;
UPDATE category SET lft = lft - 2 WHERE lft > @myRight;

UNLOCK TABLES;

在這個例子中,我們對該節點所有右邊節點的左右值都減去了2(因為不考慮其子節點,該節點的寬度為2),對該節點的子節點的左右值都減去了1(彌補由於失去父節點的左值造成的裂縫)

參考資源

  • 著作權歸作者所有。商業轉載請聯絡作者獲得授權,非商業轉載請註明出處。
  • 文章來源https://blog.arunfung.com

相關文章