本節簡單介紹了PostgreSQL中的GiST索引,包括GiST索引的基礎知識和結構等.

簡介
GiST是 generalized search tree通用搜尋樹 的簡稱,與Btree類似,是平衡搜尋樹.與Btree不同的地方在於,Btree嚴格與比較語義相關,支援大於/小於/等於等操作符,但對於地理資訊資料,文字檔案,影象等資料Btree則是無能無力.GiST索引方法可以處理這些資料,它允許定義一個規則在一個平衡的樹中分佈任意型別的資料，並允許定義一個方法來使用這個表示來讓某些操作符訪問。

結構
GiST是由節點pages組成的高度平衡(height-balanced)樹。節點由索引行組成。
葉子節點的每一個(leaf row)通常來說包含某些謂詞(布林表示式)以及實際資料行(TID)的引用.索引資料必須滿足該謂詞.
內部節點的每一個行(internal row)同樣包含謂詞以及子位元組的引用,所有child subtree中已索引的資料必須滿足該謂詞.換句話來說,內部行的謂片語合了所有child rows的謂詞.GiST索引的這一重要特性取代了Btree的簡單排序。

在GiST中檢索使用特定的 consistency function(即consistent) ,consistent是PG定義的介面函式之一,對於支援的op family有自己的實現方法.對於每一個索引行都會呼叫consistent函式來判定該行的謂詞是否與搜尋謂詞consistent(被稱為索引域操作符表示式).對於內部行,該函式實際上會判定是否需要下降到相應的子樹中進行檢索.

搜尋從root節點開始, consistency function 函式找出哪些子節點(可能有多個)可以進入,而哪些不需要,對於每一個進入的子節點進行遞迴處理.如果是子節點,那麼選中的行將被返回.

搜尋使用深度優先:演算法首先嚐試到達葉子節點,這樣可以儘可能快的返回第一個結果行(類似於Oracle的HINT : first_rows).

下面以Point為例說明.該例中,Index有3個層次:
Level one: two large intersecting rectangles are visible

Level two: large rectangles are split into smaller areas

Level three: each rectangle contains as many points as to fit one index page

下面是隻有一個Leve的例子:

相應的資料表&資料如下:


testdb=# drop table if exists points;
DROP TABLE
testdb=# create table points(p point);
CREATE TABLE
testdb=# insert into points(p) values
testdb-#   (point '(1,1)'), (point '(3,2)'), (point '(6,3)'),
testdb-#   (point '(5,5)'), (point '(7,8)'), (point '(8,6)');
INSERT 0 6
testdb=# 
testdb=# create index on points using gist(p);
CREATE INDEX

索引結構如下:

執行查詢:


testdb=# set enable_seqscan = off;
SET
testdb=# explain(costs off) select * from points where p <@ box '(2,1),(7,4)';
                  QUERY PLAN                  
----------------------------------------------
 Index Only Scan using points_p_idx on points
   Index Cond: (p <@ '(7,4),(2,1)'::box)
(2 rows)

期望得到的結果是:

索引掃描過程:

參考資料
Indexes in PostgreSQL — 5 (GiST)

PostgreSQL DBA(48) - Index(GiST)

相關文章