PostgreSQL儲存智慧-空間聚集儲存

德哥發表於2017-09-11

背景

單身漢的房間可能是這樣的，凌亂得很。

可能會有人表示不服，不是應該這樣的嗎？

實際上大多數情況下資料庫儲存也可能是這樣的，資料凌亂。索引的目的是在凌亂的資料中找到目標資料，但是從凌亂的資料中按索引順序掃描一批資料，會有什麼後果呢：

沒錯，IO放大，我在以前的文章中有詳細介紹過其原理。

《索引順序掃描引發的堆掃描IO放大背後的統計學原理與解決辦法 – PostgreSQL index scan enlarge heap page scans when index and column correlation small.》

對於亂序存放的資料，如果按索引順序來調整儲存，聚集後，可以減少IO放大。

對於空間資料，應該如何存放呢？

內窺BLOCK – pageinspect

PostgreSQL 外掛有一個外掛pageinspect，可以解析BLOCK檢視索引、堆表的內容。

但是一直沒有支援gist, sp-gist的內窺，不過馬上就會支援了，看這個patch。

https://www.postgresql.org/message-id/flat/accae316-5e4d-8963-0c3d-277ef13c396c%40postgrespro.ru#accae316-5e4d-8963-0c3d-277ef13c396c@postgrespro.ru

Hi all,    
    
the following patch transfers functionality from gevel module    
(http://www.sai.msu.su/~megera/wiki/Gevel) which provides functions for    
analyzing GIN and GiST indexes to pageinspect. Gevel was originally    
designed by Oleg Bartunov, and Teodor Sigaev for developers of GiST and    
GIN indexes.    
    
Functions added:    
  - gist_stat(text) - shows statistics on GiST Tree    
  - gist_tree(text) - shows GiST tree    
  - gist_tree(text, int4) - shows GiST tree up to MAXLEVEL    
  - gist_print(text) - prints objects stored in GiST tree    
  - spgist_stat(text) - shows statistics on SP-GiST    
  - spgist_print(text) - prints objects stored in index    
  - gin_value_count() - originally gin_stat(text) - prints estimated counts    
for index values    
  - gin_stats() - originally gin_statpage(text) - shows statistics    
  - gin_count_estimate(text, tsquery) - shows number of indexed rows matched query    
    
Tests also transferred, docs for new functions are added.     
I run pgindent over the code,     
but the result is different from those I expected,     
so I leave pgindented one.    
The patch is applicable to the commit     
866f4a7c210857aa342bf901558d170325094dde.    
    
--     
Alexey Chernyshov    
Postgres Professional: http://www.postgrespro.com    
The Russian Postgres Company

內窺GiST, SP-GiST索引

1、打補丁

patch -p1 < ../0001-gevel_to_pageinspect-v1.patch     
patching file contrib/pageinspect/Makefile    
patching file contrib/pageinspect/brinfuncs.c    
patching file contrib/pageinspect/data/rect.data    
patching file contrib/pageinspect/data/test__int.data    
patching file contrib/pageinspect/expected/gin.out    
patching file contrib/pageinspect/expected/gist.out    
patching file contrib/pageinspect/expected/spgist.out    
patching file contrib/pageinspect/ginfuncs.c    
patching file contrib/pageinspect/gistfuncs.c    
patching file contrib/pageinspect/hashfuncs.c    
patching file contrib/pageinspect/pageinspect--1.6--1.7.sql    
patching file contrib/pageinspect/pageinspect.control    
patching file contrib/pageinspect/spgistfuncs.c    
patching file contrib/pageinspect/sql/gin.sql    
patching file contrib/pageinspect/sql/gist.sql    
patching file contrib/pageinspect/sql/spgist.sql    
patching file doc/src/sgml/pageinspect.sgml    
Hunk #2 succeeded at 830 (offset 5 lines).    
    
    
cd contrib/pageinspect/    
make    
make install

2、載入pageinspect外掛

postgres=# create extension pageinspect;    
CREATE EXTENSION

3、幾個函式介面的講解

3.1 檢視GiST索引概貌，比如層級，佔用多少個PAGE，多少葉子節點，多少TUPLE等。

gist_stat(INDEXNAME) – show some statistics about GiST tree

regression=# select gist_stat(`pix`);    
 Number of levels:          2    
Number of pages:           30    
Number of leaf pages:      29    
Number of tuples:          3129    
Number of leaf tuples:     3100    
Total size of tuples:      137676 bytes    
Total size of leaf tuples: 136400 bytes    
Total size of index:       245760 bytes

3.2 列出GiST索引的每個層級的概貌，直到某個層級的節點。

gist_tree(INDEXNAME,MAXLEVEL) – show GiST tree up to MAXLEVEL

regression=# select gist_tree(`pix`,0);    
 0(l:0) blk: 0 numTuple: 29 free: 6888b(15.63%)    
Designation (from left to right):    
    
0 - page number    
(l:0) - tree level    
blk: 0 - block number    
numTuple: 29 - the number of tuples    
free: 6888b - free space in bytes    
(15.63%) - occupied space in percents    
gist_tree(INDEXNAME) - show full GiST tree    
regression=# select gist_tree(`pix`);    
 0(l:0) blk: 0 numTuple: 29 free: 6888b(15.63%)    
    1(l:1) blk: 13 numTuple: 180 free: 244b(97.01%)    
    2(l:1) blk: 11 numTuple: 175 free: 464b(94.32%)    
    3(l:1) blk: 2 numTuple: 101 free: 3720b(54.43%)    
    4(l:1) blk: 17 numTuple: 111 free: 3280b(59.82%)    
    5(l:1) blk: 18 numTuple: 101 free: 3720b(54.43%)    
    6(l:1) blk: 10 numTuple: 98 free: 3852b(52.82%)    
    7(l:1) blk: 19 numTuple: 111 free: 3280b(59.82%)    
    8(l:1) blk: 9 numTuple: 97 free: 3896b(52.28%)    
    9(l:1) blk: 20 numTuple: 104 free: 3588b(56.05%)    
    10(l:1) blk: 14 numTuple: 96 free: 3940b(51.74%)    
    11(l:1) blk: 21 numTuple: 106 free: 3500b(57.13%)    
    12(l:1) blk: 7 numTuple: 103 free: 3632b(55.51%)    
    13(l:1) blk: 1 numTuple: 101 free: 3720b(54.43%)    
    14(l:1) blk: 16 numTuple: 97 free: 3896b(52.28%)    
    15(l:1) blk: 24 numTuple: 103 free: 3632b(55.51%)    
    16(l:1) blk: 4 numTuple: 98 free: 3852b(52.82%)    
    17(l:1) blk: 25 numTuple: 98 free: 3852b(52.82%)    
    18(l:1) blk: 3 numTuple: 97 free: 3896b(52.28%)    
    19(l:1) blk: 26 numTuple: 96 free: 3940b(51.74%)    
    20(l:1) blk: 6 numTuple: 103 free: 3632b(55.51%)    
    21(l:1) blk: 8 numTuple: 162 free: 1036b(87.31%)    
    22(l:1) blk: 23 numTuple: 94 free: 4028b(50.66%)    
    23(l:1) blk: 12 numTuple: 82 free: 4556b(44.19%)    
    24(l:1) blk: 27 numTuple: 105 free: 3544b(56.59%)    
    25(l:1) blk: 5 numTuple: 90 free: 4204b(48.51%)    
    26(l:1) blk: 28 numTuple: 100 free: 3764b(53.90%)    
    27(l:1) blk: 22 numTuple: 101 free: 3720b(54.43%)    
    28(l:1) blk: 15 numTuple: 95 free: 3984b(51.20%)    
    29(l:1) blk: 29 numTuple: 95 free: 3984b(51.20%)

3.3 列印GiST索引的詳細內容，這裡包含了每一個索引的區間，例如國家、省、市。。。BOX邊界。

gist_print(INDEXNAME) – prints objects stored in GiST tree, works only if objects in index have textual representation (type_out functions should be implemented for given object type).

It`s known to work with R-tree GiST based index (contrib/rtree_gist).

Note, in example below, objects are of type box.

for 8.1 and later this query should looks like    
    
# select * from gist_print(`pix`) as t(level int, valid bool, a box) where level =1;    
 level | valid |              a    
-------+-------+-----------------------------    
     1 | t     | (37357,50073),(34242,357)  -- 一個BOX的左下，右上POINT，R-Tree的下一級節點的資料都在這個box內。    
     1 | t     | (43499,49770),(40358,43)    
     1 | t     | (31193,24679),(25047,12410)    
     1 | t     | (31018,12142),(25083,6)    
     1 | t     | (49944,25174),(43471,12802)    
     1 | t     | (12577,49757),(6302,37534)    
     1 | t     | (12528,37333),(6171,24861)    
     1 | t     | (50027,49751),(46817,25462)    
     1 | t     | (46870,49912),(43664,25722)    
     1 | t     | (24855,25574),(12447,19263)    
     1 | t     | (25054,19126),(12403,12796)    
     1 | t     | (32737,49923),(31178,1038)      
     1 | t     | (3184,24465),(15,81)    
     1 | t     | (24951,49983),(12740,44000)    
     1 | t     | (24919,43956),(12617,37901)    
     1 | t     | (40387,49852),(37338,25217)    
     1 | t     | (40325,24963),(37375,491)      
     1 | t     | (24919,12698),(12654,6518)    
     1 | t     | (25002,6338),(12350,51)       
     1 | t     | (49985,12554),(43447,222)      
     1 | t     | (25003,37769),(12552,25573)    
     1 | t     | (34270,49382),(32763,594)      
     1 | t     | (6205,50012),(3,37527)    
     1 | t     | (6163,37358),(120,25034)      
     1 | t     | (12343,24542),(9295,294)    
     1 | t     | (9308,24151),(6234,620)    
     1 | t     | (6230,24629),(3169,108)    
     1 | t     | (31179,50040),(28113,25556)    
     1 | t     | (28048,49694),(25000,25000)    
(29 rows)    
    
Using Gevel module it`s possible to visualize index tree, see for example Rtree Index.

GiST索引影像

我們可以把GIST索引的內容，按層級繪製出影像來。

例如一級長這樣，就是大BOX。

下一級（葉子節點），長這樣，就是大BOX下的一個個小BOX。

聚集儲存對效能的提升分析

如果我們按照GiST索引的層級排序，聚集儲存，那麼在基於空間掃描資料的時候，掃描的塊就更少。

假設落在杭州地區有100萬條資料，按照單身漢的亂序儲存，杭州的資料可能散落分佈在很多個PAGE中，我們掃描時也許掃描的是全量資料，而且是離散掃，效能損耗比較大。如果按照GiST聚集儲存，那麼會有極大的效能提升。

空間聚集後的效能驗證

1、建立測試表

postgres=# create table test(id int, pos point);    
CREATE TABLE

2、寫入1000萬測試記錄，隨機點位

postgres=# insert into test select id, point(round(random()*1000), round(random()*1000)) from generate_series(1,10000000) t(id);    
INSERT 0 10000000    
    
postgres=# select * from test limit 10;    
 id |    pos        
----+-----------    
  1 | (335,286)    
  2 | (948,616)    
  3 | (591,851)    
  4 | (695,158)    
  5 | (728,801)    
  6 | (544,191)    
  7 | (851,940)    
  8 | (720,487)    
  9 | (790,344)    
 10 | (50,956)    
(10 rows)

3、建立空間索引

postgres=# create index idx_test_pos on test using gist(pos);    
CREATE INDEX

4、內窺空間索引

postgres=# select * from gist_stats(`idx_test_pos`);    
                 gist_stats                     
--------------------------------------------    
 Number of levels:          4              +    
 Number of pages:           87396          +    
 Number of leaf pages:      86626          +    
 Number of tuples:          10087395       +    
 Number of invalid tuples:  0              +    
 Number of leaf tuples:     10000000       +    
 Total size of tuples:      444894132 bytes+    
 Total size of leaf tuples: 441039512 bytes+    
 Total size of index:       715948032 bytes+    
     
(1 row)    
    
postgres=# select * from gist_tree(`idx_test_pos`,1);    
                                             gist_tree                                                 
---------------------------------------------------------------------------------------------------    
 0(l:0) blk: 0 numTuple: 7 free: 7840b(3.92%) rightlink:4294967295 (InvalidBlockNumber)           +    
     1(l:1) blk: 46732 numTuple: 108 free: 3396b(58.38%) rightlink:28877 (OK)                     +    
     2(l:1) blk: 18831 numTuple: 107 free: 3440b(57.84%) rightlink:56294 (OK)                     +    
     3(l:1) blk: 56294 numTuple: 159 free: 1152b(85.88%) rightlink:4294967295 (InvalidBlockNumber)+    
     4(l:1) blk: 18830 numTuple: 109 free: 3352b(58.92%) rightlink:71045 (OK)                     +    
     5(l:1) blk: 71045 numTuple: 90 free: 4188b(48.68%) rightlink:46732 (OK)                      +    
     6(l:1) blk: 28877 numTuple: 72 free: 4980b(38.97%) rightlink:78677 (OK)                      +    
     7(l:1) blk: 78677 numTuple: 117 free: 3000b(63.24%) rightlink:18831 (OK)                     +    
     
(1 row)    
    
postgres=# select * from gist_print(`idx_test_pos`) as t(level int, valid bool, a box) where level =1;    
 level | valid |           a               
-------+-------+-----------------------    
     1 | t     | (655,626),(422,0)    
     1 | t     | (1000,411),(647,0)    
     1 | t     | (1000,1000),(647,411)    
     1 | t     | (425,337),(0,0)    
     1 | t     | (425,626),(0,331)    
     1 | t     | (248,1000),(0,617)    
     1 | t     | (655,1000),(245,617)    
(7 rows)

4.1、末端（葉子）節點的BOX如下。

postgres=# select * from (select * from gist_print(`idx_test_pos`) as t(level int, valid bool, a box) where level =4) t limit 10;    
 level | valid |          a              
-------+-------+---------------------    
     4 | t     | (630,107),(630,107)    
     4 | t     | (631,105),(631,105)    
     4 | t     | (629,107),(629,107)    
     4 | t     | (629,105),(629,105)    
     4 | t     | (631,109),(631,109)    
     4 | t     | (629,105),(629,105)    
     4 | t     | (631,105),(631,105)    
     4 | t     | (630,106),(630,106)    
     4 | t     | (629,105),(629,105)    
     4 | t     | (630,108),(630,108)    
(10 rows)

5、空間聚集前，也就是從單身漢的凌絡臥室空間掃描落在某一個BOX內的資料。

返回101518條記錄，掃描了50914個HEAP資料塊。

postgres=# explain (analyze,verbose,timing,costs) select * from test where box(`(100,100),(200,200)`) @> pos;    
                                                            QUERY PLAN                                                                
----------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.test  (cost=183.52..10728.49 rows=10000 width=20) (actual time=22.710..96.798 rows=101518 loops=1)    
   Output: id, pos    
   Recheck Cond: (`(200,200),(100,100)`::box @> test.pos)    
   Heap Blocks: exact=50914    
   ->  Bitmap Index Scan on idx_test_pos  (cost=0.00..181.02 rows=10000 width=0) (actual time=15.422..15.422 rows=101518 loops=1)    
         Index Cond: (`(200,200),(100,100)`::box @> test.pos)    
 Planning time: 0.044 ms    
 Execution time: 104.395 ms    
(8 rows)

6、按GiST空間聚集，整理資料。

PostgreSQL提供了一個聚集語法：

Command:     CLUSTER  
Description: cluster a table according to an index  
Syntax:  
CLUSTER [VERBOSE] table_name [ USING index_name ]  
CLUSTER [VERBOSE]

使用GiST索引對資料進行聚集：

postgres=# cluster test USING idx_test_pos ;  
CLUSTER

驗證聚集後的塊掃描效能。

返回101518條記錄，掃描了691個HEAP資料塊。

explain (analyze,verbose,timing,costs,buffers) select * from test where box(`(100,100),(200,200)`) @> pos;    
    
                                                          QUERY PLAN                                                               
-------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.test  (cost=193.12..10549.51 rows=10000 width=16) (actual time=8.092..21.497 rows=101518 loops=1)    
   Output: pos    
   Recheck Cond: (`(200,200),(100,100)`::box @> test.pos)    
   Heap Blocks: exact=691    
   Buffers: shared hit=1727    
   ->  Bitmap Index Scan on idx_test_pos  (cost=0.00..190.62 rows=10000 width=0) (actual time=7.997..7.997 rows=101518 loops=1)    
         Index Cond: (`(200,200),(100,100)`::box @> test.pos)    
         Buffers: shared hit=1036    
 Planning time: 0.137 ms    
 Execution time: 29.650 ms    
(10 rows)

使用空間聚集，按某個空間條件查詢並返回101518記錄，對比聚集前後，掃描的HEAP資料塊數目從50914降到了691。

參考

http://www.sai.msu.su/~megera/wiki/Gevel

http://www.sai.msu.su/~megera/wiki/Rtree_Index

【儲存管理】表空間概念
2016-10-23
Synology群暉NAS儲存正確建立儲存池和儲存空間的方法
2020-08-04
禪道 - 儲存空間釋放
2016-11-17
【儲存管理】建立永久表空間
2016-10-23
innodb表空間儲存結構
2018-06-11
oracle 修改表空間儲存路徑
2013-11-28
Oracle
dbms_space 評估儲存空間
2010-10-11
SQL Server 深入解析索引儲存(聚集索引)
2015-03-23
SQLServer索引
IBM DS8700儲存維護和管理二：儲存空間劃分
2012-09-09
IBM
掃描儲存上的新加空間
2015-04-15
set unused 是否會釋放儲存空間
2012-03-23
塊儲存檔案儲存物件儲存
2020-05-28
物件
SQL Server 深入解析索引儲存(非聚集索引)
2015-03-26
SQLServer索引
PostgreSQL DBA(60) - 列式儲存zedstore
2019-07-22
SQLZed
PostgreSQL：表的儲存屬性
2020-12-14
SQL
win10保留的儲存空間怎麼檢視_win10保留的儲存空間的檢視方法
2020-06-24
Win10
鐵威馬NAS建立儲存空間的方法
2020-01-08
UniCloud空間雲端儲存圖床原始碼
2021-12-07
Cloud圖床原始碼
多名稱空間結合cgroup保障儲存QoS
2018-07-09
表在表空間中的儲存情況
2006-12-19
給資料庫新增儲存空間的案例
2007-03-21
資料庫
行式儲存列式儲存
2024-04-23
儲存—物件儲存_Minio
2024-07-15
物件
物件儲存、檔案儲存、塊儲存這三者之間有什麼區別？
2021-09-16
物件
PostgreSQL儲存引擎之page結構
2018-11-11
SQL儲存引擎
自動儲存、靜態儲存和動態儲存
2017-04-18
【SQL Server2005頁面儲存4之--非聚集索引行在葉級別儲存】
2010-05-08
SQLServer索引
PG的物理儲存結構、版本控制、空間回收
2019-06-06
如何刪除 Mac 儲存空間的其他選項？
2020-09-23
Mac
伺服器儲存空間不足怎麼處理？
2022-07-07
伺服器
儲存管理之段收縮、可恢復空間
2011-12-05
小儲存容量手機有救了！Android 新歸檔功能可釋放 60% 儲存空間
2022-03-17
Android
浪潮儲存基於智慧運維技術，加速儲存自治
2021-11-14
運維
MyISAM 儲存引擎,Innodb 儲存引擎
2015-02-28
儲存引擎
資料儲存--檔案儲存
2024-05-26
[DB2]表空間之DMS、自動儲存的DMS表空間
2010-07-26
DB2
【SQL Server2005頁面儲存5之--非聚集索引行在非葉級別儲存】
2010-07-17
SQLServer索引
3-04. 實現箱子儲物空間的儲存和資料交換
2024-04-17