PostgreSQL儲存智慧-空間聚集儲存
標籤
PostgreSQL , 空間資料 , R-Tree , 聚集儲存
背景
單身漢的房間可能是這樣的,凌亂得很。
可能會有人表示不服,不是應該這樣的嗎?
實際上大多數情況下資料庫儲存也可能是這樣的,資料凌亂。索引的目的是在凌亂的資料中找到目標資料,但是從凌亂的資料中按索引順序掃描一批資料,會有什麼後果呢:
沒錯,IO放大,我在以前的文章中有詳細介紹過其原理。
對於亂序存放的資料,如果按索引順序來調整儲存,聚集後,可以減少IO放大。
對於空間資料,應該如何存放呢?
內窺BLOCK – pageinspect
PostgreSQL 外掛有一個外掛pageinspect,可以解析BLOCK檢視索引、堆表的內容。
但是一直沒有支援gist, sp-gist的內窺,不過馬上就會支援了,看這個patch。
Hi all,
the following patch transfers functionality from gevel module
(http://www.sai.msu.su/~megera/wiki/Gevel) which provides functions for
analyzing GIN and GiST indexes to pageinspect. Gevel was originally
designed by Oleg Bartunov, and Teodor Sigaev for developers of GiST and
GIN indexes.
Functions added:
- gist_stat(text) - shows statistics on GiST Tree
- gist_tree(text) - shows GiST tree
- gist_tree(text, int4) - shows GiST tree up to MAXLEVEL
- gist_print(text) - prints objects stored in GiST tree
- spgist_stat(text) - shows statistics on SP-GiST
- spgist_print(text) - prints objects stored in index
- gin_value_count() - originally gin_stat(text) - prints estimated counts
for index values
- gin_stats() - originally gin_statpage(text) - shows statistics
- gin_count_estimate(text, tsquery) - shows number of indexed rows matched query
Tests also transferred, docs for new functions are added.
I run pgindent over the code,
but the result is different from those I expected,
so I leave pgindented one.
The patch is applicable to the commit
866f4a7c210857aa342bf901558d170325094dde.
--
Alexey Chernyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
內窺GiST, SP-GiST索引
1、打補丁
patch -p1 < ../0001-gevel_to_pageinspect-v1.patch
patching file contrib/pageinspect/Makefile
patching file contrib/pageinspect/brinfuncs.c
patching file contrib/pageinspect/data/rect.data
patching file contrib/pageinspect/data/test__int.data
patching file contrib/pageinspect/expected/gin.out
patching file contrib/pageinspect/expected/gist.out
patching file contrib/pageinspect/expected/spgist.out
patching file contrib/pageinspect/ginfuncs.c
patching file contrib/pageinspect/gistfuncs.c
patching file contrib/pageinspect/hashfuncs.c
patching file contrib/pageinspect/pageinspect--1.6--1.7.sql
patching file contrib/pageinspect/pageinspect.control
patching file contrib/pageinspect/spgistfuncs.c
patching file contrib/pageinspect/sql/gin.sql
patching file contrib/pageinspect/sql/gist.sql
patching file contrib/pageinspect/sql/spgist.sql
patching file doc/src/sgml/pageinspect.sgml
Hunk #2 succeeded at 830 (offset 5 lines).
cd contrib/pageinspect/
make
make install
2、載入pageinspect外掛
postgres=# create extension pageinspect;
CREATE EXTENSION
3、幾個函式介面的講解
3.1 檢視GiST索引概貌,比如層級,佔用多少個PAGE,多少葉子節點,多少TUPLE等。
gist_stat(INDEXNAME) – show some statistics about GiST tree
regression=# select gist_stat(`pix`);
Number of levels: 2
Number of pages: 30
Number of leaf pages: 29
Number of tuples: 3129
Number of leaf tuples: 3100
Total size of tuples: 137676 bytes
Total size of leaf tuples: 136400 bytes
Total size of index: 245760 bytes
3.2 列出GiST索引的每個層級的概貌,直到某個層級的節點。
gist_tree(INDEXNAME,MAXLEVEL) – show GiST tree up to MAXLEVEL
regression=# select gist_tree(`pix`,0);
0(l:0) blk: 0 numTuple: 29 free: 6888b(15.63%)
Designation (from left to right):
0 - page number
(l:0) - tree level
blk: 0 - block number
numTuple: 29 - the number of tuples
free: 6888b - free space in bytes
(15.63%) - occupied space in percents
gist_tree(INDEXNAME) - show full GiST tree
regression=# select gist_tree(`pix`);
0(l:0) blk: 0 numTuple: 29 free: 6888b(15.63%)
1(l:1) blk: 13 numTuple: 180 free: 244b(97.01%)
2(l:1) blk: 11 numTuple: 175 free: 464b(94.32%)
3(l:1) blk: 2 numTuple: 101 free: 3720b(54.43%)
4(l:1) blk: 17 numTuple: 111 free: 3280b(59.82%)
5(l:1) blk: 18 numTuple: 101 free: 3720b(54.43%)
6(l:1) blk: 10 numTuple: 98 free: 3852b(52.82%)
7(l:1) blk: 19 numTuple: 111 free: 3280b(59.82%)
8(l:1) blk: 9 numTuple: 97 free: 3896b(52.28%)
9(l:1) blk: 20 numTuple: 104 free: 3588b(56.05%)
10(l:1) blk: 14 numTuple: 96 free: 3940b(51.74%)
11(l:1) blk: 21 numTuple: 106 free: 3500b(57.13%)
12(l:1) blk: 7 numTuple: 103 free: 3632b(55.51%)
13(l:1) blk: 1 numTuple: 101 free: 3720b(54.43%)
14(l:1) blk: 16 numTuple: 97 free: 3896b(52.28%)
15(l:1) blk: 24 numTuple: 103 free: 3632b(55.51%)
16(l:1) blk: 4 numTuple: 98 free: 3852b(52.82%)
17(l:1) blk: 25 numTuple: 98 free: 3852b(52.82%)
18(l:1) blk: 3 numTuple: 97 free: 3896b(52.28%)
19(l:1) blk: 26 numTuple: 96 free: 3940b(51.74%)
20(l:1) blk: 6 numTuple: 103 free: 3632b(55.51%)
21(l:1) blk: 8 numTuple: 162 free: 1036b(87.31%)
22(l:1) blk: 23 numTuple: 94 free: 4028b(50.66%)
23(l:1) blk: 12 numTuple: 82 free: 4556b(44.19%)
24(l:1) blk: 27 numTuple: 105 free: 3544b(56.59%)
25(l:1) blk: 5 numTuple: 90 free: 4204b(48.51%)
26(l:1) blk: 28 numTuple: 100 free: 3764b(53.90%)
27(l:1) blk: 22 numTuple: 101 free: 3720b(54.43%)
28(l:1) blk: 15 numTuple: 95 free: 3984b(51.20%)
29(l:1) blk: 29 numTuple: 95 free: 3984b(51.20%)
3.3 列印GiST索引的詳細內容,這裡包含了每一個索引的區間,例如國家、省、市。。。BOX邊界。
gist_print(INDEXNAME) – prints objects stored in GiST tree, works only if objects in index have textual representation (type_out functions should be implemented for given object type).
It`s known to work with R-tree GiST based index (contrib/rtree_gist).
Note, in example below, objects are of type box.
for 8.1 and later this query should looks like
# select * from gist_print(`pix`) as t(level int, valid bool, a box) where level =1;
level | valid | a
-------+-------+-----------------------------
1 | t | (37357,50073),(34242,357) -- 一個BOX的左下,右上POINT,R-Tree的下一級節點的資料都在這個box內。
1 | t | (43499,49770),(40358,43)
1 | t | (31193,24679),(25047,12410)
1 | t | (31018,12142),(25083,6)
1 | t | (49944,25174),(43471,12802)
1 | t | (12577,49757),(6302,37534)
1 | t | (12528,37333),(6171,24861)
1 | t | (50027,49751),(46817,25462)
1 | t | (46870,49912),(43664,25722)
1 | t | (24855,25574),(12447,19263)
1 | t | (25054,19126),(12403,12796)
1 | t | (32737,49923),(31178,1038)
1 | t | (3184,24465),(15,81)
1 | t | (24951,49983),(12740,44000)
1 | t | (24919,43956),(12617,37901)
1 | t | (40387,49852),(37338,25217)
1 | t | (40325,24963),(37375,491)
1 | t | (24919,12698),(12654,6518)
1 | t | (25002,6338),(12350,51)
1 | t | (49985,12554),(43447,222)
1 | t | (25003,37769),(12552,25573)
1 | t | (34270,49382),(32763,594)
1 | t | (6205,50012),(3,37527)
1 | t | (6163,37358),(120,25034)
1 | t | (12343,24542),(9295,294)
1 | t | (9308,24151),(6234,620)
1 | t | (6230,24629),(3169,108)
1 | t | (31179,50040),(28113,25556)
1 | t | (28048,49694),(25000,25000)
(29 rows)
Using Gevel module it`s possible to visualize index tree, see for example Rtree Index.
GiST索引影像
我們可以把GIST索引的內容,按層級繪製出影像來。
例如一級長這樣,就是大BOX。
下一級(葉子節點),長這樣,就是大BOX下的一個個小BOX。
聚集儲存對效能的提升分析
如果我們按照GiST索引的層級排序,聚集儲存,那麼在基於空間掃描資料的時候,掃描的塊就更少。
假設落在杭州地區有100萬條資料,按照單身漢的亂序儲存,杭州的資料可能散落分佈在很多個PAGE中,我們掃描時也許掃描的是全量資料,而且是離散掃,效能損耗比較大。如果按照GiST聚集儲存,那麼會有極大的效能提升。
空間聚集後的效能驗證
1、建立測試表
postgres=# create table test(id int, pos point);
CREATE TABLE
2、寫入1000萬測試記錄,隨機點位
postgres=# insert into test select id, point(round(random()*1000), round(random()*1000)) from generate_series(1,10000000) t(id);
INSERT 0 10000000
postgres=# select * from test limit 10;
id | pos
----+-----------
1 | (335,286)
2 | (948,616)
3 | (591,851)
4 | (695,158)
5 | (728,801)
6 | (544,191)
7 | (851,940)
8 | (720,487)
9 | (790,344)
10 | (50,956)
(10 rows)
3、建立空間索引
postgres=# create index idx_test_pos on test using gist(pos);
CREATE INDEX
4、內窺空間索引
postgres=# select * from gist_stats(`idx_test_pos`);
gist_stats
--------------------------------------------
Number of levels: 4 +
Number of pages: 87396 +
Number of leaf pages: 86626 +
Number of tuples: 10087395 +
Number of invalid tuples: 0 +
Number of leaf tuples: 10000000 +
Total size of tuples: 444894132 bytes+
Total size of leaf tuples: 441039512 bytes+
Total size of index: 715948032 bytes+
(1 row)
postgres=# select * from gist_tree(`idx_test_pos`,1);
gist_tree
---------------------------------------------------------------------------------------------------
0(l:0) blk: 0 numTuple: 7 free: 7840b(3.92%) rightlink:4294967295 (InvalidBlockNumber) +
1(l:1) blk: 46732 numTuple: 108 free: 3396b(58.38%) rightlink:28877 (OK) +
2(l:1) blk: 18831 numTuple: 107 free: 3440b(57.84%) rightlink:56294 (OK) +
3(l:1) blk: 56294 numTuple: 159 free: 1152b(85.88%) rightlink:4294967295 (InvalidBlockNumber)+
4(l:1) blk: 18830 numTuple: 109 free: 3352b(58.92%) rightlink:71045 (OK) +
5(l:1) blk: 71045 numTuple: 90 free: 4188b(48.68%) rightlink:46732 (OK) +
6(l:1) blk: 28877 numTuple: 72 free: 4980b(38.97%) rightlink:78677 (OK) +
7(l:1) blk: 78677 numTuple: 117 free: 3000b(63.24%) rightlink:18831 (OK) +
(1 row)
postgres=# select * from gist_print(`idx_test_pos`) as t(level int, valid bool, a box) where level =1;
level | valid | a
-------+-------+-----------------------
1 | t | (655,626),(422,0)
1 | t | (1000,411),(647,0)
1 | t | (1000,1000),(647,411)
1 | t | (425,337),(0,0)
1 | t | (425,626),(0,331)
1 | t | (248,1000),(0,617)
1 | t | (655,1000),(245,617)
(7 rows)
4.1、末端(葉子)節點的BOX如下。
postgres=# select * from (select * from gist_print(`idx_test_pos`) as t(level int, valid bool, a box) where level =4) t limit 10;
level | valid | a
-------+-------+---------------------
4 | t | (630,107),(630,107)
4 | t | (631,105),(631,105)
4 | t | (629,107),(629,107)
4 | t | (629,105),(629,105)
4 | t | (631,109),(631,109)
4 | t | (629,105),(629,105)
4 | t | (631,105),(631,105)
4 | t | (630,106),(630,106)
4 | t | (629,105),(629,105)
4 | t | (630,108),(630,108)
(10 rows)
5、空間聚集前,也就是從單身漢的凌絡臥室空間掃描落在某一個BOX內的資料。
返回101518條記錄,掃描了50914個HEAP資料塊。
postgres=# explain (analyze,verbose,timing,costs) select * from test where box(`(100,100),(200,200)`) @> pos;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.test (cost=183.52..10728.49 rows=10000 width=20) (actual time=22.710..96.798 rows=101518 loops=1)
Output: id, pos
Recheck Cond: (`(200,200),(100,100)`::box @> test.pos)
Heap Blocks: exact=50914
-> Bitmap Index Scan on idx_test_pos (cost=0.00..181.02 rows=10000 width=0) (actual time=15.422..15.422 rows=101518 loops=1)
Index Cond: (`(200,200),(100,100)`::box @> test.pos)
Planning time: 0.044 ms
Execution time: 104.395 ms
(8 rows)
6、按GiST空間聚集,整理資料。
PostgreSQL提供了一個聚集語法:
Command: CLUSTER
Description: cluster a table according to an index
Syntax:
CLUSTER [VERBOSE] table_name [ USING index_name ]
CLUSTER [VERBOSE]
使用GiST索引對資料進行聚集:
postgres=# cluster test USING idx_test_pos ;
CLUSTER
驗證聚集後的塊掃描效能。
返回101518條記錄,掃描了691個HEAP資料塊。
explain (analyze,verbose,timing,costs,buffers) select * from test where box(`(100,100),(200,200)`) @> pos;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.test (cost=193.12..10549.51 rows=10000 width=16) (actual time=8.092..21.497 rows=101518 loops=1)
Output: pos
Recheck Cond: (`(200,200),(100,100)`::box @> test.pos)
Heap Blocks: exact=691
Buffers: shared hit=1727
-> Bitmap Index Scan on idx_test_pos (cost=0.00..190.62 rows=10000 width=0) (actual time=7.997..7.997 rows=101518 loops=1)
Index Cond: (`(200,200),(100,100)`::box @> test.pos)
Buffers: shared hit=1036
Planning time: 0.137 ms
Execution time: 29.650 ms
(10 rows)
使用空間聚集,按某個空間條件查詢並返回101518記錄,對比聚集前後,掃描的HEAP資料塊數目從50914降到了691。
參考
http://www.sai.msu.su/~megera/wiki/Gevel
http://www.sai.msu.su/~megera/wiki/Rtree_Index
相關文章
- 【儲存管理】表空間概念
- Synology群暉NAS儲存正確建立儲存池和儲存空間的方法
- 禪道 - 儲存空間釋放
- 【儲存管理】建立永久表空間
- innodb表空間儲存結構
- oracle 修改表空間儲存路徑Oracle
- dbms_space 評估儲存空間
- SQL Server 深入解析索引儲存(聚集索引)SQLServer索引
- IBM DS8700儲存維護和管理二:儲存空間劃分IBM
- 掃描儲存上的新加空間
- set unused 是否會釋放儲存空間
- 塊儲存 檔案儲存 物件儲存物件
- SQL Server 深入解析索引儲存(非聚集索引)SQLServer索引
- PostgreSQL DBA(60) - 列式儲存zedstoreSQLZed
- PostgreSQL:表的儲存屬性SQL
- win10保留的儲存空間怎麼檢視_win10保留的儲存空間的檢視方法Win10
- 鐵威馬NAS建立儲存空間的方法
- UniCloud空間雲端儲存圖床原始碼Cloud圖床原始碼
- 多名稱空間結合cgroup保障儲存QoS
- 表在表空間中的儲存情況
- 給資料庫新增儲存空間的案例資料庫
- 行式儲存 列式儲存
- 儲存—物件儲存_Minio物件
- 物件儲存、檔案儲存、塊儲存這三者之間有什麼區別?物件
- PostgreSQL儲存引擎之page結構SQL儲存引擎
- 自動儲存、靜態儲存和動態儲存
- 【SQL Server2005頁面儲存4之--非聚集索引行在葉級別儲存】SQLServer索引
- PG的物理儲存結構、版本控制、空間回收
- 如何刪除 Mac 儲存空間的其他選項?Mac
- 伺服器儲存空間不足怎麼處理?伺服器
- 儲存管理之段收縮、可恢復空間
- 小儲存容量手機有救了!Android 新歸檔功能可釋放 60% 儲存空間Android
- 浪潮儲存基於智慧運維技術,加速儲存自治運維
- MyISAM 儲存引擎,Innodb 儲存引擎儲存引擎
- 資料儲存--檔案儲存
- [DB2]表空間之DMS、自動儲存的DMS表空間DB2
- 【SQL Server2005頁面儲存5之--非聚集索引行在非葉級別儲存】SQLServer索引
- 3-04. 實現箱子儲物空間的儲存和資料交換