PostgreSQLGIN單列聚集索引應用

德哥發表於2017-02-21

SQL索引

背景

聚集儲存比較好理解，資料按照聚集KEY儲存在一個或相鄰的資料塊中，對聚集KEY的軌跡、行為資料檢索可以大幅減少IO。

那麼聚集索引呢？我們知道通常BTREE索引中儲存的是KEY以及對應資料的堆錶行號，每條記錄一個索引條目。

而GIN索引也是樹結構，只不過它對於單個KEY只儲存一個條目，所有行號會儲存到一個posting list或者posting tree中。

因此這樣的場景能體現GIN的幾個優勢

1. 對於帶重複KEY的軌跡、行為資料，建立GIN索引的話，可以節約一些空間。

2. 在使用GIN索引掃描單個KEY時，如果要輸出所有的軌跡資料，也能獲得非常好的查詢效率。

我們使用這裡同樣的例子，來測試一下就知道了

《PostgreSQL 聚集儲存與 BRIN索引 – 高併發行為、軌跡類大吞吐資料查詢場景解說》

正文

1. 構建離散儲存測試資料，建立GIN索引

create unlogged table test(id int, info text, crt_time timestamp);    

insert into test select generate_series(1,10000), md5(id::text), clock_timestamp() from generate_series(1,10000) t(id);    

postgres=# dt+    
                           List of relations    
 Schema |        Name        | Type  |  Owner   |  Size   | Description     
--------+--------------------+-------+----------+---------+-------------    
 public | test               | table | postgres | 7303 MB |     

set maintenance_work_mem=`32GB`;  

create index idx_test_id on test using gin (id);  

di+ idx_test_id  

 public | idx_test_id              | index | postgres | test               | 391 MB     |   

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test where id=1;  
                                                         QUERY PLAN                                                           
----------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.test  (cost=84.79..12541.74 rows=9767 width=45) (actual time=3.808..17.915 rows=10000 loops=1)  
   Output: id, info, crt_time  
   Recheck Cond: (test.id = 1)  
   Heap Blocks: exact=10000  
   Buffers: shared hit=10008  
   ->  Bitmap Index Scan on idx_test_id  (cost=0.00..82.35 rows=9767 width=0) (actual time=1.962..1.962 rows=10000 loops=1)  
         Index Cond: (test.id = 1)  
         Buffers: shared hit=8  
 Planning time: 0.092 ms  
 Execution time: 18.480 ms  
(10 rows)

測試

$ vi test.sql    

set id random(1,10000)    
select * from test where id=:id;    

$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 100000

2. 構建聚集儲存測試資料，建立GIN索引

create unlogged table cluster_test_gin (like test);    

insert into cluster_test_gin select * from test order by id;    

set maintenance_work_mem =`32GB`;    

create index idx_cluster_test_gin_id on cluster_test_gin using gin (id);    

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from cluster_test_gin where id=1;  
                                                               QUERY PLAN                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.cluster_test_gin  (cost=90.83..13732.45 rows=10714 width=45) (actual time=1.037..2.236 rows=10000 loops=1)  
   Output: id, info, crt_time  
   Recheck Cond: (cluster_test_gin.id = 1)  
   Heap Blocks: exact=94  
   Buffers: shared hit=100  
   ->  Bitmap Index Scan on idx_cluster_test_gin_id  (cost=0.00..88.16 rows=10714 width=0) (actual time=1.010..1.010 rows=10000 loops=1)  
         Index Cond: (cluster_test_gin.id = 1)  
         Buffers: shared hit=6  
 Planning time: 0.092 ms  
 Execution time: 2.791 ms  
(10 rows)

測試

$ vi test.sql    

set id random(1,10000)    
select * from cluster_test_gin where id=:id;    

$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 100000

測試結果

儲存格式	按KEY查詢軌跡 TPS	輸出吞吐	CPU利用率	索引大小	表大小
離散儲存 BTREE索引	2184	2184 萬行/s	99.8%	2.1 GB	7.3 GB
離散儲存 GIN索引	1620	1620 萬行/s	99.8%	391 MB	7.3 GB
聚集儲存 BTREE索引	4000	4000 萬行/s	99.8%	2.1 GB	7.3 GB
聚集儲存 GIN索引	3770	3770 萬行/s	99.8%	391 MB	7.3 GB
聚集儲存 BRIN索引	2255	2255 萬行/s	99.8%	232 KB	7.3 GB
行列變換 array	850	850 行/s	99.8%	248 KB	4.5 GB
行列變換 jsonb	1650	1650 行/s	99.8%	248 KB	4.5 GB

參考

《寶劍贈英雄 – 任意組合欄位等效查詢, 探探PostgreSQL多列展開式B樹》

《PostgreSQL GIN索引實現原理》

《PostgreSQL GIN multi-key search 優化》

《PostgreSQL 聚集儲存與 BRIN索引 – 高併發行為、軌跡類大吞吐資料查詢場景解說》

PostgreSQLGIN索引實現原理
2017-02-05
SQL索引
[轉]聚集索引和非聚集索引的區別
2017-08-15
索引
使用聚集索引和非聚集索引的區別
2009-12-25
索引
mysql關於聚集索引、非聚集索引的總結
2018-11-23
MySql索引
SQL Server 聚集索引和非聚集索引的區別
2009-05-27
SQLServer索引
[zt] 聚集索引和非聚集索引(sql server索引結構及其使用)
2009-06-02
索引SQLServer
SQL Server索引 - 非聚集索引
2019-01-08
SQLServer索引
FAQ系列|MySQL索引之聚集索引
2017-08-02
MySql索引
從InnoDB 索引執行簡述聚集索引和非聚集索引、覆蓋索引、回表、索引下推
2021-12-28
索引
一文總結分析聚集索引、非聚集索引、覆蓋索引的工作原理！
2020-04-07
索引
資料庫的聚集索引和非聚集索引很好的詳細說明
2012-10-18
資料庫索引
索引與null（一）：單列索引
2019-03-04
索引Null
SQL Server 深入解析索引儲存(聚集索引)
2015-03-23
SQLServer索引
SQL Server 深入解析索引儲存(非聚集索引)
2015-03-26
SQLServer索引
主鍵索引 (聚集索引) 和普通索引 (輔助索引) 的區別
2019-04-06
索引
SQLSERVER的非聚集索引結構
2013-04-25
SQLServer索引
SQLT 最佳化SQL 用複合索引代替單列索引的案例
2014-06-06
SQL索引
認識SQLServer索引以及單列索引和多列索引的不同
2014-12-03
SQLServer索引
SQL Server 索引和表體系結構（聚集索引）
2014-06-10
SQLServer索引
SQL Server 索引和表體系結構（非聚集索引）
2014-06-12
SQLServer索引
MySQL單列索引和組合索引的區別
2017-05-27
MySql索引
redis訊息佇列簡單應用
2019-05-06
Redis佇列
軟體開發人員真的瞭解SQL索引嗎(聚集索引)
2010-12-21
SQL索引
SQL Server 2008 建立非聚集索引
2008-12-18
SQLServer索引
ZwQuerySystemInformation列舉核心模組及簡單應用
2014-08-12
ORM
MySQL單列索引和組合索引的區別介紹
2015-12-10
MySql索引
PostgreSQL10.0preview功能增強-OLAP增強向量聚集索引(列儲存擴充套件)
2017-03-14
SQLView索引套件
InnoDB事務鎖之行鎖-聚集索引加鎖流程
2018-01-31
索引
SQL Server 2008 非聚集索引設計
2008-12-18
SQLServer索引
SAP Fiori應用索引大全
2020-01-17
索引
聯合索引和多個單列索引使用中的索引命中情況及索引建立原則
2022-10-10
索引
HDU3530 單調佇列的應用
2014-11-12
佇列
MySQL索引理解和應用
2022-11-29
MySql索引
SqlServer索引的原理與應用
2014-04-18
SQLServer索引
oracle的逆序鍵索引應用！
2010-05-08
Oracle索引
從效能的角度談SQL Server聚集索引鍵的選擇
2015-12-21
SQLServer索引
SQL Server調優實戰亂建聚集索引的後果
2009-04-20
SQLServer索引
常數複合索引應用案例
2011-05-19
索引

PostgreSQLGIN單列聚集索引應用

標籤

背景

正文

測試結果

參考

相關文章