PostgreSQL11preview-BRIN索引介面功能擴充套件(BLOOMFILTER、minmax分段)

德哥發表於2018-04-18

原文網址 : https://flycode.co/archives/185833

SQLView索引套件OOMFilter

背景

BRIN索引是PG的一種塊索引介面，儲存指定連續資料塊內被索引欄位的後設資料。

https://www.postgresql.org/docs/devel/static/brin.html

目前BRIN儲存的後設資料包括被索引欄位在每個指定連續資料塊區間的MIN,MAX值。所以對於比較分散的資料實際上效果是很差的，對於資料分佈比較有時序屬性的（或者說線性相關性很好）的欄位，效果特別贊。

《HTAP資料庫 PostgreSQL 場景與效能測試之 24 – (OLTP) 物聯網 – 時序資料併發寫入(含時序索引BRIN)》

《PostgreSQL BRIN索引的pages_per_range選項優化與核心程式碼優化思考》

《萬億級電商廣告 – brin黑科技帶你(最低成本)玩轉毫秒級圈人(視覺挖掘姊妹篇) – 阿里雲RDS PostgreSQL, HybridDB for PostgreSQL最佳實踐》

《PostGIS空間索引(GiST、BRIN、R-Tree)選擇、優化 – 阿里雲RDS PostgreSQL最佳實踐》

《自動選擇正確索引訪問介面(btree,hash,gin,gist,sp-gist,brin,bitmap…)的方法》

《PostgreSQL 並行寫入堆表，如何保證時序線性儲存 – BRIN索引優化》

《PostgreSQL 10.0 preview 功能增強 – BRIN 索引更新smooth化》

《PostgreSQL 聚集儲存與 BRIN索引 – 高併發行為、軌跡類大吞吐資料查詢場景解說》

《PostgreSQL 物聯網黑科技 – 瘦身幾百倍的索引(BRIN index)》

《PostgreSQL 9.5 new feature – lets BRIN be used with R-Tree-like indexing strategies For “inclusion” opclasses》

《PostgreSQL 9.5 new feature – BRIN (block range index) index》

目前BRIN存在的可以改進的點：

當資料分佈與HEAP儲存的線性相關性很差時，效果不好。如何改進呢？

多段MIN,MAX可能是一個非常有效果的改進方法，舉個例子，我們有一個非常大的小區，有很多棟房子，然後每一棟房子我們儲存了年齡最小和年齡最大的住戶，比如說真實的分佈是每棟樓都包含少部分是1-35歲，1個80歲的。

現在要找一位40歲的住戶，如果是BRIN索引，會把所有的樓棟都返回給你原因是每棟樓的範圍都是1-80歲。

如果使用多段儲存，那麼應該是1-35, 80。這樣的話使用BRIN索引找40歲的住戶直接返回0條記錄。

1、現在PostgreSQL 11馬上要提交的PATCH，就包含了multi min max的優化

https://commitfest.postgresql.org/17/1348/

2、第二個改進是引入了BRIN的BLOOM FILTER，我們知道BLOOM FILTER用少量的BIT位表示某被索引值是否存在，存在則設定這些BIT為1，如果對應的BITS不全為1，則說明沒有這條記錄。但是為了節約空間，BIT存在衝撞，例如某個值的BITS可能被其他一個或多個值的BITS覆蓋。

那麼就會出現一種情況，索引告訴你包含某個值，並不一定真的包含。但是索引告訴你不包含某個值，那就肯定不包含。

所以

select * from tbl where a=? and b=? and c=? or d=?

bloom會告訴你一個較大的結果集，然後再回HEAP表，使用FILTER過濾不滿足條件的記錄。

https://en.wikipedia.org/wiki/Bloom_filter

https://www.postgresql.org/docs/devel/static/bloom.html

目前使用bloom外掛可以建立BLOOM索引，而PostgreSQL 11，會把這個功能加入BRIN索引介面中。

min max 分段

這個是POC裡面的例子，可以看到使用分段MIN MAX後，BRIN索引的過濾性好了很多。

PATCH連線

https://commitfest.postgresql.org/17/1348/

https://www.postgresql.org/message-id/flat/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com#c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com

To illustrate the improvement, consider this table:

    create table a (val float8) with (fillfactor = 90);  
    insert into a select i::float from generate_series(1,10000000) s(i);  
    update a set val = 1 where random() < 0.01;  
    update a set val = 10000000 where random() < 0.01;

Which means the column `val` is almost perfectly correlated with the
position in the table (which would be great for BRIN minmax indexes),
but then 1% of the values is set to 1 and 10.000.000. That means pretty
much every range will be [1,10000000], which makes this BRIN index
mostly useless, as illustrated by these explain plans:

    create index on a using brin (val) with (pages_per_range = 16);  
  
    explain analyze select * from a where val = 100;  
                                  QUERY PLAN  
    --------------------------------------------------------------------  
     Bitmap Heap Scan on a  (cost=54.01..10691.02 rows=8 width=8)  
                            (actual time=5.901..785.520 rows=1 loops=1)  
       Recheck Cond: (val = `100`::double precision)  
       Rows Removed by Index Recheck: 9999999  
       Heap Blocks: lossy=49020  
       ->  Bitmap Index Scan on a_val_idx  
             (cost=0.00..54.00 rows=3400 width=0)  
             (actual time=5.792..5.792 rows=490240 loops=1)  
             Index Cond: (val = `100`::double precision)  
     Planning time: 0.119 ms  
     Execution time: 785.583 ms  
    (8 rows)  
  
    explain analyze select * from a where val between 100 and 10000;  
                                  QUERY PLAN  
    ------------------------------------------------------------------  
     Bitmap Heap Scan on a  (cost=55.94..25132.00 rows=7728 width=8)  
                      (actual time=5.939..858.125 rows=9695 loops=1)  
       Recheck Cond: ((val >= `100`::double precision) AND  
                      (val <= `10000`::double precision))  
       Rows Removed by Index Recheck: 9990305  
       Heap Blocks: lossy=49020  
       ->  Bitmap Index Scan on a_val_idx  
             (cost=0.00..54.01 rows=10200 width=0)  
             (actual time=5.831..5.831 rows=490240 loops=1)  
             Index Cond: ((val >= `100`::double precision) AND  
                          (val <= `10000`::double precision))  
     Planning time: 0.139 ms  
     Execution time: 871.132 ms  
    (8 rows)

Obviously, the queries do scan the whole table and then eliminate most
of the rows in “Index Recheck”. Decreasing pages_per_range does not
really make a measurable difference in this case – it eliminates maybe
10% of the rechecks, but most pages still have very wide minmax range.

With the patch, it looks about like this:

    create index on a using brin (val float8_minmax_multi_ops)  
                            with (pages_per_range = 16);  
  
    explain analyze select * from a where val = 100;  
                                  QUERY PLAN  
    -------------------------------------------------------------------  
     Bitmap Heap Scan on a  (cost=830.01..11467.02 rows=8 width=8)  
                            (actual time=7.772..8.533 rows=1 loops=1)  
       Recheck Cond: (val = `100`::double precision)  
       Rows Removed by Index Recheck: 3263  
       Heap Blocks: lossy=16  
       ->  Bitmap Index Scan on a_val_idx  
             (cost=0.00..830.00 rows=3400 width=0)  
             (actual time=7.729..7.729 rows=160 loops=1)  
             Index Cond: (val = `100`::double precision)  
     Planning time: 0.124 ms  
     Execution time: 8.580 ms  
    (8 rows)  
  
  
    explain analyze select * from a where val between 100 and 10000;  
                                 QUERY PLAN  
    ------------------------------------------------------------------  
     Bitmap Heap Scan on a  (cost=831.94..25908.00 rows=7728 width=8)  
                        (actual time=9.318..23.715 rows=9695 loops=1)  
       Recheck Cond: ((val >= `100`::double precision) AND  
                      (val <= `10000`::double precision))  
       Rows Removed by Index Recheck: 3361  
       Heap Blocks: lossy=64  
       ->  Bitmap Index Scan on a_val_idx  
             (cost=0.00..830.01 rows=10200 width=0)  
             (actual time=9.274..9.274 rows=640 loops=1)  
             Index Cond: ((val >= `100`::double precision) AND  
                          (val <= `10000`::double precision))  
     Planning time: 0.138 ms  
     Execution time: 36.100 ms  
    (8 rows)

bloom filter

https://www.postgresql.org/docs/devel/static/bloom.html

參考

https://commitfest.postgresql.org/17/1348/

https://www.postgresql.org/message-id/flat/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com#c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com

MySQL InnoDB的索引擴充套件
2020-08-19
MySql索引套件
Spring擴充套件介面（4）：InstantiationAwareBeanPostProcessor
2023-10-30
Spring套件Bean
Spring擴充套件介面（2）：BeanDefinitionRegistryPostProcessor
2023-10-10
Spring套件Bean
擴充套件你的KUBECTL功能
2022-07-04
套件
SpringBoot-11 擴充套件功能
2021-04-05
Spring Boot套件
在 easywechat 包上擴充套件 API 介面
2020-04-13
套件API
autofac aop擴充套件透過介面
2024-12-10
套件
kotlin 擴充套件（擴充套件函式和擴充套件屬性）
2019-02-26
Kotlin套件函式
MySQL中InnoDB引擎對索引的擴充套件
2020-03-24
MySql索引套件
Jmeter——元件擴充套件，使其功能更全面
2021-06-08
JMeter元件套件
【Kotlin】擴充套件屬性、擴充套件函式
2024-04-08
Kotlin套件函式
虛擬主機支援哪些擴充套件功能
2018-09-28
套件
WPF之花式控制元件功能擴充套件
2021-07-16
控制元件套件
python算術運算子的擴充套件功能
2021-09-11
Python套件
擴充套件工具
2020-11-21
套件
Sanic 擴充套件
2019-04-21
套件
Mybatis擴充套件
2024-08-14
MyBatis套件
SpringMVC 擴充套件
2024-08-17
SpringMVC套件
ORACLE 擴充套件
2021-02-26
Oracle套件
寫介面的藝術：精簡，可擴充套件
2019-01-04
套件
使用Kotlin擴充套件函式擴充套件Spring Data案例
2021-11-11
Kotlin套件函式Spring
JMeter 擴充套件開發：擴充套件 TCP 取樣器
2022-12-01
JMeter套件TCP
開發教程 | 全志T507去掉IO擴充套件晶片後，如何保留擴充套件引腳功能
2022-07-11
套件晶片
?用Chrome擴充套件管理器, 管理你的擴充套件
2019-03-18
Chrome套件
ASP.NET Core擴充套件庫之Http通用擴充套件
2021-04-19
ASP.NET套件HTTP
iOS 通知擴充套件
2018-10-08
iOS套件
swift擴充套件Extensions
2018-09-05
Swift套件
擴充套件BSGS/exBSGS
2024-03-10
套件
Json擴充套件方法
2024-03-23
JSON套件
分類擴充套件
2019-05-11
套件
提高擴充套件性
2024-11-29
套件
HttpContext擴充套件類
2024-06-22
HTTPContext套件
DOM部分擴充套件
2018-04-18
套件
LINQ擴充套件方法
2024-07-06
套件
Flask 自建擴充套件
2022-03-07
Flask套件
新增php擴充套件
2021-09-09
PHP套件
擴充套件表示式
2021-09-09
套件
擴充套件包上傳
2021-10-17
套件

PostgreSQL11preview-BRIN索引介面功能擴充套件(BLOOMFILTER、minmax分段)

標籤

背景

min max 分段

bloom filter

參考

相關文章