PostgreSQL11preview-索引優化。filter智慧消除、分割槽索引智慧合併

德哥發表於2018-04-18

原文網址 : https://flycode.co/archives/185841

SQLView索引優化Filter

背景

當資料量較大時，對資料分割槽是一種比較不錯的資料管理手段。

每個分割槽都有對應的邊界定義，在PostgreSQL中我們可以把它理解為分割槽的constraint。

目前PostgreSQL在優化器的細節方面進行打磨，constraint可以被優化器利用來做一些優化的工作。

1、例如：

每個班級一個分割槽，你要找在某個班級並且學號=1的同學。

假設你在學號上建立了一個索引。

那麼真實的執行計劃會怎麼樣呢？在沒有優化的情況下，可能是這樣的

    ->  Index Scan using pi1 on 班級101分割槽表  (cost=0.29..8.31 rows=1 width=8)  
          Index Cond: (學號 = 1)  
          Filter: (班級=101)

由於索引上沒有班級這個欄位的內容，所以被放到了filter裡面，意味著每一條符合學號=1的記錄都需要再次被filter一遍。而實際上這是板上釘釘的，因為你已經在這個分割槽裡面了。不需要再filter。

顯然，優化器可以對這樣的場景進行優化，去除Filter。

再看下面這個例子，分割槽上面都有K欄位的範圍約束，選擇分割槽表時優化器根據輸入條件，將資料收斂到了對應的分割槽，但是在索引掃描時，FILTRER沒有被去除。

create table base (k integer primary key, v integer);  
create table part1 (check (k between 1 and 10000)) inherits (base);  
create table part2 (check (k between 10001 and 20000)) inherits (base);  
create index pi1 on part1(v);  
create index pi2 on part2(v);  
insert int part1 values (generate series(1,10000), random());  
insert into part2 values (generate_series(10001,20000), random());  
explain select * from base where k between 1 and 20000 and v = 100;  
                               QUERY PLAN  
-----------------------------------------------------------------------  
  Append  (cost=0.00..15.65 rows=3 width=8)  
    ->  Seq Scan on base  (cost=0.00..0.00 rows=1 width=8)  
          Filter: ((k >= 1) AND (k <= 20000) AND (v = 100))  
    ->  Index Scan using pi1 on part1  (cost=0.29..8.31 rows=1 width=8)  
          Index Cond: (v = 100)  
          Filter: ((k >= 1) AND (k <= 20000))  
    ->  Index Scan using pi2 on part2  (cost=0.29..7.34 rows=1 width=8)  
          Index Cond: (v = 100)  
          Filter: ((k >= 1) AND (k <= 20000))

2、對於partial index也是一樣的道理，當可以選擇多個partial index時，PostgreSQL目前的版本，不能智慧的進行最優索引的選擇。

create table t (k integer primary key, v integer);  
insert into t values (generate_series(1,20000),random());  
create index i1 on t(v) where k between 1 and 10000;  
create index i2 on t(v) where k between 10001 and 20000;  
postgres=# explain select * from t where k between 1 and 10000 and v = 100;  
                          QUERY PLAN  
------------------------------------------------------------  
  Index Scan using i1 on t  (cost=0.29..7.28 rows=1 width=8)  
    Index Cond: (v = 100)  
(2 rows)  
  
  
Here we get perfect plan. Let`s try to extend search interval:  
  
  
postgres=# explain select * from t where k between 1 and 20000 and v = 100;  
                             QUERY PLAN  
------------------------------------------------------------------  
  Index Scan using t_pkey on t  (cost=0.29..760.43 rows=1 width=8)  
    Index Cond: ((k >= 1) AND (k <= 20000))  
    Filter: (v = 100)  
(3 rows)

實際上可以使用多個partial index，從而實現最優的成本。

Unfortunately in this case Postgres is not able to apply partial indexes.  
And this is what I expected to get:  
  
postgres=# explain select * from t where k between 1 and 10000 and v =   
100 union all select * from t where k between 10001 and 20000 and v = 100;  
                               QUERY PLAN  
----------------------------------------------------------------------  
  Append  (cost=0.29..14.58 rows=2 width=8)  
    ->  Index Scan using i1 on t  (cost=0.29..7.28 rows=1 width=8)  
          Index Cond: (v = 100)  
    ->  Index Scan using i2 on t t_1  (cost=0.29..7.28 rows=1 width=8)  
          Index Cond: (v = 100)

PostgreSQL 11 在優化器方面，可能針對這個場景會進行優化。patch如下。

https://commitfest.postgresql.org/17/1264/

https://www.postgresql.org/message-id/flat/27516421-5afa-203c-e22a-8407e9187327@postgrespro.ru#27516421-5afa-203c-e22a-8407e9187327@postgrespro.ru

Hi hackers,  
  
I am trying to compare different ways of optimizing work with huge   
append-only tables in PostgreSQL where primary key is something like   
timestamp and queries are usually accessing most recent data using some   
secondary keys. Size of secondary index is one of the most critical   
factors limiting  insert/search performance. As far as data is inserted   
in timestamp ascending order, access to primary key is well localized   
and accessed tables are present in memory. But if we create secondary   
key for the whole table, then access to it will require random reads   
from the disk and significantly decrease performance.  
  
There are two well known solutions of the problem:  
1. Table partitioning  
2. Partial indexes  
  
This approaches I want to compare. First of all I want to check if   
optimizer is able to generate efficient query execution plan covering   
different time intervals.  
Unfortunately in both cases generated plan is not optimal.  
  
1. Table partitioning:  
  
create table base (k integer primary key, v integer);  
create table part1 (check (k between 1 and 10000)) inherits (base);  
create table part2 (check (k between 10001 and 20000)) inherits (base);  
create index pi1 on part1(v);  
create index pi2 on part2(v);  
insert int part1 values (generate series(1,10000), random());  
insert into part2 values (generate_series(10001,20000), random());  
explain select * from base where k between 1 and 20000 and v = 100;  
                               QUERY PLAN  
-----------------------------------------------------------------------  
  Append  (cost=0.00..15.65 rows=3 width=8)  
    ->  Seq Scan on base  (cost=0.00..0.00 rows=1 width=8)  
          Filter: ((k >= 1) AND (k <= 20000) AND (v = 100))  
    ->  Index Scan using pi1 on part1  (cost=0.29..8.31 rows=1 width=8)  
          Index Cond: (v = 100)  
          Filter: ((k >= 1) AND (k <= 20000))  
    ->  Index Scan using pi2 on part2  (cost=0.29..7.34 rows=1 width=8)  
          Index Cond: (v = 100)  
          Filter: ((k >= 1) AND (k <= 20000))  
  
Questions:  
- Is there some way to avoid sequential scan of parent table? Yes, it is   
empty and so sequential scan will not take much time, but ... it still   
requires some additional actions and so increasing query execution time.  
- Why index scan of partition indexes includes filter condition if it is   
guaranteed by check constraint that all records of this partition match   
search predicate?  
  
  
2. Partial indexes:  
  
create table t (k integer primary key, v integer);  
insert into t values (generate_series(1,20000),random());  
create index i1 on t(v) where k between 1 and 10000;  
create index i2 on t(v) where k between 10001 and 20000;  
postgres=# explain select * from t where k between 1 and 10000 and v = 100;  
                          QUERY PLAN  
------------------------------------------------------------  
  Index Scan using i1 on t  (cost=0.29..7.28 rows=1 width=8)  
    Index Cond: (v = 100)  
(2 rows)  
  
  
Here we get perfect plan. Let`s try to extend search interval:  
  
  
postgres=# explain select * from t where k between 1 and 20000 and v = 100;  
                             QUERY PLAN  
------------------------------------------------------------------  
  Index Scan using t_pkey on t  (cost=0.29..760.43 rows=1 width=8)  
    Index Cond: ((k >= 1) AND (k <= 20000))  
    Filter: (v = 100)  
(3 rows)  
  
Unfortunately in this case Postgres is not able to apply partial indexes.  
And this is what I expected to get:  
  
postgres=# explain select * from t where k between 1 and 10000 and v =   
100 union all select * from t where k between 10001 and 20000 and v = 100;  
                               QUERY PLAN  
----------------------------------------------------------------------  
  Append  (cost=0.29..14.58 rows=2 width=8)  
    ->  Index Scan using i1 on t  (cost=0.29..7.28 rows=1 width=8)  
          Index Cond: (v = 100)  
    ->  Index Scan using i2 on t t_1  (cost=0.29..7.28 rows=1 width=8)  
          Index Cond: (v = 100)  
  
  
I wonder if there are some principle problems in supporting this two   
things in optimizer:  
1. Remove search condition for primary key if it is fully satisfied by   
derived table check constraint.  
2. Append index scans of several partial indexes if specified interval   
is covered by their conditions.  
  
I wonder if someone is familiar with this part of optimizer ad can   
easily fix it.  
Otherwise I am going to spend some time on solving this problems (if   
community think that such optimizations will be useful).  
  
--   
  
Konstantin Knizhnik  
Postgres Professional: http://www.postgrespro.com  
The Russian Postgres Company

小結

PostgreSQL 11在優化器細節方面進行了很多優化，包括前面提到的分割槽表的智慧並行JOIN，group, 聚合等。

以及本文提到的，在索引掃描時，

1、根據分割槽表的約束條件，如果索引欄位未包含分割槽鍵，會自動消除filter。

2、在輸入條件可以用到多個partial index時，自動選擇union all掃描。（這項優化，將來也能用在分割槽索引上面，partial index是分割槽索引的一個雛形。）

參考

https://commitfest.postgresql.org/17/1264/

https://www.postgresql.org/message-id/flat/27516421-5afa-203c-e22a-8407e9187327@postgrespro.ru#27516421-5afa-203c-e22a-8407e9187327@postgrespro.ru

SQL優化案例-分割槽索引之無字首索引（六）
2018-08-21
SQL優化索引
Oracle優化案例-分割槽索引之無字首索引（六）
2018-08-06
Oracle優化索引
關於numpy的索引、合併、分割
2020-10-18
索引
SQL最佳化案例-分割槽索引之無字首索引（六）
2018-11-28
SQL索引
MySQL 優化之 index_merge （索引合併）
2018-05-11
MySql優化Index索引
mysql 索引合併
2021-09-10
MySql索引
理解索引：索引優化
2018-06-04
索引優化
移動分割槽表和分割槽索引的表空間
2018-05-23
索引
SQLServer索引優化（3）：對於建組合索引的要求
2022-01-05
SQLServer索引優化
【INDEX】Oracle分割槽索引技術詳解
2022-03-08
IndexOracle索引
Oracle分割槽表基礎運維-06分割槽表索引
2020-05-18
Oracle運維索引
win10怎麼合併分割槽_win10合併分割槽的方法
2019-11-25
Win10
12.索引合併-indexMerge
2024-10-28
索引Index
mac分割槽合併APFS容器
2019-01-17
Mac
Elasitcsearch索引優化
2018-11-15
索引優化
MySQL索引優化
2021-01-05
MySql索引優化
mysql優化索引
2020-12-23
MySql優化索引
增加表分割槽時，為local分割槽索引指定不同表空間的方法
2019-04-23
索引
PostgreSQL11preview-索引增強彙總
2018-07-28
SQLView索引
index_oracle索引梳理系列及分割槽表梳理
2018-11-08
IndexOracle索引
mongo 建立使用者建hash 分割槽建索引
2018-08-30
Go索引
Mysql索引優化之索引的分類
2019-07-04
MySql索引優化
win10u盤分割槽怎麼合併 win10如何把U分割槽合併
2020-10-06
Win10
PostgreSQL11preview-分割槽表增強彙總
2018-07-28
SQLView
Mysql索引優化（一）
2020-12-02
MySql索引優化
使用Elasticsearch的動態索引和索引優化
2019-03-28
Elasticsearch索引優化
MySQL調優之索引優化
2021-11-09
MySql索引優化
淺談MySQL的B樹索引與索引優化
2018-03-26
MySql索引優化
事務、全域性索引、透明分散式，再見，分割槽健！
2022-06-02
索引分散式
MongoDB索引優化詳解
2019-02-23
MongoDB索引優化
MySQL 筆記 - 索引優化
2018-08-12
MySql筆記索引優化
MySQL——索引優化實戰
2018-08-08
MySql索引優化
效能優化：索引下推
2020-11-18
優化索引
MySQL 字串索引優化方案
2020-07-01
MySql字串索引優化
MySQL 索引原理以及優化
2019-06-03
MySql索引優化
MySQL優化之索引解析
2022-03-15
MySql優化索引
mysql索引原理及優化
2021-06-20
MySql索引優化
MySQL 配置索引頁的合併閾值
2022-03-17
MySql索引

PostgreSQL11preview-索引優化。filter智慧消除、分割槽索引智慧合併

標籤

背景

小結

參考

相關文章