PostgreSQL9.x,10,11hash分割槽表用法舉例

德哥發表於2018-07-28

標籤

PostgreSQL , 分割槽表 , 優化器 , 分割槽過濾 , hash 分割槽


背景

PostgreSQL 10開始內建分割槽表語法,當時只支援了range,list兩種分割槽,實際上可以通過LIST實現HASH分割槽。

PostgreSQL 10 hash 分割槽表

使用list支援hash分割槽

postgres=# create table p (id int , info text, crt_time timestamp) partition by list (abs(mod(id,4)));  
CREATE TABLE  
  
postgres=# create table p0 partition of p for values in (0);  
CREATE TABLE  
postgres=# create table p1 partition of p for values in (1);  
CREATE TABLE  
postgres=# create table p2 partition of p for values in (2);  
CREATE TABLE  
postgres=# create table p3 partition of p for values in (3);  
CREATE TABLE  

分割槽表如下

postgres=# d+ p  
                                                Table "public.p"  
  Column  |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description   
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------  
 id       | integer                     |           |          |         | plain    |              |   
 info     | text                        |           |          |         | extended |              |   
 crt_time | timestamp without time zone |           |          |         | plain    |              |   
Partition key: LIST (abs(mod(id, 4)))  
Partitions: p0 FOR VALUES IN (0),  
            p1 FOR VALUES IN (1),  
            p2 FOR VALUES IN (2),  
            p3 FOR VALUES IN (3)  

寫入資料

postgres=# insert into p select generate_series(1,1000),md5(random()::text),now();  
INSERT 0 1000  
postgres=# select tableoid::regclass,id from p limit 10;  
 tableoid | id   
----------+----  
 p0       |  4  
 p0       |  8  
 p0       | 12  
 p0       | 16  
 p0       | 20  
 p0       | 24  
 p0       | 28  
 p0       | 32  
 p0       | 36  
 p0       | 40  
(10 rows)  

普通的查詢,無法做到分割槽的過濾

postgres=# explain select * from p where id=1 ;  
                        QUERY PLAN                          
----------------------------------------------------------  
 Append  (cost=0.00..96.50 rows=24 width=44)  
   ->  Seq Scan on p0  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 1)  
   ->  Seq Scan on p1  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 1)  
   ->  Seq Scan on p2  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 1)  
   ->  Seq Scan on p3  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 1)  
(9 rows)  

一定要帶上分割槽條件,才可以做到分割槽過濾

postgres=# explain select * from p where id=1 and abs(mod(id, 4))=abs(mod(1, 4));  
                        QUERY PLAN                          
----------------------------------------------------------  
 Append  (cost=0.00..32.60 rows=1 width=44)  
   ->  Seq Scan on p1  (cost=0.00..32.60 rows=1 width=44)  
         Filter: ((id = 1) AND (abs(mod(id, 4)) = 1))  
(3 rows)  

PostgreSQL 11 hash 分割槽表

PostgreSQL 11同樣可以使用與10一樣的方法,LIST來實現HASH分割槽,但是有一個更加優雅的方法,直接使用HASH分割槽。

postgres=# create table p (id int , info text, crt_time timestamp) partition by hash (id);  
CREATE TABLE  
  
postgres=# create table p0 partition of p  for values WITH (MODULUS 4, REMAINDER 0);  
CREATE TABLE  
postgres=# create table p1 partition of p  for values WITH (MODULUS 4, REMAINDER 1);  
CREATE TABLE  
postgres=# create table p2 partition of p  for values WITH (MODULUS 4, REMAINDER 2);  
CREATE TABLE  
postgres=# create table p3 partition of p  for values WITH (MODULUS 4, REMAINDER 3);  
CREATE TABLE  

表結構如下

postgres=# d+ p  
                                                Table "public.p"  
  Column  |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description   
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------  
 id       | integer                     |           |          |         | plain    |              |   
 info     | text                        |           |          |         | extended |              |   
 crt_time | timestamp without time zone |           |          |         | plain    |              |   
Partition key: HASH (id)  
Partitions: p0 FOR VALUES WITH (modulus 4, remainder 0),  
            p1 FOR VALUES WITH (modulus 4, remainder 1),  
            p2 FOR VALUES WITH (modulus 4, remainder 2),  
            p3 FOR VALUES WITH (modulus 4, remainder 3)  

表分割槽定義,內建的約束是一個HASH函式的返回值

postgres=# d+ p0  
                                                Table "public.p0"  
  Column  |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description   
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------  
 id       | integer                     |           |          |         | plain    |              |   
 info     | text                        |           |          |         | extended |              |   
 crt_time | timestamp without time zone |           |          |         | plain    |              |   
Partition of: p FOR VALUES WITH (modulus 4, remainder 0)  
Partition constraint: satisfies_hash_partition(`180289`::oid, 4, 0, id)  
  
  
postgres=# d+ p1  
                                                Table "public.p1"  
  Column  |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description   
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------  
 id       | integer                     |           |          |         | plain    |              |   
 info     | text                        |           |          |         | extended |              |   
 crt_time | timestamp without time zone |           |          |         | plain    |              |   
Partition of: p FOR VALUES WITH (modulus 4, remainder 1)  
Partition constraint: satisfies_hash_partition(`180289`::oid, 4, 1, id)  

這個hash函式的定義如下,他一定是一個immutable 函式,所以可以用於分割槽過濾

postgres=# x  
Expanded display is on.  
postgres=# df+ satisfies_hash_partition  
List of functions  
-[ RECORD 1 ]-------+--------------------------------------  
Schema              | pg_catalog  
Name                | satisfies_hash_partition  
Result data type    | boolean  
Argument data types | oid, integer, integer, VARIADIC "any"  
Type                | func  
Volatility          | immutable  
Parallel            | safe  
Owner               | postgres  
Security            | invoker  
Access privileges   |   
Language            | internal  
Source code         | satisfies_hash_partition  
Description         | hash partition CHECK constraint  

PostgreSQL 11終於可以只輸入分割槽欄位值就可以做到分割槽過濾了

postgres=# explain select * from p where id=1;  
                        QUERY PLAN                          
----------------------------------------------------------  
 Append  (cost=0.00..24.16 rows=6 width=44)  
   ->  Seq Scan on p0  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 1)  
(3 rows)  
  
postgres=# explain select * from p where id=0;  
                        QUERY PLAN                          
----------------------------------------------------------  
 Append  (cost=0.00..24.16 rows=6 width=44)  
   ->  Seq Scan on p0  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 0)  
(3 rows)  
  
postgres=# explain select * from p where id=2;  
                        QUERY PLAN                          
----------------------------------------------------------  
 Append  (cost=0.00..24.16 rows=6 width=44)  
   ->  Seq Scan on p2  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 2)  
(3 rows)  
  
postgres=# explain select * from p where id=3;  
                        QUERY PLAN                          
----------------------------------------------------------  
 Append  (cost=0.00..24.16 rows=6 width=44)  
   ->  Seq Scan on p1  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 3)  
(3 rows)  

它受控於一個開關,當關閉後,就無法只通過分割槽值來過濾分割槽。

postgres=# set enable_partition_pruning =off;  
SET  
postgres=# explain select * from p where id=0;  
                        QUERY PLAN                          
----------------------------------------------------------  
 Append  (cost=0.00..96.62 rows=24 width=44)  
   ->  Seq Scan on p0  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 0)  
   ->  Seq Scan on p1  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 0)  
   ->  Seq Scan on p2  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 0)  
   ->  Seq Scan on p3  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 0)  
(9 rows)  

PostgreSQL 繼承表 hash 分割槽表實現

《PostgreSQL 傳統 hash 分割槽方法和效能》

postgres=# explain select * from tbl where abs(mod(id,4)) = abs(mod(1,4)) and id=1;    
                                QUERY PLAN                                    
--------------------------------------------------------------------------    
 Append  (cost=0.00..979127.84 rows=3 width=45)    
   ->  Seq Scan on tbl  (cost=0.00..840377.67 rows=2 width=45)    
         Filter: ((id = 1) AND (abs(mod(id, 4)) = 1))    
   ->  Seq Scan on tbl1  (cost=0.00..138750.17 rows=1 width=45)    
         Filter: ((id = 1) AND (abs(mod(id, 4)) = 1))    
(5 rows)    

pg_pathman分割槽方法

支援9.5以上的版本

《PostgreSQL 9.5+ 高效分割槽表實現 – pg_pathman》

小結

1、PostgreSQL 10內建分割槽表,為了HASH分割槽,可以使用LIST分割槽的方法,但是為了讓資料庫可以自動過濾分割槽,一定要帶上HASH分割槽條件表示式到SQL中。

2、PostgreSQL 11內建分割槽表,內建了HASH分割槽,並且支援只按照HASH分割槽條件,自動過濾分割槽。

3、繼承表的方法,同樣可以實現HASH分割槽,需要建立觸發器,同時主表在查詢時依舊會被查詢到。

以上三種方法,必須保證constraint_exclusion引數設定為partition或者on, 否則無法做到分割槽自動過濾。

對於PostgreSQL 11,為了實現只輸入分割槽欄位的值就能夠滿足分割槽自動過濾,還需要設定enable_partition_pruning為on.

索性這些引數預設都是OK的。

4、pg_pathman是通過custom scan介面實現的分割槽,是目前為止,效能最好的,鎖粒度最低的方法。

參考

《PostgreSQL 11 preview – 分割槽表 增強 彙總》

《PostgreSQL 自動建立分割槽實踐 – 寫入觸發器》

《PostgreSQL 11 preview – 分割槽過濾控制引數 – enable_partition_pruning》

《Greenplum 計算能力估算 – 暨多大表需要分割槽,單個分割槽多大適宜》

《PostgreSQL 11 preview – 分割槽表智慧並行聚合、分組計算(已類似MPP架構,效能暴增)》

《PostgreSQL 並行vacuum patch – 暨為什麼需要並行vacuum或分割槽表》

《分割槽表鎖粒度差異 – pg_pathman VS native partition table》

《PostgreSQL 11 preview – 分割槽表用法及增強 – 增加HASH分割槽支援 (hash, range, list)》

《PostgreSQL 11 preview – Parallel Append(包括 union all分割槽查詢) (多表平行計算) sharding架構平行計算核心功能之一》

《PostgreSQL 11 preview – 分割槽表智慧並行JOIN (已類似MPP架構,效能暴增)》

《PostgreSQL 查詢涉及分割槽表過多導致的效能問題 – 效能診斷與優化(大量BIND, spin lock, SLEEP程式)》

《PostgreSQL 商用版本EPAS(阿里雲ppas(Oracle 相容版)) – 分割槽表效能優化 (堪比pg_pathman)》

《PostgreSQL 傳統 hash 分割槽方法和效能》

《HTAP資料庫 PostgreSQL 場景與效能測試之 45 – (OLTP) 資料量與效能的線性關係(10億+無衰減), 暨單表多大需要分割槽》

《PostgreSQL 10 內建分割槽 vs pg_pathman perf profiling》

《PostgreSQL 10.0 preview 功能增強 – 內建分割槽表》

《PostgreSQL 9.5+ 高效分割槽表實現 – pg_pathman》


相關文章