Greenplum海量資料,大寬錶行存VS列存

德哥發表於2018-05-06

標籤

PostgreSQL , Greenplum , 大寬表 , 行存 , 列存


背景

GPDB支援行存列存,而且相關的選擇原理我們之前的文章也有很詳細的分析,同時在效能方面也做過較多的評測對比。

但是在大寬表上,相差到底有多懸殊我們之前沒有對其進行過對比。主要的差異來自deform和QUERY掃描的儲存空間(理論上,資料量越大,同時訪問的列越少,列存的優勢越明顯。或者是訪問到的列越靠後,優勢越明顯)。

一些原理參考

《PostgreSQL VOPS 向量計算 + DBLINK非同步並行 – 單例項 10億 聚合計算跑進2秒》

《PostgreSQL 向量化執行外掛(瓦片式實現-vops) 10x提速OLAP》

一些選擇參考本文末尾的連結文章。

測試

1、建立一個函式,用於建立1000個列的表(每列型別INT8)

create or replace function crt_tbl(name, text, text) returns void as $$  
declare  
  sql text;  
begin  
  sql := `create table `||$1||`(`;  
  for i in 1..1000 loop  
    sql := sql||`c`||i||` int8 default random()*10000,`;  
  end loop;  
  sql := rtrim(sql, `,`);  
  sql := sql||`) with (APPENDONLY=true, ORIENTATION=`||$2||`, COMPRESSTYPE=`||$3||`)`;  
  execute sql;  
end;  
$$ language plpgsql strict;  

2、建立行存表

select crt_tbl(`tbl_1000_row_nonc`, `row`, `none`);  
  
select crt_tbl(`tbl_1000_row_c`, `row`, `zlib`);  

3、建立列存表

select crt_tbl(`tbl_1000_column_nonc`, `column`, `none`);  
  
select crt_tbl(`tbl_1000_column_c`, `column`, `zlib`);  

4、寫入100萬行測試資料

postgres=# insert into tbl_1000_row_nonc (c1) select generate_series(1,1000000);  
INSERT 0 1000000  
Time: 221003.467 ms  
postgres=# insert into tbl_1000_row_c select * from tbl_1000_row_nonc;  
INSERT 0 1000000  
Time: 12298.931 ms  
postgres=# insert into tbl_1000_column_nonc select * from tbl_1000_row_nonc;  
INSERT 0 1000000  
Time: 23332.039 ms  
postgres=# insert into tbl_1000_column_c select * from tbl_1000_row_nonc;  
INSERT 0 1000000  
Time: 17017.119 ms  

5、空間對比

postgres=# select pg_size_pretty(pg_total_relation_size(`tbl_1000_row_nonc`));  
 pg_size_pretty   
----------------  
 7653 MB  
(1 row)  
  
Time: 282.180 ms  
postgres=# select pg_size_pretty(pg_total_relation_size(`tbl_1000_column_nonc`));  
 pg_size_pretty   
----------------  
 7647 MB  
(1 row)  
  
Time: 55.315 ms  
  
postgres=# select pg_size_pretty(pg_total_relation_size(`tbl_1000_row_c`));  
 pg_size_pretty   
----------------  
 2522 MB  
(1 row)  
  
Time: 56.017 ms  
postgres=# select pg_size_pretty(pg_total_relation_size(`tbl_1000_column_c`));  
 pg_size_pretty   
----------------  
 2520 MB  
(1 row)  
  
Time: 55.557 ms  

6、第一列查詢對比

postgres=# explain analyze select c1,count(*) from tbl_1000_row_nonc group by c1;  
                                                                 QUERY PLAN                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice1; segments: 48)  (cost=259714.00..272214.00 rows=1000000 width=16)  
   Rows out:  1000000 rows at destination with 352 ms to end, start offset by 1.483 ms.  
   ->  HashAggregate  (cost=259714.00..272214.00 rows=20834 width=16)  
         Group By: c1  
         Rows out:  Avg 20833.3 rows x 48 workers.  Max 20854 rows (seg42) with 0.005 ms to first row, 101 ms to end, start offset by 63 ms.  
         ->  Append-only Scan on tbl_1000_row_nonc  (cost=0.00..254714.00 rows=20834 width=8)  
               Rows out:  0 rows (seg0) with 19 ms to end, start offset by 82 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 405K bytes.  
   (slice1)    Executor memory: 500K bytes avg x 48 workers, 500K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 353.793 ms  
(15 rows)  
  
postgres=# explain analyze select c1,count(*) from tbl_1000_row_c group by c1;  
                                                                 QUERY PLAN                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice1; segments: 48)  (cost=95537.00..108037.00 rows=1000000 width=16)  
   Rows out:  1000000 rows at destination with 1128 ms to end, start offset by 1.360 ms.  
   ->  HashAggregate  (cost=95537.00..108037.00 rows=20834 width=16)  
         Group By: c1  
         Rows out:  Avg 20833.3 rows x 48 workers.  Max 20854 rows (seg42) with 0.003 ms to first row, 699 ms to end, start offset by 42 ms.  
         ->  Append-only Scan on tbl_1000_row_c  (cost=0.00..90537.00 rows=20834 width=8)  
               Rows out:  0 rows (seg0) with 13 ms to end, start offset by 42 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 405K bytes.  
   (slice1)    Executor memory: 500K bytes avg x 48 workers, 500K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 1130.006 ms  
(15 rows)  
  
  
postgres=# explain analyze select c1,count(*) from tbl_1000_column_nonc group by c1;  
                                                                  QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice1; segments: 48)  (cost=259518.00..272018.00 rows=1000000 width=16)  
   Rows out:  1000000 rows at destination with 245 ms to end, start offset by 1.365 ms.  
   ->  HashAggregate  (cost=259518.00..272018.00 rows=20834 width=16)  
         Group By: c1  
         Rows out:  Avg 20833.3 rows x 48 workers.  Max 20854 rows (seg42) with 0.005 ms to first row, 9.968 ms to end, start offset by 60 ms.  
         ->  Append-only Columnar Scan on tbl_1000_column_nonc  (cost=0.00..254518.00 rows=20834 width=8)  
               Rows out:  0 rows (seg0) with 16 ms to end, start offset by 60 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 405K bytes.  
   (slice1)    Executor memory: 821K bytes avg x 48 workers, 821K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 246.967 ms  
(15 rows)  
  
  
postgres=# explain analyze select c1,count(*) from tbl_1000_column_c group by c1;  
                                                                  QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice1; segments: 48)  (cost=259518.00..272018.00 rows=1000000 width=16)  
   Rows out:  1000000 rows at destination with 249 ms to end, start offset by 1.450 ms.  
   ->  HashAggregate  (cost=259518.00..272018.00 rows=20834 width=16)  
         Group By: c1  
         Rows out:  Avg 20833.3 rows x 48 workers.  Max 20854 rows (seg42) with 0.004 ms to first row, 8.861 ms to end, start offset by 46 ms.  
         ->  Append-only Columnar Scan on tbl_1000_column_c  (cost=0.00..254518.00 rows=20834 width=8)  
               Rows out:  0 rows (seg0) with 16 ms to end, start offset by 45 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 405K bytes.  
   (slice1)    Executor memory: 853K bytes avg x 48 workers, 853K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 250.583 ms  
(15 rows)  

7、最後一列查詢對比

postgres=# explain analyze select c1000,count(*) from tbl_1000_row_nonc group by c1000;  
                                                                      QUERY PLAN                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice2; segments: 48)  (cost=260048.04..260167.34 rows=9544 width=16)  
   Rows out:  10001 rows at destination with 296 ms to end, start offset by 1.806 ms.  
   ->  HashAggregate  (cost=260048.04..260167.34 rows=199 width=16)  
         Group By: tbl_1000_row_nonc.c1000  
         Rows out:  Avg 208.4 rows x 48 workers.  Max 223 rows (seg17) with 0.002 ms to first row, 133 ms to end, start offset by 21 ms.  
         ->  Redistribute Motion 48:48  (slice1; segments: 48)  (cost=259714.00..259904.88 rows=199 width=16)  
               Hash Key: tbl_1000_row_nonc.c1000  
               Rows out:  Avg 8749.4 rows x 48 workers at destination.  Max 9360 rows (seg46) with 141 ms to end, start offset by 21 ms.  
               ->  HashAggregate  (cost=259714.00..259714.00 rows=199 width=16)  
                     Group By: tbl_1000_row_nonc.c1000  
                     Rows out:  Avg 8749.4 rows x 48 workers.  Max 8814 rows (seg18) with 0.003 ms to first row, 209 ms to end, start offset by 63 ms.  
                     ->  Append-only Scan on tbl_1000_row_nonc  (cost=0.00..254714.00 rows=20834 width=8)  
                           Rows out:  0 rows (seg0) with 16 ms to end, start offset by 64 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 417K bytes.  
   (slice1)    Executor memory: 1818K bytes avg x 48 workers, 1818K bytes max (seg0).  
   (slice2)    Executor memory: 434K bytes avg x 48 workers, 434K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 298.222 ms  
(22 rows)  
  
postgres=# explain analyze select c1000,count(*) from tbl_1000_row_c group by c1000;  
                                                                       QUERY PLAN                                                                         
--------------------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice2; segments: 48)  (cost=95886.65..96011.53 rows=9990 width=16)  
   Rows out:  10001 rows at destination with 1244 ms to end, start offset by 1.760 ms.  
   ->  HashAggregate  (cost=95886.65..96011.53 rows=209 width=16)  
         Group By: tbl_1000_row_c.c1000  
         Rows out:  Avg 208.4 rows x 48 workers.  Max 223 rows (seg17) with 0.002 ms to first row, 432 ms to end, start offset by 32 ms.  
         ->  Redistribute Motion 48:48  (slice1; segments: 48)  (cost=95537.00..95736.80 rows=209 width=16)  
               Hash Key: tbl_1000_row_c.c1000  
               Rows out:  Avg 8749.4 rows x 48 workers at destination.  Max 9360 rows (seg46) with 755 ms to end, start offset by 57 ms.  
               ->  HashAggregate  (cost=95537.00..95537.00 rows=209 width=16)  
                     Group By: tbl_1000_row_c.c1000  
                     Rows out:  Avg 8749.4 rows x 48 workers.  Max 8814 rows (seg18) with 0.004 ms to first row, 1085 ms to end, start offset by 63 ms.  
                     ->  Append-only Scan on tbl_1000_row_c  (cost=0.00..90537.00 rows=20834 width=8)  
                           Rows out:  0 rows (seg0) with 22 ms to end, start offset by 59 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 417K bytes.  
   (slice1)    Executor memory: 1818K bytes avg x 48 workers, 1818K bytes max (seg0).  
   (slice2)    Executor memory: 434K bytes avg x 48 workers, 434K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 1246.276 ms  
(22 rows)  
  
postgres=# explain analyze select c1000,count(*) from tbl_1000_column_c group by c1000;  
                                                                       QUERY PLAN                                                                          
---------------------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice2; segments: 48)  (cost=259870.80..259996.80 rows=10080 width=16)  
   Rows out:  10001 rows at destination with 78 ms to end, start offset by 1.783 ms.  
   ->  HashAggregate  (cost=259870.80..259996.80 rows=210 width=16)  
         Group By: tbl_1000_column_c.c1000  
         Rows out:  Avg 208.4 rows x 48 workers.  Max 223 rows (seg17) with 0.004 ms to first row, 24 ms to end, start offset by 26 ms.  
         ->  Redistribute Motion 48:48  (slice1; segments: 48)  (cost=259518.00..259719.60 rows=210 width=16)  
               Hash Key: tbl_1000_column_c.c1000  
               Rows out:  Avg 8749.4 rows x 48 workers at destination.  Max 9360 rows (seg46) with 31 ms to end, start offset by 23 ms.  
               ->  HashAggregate  (cost=259518.00..259518.00 rows=210 width=16)  
                     Group By: tbl_1000_column_c.c1000  
                     Rows out:  Avg 8749.4 rows x 48 workers.  Max 8814 rows (seg18) with 0.004 ms to first row, 5.962 ms to end, start offset by 42 ms.  
                     ->  Append-only Columnar Scan on tbl_1000_column_c  (cost=0.00..254518.00 rows=20834 width=8)  
                           Rows out:  0 rows (seg0) with 28 ms to end, start offset by 30 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 417K bytes.  
   (slice1)    Executor memory: 1787K bytes avg x 48 workers, 1787K bytes max (seg0).  
   (slice2)    Executor memory: 434K bytes avg x 48 workers, 434K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 80.152 ms  
(22 rows)  
  
postgres=# explain analyze select c1000,count(*) from tbl_1000_column_nonc group by c1000;  
                                                                       QUERY PLAN                                                                          
---------------------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice2; segments: 48)  (cost=259856.48..259977.37 rows=9671 width=16)  
   Rows out:  10001 rows at destination with 77 ms to end, start offset by 2.019 ms.  
   ->  HashAggregate  (cost=259856.48..259977.37 rows=202 width=16)  
         Group By: tbl_1000_column_nonc.c1000  
         Rows out:  Avg 208.4 rows x 48 workers.  Max 223 rows (seg17) with 0.003 ms to first row, 26 ms to end, start offset by 35 ms.  
         ->  Redistribute Motion 48:48  (slice1; segments: 48)  (cost=259518.00..259711.42 rows=202 width=16)  
               Hash Key: tbl_1000_column_nonc.c1000  
               Rows out:  Avg 8749.4 rows x 48 workers at destination.  Max 9360 rows (seg46) with 20 ms to end, start offset by 30 ms.  
               ->  HashAggregate  (cost=259518.00..259518.00 rows=202 width=16)  
                     Group By: tbl_1000_column_nonc.c1000  
                     Rows out:  Avg 8749.4 rows x 48 workers.  Max 8814 rows (seg18) with 0.006 ms to first row, 5.476 ms to end, start offset by 25 ms.  
                     ->  Append-only Columnar Scan on tbl_1000_column_nonc  (cost=0.00..254518.00 rows=20834 width=8)  
                           Rows out:  0 rows (seg0) with 27 ms to end, start offset by 25 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 417K bytes.  
   (slice1)    Executor memory: 1755K bytes avg x 48 workers, 1755K bytes max (seg0).  
   (slice2)    Executor memory: 1330K bytes avg x 48 workers, 1330K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 78.976 ms  
(22 rows)  

9、更新對比

postgres=# update tbl_1000_row_nonc set c1000=c1000+1;  
UPDATE 1000000  
Time: 7085.645 ms  
postgres=# update tbl_1000_row_c set c1000=c1000+1;  
UPDATE 1000000  
Time: 6734.279 ms  
postgres=# update tbl_1000_column_nonc set c1000=c1000+1;  
UPDATE 1000000  
Time: 13514.749 ms  
postgres=# update tbl_1000_column_c set c1000=c1000+1;  
UPDATE 1000000  
Time: 10629.104 ms  

小結

空間 查詢第一列 查詢最後一列 全表更新
AO行存不壓縮 7653 MB 353.793 ms 298.222 ms 7085.645 ms
AO行存壓縮 2522 MB 1130.006 ms 1246.276 ms 6734.279 ms
AO列存不壓縮 7647 MB 246.967 ms 80.152 ms 13514.749 ms
AO列存壓縮 2520 MB 250.583 ms 78.976 ms 10629.104 ms

對於大寬表,查詢效能顯然列儲存要好很多(不管是第一列還是最後一列,效能一致),寫入和更新效能列儲存略差因為需要操作更多的資料檔案。

列存第一列和最後一列效能差異是返回記錄數導致的,除了第一列,其他列的取值空間都是10000,第一列是100萬。

有興趣可以再測試一些其他引數,比如壓縮級別,塊大小,HEAP TABLE:

where storage_parameter is:
   APPENDONLY={TRUE|FALSE}
   BLOCKSIZE={8192-2097152}
   ORIENTATION={COLUMN|ROW}
   COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE}
   COMPRESSLEVEL={0-9}
   CHECKSUM={TRUE|FALSE}
   FILLFACTOR={10-100}
   OIDS[=TRUE|FALSE]

參考

《Greenplum 行存、列存,堆表、AO表效能對比 – 阿里雲HDB for PostgreSQL最佳實踐》

《Greenplum 行存、列存,堆表、AO表的原理和選擇》

《Greenplum 最佳實踐 – 行存與列存的選擇以及轉換方法》

《PostgreSQL VOPS 向量計算 + DBLINK非同步並行 – 單例項 10億 聚合計算跑進2秒》

《PostgreSQL 向量化執行外掛(瓦片式實現-vops) 10x提速OLAP》


相關文章