Greenplum海量資料，大寬錶行存VS列存

德哥發表於2018-05-06

原文網址 : https://flycode.co/archives/185749

背景

GPDB支援行存列存，而且相關的選擇原理我們之前的文章也有很詳細的分析，同時在效能方面也做過較多的評測對比。

但是在大寬表上，相差到底有多懸殊我們之前沒有對其進行過對比。主要的差異來自deform和QUERY掃描的儲存空間（理論上，資料量越大，同時訪問的列越少，列存的優勢越明顯。或者是訪問到的列越靠後，優勢越明顯）。

一些原理參考

《PostgreSQL VOPS 向量計算 + DBLINK非同步並行 – 單例項 10億聚合計算跑進2秒》

《PostgreSQL 向量化執行外掛(瓦片式實現-vops) 10x提速OLAP》

一些選擇參考本文末尾的連結文章。

測試

1、建立一個函式，用於建立1000個列的表(每列型別INT8)

create or replace function crt_tbl(name, text, text) returns void as $$  
declare  
  sql text;  
begin  
  sql := `create table `||$1||`(`;  
  for i in 1..1000 loop  
    sql := sql||`c`||i||` int8 default random()*10000,`;  
  end loop;  
  sql := rtrim(sql, `,`);  
  sql := sql||`) with (APPENDONLY=true, ORIENTATION=`||$2||`, COMPRESSTYPE=`||$3||`)`;  
  execute sql;  
end;  
$$ language plpgsql strict;

2、建立行存表

select crt_tbl(`tbl_1000_row_nonc`, `row`, `none`);  
  
select crt_tbl(`tbl_1000_row_c`, `row`, `zlib`);

3、建立列存表

select crt_tbl(`tbl_1000_column_nonc`, `column`, `none`);  
  
select crt_tbl(`tbl_1000_column_c`, `column`, `zlib`);

4、寫入100萬行測試資料

postgres=# insert into tbl_1000_row_nonc (c1) select generate_series(1,1000000);  
INSERT 0 1000000  
Time: 221003.467 ms  
postgres=# insert into tbl_1000_row_c select * from tbl_1000_row_nonc;  
INSERT 0 1000000  
Time: 12298.931 ms  
postgres=# insert into tbl_1000_column_nonc select * from tbl_1000_row_nonc;  
INSERT 0 1000000  
Time: 23332.039 ms  
postgres=# insert into tbl_1000_column_c select * from tbl_1000_row_nonc;  
INSERT 0 1000000  
Time: 17017.119 ms

5、空間對比

postgres=# select pg_size_pretty(pg_total_relation_size(`tbl_1000_row_nonc`));  
 pg_size_pretty   
----------------  
 7653 MB  
(1 row)  
  
Time: 282.180 ms  
postgres=# select pg_size_pretty(pg_total_relation_size(`tbl_1000_column_nonc`));  
 pg_size_pretty   
----------------  
 7647 MB  
(1 row)  
  
Time: 55.315 ms  
  
postgres=# select pg_size_pretty(pg_total_relation_size(`tbl_1000_row_c`));  
 pg_size_pretty   
----------------  
 2522 MB  
(1 row)  
  
Time: 56.017 ms  
postgres=# select pg_size_pretty(pg_total_relation_size(`tbl_1000_column_c`));  
 pg_size_pretty   
----------------  
 2520 MB  
(1 row)  
  
Time: 55.557 ms

6、第一列查詢對比

postgres=# explain analyze select c1,count(*) from tbl_1000_row_nonc group by c1;  
                                                                 QUERY PLAN                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice1; segments: 48)  (cost=259714.00..272214.00 rows=1000000 width=16)  
   Rows out:  1000000 rows at destination with 352 ms to end, start offset by 1.483 ms.  
   ->  HashAggregate  (cost=259714.00..272214.00 rows=20834 width=16)  
         Group By: c1  
         Rows out:  Avg 20833.3 rows x 48 workers.  Max 20854 rows (seg42) with 0.005 ms to first row, 101 ms to end, start offset by 63 ms.  
         ->  Append-only Scan on tbl_1000_row_nonc  (cost=0.00..254714.00 rows=20834 width=8)  
               Rows out:  0 rows (seg0) with 19 ms to end, start offset by 82 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 405K bytes.  
   (slice1)    Executor memory: 500K bytes avg x 48 workers, 500K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 353.793 ms  
(15 rows)  
  
postgres=# explain analyze select c1,count(*) from tbl_1000_row_c group by c1;  
                                                                 QUERY PLAN                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice1; segments: 48)  (cost=95537.00..108037.00 rows=1000000 width=16)  
   Rows out:  1000000 rows at destination with 1128 ms to end, start offset by 1.360 ms.  
   ->  HashAggregate  (cost=95537.00..108037.00 rows=20834 width=16)  
         Group By: c1  
         Rows out:  Avg 20833.3 rows x 48 workers.  Max 20854 rows (seg42) with 0.003 ms to first row, 699 ms to end, start offset by 42 ms.  
         ->  Append-only Scan on tbl_1000_row_c  (cost=0.00..90537.00 rows=20834 width=8)  
               Rows out:  0 rows (seg0) with 13 ms to end, start offset by 42 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 405K bytes.  
   (slice1)    Executor memory: 500K bytes avg x 48 workers, 500K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 1130.006 ms  
(15 rows)  
  
  
postgres=# explain analyze select c1,count(*) from tbl_1000_column_nonc group by c1;  
                                                                  QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice1; segments: 48)  (cost=259518.00..272018.00 rows=1000000 width=16)  
   Rows out:  1000000 rows at destination with 245 ms to end, start offset by 1.365 ms.  
   ->  HashAggregate  (cost=259518.00..272018.00 rows=20834 width=16)  
         Group By: c1  
         Rows out:  Avg 20833.3 rows x 48 workers.  Max 20854 rows (seg42) with 0.005 ms to first row, 9.968 ms to end, start offset by 60 ms.  
         ->  Append-only Columnar Scan on tbl_1000_column_nonc  (cost=0.00..254518.00 rows=20834 width=8)  
               Rows out:  0 rows (seg0) with 16 ms to end, start offset by 60 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 405K bytes.  
   (slice1)    Executor memory: 821K bytes avg x 48 workers, 821K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 246.967 ms  
(15 rows)  
  
  
postgres=# explain analyze select c1,count(*) from tbl_1000_column_c group by c1;  
                                                                  QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice1; segments: 48)  (cost=259518.00..272018.00 rows=1000000 width=16)  
   Rows out:  1000000 rows at destination with 249 ms to end, start offset by 1.450 ms.  
   ->  HashAggregate  (cost=259518.00..272018.00 rows=20834 width=16)  
         Group By: c1  
         Rows out:  Avg 20833.3 rows x 48 workers.  Max 20854 rows (seg42) with 0.004 ms to first row, 8.861 ms to end, start offset by 46 ms.  
         ->  Append-only Columnar Scan on tbl_1000_column_c  (cost=0.00..254518.00 rows=20834 width=8)  
               Rows out:  0 rows (seg0) with 16 ms to end, start offset by 45 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 405K bytes.  
   (slice1)    Executor memory: 853K bytes avg x 48 workers, 853K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 250.583 ms  
(15 rows)

7、最後一列查詢對比

postgres=# explain analyze select c1000,count(*) from tbl_1000_row_nonc group by c1000;  
                                                                      QUERY PLAN                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice2; segments: 48)  (cost=260048.04..260167.34 rows=9544 width=16)  
   Rows out:  10001 rows at destination with 296 ms to end, start offset by 1.806 ms.  
   ->  HashAggregate  (cost=260048.04..260167.34 rows=199 width=16)  
         Group By: tbl_1000_row_nonc.c1000  
         Rows out:  Avg 208.4 rows x 48 workers.  Max 223 rows (seg17) with 0.002 ms to first row, 133 ms to end, start offset by 21 ms.  
         ->  Redistribute Motion 48:48  (slice1; segments: 48)  (cost=259714.00..259904.88 rows=199 width=16)  
               Hash Key: tbl_1000_row_nonc.c1000  
               Rows out:  Avg 8749.4 rows x 48 workers at destination.  Max 9360 rows (seg46) with 141 ms to end, start offset by 21 ms.  
               ->  HashAggregate  (cost=259714.00..259714.00 rows=199 width=16)  
                     Group By: tbl_1000_row_nonc.c1000  
                     Rows out:  Avg 8749.4 rows x 48 workers.  Max 8814 rows (seg18) with 0.003 ms to first row, 209 ms to end, start offset by 63 ms.  
                     ->  Append-only Scan on tbl_1000_row_nonc  (cost=0.00..254714.00 rows=20834 width=8)  
                           Rows out:  0 rows (seg0) with 16 ms to end, start offset by 64 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 417K bytes.  
   (slice1)    Executor memory: 1818K bytes avg x 48 workers, 1818K bytes max (seg0).  
   (slice2)    Executor memory: 434K bytes avg x 48 workers, 434K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 298.222 ms  
(22 rows)  
  
postgres=# explain analyze select c1000,count(*) from tbl_1000_row_c group by c1000;  
                                                                       QUERY PLAN                                                                         
--------------------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice2; segments: 48)  (cost=95886.65..96011.53 rows=9990 width=16)  
   Rows out:  10001 rows at destination with 1244 ms to end, start offset by 1.760 ms.  
   ->  HashAggregate  (cost=95886.65..96011.53 rows=209 width=16)  
         Group By: tbl_1000_row_c.c1000  
         Rows out:  Avg 208.4 rows x 48 workers.  Max 223 rows (seg17) with 0.002 ms to first row, 432 ms to end, start offset by 32 ms.  
         ->  Redistribute Motion 48:48  (slice1; segments: 48)  (cost=95537.00..95736.80 rows=209 width=16)  
               Hash Key: tbl_1000_row_c.c1000  
               Rows out:  Avg 8749.4 rows x 48 workers at destination.  Max 9360 rows (seg46) with 755 ms to end, start offset by 57 ms.  
               ->  HashAggregate  (cost=95537.00..95537.00 rows=209 width=16)  
                     Group By: tbl_1000_row_c.c1000  
                     Rows out:  Avg 8749.4 rows x 48 workers.  Max 8814 rows (seg18) with 0.004 ms to first row, 1085 ms to end, start offset by 63 ms.  
                     ->  Append-only Scan on tbl_1000_row_c  (cost=0.00..90537.00 rows=20834 width=8)  
                           Rows out:  0 rows (seg0) with 22 ms to end, start offset by 59 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 417K bytes.  
   (slice1)    Executor memory: 1818K bytes avg x 48 workers, 1818K bytes max (seg0).  
   (slice2)    Executor memory: 434K bytes avg x 48 workers, 434K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 1246.276 ms  
(22 rows)  
  
postgres=# explain analyze select c1000,count(*) from tbl_1000_column_c group by c1000;  
                                                                       QUERY PLAN                                                                          
---------------------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice2; segments: 48)  (cost=259870.80..259996.80 rows=10080 width=16)  
   Rows out:  10001 rows at destination with 78 ms to end, start offset by 1.783 ms.  
   ->  HashAggregate  (cost=259870.80..259996.80 rows=210 width=16)  
         Group By: tbl_1000_column_c.c1000  
         Rows out:  Avg 208.4 rows x 48 workers.  Max 223 rows (seg17) with 0.004 ms to first row, 24 ms to end, start offset by 26 ms.  
         ->  Redistribute Motion 48:48  (slice1; segments: 48)  (cost=259518.00..259719.60 rows=210 width=16)  
               Hash Key: tbl_1000_column_c.c1000  
               Rows out:  Avg 8749.4 rows x 48 workers at destination.  Max 9360 rows (seg46) with 31 ms to end, start offset by 23 ms.  
               ->  HashAggregate  (cost=259518.00..259518.00 rows=210 width=16)  
                     Group By: tbl_1000_column_c.c1000  
                     Rows out:  Avg 8749.4 rows x 48 workers.  Max 8814 rows (seg18) with 0.004 ms to first row, 5.962 ms to end, start offset by 42 ms.  
                     ->  Append-only Columnar Scan on tbl_1000_column_c  (cost=0.00..254518.00 rows=20834 width=8)  
                           Rows out:  0 rows (seg0) with 28 ms to end, start offset by 30 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 417K bytes.  
   (slice1)    Executor memory: 1787K bytes avg x 48 workers, 1787K bytes max (seg0).  
   (slice2)    Executor memory: 434K bytes avg x 48 workers, 434K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 80.152 ms  
(22 rows)  
  
postgres=# explain analyze select c1000,count(*) from tbl_1000_column_nonc group by c1000;  
                                                                       QUERY PLAN                                                                          
---------------------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice2; segments: 48)  (cost=259856.48..259977.37 rows=9671 width=16)  
   Rows out:  10001 rows at destination with 77 ms to end, start offset by 2.019 ms.  
   ->  HashAggregate  (cost=259856.48..259977.37 rows=202 width=16)  
         Group By: tbl_1000_column_nonc.c1000  
         Rows out:  Avg 208.4 rows x 48 workers.  Max 223 rows (seg17) with 0.003 ms to first row, 26 ms to end, start offset by 35 ms.  
         ->  Redistribute Motion 48:48  (slice1; segments: 48)  (cost=259518.00..259711.42 rows=202 width=16)  
               Hash Key: tbl_1000_column_nonc.c1000  
               Rows out:  Avg 8749.4 rows x 48 workers at destination.  Max 9360 rows (seg46) with 20 ms to end, start offset by 30 ms.  
               ->  HashAggregate  (cost=259518.00..259518.00 rows=202 width=16)  
                     Group By: tbl_1000_column_nonc.c1000  
                     Rows out:  Avg 8749.4 rows x 48 workers.  Max 8814 rows (seg18) with 0.006 ms to first row, 5.476 ms to end, start offset by 25 ms.  
                     ->  Append-only Columnar Scan on tbl_1000_column_nonc  (cost=0.00..254518.00 rows=20834 width=8)  
                           Rows out:  0 rows (seg0) with 27 ms to end, start offset by 25 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 417K bytes.  
   (slice1)    Executor memory: 1755K bytes avg x 48 workers, 1755K bytes max (seg0).  
   (slice2)    Executor memory: 1330K bytes avg x 48 workers, 1330K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 78.976 ms  
(22 rows)

9、更新對比

postgres=# update tbl_1000_row_nonc set c1000=c1000+1;  
UPDATE 1000000  
Time: 7085.645 ms  
postgres=# update tbl_1000_row_c set c1000=c1000+1;  
UPDATE 1000000  
Time: 6734.279 ms  
postgres=# update tbl_1000_column_nonc set c1000=c1000+1;  
UPDATE 1000000  
Time: 13514.749 ms  
postgres=# update tbl_1000_column_c set c1000=c1000+1;  
UPDATE 1000000  
Time: 10629.104 ms

小結

表	空間	查詢第一列	查詢最後一列	全表更新
AO行存不壓縮	7653 MB	353.793 ms	298.222 ms	7085.645 ms
AO行存壓縮	2522 MB	1130.006 ms	1246.276 ms	6734.279 ms
AO列存不壓縮	7647 MB	246.967 ms	80.152 ms	13514.749 ms
AO列存壓縮	2520 MB	250.583 ms	78.976 ms	10629.104 ms

對於大寬表，查詢效能顯然列儲存要好很多(不管是第一列還是最後一列，效能一致)，寫入和更新效能列儲存略差因為需要操作更多的資料檔案。

列存第一列和最後一列效能差異是返回記錄數導致的，除了第一列，其他列的取值空間都是10000，第一列是100萬。

有興趣可以再測試一些其他引數，比如壓縮級別，塊大小，HEAP TABLE：

where storage_parameter is:
   APPENDONLY={TRUE|FALSE}
   BLOCKSIZE={8192-2097152}
   ORIENTATION={COLUMN|ROW}
   COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE}
   COMPRESSLEVEL={0-9}
   CHECKSUM={TRUE|FALSE}
   FILLFACTOR={10-100}
   OIDS[=TRUE|FALSE]

參考

《Greenplum 行存、列存，堆表、AO表效能對比 – 阿里雲HDB for PostgreSQL最佳實踐》

《Greenplum 行存、列存，堆表、AO表的原理和選擇》

《Greenplum 最佳實踐 – 行存與列存的選擇以及轉換方法》

《PostgreSQL VOPS 向量計算 + DBLINK非同步並行 – 單例項 10億聚合計算跑進2秒》

《PostgreSQL 向量化執行外掛(瓦片式實現-vops) 10x提速OLAP》

資料儲存-領存高速海量資料記錄儲存模組產品介紹
2021-04-08
海量資料儲存之動態SchemaOU
2022-03-21
大資料儲存系統對比：Ceph VS Gluster
2018-10-29
大資料
杉巖海量資料儲存解決方案
2019-12-30
50億海量資料如何高效儲存和分析？
2022-12-01
海量非結構化資料儲存難題，杉巖資料物件儲存完美解決
2020-01-17
物件
物件儲存 vs 檔案儲存 vs 塊儲存，選哪個？
2020-09-17
物件
本地儲存VS雲端儲存：區別不只是資料存放位置
2019-02-20
MonetDB列存資料庫架構初探
2018-03-21
資料庫架構
杉巖資料：海量智慧儲存，打造新基建數字底座
2020-05-22
IM系統海量訊息資料是怎麼儲存的？
2019-03-26
面對海量的監控影片資料應該如何儲存？
2023-02-27
Greenplum 效能優化之路 --（二）儲存格式
2020-09-29
優化
大資料行業地震中“倖存者”的現狀
2020-04-24
大資料行業
紫光西部資料助力中信建投證券實現海量資料儲存創新
2018-05-07
資料儲存--檔案儲存
2024-05-26
摩杜雲物件儲存OSS：為海量資料創造無限可能
2021-06-19
物件
聊聊大資料的存算分離
2022-10-31
大資料
儲存卡變為RAW，如何進行儲存卡資料救援
2020-11-06
資料結構(線性錶鏈式儲存)的幾個基本操作
2020-12-05
資料結構
【儲存資料恢復】儲存上的raid5陣列崩潰的資料恢復案例
2022-05-24
資料恢復AI陣列
資料儲存(1)：從資料儲存看人類文明-資料儲存器發展歷程
2020-07-27
深耕物件儲存 ECS釋放海量非結構化資料新價值
2021-04-14
物件
大資料檔案儲存系統HDFS
2019-01-15
大資料
資料儲存：FMDB-模型進行存取
2019-03-05
模型
大資料儲存平臺之異構儲存實踐深度解讀
2018-06-06
大資料
海量圖片儲存，杉巖分散式物件儲存輕鬆應對
2019-12-18
分散式物件
面對海量資料儲存，如何保證HBase叢集的高效以及穩定
2019-02-20
TiDB資料儲存
2020-10-13
TiDB
資料儲存：CoreData
2019-03-05
iOS 資料儲存
2022-07-18
iOS
小米大資料儲存服務的資料治理實踐
2023-02-21
大資料
原來大資料 Hadoop 是這樣儲存資料的
2021-01-04
大資料Hadoop
python儲存超大資料excel表格——大於65532
2020-11-20
Python大資料Excel
星環科技多模型資料統一儲存的大資料分散式儲存平臺方案分享
2022-09-16
模型大資料分散式
FastDFS 海量小檔案儲存解決之道
2022-04-26
AST
Flutter持久化儲存之資料庫儲存
2019-03-08
Flutter持久化資料庫
資料儲存(歸檔解檔，沙河儲存)
2018-03-27

Greenplum海量資料，大寬錶行存VS列存

標籤

背景

測試

小結

參考

相關文章