HybridDBforPostgreSQL,Greenplum寫入效能優化實踐

德哥發表於2018-10-05

原文網址 : https://flycode.co/archives/185284

SQL優化

背景

Greenplum寫入資料的效能優化實踐。

1 鏈路

儘量縮短客戶端和資料庫的鏈路，不要太多的跳數，比如NAT，PROXY，等越多，效能越差。

2 連線方式

儘量使用長連線，不要使用短連線，短連線的開啟開銷非常大。

3 儲存屬性

where storage_parameter is:  
   APPENDONLY={TRUE|FALSE}  
   BLOCKSIZE={8192-2097152}  
   ORIENTATION={COLUMN|ROW}  
   COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE}  
   COMPRESSLEVEL={0-9}  
   CHECKSUM={TRUE|FALSE}  
   FILLFACTOR={10-100}  
   OIDS[=TRUE|FALSE]

行存與列存的選擇方面，需要權衡插入與查詢的風格與需求的效能，同時它們支援相互轉換。

列存在單條INSERT時，效能較差，原因和列存的可靠性機制有關。

4 插入方法

1、單條INSERT，單步提交，效能最差

2、單條INSERT，批量提交

3、批量INSERT

4、COPY

5、segment並行，效能最好

5 索引

索引越多，寫入效能越差。

6 繫結變數

如果是使用INSERT的單條寫入方法，可以考慮使用繫結變數。減少CPU硬解析。

7 併發數

INSERT的寫入，也可以使用並行（開多個連線）的方式，但是並不是開越多越好，通常不建議超過CPU核數。

8 傾斜

傾斜，資料分佈一定不要出現傾斜，否則就會導致木桶效應，導致某些SEGMENT繁忙，某些空閒，不利於整體效能。

例子

1 效能較差的例子

使用列存，壓縮，大BLOCK，一條一條INSERT，單步提交，使用短連線。

每秒插入19行。

create table t_bad(  
id int,   
c1 int default (random()*100)::int,   
c2 int default (random()*100)::int,   
c3 int default (random()*100)::int,   
c4 int default (random()*100)::int,   
c5 int default (random()*100)::int,   
c6 int default (random()*100)::int,   
c7 int default (random()*100)::int,   
c8 int default (random()*100)::int,   
c9 int default (random()*100)::int,   
c10 int default (random()*100)::int,   
c11 int default (random()*100)::int,   
c12 int default (random()*100)::int,   
c13 int default (random()*100)::int,   
c14 int default (random()*100)::int,   
c15 int default (random()*100)::int,   
c16 int default (random()*100)::int,   
c17 int default (random()*100)::int,   
c18 int default (random()*100)::int,   
c19 int default (random()*100)::int,   
c20 int default (random()*100)::int,   
c21 int default (random()*100)::int,   
c22 int default (random()*100)::int,   
c23 int default (random()*100)::int,   
c24 int default (random()*100)::int,   
c25 int default (random()*100)::int,   
c26 int default (random()*100)::int,   
c27 int default (random()*100)::int,   
c28 int default (random()*100)::int,   
c29 int default (random()*100)::int,   
c30 int default (random()*100)::int,   
c31 int default (random()*100)::int,   
c32 int default (random()*100)::int,   
crt_time timestamp  
)   
with (APPENDONLY=true, BLOCKSIZE=2097152, ORIENTATION=column, COMPRESSTYPE=zlib, CHECKSUM=true);

vi test.sql  
set id random(1,10000000)  
insert into t_bad (id, crt_time) values (:id, now());

/home/digoal/pgsql10.4/bin/pgbench -M simple -n -r -P 1 -f ./test.sql -C -c 1 -j 1 -T 120  
  
transaction type: ./test.sql  
scaling factor: 1  
query mode: simple  
number of clients: 1  
number of threads: 1  
duration: 120 s  
number of transactions actually processed: 2190  
latency average = 50.341 ms  
latency stddev = 1.752 ms  
tps = 18.243126 (including connections establishing)  
tps = 19.855318 (excluding connections establishing)  
script statistics:  
 - statement latencies in milliseconds:  
         0.003  set id random(1,10000000)  
        50.338  insert into t_bad (id, crt_time) values (:id, now());

2 效能較好的例子

使用行存，批量INSERT，使用長連線，使用併發INSERT。

每秒插入3.41萬行。

create table t_good(  
id int,   
c1 int default (random()*100)::int,   
c2 int default (random()*100)::int,   
c3 int default (random()*100)::int,   
c4 int default (random()*100)::int,   
c5 int default (random()*100)::int,   
c6 int default (random()*100)::int,   
c7 int default (random()*100)::int,   
c8 int default (random()*100)::int,   
c9 int default (random()*100)::int,   
c10 int default (random()*100)::int,   
c11 int default (random()*100)::int,   
c12 int default (random()*100)::int,   
c13 int default (random()*100)::int,   
c14 int default (random()*100)::int,   
c15 int default (random()*100)::int,   
c16 int default (random()*100)::int,   
c17 int default (random()*100)::int,   
c18 int default (random()*100)::int,   
c19 int default (random()*100)::int,   
c20 int default (random()*100)::int,   
c21 int default (random()*100)::int,   
c22 int default (random()*100)::int,   
c23 int default (random()*100)::int,   
c24 int default (random()*100)::int,   
c25 int default (random()*100)::int,   
c26 int default (random()*100)::int,   
c27 int default (random()*100)::int,   
c28 int default (random()*100)::int,   
c29 int default (random()*100)::int,   
c30 int default (random()*100)::int,   
c31 int default (random()*100)::int,   
c32 int default (random()*100)::int,   
crt_time timestamp  
) ;

vi test.sql  
  
set id random(1,10000000)  
insert into t_good (id, crt_time) select random()*100000000,now() from generate_series(1,100) t(id);

/home/digoal/pgsql10.4/bin/pgbench -M prepared -n -r -P 1 -f ./test.sql -c 32 -j 32 -T 120  
  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 32  
number of threads: 32  
duration: 120 s  
number of transactions actually processed: 41006  
latency average = 93.658 ms  
latency stddev = 22.133 ms  
tps = 341.492522 (including connections establishing)  
tps = 341.562788 (excluding connections establishing)  
script statistics:  
 - statement latencies in milliseconds:  
         0.005  set id random(1,10000000)  
        93.713  insert into t_good (id, crt_time) select random()*100000000,now() from generate_series(1,100) t(id);