Greenplum點查詢的優化(分佈鍵)

德哥發表於2017-10-28

原文網址 : https://flycode.co/archives/186466

背景

Greenplum是分散式資料庫，在建表時，可以指定隨機分佈、或按指定欄位或多個欄位進行分佈。

因此在做點查時，Greenplum可以根據WHERE條件來判斷是需要在所有節點查詢，還是隻需要到若干節點查詢。

假設tbl按id分佈，那麼下面幾種情況應該到哪個節點查詢呢：

-- 到單個節點查詢  
select * from tbl where id=?  
  
-- 到若干個節點查詢  
select * from tbl where id in (?,?,?)  
  
-- 到所有節點查詢  
select * from tbl where id >= ? and id < ?;  
  
-- 到所有節點查詢  
select * from tbl where col1 ...;

到多個節點查詢和到一個節點查詢，效能是不一樣的。因此我們就有了優化的方法。

點查的優化與分散式QUERY

測試

1、構建測試表和資料

create table tbl2 (id int, c1 int) DISTRIBUTED BY (id);  
  
insert into tbl2 select id,id from generate_series(1,10000000) t(id);

2、建立索引

create index idx_tbl2_1 on tbl2(id);  
create index idx_tbl2_2 on tbl2(c1);

3、按分佈鍵，點查，只需要查詢一個節點。

postgres=# explain analyze select * from tbl2 where id=1;  
                                                QUERY PLAN                                                   
-----------------------------------------------------------------------------------------------------------  
 Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..200.28 rows=1 width=8)  
   Rows out:  1 rows at destination with 2.421 ms to first row, 2.422 ms to end, start offset by 0.212 ms.  
   ->  Index Scan using idx_tbl2_1 on tbl2  (cost=0.00..200.28 rows=1 width=8)  
         Index Cond: id = 1  
         Rows out:  1 rows with 0.017 ms to first row, 0.019 ms to end, start offset by 2.576 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 226K bytes.  
   (slice1)    Executor memory: 172K bytes (seg42).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 2.732 ms  
(13 rows)

4、按非分佈鍵查詢，需要查詢所有節點。

postgres=# explain analyze select * from tbl2 where c1=1;  
                                              QUERY PLAN                                                 
-------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice1; segments: 48)  (cost=0.00..200.28 rows=1 width=8)  
   Rows out:  1 rows at destination with 58 ms to first row, 74 ms to end, start offset by 0.341 ms.  
   ->  Index Scan using idx_tbl2_2 on tbl2  (cost=0.00..200.28 rows=1 width=8)  
         Index Cond: c1 = 1  
         Rows out:  1 rows (seg42) with 0.294 ms to first row, 0.297 ms to end, start offset by 58 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 226K bytes.  
   (slice1)    Executor memory: 172K bytes avg x 48 workers, 172K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 74.553 ms  
(13 rows)

分析

在單個節點執行和在多個節點執行效能完全不一樣對吧，從現象來看：

1、首先master需要生成分散式執行計劃，耗費若干毫秒。

2、建立master與segment的連線，這一步可能是序列的，節點越多，建立連線耗時越久。

這個是在所有節點執行的，可以看到offset 58 ms，看起來是序列帶來的問題。

Rows out:  1 rows (seg42) with 0.294 ms to first row, 0.297 ms to end, start offset by 58 ms.

3、執行，執行實際上是並行的，而且也沒有花多少時間。

優化

1、如果需要經常點查，並且條件欄位分佈均勻的話。可以選擇這個欄位作為分佈鍵，提高查詢效率。（並且這個表的UK或PK也是這個列，或者沒有UK或者PK列的需求時。）

2、如果不能做到按查詢條件欄位進行分佈，引數可以優化（效果不明顯）。

gp_cached_segworkers_threshold

When a user starts a session with Greenplum Database and issues a query, the system creates groups or  
  
`gangs` of worker processes on each segment to do the work. After the work is done, the segment worker  
  
processes are destroyed except for a cached number which is set by this parameter. A lower setting  
  
conserves system resources on the segment hosts, but a higher setting may improve performance for  
  
power-users that want to issue many complex queries in a row.

gp_interconnect_type

Sets the networking protocol used for Greenplum Database interconnect traffic. With the TCP protocol,  
  
Greenplum Database has an upper limit of 1000 segment instances - less than that if the query workload  
  
involves complex, multi-slice queries.  
  
The UDP protocol allows for greater interconnect scalability. Note that the Greenplum Database software  
  
does the additional packet verification and checking not performed by UDP, so reliability and performance  
  
is equivalent to TCP.  
  
UDPIFC specifies using UDP with flow control for interconnect traffic. Specify the interconnect flow control  
  
method with gp_interconnect_fc_method.  
  
  
Note: The Greenplum Database interconnect types TCP and UDP are deprecated. In the next  
  
major release, only the UDPIFC interconnect type will be supported by Greenplum Database.

3、在master與segment之間使用連線池，也可能是一種優化方法。

https://www.linkedin.com/pulse/scaling-greenplum-pgbouncer-sandeep-katta-/?articleId=6128769027482402816

https://greenplum.org/docs/admin_guide/access_db/topics/pgbouncer.html

greenplum分佈鍵的hash值計算分析
2021-11-01
分頁查詢優化
2020-04-05
優化
MySQL分頁查詢優化
2020-10-13
MySql優化
MySQL——優化巢狀查詢和分頁查詢
2018-11-01
MySql優化巢狀
MySQL分優化之超大頁查詢
2020-11-12
MySql優化
資料庫全表查詢之-分頁查詢優化
2020-12-31
資料庫優化
查詢優化
2020-10-09
優化
MySQL 的查詢優化
2020-01-07
MySql優化
pgsql查詢優化之模糊查詢
2019-07-20
SQL優化
HBase查詢優化
2018-08-05
優化
Oracle in 查詢優化
2021-06-07
Oracle優化
join 查詢優化
2021-01-28
優化
MySQL查詢優化
2021-03-04
MySql優化
MySQL基本知識點梳理和查詢優化
2018-11-28
MySql優化
關於樹結構的查詢優化，及許可權樹的查詢優化
2021-01-01
優化
mysql查詢優化檢查 explain
2018-05-04
MySql優化AI
MySQL優化COUNT()查詢
2019-01-14
MySql優化
EntityFramework優化：查詢WITH(NOLOCK)
2018-05-17
Framework優化
EntityFramework優化：查詢效能
2018-05-19
Framework優化
優化sql查詢速度
2020-10-25
優化SQL
MySQL 慢查詢優化
2019-06-23
MySql優化
KunlunDB 查詢優化（一）
2022-01-17
優化
MySQL調優之查詢優化
2021-11-09
MySql優化
資料量很大，分頁查詢很慢，該怎麼優化？
2020-03-26
優化
TiDB 查詢優化及調優系列（四）查詢執行計劃的調整及優化原理
2022-05-24
TiDB優化
再議包含DBLINK的查詢優化
2019-05-31
優化
MySQL-效能優化-索引和查詢優化
2018-06-03
MySql優化索引
exists與in子查詢優化
2019-04-09
優化
效能優化之分頁查詢
2018-11-19
優化
MySQL索引與查詢優化
2019-04-13
MySql索引優化
APP查詢圖片優化
2018-10-15
APP優化
MySQL查詢優化利刃-EXPLAIN
2020-06-08
MySql優化AI
資料庫查詢優化
2020-10-05
資料庫優化
10分鐘掌握資料型別、索引、查詢的MySQL優化技巧
2019-05-09
資料型別索引MySql優化
分塊查詢【大規模資料查詢演算法優化】【索引順序查詢】演算法 PHP 版
2019-07-31
演算法優化索引PHP
【資料庫】查詢優化之子連線優化
2022-01-12
資料庫優化
優化GAN的分佈的梯度問題，WGAN
2020-12-22
優化梯度
TiDB 查詢優化及調優系列（二）TiDB 查詢計劃簡介
2022-04-27
TiDB優化
MySQL查詢中分頁思路的優化BF
2022-03-21
MySql優化

Greenplum點查詢的優化(分佈鍵)

標籤

背景

點查的優化與分散式QUERY

測試

分析

優化

相關文章