Greenplum點查詢的優化(分佈鍵)
標籤
PostgreSQL , Greenplum , 分散式查詢 , 分散式執行計劃 , 點查
背景
Greenplum是分散式資料庫,在建表時,可以指定隨機分佈、或按指定欄位或多個欄位進行分佈。
因此在做點查時,Greenplum可以根據WHERE條件來判斷是需要在所有節點查詢,還是隻需要到若干節點查詢。
假設tbl按id分佈,那麼下面幾種情況應該到哪個節點查詢呢:
-- 到單個節點查詢
select * from tbl where id=?
-- 到若干個節點查詢
select * from tbl where id in (?,?,?)
-- 到所有節點查詢
select * from tbl where id >= ? and id < ?;
-- 到所有節點查詢
select * from tbl where col1 ...;
到多個節點查詢和到一個節點查詢,效能是不一樣的。因此我們就有了優化的方法。
點查的優化與分散式QUERY
測試
1、構建測試表和資料
create table tbl2 (id int, c1 int) DISTRIBUTED BY (id);
insert into tbl2 select id,id from generate_series(1,10000000) t(id);
2、建立索引
create index idx_tbl2_1 on tbl2(id);
create index idx_tbl2_2 on tbl2(c1);
3、按分佈鍵,點查,只需要查詢一個節點。
postgres=# explain analyze select * from tbl2 where id=1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..200.28 rows=1 width=8)
Rows out: 1 rows at destination with 2.421 ms to first row, 2.422 ms to end, start offset by 0.212 ms.
-> Index Scan using idx_tbl2_1 on tbl2 (cost=0.00..200.28 rows=1 width=8)
Index Cond: id = 1
Rows out: 1 rows with 0.017 ms to first row, 0.019 ms to end, start offset by 2.576 ms.
Slice statistics:
(slice0) Executor memory: 226K bytes.
(slice1) Executor memory: 172K bytes (seg42).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 2.732 ms
(13 rows)
4、按非分佈鍵查詢,需要查詢所有節點。
postgres=# explain analyze select * from tbl2 where c1=1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Gather Motion 48:1 (slice1; segments: 48) (cost=0.00..200.28 rows=1 width=8)
Rows out: 1 rows at destination with 58 ms to first row, 74 ms to end, start offset by 0.341 ms.
-> Index Scan using idx_tbl2_2 on tbl2 (cost=0.00..200.28 rows=1 width=8)
Index Cond: c1 = 1
Rows out: 1 rows (seg42) with 0.294 ms to first row, 0.297 ms to end, start offset by 58 ms.
Slice statistics:
(slice0) Executor memory: 226K bytes.
(slice1) Executor memory: 172K bytes avg x 48 workers, 172K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 74.553 ms
(13 rows)
分析
在單個節點執行和在多個節點執行效能完全不一樣對吧,從現象來看:
1、首先master需要生成分散式執行計劃,耗費若干毫秒。
2、建立master與segment的連線,這一步可能是序列的,節點越多,建立連線耗時越久。
這個是在所有節點執行的,可以看到offset 58 ms,看起來是序列帶來的問題。
Rows out: 1 rows (seg42) with 0.294 ms to first row, 0.297 ms to end, start offset by 58 ms.
3、執行,執行實際上是並行的,而且也沒有花多少時間。
優化
1、如果需要經常點查,並且條件欄位分佈均勻的話。可以選擇這個欄位作為分佈鍵,提高查詢效率。(並且這個表的UK或PK也是這個列,或者沒有UK或者PK列的需求時。)
2、如果不能做到按查詢條件欄位進行分佈,引數可以優化(效果不明顯)。
gp_cached_segworkers_threshold
When a user starts a session with Greenplum Database and issues a query, the system creates groups or
`gangs` of worker processes on each segment to do the work. After the work is done, the segment worker
processes are destroyed except for a cached number which is set by this parameter. A lower setting
conserves system resources on the segment hosts, but a higher setting may improve performance for
power-users that want to issue many complex queries in a row.
gp_interconnect_type
Sets the networking protocol used for Greenplum Database interconnect traffic. With the TCP protocol,
Greenplum Database has an upper limit of 1000 segment instances - less than that if the query workload
involves complex, multi-slice queries.
The UDP protocol allows for greater interconnect scalability. Note that the Greenplum Database software
does the additional packet verification and checking not performed by UDP, so reliability and performance
is equivalent to TCP.
UDPIFC specifies using UDP with flow control for interconnect traffic. Specify the interconnect flow control
method with gp_interconnect_fc_method.
Note: The Greenplum Database interconnect types TCP and UDP are deprecated. In the next
major release, only the UDPIFC interconnect type will be supported by Greenplum Database.
3、在master與segment之間使用連線池,也可能是一種優化方法。
https://greenplum.org/docs/admin_guide/access_db/topics/pgbouncer.html
相關文章
- greenplum分佈鍵的hash值計算分析
- 分頁查詢優化優化
- MaxCompute複雜資料分佈的查詢優化實踐優化
- MySQL分頁查詢優化MySql優化
- MySQL——優化巢狀查詢和分頁查詢MySql優化巢狀
- Greenplum點查(按PK查詢)效能與提升空間
- 關於分頁查詢的優化思路優化
- 一次分頁查詢的優化優化
- MySQL分優化之超大頁查詢MySql優化
- 資料庫全表查詢之-分頁查詢優化資料庫優化
- 左百分號模糊查詢的優化優化
- 查詢優化優化
- MySQL 的查詢優化MySql優化
- oracle 跨庫查詢greenplum慢Oracle
- pgsql查詢優化之模糊查詢SQL優化
- Oracle in 查詢優化Oracle優化
- MySQL查詢優化MySql優化
- join 查詢優化優化
- HBase查詢優化優化
- 查詢優化器優化
- SQL查詢優化SQL優化
- 全文查詢的效能優化優化
- SQL查詢優化的方法SQL優化
- MySQL基本知識點梳理和查詢優化MySql優化
- mysql大資料量分頁查詢方法及其優化MySql大資料優化
- [推薦]無限分級數量查詢優化優化
- 關於樹結構的查詢優化,及許可權樹的查詢優化優化
- MySQL 慢查詢優化MySql優化
- KunlunDB 查詢優化(一)優化
- MySQL優化COUNT()查詢MySql優化
- 優化sql查詢速度優化SQL
- EntityFramework優化:查詢效能Framework優化
- EntityFramework優化:查詢WITH(NOLOCK)Framework優化
- 優化星型查詢優化
- mysql查詢優化檢查 explainMySql優化AI
- 使用點陣圖連線索引優化OLAP查詢索引優化
- MySQL子查詢的優化薦MySql優化
- MySQL調優之查詢優化MySql優化