Greenplum自動統計資訊收集-暨統計資訊不準引入的broadcastmotion一例
標籤
PostgreSQL , Greenplum , 統計資訊 , 自動統計資訊 , broadcast motion , 執行計劃
背景
資料庫執行計劃的好壞,與資料庫的SQL優化器息息相關。Greenplum有兩套優化器,legacy query optimizer 與 ORCA。
這兩個優化器都是CBO優化器,都需要依賴統計資訊,如果統計資訊不準確,可能生成的執行計劃就不準確。
例如我們有一個這樣的QUERY,發現怎麼跑都跑不出來。
觀察執行計劃,發現有一個節點用到了broadcast motion,也就是廣播。
-> Broadcast Motion 512:512 (slice1; segments: 512) (cost=0.00..6.13 rows=1 width=16)
-> Append-only Columnar Scan on xxxx (cost=0.00..1.00 rows=1 width=16)
當JOIN欄位並非分佈鍵時,Greenplum會根據表的大小,選擇重分佈或廣播。(小表廣播,大表的話多階段JOIN)。
《HybridDB PostgreSQL “Sort、Group、distinct 聚合、JOIN” 不懼怕資料傾斜的黑科技和原理 – 多階段聚合》
而這個執行計劃跑偏的SQL,恰恰是在一個大表上觸發了廣播(broadcast motion),這不正常。
select count(*) xxxx;
返回有31億資料。
查詢pg_class,值有1條記錄,佔用0個BLOCK
select * from pg_class where relname=`xxxx`;
relpages | 1
reltuples | 0
執行analyze,收集統計資訊,執行計劃恢復。
(Greenplum對於超級大表,收集統計資訊時,會構建臨時表來進行取樣分析)
digoal=> analyze verbose xxxxxxxx;
INFO: Executing SQL: select sum(gp_statistics_estimate_reltuples_relpages_oid(c.oid))::float4[] from gp_dist_random(`pg_class`) c where c.oid=112293
INFO: ANALYZE estimated reltuples=3091824896.000000, relpages=47509120.000000 for table xxxxxxxx
INFO: ANALYZE building sample table of size 173762 on table xxxxxxxx because it has too many rows.
INFO: Executing SQL: create table pg_temp.pg_analyze_112293_59 as ( select Ta.xx,....Ta.xxx from public.xxxxxxxx as Ta where random() < 0.00005620053343591280281543731689453125 limit 173762 ) distributed randomly
INFO: Created sample table pg_temp.pg_analyze_112293_59 with nrows=173762
INFO: ANALYZE computing statistics on attribute xx
INFO: Executing SQL: select count(*)::float4 from pg_temp_440803.pg_analyze_112293_59 as Ta where Ta.xx is null
INFO: nullfrac = 0.178474
INFO: Executing SQL: select avg(pg_column_size(Ta.xx))::float4 from pg_temp_440803.pg_analyze_112293_59 as Ta where Ta.xx is not null
INFO: avgwidth = 21.418087
INFO: Executing SQL: select count(*)::float4 from (select Ta.xx from pg_temp_440803.pg_analyze_112293_59 as Ta group by Ta.xx) as Tb
INFO: count(ndistinct()) gives 142751.000000 values.
INFO: Executing SQL: select count(v)::float4 from (select Ta.xx as v, count(Ta.xx) as f from pg_temp_440803.pg_analyze_112293_59 as Ta group by Ta.xx) as foo where f > 1
INFO: ndistinct = -1.000000
..........
收集統計資訊後,執行計劃恢復,沒有broadcast了,執行也秒級返回了。
讓Greenplum自動收集統計資訊
對於在函式內 或 函式外執行DML時,核心會跟蹤表的記錄數變更影響的資料量,我們可以設定什麼時候收集統計資訊:
none:不收集
on_no_stats:沒有統計資訊時,收集
on_change:當寫入、更新量超過閾值(gp_autostats_on_change_threshold引數設定的行數,預設為20億)後,自動收集統計資訊。
Automatic Statistics Collection
Greenplum Database can be set to automatically run ANALYZE on a table that either has no statistics or has
changed significantly when certain operations are performed on the table. For partitioned tables, automatic
statistics collection is only triggered when the operation is run directly on a leaf table, and then only the leaf
table is analyzed.
Automatic statistics collection has three modes:
• none disables automatic statistics collection.
• on_no_stats triggers an analyze operation for a table with no existing statistics when any of the
commands CREATE TABLE AS SELECT, INSERT, or COPY are executed on the table.
• on_change triggers an analyze operation when any of the commands CREATE TABLE AS SELECT,
UPDATE, DELETE, INSERT, or COPY are executed on the table and the number of rows affected exceeds
the threshold defined by the gp_autostats_on_change_threshold configuration parameter.
The automatic statistics collection mode is set separately for commands that occur within a procedural
language function and commands that execute outside of a function:
• The gp_autostats_mode configuration parameter controls automatic statistics collection behavior
outside of functions and is set to on_no_stats by default.
• The gp_autostats_mode_in_functions parameter controls the behavior when table operations are
performed within a procedural language function and is set to none by default.
With the on_change mode, ANALYZE is triggered only if the number of rows affected exceeds the threshold
defined by the gp_autostats_on_change_threshold configuration parameter. The default value for this
parameter is a very high value, 2147483647, which effectively disables automatic statistics collection;
you must set the threshold to a lower number to enable it. The on_change mode could trigger large,
unexpected analyze operations that could disrupt the system, so it is not recommended to set it globally. It
could be useful in a session, for example to automatically analyze a table following a load.
To disable automatic statistics collection outside of functions, set the gp_autostats_mode parameter to
none:
gpconfigure -c gp_autostats_mode -v none
To enable automatic statistics collection in functions for tables that have no statistics, change
gp_autostats_mode_in_functions to on_no_stats:
gpconfigure -c gp_autostats_mode_in_functions -v on_no_stats
Set the log_autostats system configuration parameter to on if you want to log automatic statistics
collection operations.
為了讓資料庫產生準確的執行計劃,建議要麼使用者自己排程analyZE收集統計資訊,要麼自動收集。
相關文章
- [統計資訊系列7] Oracle 11g的自動統計資訊收集Oracle
- 手動收集——收集統計資訊
- 【統計資訊】Oracle常用的收集統計資訊方式Oracle
- 啟用與禁用統計資訊自動收集
- 收集統計資訊方案
- Oracle收集統計資訊Oracle
- 收集全庫統計資訊
- 微課sql最佳化(3)、統計資訊收集(2)-如何收集統計資訊SQL
- 6 收集資料庫統計資訊資料庫
- 最佳實踐:解讀GaussDB(DWS) 統計資訊自動收集方案
- 【統計資訊】Oracle統計資訊Oracle
- Oracle統計資訊的收集和維護Oracle
- Oracle 11g系統自動收集統計資訊的一些知識Oracle
- oracle 統計資訊檢視與收集Oracle
- ORACLE表統計資訊與列統計資訊、索引統計資訊Oracle索引
- 微課sql最佳化(8)、統計資訊收集(6)-統計資訊查詢SQL
- Oracle運維指令碼-收集統計資訊Oracle運維指令碼
- ORACLE analyse table方式收集表統計資訊導致SQL執行計劃不準確而效能下降OracleSQL
- 【統計資訊】全面檢視錶所有統計資訊SQLSQL
- 【統計資訊】如何備份和還原統計資訊
- Sqlserver關於統計資訊自動建立自動更新的知識點SQLServer
- Nebula Graph 特性講解——RocksDB 統計資訊的收集和展示
- 修改oracle 的統計資訊Oracle
- ORACLE19c新特性-實時統計資訊收集Oracle
- MySQL統計資訊系列MySql
- 【TUNE_ORACLE】定製化收集統計資訊SQL參考OracleSQL
- SQL Server 更新統計資訊SQLServer
- Oracle 統計資訊介紹Oracle
- PostgreSQL DBA(10) - 統計資訊SQL
- [20200819]12c Global Temporary table 統計資訊的收集的疑問.txt
- MySQL對所有表收集統計資訊(Python 2指令碼)MySqlPython指令碼
- 達夢資料庫手動管理統計資訊方法資料庫
- 微課sql最佳化(6)、統計資訊收集(4)-關於動態取樣SQL
- 收集 Kubernetes 資源統計資料的新工具
- 微課sql最佳化(2)-為什麼需要收集統計資訊SQL
- ansible 統計 ssh 登入資訊
- Win10系統關閉“小娜自動收集個人資訊”功能的方法Win10
- MySQL中Innodb如何計算索引的統計資訊?MySql索引
- 職工資訊管理系統的設計