Greenplumexplainanalyze解讀+深度明細開關

德哥發表於2017-12-04

標籤

PostgreSQL , Greenplum , explain analyze , 解讀 , 深度明細 , explain_memory_verbosity , gp_enable_explain_allstat


背景

解讀Greenplum 的explain analyze輸出。同時介紹兩個深度明細開關explain_memory_verbosity , gp_enable_explain_allstat 。

例子

如果不開啟詳細開關,那麼統計資訊中,rows out後面的明細,實際上輸出的是返回行數最多的SEGMENT,並不一定是最慢的segment(請務必注意)。因此我們看到上面的offset甚至有一些比下面的offset更小的,原因就在此。

解釋請參考如下:

其中,start offset 表示從query dispatcher(QD)發起,到返回這個NODE的第一條有效tuple的時間。

可以說明幾個問題:

1、query dispatch下發執行計劃到segment,啟動segment 的worker process。這之間有一定的時間差,體現在start offset層面。

開始時間是qd下發分散式執行計劃,如果有一些offset比較大,可能是worker process fork較慢引起。

2、結束時間是獲取到當前NODE的第一條有效記錄是結束點,如果有一些offset比較大,也可能是當前worker處理較慢,獲得第一條有效記錄時間較久。

3、又或者,資料較傾斜,得到第一條有效記錄,需要過濾很多無效記錄。

postgres=# explain analyze select count(*),c1 from a group by c1 limit 1;  
                                                                            QUERY PLAN                                                                              
------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=1561155.53..1561155.57 rows=1 width=12)  
   Rows out:  1 rows with 0.003 ms to first row, 0.031 ms to end, start offset by 522 ms.  
   ->  Gather Motion 48:1  (slice2; segments: 48)  (cost=1561155.53..1561155.57 rows=1 width=12)  
         Rows out:  1 rows at destination with 3994 ms to end, start offset by 522 ms.  
         ->  Limit  (cost=1561155.53..1561155.55 rows=1 width=12)  
               Rows out:  Avg 1.0 rows x 41 workers.  Max 1 rows (seg0) with 0.006 ms to end, start offset by 538 ms.  
               ->  HashAggregate  (cost=1561155.53..1561156.80 rows=3 width=12)  
                     Group By: a.c1  
                     Rows out:  Avg 1.0 rows x 41 workers.  Max 1 rows (seg0) with 0.003 ms to first row, 1252 ms to end, start offset by 538 ms.  
                     ->  Redistribute Motion 48:48  (slice1; segments: 48)  (cost=1561152.00..1561154.02 rows=3 width=12)  
                           Hash Key: a.c1  
                           -- 平均返回多少行,返回最多行的節點,Rows out:  Avg 118.2 rows x 41 workers at destination.  Max 192 rows  
			   -- 從這個worker第一次Entry to this plan node時間到返回這個NODE的第一條有效tuple,with xxx ms to first row   
			   -- 從這個worker第一次Entry to this plan node時間到返回這個NODE的最後一條有效tuple,(seg9) with 2669 ms to end  
			   -- 從query dispatcher(QD)發起,到返回這個NODE的第一條有效tuple,start offset by 595 ms.  
			   Rows out:  Avg 118.2 rows x 41 workers at destination.  Max 192 rows (seg9) with 2669 ms to end, start offset by 595 ms.  
                           ->  HashAggregate  (cost=1561152.00..1561152.00 rows=3 width=12)  
                                 Group By: a.c1  
                                -- appendStringInfo(str,  
                                --                               "Rows in:  Avg %.1f rows x %d workers."  
                                --                               "  Max %.0f rows%s",  
                                --                               ntuples_avg,  
                                --                               ns->ntuples.vcnt,  
                                --                               ns->ntuples.vmax,  
                                --                               segbuf);  
			        -- Rows out:  {noRowRequested} rows{ns->ntuples.vmax}    
			        -- with {Time from this worker`s first InstrStartNode() to its first result row} to first row  
				-- {Time from this worker`s first InstrStartNode() to end of its results} to end  
				-- start offset by {Time from start of query on qDisp to this worker`s first result row}   
				 Rows out:  Avg 101.0 rows x 48 workers.  Max 101 rows (seg0) with 0.005 ms to first row, 2706 ms to end, start offset by 572 ms.  
                                 ->  Append-only Columnar Scan on a  (cost=0.00..1061152.00 rows=2083334 width=4)  
                                       -- 不同的planstate->type,輸出的內容可能不一樣,參考 cdbexplain_showExecStats   
				       -- Rows out:  {noRowRequested} rows{ns->ntuples.vmax}    
				       -- with {Time from this worker`s first InstrStartNode() to end of its results} to end  
				       -- start offset by {Time from start of query on qDisp to this worker`s first result row}  
				       Rows out:  0 rows (seg0) with 34 ms to end, start offset by 568 ms.  
 -- 每一個分片的記憶體使用情況, 一個分片指在整個執行計劃中,在segment內可以獨立執行的連續的步驟集,後面有術語介紹  
 Slice statistics:  
   (slice0)    Executor memory: 327K bytes.  
   (slice1)    Executor memory: 828K bytes avg x 48 workers, 828K bytes max (seg0).  
   (slice2)    Executor memory: 353K bytes avg x 48 workers, 356K bytes max (seg0).  
 -- 當gp_resqueue_memory_policy != RESQUEUE_MEMORY_POLICY_NONE時,列印記憶體統計資訊  
 Statement statistics:  
   --   /* What is the memory reserved for this query`s execution? */  
   --    uint64          query_mem;  
   --   PlannedStmt @ pgsql/src/include/nodes/plannodes.h   
   Memory used: 128000K bytes  
 -- 設定了哪些影響執行計劃的非預設引數  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 -- 總執行時長  
 Total runtime: 4518.864 ms  
(26 rows)  

以上解釋,詳見程式碼 cdbexplain_showExecStats

  
        switch (planstate->type)  
        {  
                case T_BitmapAndState:  
                case T_BitmapOrState:  
                case T_BitmapIndexScanState:  
                        s_row = "";  
                        if (ns->ntuples.vcnt > 1)  
                                appendStringInfo(str,  
                                                                 "Bitmaps out:  Avg %.1f x %d workers."  
                                                                 "  Max %.0f%s",  
                                                                 ntuples_avg,  
                                                                 ns->ntuples.vcnt,  
                                                                 ns->ntuples.vmax,  
                                                                 segbuf);  
                        else  
                                appendStringInfo(str,  
                                                                 "Bitmaps out:  %s%.0f%s",  
                                                                 noRowRequested,  
                                                                 ns->ntuples.vmax,  
                                                                 segbuf);  
                        break;  
                case T_HashState:  
                        if (ns->ntuples.vcnt > 1)  
                                appendStringInfo(str,  
                                                                 "Rows in:  Avg %.1f rows x %d workers."  
                                                                 "  Max %.0f rows%s",  
                                                                 ntuples_avg,  
                                                                 ns->ntuples.vcnt,  
                                                                 ns->ntuples.vmax,  
                                                                 segbuf);  
                        else  
                                appendStringInfo(str,  
                                                                 "Rows in:  %s%.0f rows%s",  
                                                                 noRowRequested,  
                                                                 ns->ntuples.vmax,  
                                                                 segbuf);  
                        break;  
                case T_MotionState:  
                        if (ns->ntuples.vcnt > 1)  
                                appendStringInfo(str,  
                                                                 "Rows out:  Avg %.1f rows x %d workers"  
                                                                 " at destination.  Max %.0f rows%s",  
                                                                 ntuples_avg,  
                                                                 ns->ntuples.vcnt,  
                                                                 ns->ntuples.vmax,  
                                                                 segbuf);  
                        else  
                                appendStringInfo(str,  
                                                                 "Rows out:  %s%.0f rows at destination%s",  
                                                                 noRowRequested,  
                                                                 ns->ntuples.vmax,  
                                                                 segbuf);  
                        break;  
                default:  
                        if (ns->ntuples.vcnt > 1)  
                                appendStringInfo(str,  
                                                                 "Rows out:  Avg %.1f rows x %d workers."  
                                                                 "  Max %.0f rows%s",  
                                                                 ntuples_avg,  
                                                                 ns->ntuples.vcnt,  
                                                                 ns->ntuples.vmax,  
                                                                 segbuf);  
                        else  
                                appendStringInfo(str,  
                                                                 "Rows out:  %s%.0f rows%s",  
                                                                 noRowRequested,  
                                                                 ns->ntuples.vmax,  
                                                                 segbuf);  
        }  
  
  
        /*  
         * Show elapsed time just once if they are the same or if we don`t have  
         * any valid elapsed time for first tuple.  
         */  
        if ((instr->ntuples > 0) && (strcmp(firstbuf, totalbuf) != 0))  
                appendStringInfo(str,  
                                                 " with %s to first%s, %s to end",  
                                                 firstbuf,  
                                                 s_row,  
                                                 totalbuf);  
        else  
                appendStringInfo(str,  
                                                 " with %s to end",  
                                                 totalbuf);  
  
  
        /* Number of rescans */  
        if (instr->nloops > 1)  
                appendStringInfo(str, " of %.0f scans", instr->nloops);  
  
        /* Time from start of query on qDisp to this worker`s first result row */  
        if (!(INSTR_TIME_IS_ZERO(instr->firststart)))  
        {  
                INSTR_TIME_SET_ZERO(timediff);  
                INSTR_TIME_ACCUM_DIFF(timediff, instr->firststart, ctx->querystarttime);  
                cdbexplain_formatSeconds(startbuf, sizeof(startbuf), INSTR_TIME_GET_DOUBLE(timediff));  
                appendStringInfo(str, ", start offset by %s", startbuf);  
        }  

Greenplum Query Plans 術語

取自《greenplum administrator guide》 Understanding Greenplum Query Plans 章節。

pic

1、tree

Plans are read and executed from bottom to top.

2、node

Each node or step in the plan represents a database operation such as a table scan, join, aggregation, or
sort.

3、slice

To achieve maximum parallelism during query execution, Greenplum divides the work of the query plan
into slices. A slice is a portion of the plan that segments can work on independently. A query plan is sliced
wherever a motion operation occurs in the plan, with one slice on each side of the motion.

4、motion

Greenplum Database
has an additional operation type called motion. A motion operation involves moving tuples between the
segments during query processing.

4.1、redistribute motion

The query plan for this example has a redistribute motion that moves tuples between the segments to
complete the join. The redistribute motion is necessary because the customer table is distributed across
the segments by cust_id, but the sales table is distributed across the segments by sale_id. To perform
the join, the sales tuples must be redistributed by cust_id. The plan is sliced on either side of the
redistribute motion, creating slice 1 and slice 2.

4.2、gather motion

This query plan has another type of motion operation called a gather motion. A gather motion is when the
segments send results back up to the master for presentation to the client.

Not
all query plans involve a gather motion. For example, a CREATE TABLE x AS SELECT… statement would
not have a gather motion because tuples are sent to the newly created table, not to the master.

pic

1、query dispatcher (QD)

Greenplum creates a number of database processes to handle the work of a query. On the master,
the query worker process is called the query dispatcher (QD). The QD is responsible for creating and
dispatching the query plan. It also accumulates and presents the final results.

2、query executor (QE)

On the segments, a query
worker process is called a query executor (QE). A QE is responsible for completing its portion of work and
communicating its intermediate results to the other worker processes.

3、worker process

There is at least one worker process assigned to each slice of the query plan. A worker process works on
its assigned portion of the query plan independently. During query execution, each segment will have a
number of processes working on the query in parallel.

4、gangs

Related processes that are working on the same slice of the query plan but on different segments are
called gangs. As a portion of work is completed, tuples flow up the query plan from one gang of processes
to the next.

5、interconnect

This inter-process communication between the segments is referred to as the interconnect
component of Greenplum Database.

深度明細引數

https://discuss.pivotal.io/hc/en-us/articles/205752208-Options-to-gather-more-information-from-EXPLAIN-ANALYZE

There are two GUCs that can add additional information to the EXPLAIN ANALYZE output.
Both of these can be set at the session level and can be used to find skew,
excessive processing time or memory consumption:

gp_enable_explain_allstat

設定為true or false。用於列印每個segment的執行耗時詳情。

This is a useful way to see if there is any skew during the processings of your query.
This will, for each operator, list the individual segment,
the number of rows it processed as well as the time it took to do so.

Below we have one of the lines that the gp_enable_explain_allstat option added to the

allstat: seg_firststart_total_ntuples/seg0_19 ms_1.028 ms_1334/seg1_19 ms_2.283 ms_3333//end  

Let`s break it down – the format of this is for each segment we have the following:

segment number _ time to start ms _ time to complete ms _ number of tuples  

Using this we see that:

seg0 waiting 19 ms before starting  
  
it took 1.028 ms to complete it`s work  
  
and it processed 1334 tuples  

explain_memory_verbosity

設定為SUPPRESS, SUMMARY, and DETAIL。用於列印記憶體使用詳情。

This option uses the new memory accounting framework introduced in
GPDB 4.3.2.0 and will show the memory usage per operator node.

This GUC takes three values: SUPPRESS, SUMMARY, and DETAIL with SUPPRESS being the default.

With the SUMMARY setting there is only one additional row being added per node, the `Memory` node:

Memory:  47K bytes avg, 47K bytes max (seg0).  

This option will add the average and max memory used for each operator.

使用深度明細引數例子

postgres=# set explain_memory_verbosity =detail;  
SET  
postgres=# set gp_enable_explain_allstat=on;  
SET  
資訊過多,截斷部分展示  
postgres=# explain analyze select count(*),c1 from a group by c1 limit 1;  
  
 Limit  (cost=1561155.53..1561155.57 rows=1 width=12)  
   Rows out:  1 rows with 0.002 ms to first row, 0.021 ms to end, start offset by 242 ms.  
   Memory:  4K bytes.  
   ->  Gather Motion 48:1  (slice2; segments: 48)  (cost=1561155.53..1561155.57 rows=1 width=12)  
         Rows out:  1 rows at destination with 3997 ms to end, start offset by 242 ms.  
         slice 2, seg 0  
           Root: Peak 0K bytes. Quota: 0K bytes.  
             Top: Peak 0K bytes. Quota: 0K bytes.  
               Main: Peak 3K bytes. Quota: 0K bytes.  
                 Executor: Peak 246K bytes. Quota: 0K bytes.  
                 Deserializer: Peak 16K bytes. Quota: 0K bytes.  
                 Deserializer: Peak 15K bytes. Quota: 0K bytes.  
               X_Alien: Peak 209K bytes. Quota: 0K bytes.  
                 X_Motion: Peak 5K bytes. Quota: 100K bytes.  
                   X_Limit: Peak 3K bytes. Quota: 100K bytes.  
                     X_Agg: Peak 10K bytes. Quota: 63900K bytes.  
                       X_Motion: Peak 5K bytes. Quota: 100K bytes.  
             MemAcc: Peak 0K bytes. Quota: 0K bytes.  
             Rollover: Peak 887K bytes. Quota: 0K bytes.  
             SharedHeader: Peak 16K bytes. Quota: 0K bytes.  
         slice 2, seg 1  
.....................................  
  
                               SharedHeader: Peak 18K bytes. Quota: 0K bytes.  
                           slice 1, seg 47  
                             Root: Peak 0K bytes. Quota: 0K bytes.  
                               Top: Peak 0K bytes. Quota: 0K bytes.  
                                 Main: Peak 3K bytes. Quota: 0K bytes.  
                                   Executor: Peak 809K bytes. Quota: 0K bytes.  
                                   Deserializer: Peak 16K bytes. Quota: 0K bytes.  
                                   Deserializer: Peak 15K bytes. Quota: 0K bytes.  
                                 X_Alien: Peak 20K bytes. Quota: 0K bytes.  
                                   X_Motion: Peak 5K bytes. Quota: 100K bytes.  
                                     X_Agg: Peak 152K bytes. Quota: 63900K bytes.  
                                       X_TableScan: Peak 81K bytes. Quota: 100K bytes.  
                               MemAcc: Peak 0K bytes. Quota: 0K bytes.  
                               Rollover: Peak 887K bytes. Quota: 0K bytes.  
                               SharedHeader: Peak 18K bytes. Quota: 0K bytes.  
                           Memory:  6K bytes avg, 6K bytes max (seg0).  
                           allstat: seg_firststart_total_ntuples/seg0_248 ms_2727 ms_144/seg1_248 ms_2727 ms_48/seg2_254 ms_2721 ms_96/seg3_254 ms_2721 ms_96/seg4_248 ms_2727 ms_96/seg5_254 ms_2721 ms_144/seg6_270 ms_2705 ms_96/seg7_254   
ms_2721 ms_144/seg8_270 ms_2705 ms_48/seg9_254 ms_2721 ms_192/seg10_270 ms_2705 ms_48/seg11_254 ms_2721 ms_192/seg12_254 ms_3963 ms_0/seg13_254 ms_2721 ms_144/seg14_254 ms_3963 ms_0/seg15_271 ms_2705 ms_144/seg16_272 ms_3945 ms_0/seg17_2  
85 ms_2691 ms_192/seg18_308 ms_3909 ms_0/seg19_302 ms_2673 ms_192/seg20_303 ms_2672 ms_96/seg21_286 ms_2690 ms_144/seg22_286 ms_2690 ms_48/seg23_286 ms_2689 ms_144/seg24_308 ms_2668 ms_48/seg25_284 ms_2691 ms_96/seg26_288 ms_2688 ms_96/s  
eg27_287 ms_2689 ms_96/seg28_285 ms_2691 ms_96/seg29_285 ms_2691 ms_96/seg30_291 ms_2684 ms_144/seg31_286 ms_2690 ms_48/seg32_287 ms_2688 ms_144/seg33_291 ms_2684 ms_48/seg34_286 ms_2690 ms_192/seg35_285 ms_3932 ms_0/seg36_286 ms_2690 ms  
_144/seg37_285 ms_2691 ms_48/seg38_286 ms_2690 ms_144/seg39_286 ms_3932 ms_0/seg40_287 ms_2689 ms_192/seg41_287 ms_2689 ms_48/seg42_286 ms_2692 ms_192/seg43_290 ms_3928 ms_0/seg44_286 ms_2690 ms_192/seg45_286 ms_2693 ms_48/seg46_288 ms_2  
688 ms_144/seg47_295 ms_2681 ms_144//end  
                           ->  HashAggregate  (cost=1561152.00..1561152.00 rows=3 width=12)  
                                 Group By: a.c1  
                                 Rows out:  Avg 101.0 rows x 48 workers.  Max 101 rows (seg0) with 0.006 ms to first row, 2506 ms to end, start offset by 295 ms.  
                                 Memory:  153K bytes avg, 153K bytes max (seg0).  
                                 allstat: seg_firststart_total_ntuples/seg0_295 ms_2506 ms_101/seg1_287 ms_2697 ms_101/seg2_285 ms_1907 ms_101/seg3_288 ms_2622 ms_101/seg4_290 ms_2632 ms_101/seg5_286 ms_2482 ms_101/seg6_293 ms_2861 ms_10  
1/seg7_287 ms_2094 ms_101/seg8_287 ms_2676 ms_101/seg9_293 ms_2644 ms_101/seg10_286 ms_2865 ms_101/seg11_287 ms_2631 ms_101/seg12_287 ms_2714 ms_101/seg13_287 ms_1913 ms_101/seg14_286 ms_2640 ms_101/seg15_288 ms_1922 ms_101/seg16_285 ms_  
2689 ms_101/seg17_286 ms_1946 ms_101/seg18_284 ms_2655 ms_101/seg19_286 ms_2709 ms_101/seg20_289 ms_1890 ms_101/seg21_291 ms_1894 ms_101/seg22_286 ms_2742 ms_101/seg23_288 ms_1903 ms_101/seg24_283 ms_2636 ms_101/seg25_286 ms_1948 ms_101/  
seg26_286 ms_2094 ms_101/seg27_283 ms_2181 ms_101/seg28_285 ms_2628 ms_101/seg29_282 ms_2503 ms_101/seg30_285 ms_2636 ms_101/seg31_290 ms_2636 ms_101/seg32_293 ms_2621 ms_101/seg33_297 ms_2552 ms_101/seg34_293 ms_2704 ms_101/seg35_293 ms  
_2633 ms_101/seg36_293 ms_1972 ms_101/seg37_305 ms_2615 ms_101/seg38_298 ms_2783 ms_101/seg39_293 ms_2624 ms_101/seg40_297 ms_2618 ms_101/seg41_295 ms_2183 ms_101/seg42_295 ms_1905 ms_101/seg43_298 ms_1955 ms_101/seg44_297 ms_1904 ms_101  
/seg45_294 ms_2781 ms_101/seg46_296 ms_2434 ms_101/seg47_298 ms_2632 ms_101//end  
                                 ->  Append-only Columnar Scan on a  (cost=0.00..1061152.00 rows=2083334 width=4)  
                                       Rows out:  0 rows (seg0) with 28 ms to end, start offset by 283 ms.  
                                       Memory:  82K bytes avg, 82K bytes max (seg0).  
                                       allstat: seg_firststart_total_ntuples/seg0_295 ms_22 ms_0/seg1_287 ms_16 ms_0/seg2_285 ms_13 ms_0/seg3_288 ms_16 ms_0/seg4_290 ms_22 ms_0/seg5_286 ms_16 ms_0/seg6_293 ms_27 ms_0/seg7_287 ms_18 ms_0/  
seg8_287 ms_16 ms_0/seg9_293 ms_23 ms_0/seg10_286 ms_17 ms_0/seg11_287 ms_26 ms_0/seg12_287 ms_19 ms_0/seg13_287 ms_23 ms_0/seg14_286 ms_16 ms_0/seg15_288 ms_17 ms_0/seg16_285 ms_17 ms_0/seg17_286 ms_23 ms_0/seg18_284 ms_19 ms_0/seg19_28  
6 ms_22 ms_0/seg20_289 ms_19 ms_0/seg21_291 ms_27 ms_0/seg22_286 ms_17 ms_0/seg23_288 ms_25 ms_0/seg24_283 ms_28 ms_0/seg25_286 ms_17 ms_0/seg26_286 ms_17 ms_0/seg27_283 ms_26 ms_0/seg28_285 ms_16 ms_0/seg29_282 ms_19 ms_0/seg30_285 ms_1  
8 ms_0/seg31_290 ms_16 ms_0/seg32_293 ms_24 ms_0/seg33_297 ms_16 ms_0/seg34_293 ms_16 ms_0/seg35_293 ms_18 ms_0/seg36_293 ms_15 ms_0/seg37_305 ms_15 ms_0/seg38_298 ms_21 ms_0/seg39_293 ms_16 ms_0/seg40_297 ms_16 ms_0/seg41_295 ms_20 ms_0  
/seg42_295 ms_14 ms_0/seg43_298 ms_20 ms_0/seg44_297 ms_22 ms_0/seg45_294 ms_23 ms_0/seg46_296 ms_16 ms_0/seg47_298 ms_16 ms_0//end  
 Slice statistics:  
   (slice0)    Executor memory: 327K bytes.  Peak memory: 1827K bytes.  
   (slice1)    Executor memory: 828K bytes avg x 48 workers, 828K bytes max (seg0).  Peak memory: 1939K bytes avg x 48 workers, 1939K bytes max (seg10).  
   (slice2)    Executor memory: 353K bytes avg x 48 workers, 356K bytes max (seg0).  Peak memory: 1375K bytes avg x 48 workers, 1375K bytes max (seg11).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 4219.754 ms  
(1430 rows)  

參考

src/backend/cdb/cdbexplain.c

/* EXPLAIN ANALYZE statistics for one plan node of a slice */  
typedef struct CdbExplain_StatInst  
{  
        NodeTag         pstype;                 /* PlanState node type */  
        bool            running;                /* True if we`ve completed first tuple */  
        instr_time      starttime;              /* Start time of current iteration of node */  
        instr_time      counter;                /* Accumulated runtime for this node */  
        double          firsttuple;             /* Time for first tuple of this cycle */  
        double          startup;                /* Total startup time (in seconds) */  
        double          total;                  /* Total total time (in seconds) */  
        double          ntuples;                /* Total tuples produced */  
        double          nloops;                 /* # of run cycles for this node */  
        double          execmemused;    /* executor memory used (bytes) */  
        double          workmemused;    /* work_mem actually used (bytes) */  
        double          workmemwanted;  /* work_mem to avoid workfile i/o (bytes) */  
        bool            workfileCreated;        /* workfile created in this node */  
        instr_time      firststart;             /* Start time of first iteration of node */  
        double          peakMemBalance; /* Max mem account balance */  
        int                     numPartScanned; /* Number of part tables scanned */  
        int                     bnotes;                 /* Offset to beginning of node`s extra text */  
        int                     enotes;                 /* Offset to end of node`s extra text */  
} CdbExplain_StatInst;  
  
  
/* EXPLAIN ANALYZE statistics for one process working on one slice */  
typedef struct CdbExplain_SliceWorker  
{  
        double          peakmemused;    /* bytes alloc in per-query mem context tree */  
        double          vmem_reserved;  /* vmem reserved by a QE */  
        double          memory_accounting_global_peak;  /* peak memory observed during  
                                                                                                 * memory accounting */  
} CdbExplain_SliceWorker;  
  
  
/* Header of EXPLAIN ANALYZE statistics message sent from qExec to qDisp */  
typedef struct CdbExplain_StatHdr  
{  
        NodeTag         type;                   /* T_CdbExplain_StatHdr */  
        int                     segindex;               /* segment id */  
        int                     nInst;                  /* num of StatInst entries following StatHdr */  
        int                     bnotes;                 /* offset to extra text area */  
        int                     enotes;                 /* offset to end of extra text area */  
  
        int                     memAccountCount;        /* How many mem account we serialized */  
        int                     memAccountStartOffset;  /* Where in the header our memory  
                                                                                 * account array is serialized */  
  
        CdbExplain_SliceWorker worker;          /* qExec`s overall stats for slice */  
  
        /*  
         * During serialization, we use this as a temporary StatInst and save  
         * "one-at-a-time" StatInst into this variable. We then write this  
         * variable into buffer (serialize it) and then "recycle" the same inst  
         * for next plan node`s StatInst. During deserialization, an Array  
         * [0..nInst-1] of StatInst entries is appended starting here.  
         */  
        CdbExplain_StatInst inst[1];  
  
        /* extra text is appended after that */  
} CdbExplain_StatHdr;  
  
  
/* One node`s EXPLAIN ANALYZE statistics for all the workers of its segworker group */  
typedef struct CdbExplain_NodeSummary  
{  
        /* Summary over all the node`s workers */  
        CdbExplain_Agg ntuples;  
        CdbExplain_Agg execmemused;  
        CdbExplain_Agg workmemused;  
        CdbExplain_Agg workmemwanted;  
        CdbExplain_Agg totalWorkfileCreated;  
        CdbExplain_Agg peakMemBalance;  
        /* Used for DynamicTableScan, DynamicIndexScan and DynamicBitmapTableScan */  
        CdbExplain_Agg totalPartTableScanned;  
  
        /* insts array info */  
        int                     segindex0;              /* segment id of insts[0] */  
        int                     ninst;                  /* num of StatInst entries in inst array */  
  
        /* Array [0..ninst-1] of StatInst entries is appended starting here */  
        CdbExplain_StatInst insts[1];           /* variable size - must be last */  
} CdbExplain_NodeSummary;  
  
  
/* One slice`s statistics for all the workers of its segworker group */  
typedef struct CdbExplain_SliceSummary  
{  
        Slice      *slice;  
  
        /* worker array */  
        int                     nworker;                /* num of SliceWorker slots in worker array */  
        int                     segindex0;              /* segment id of workers[0] */  
        CdbExplain_SliceWorker *workers;        /* -> array [0..nworker-1] of  
                                                                                 * SliceWorker */  
  
        /*  
         * We use void ** as we don`t have access to MemoryAccount struct, which  
         * is private to memory accounting framework  
         */  
        void      **memoryAccounts; /* Array of pointers to serialized memory  
                                                                 * accounts array, one array per worker  
                                                                 * [0...nworker-1]. */  
        MemoryAccountIdType *memoryAccountCount;        /* Array of memory account  
                                                                                                 * counts, one per slice */  
  
        CdbExplain_Agg peakmemused; /* Summary of SliceWorker stats over all of  
                                                                 * the slice`s workers */  
  
        CdbExplain_Agg vmem_reserved;           /* vmem reserved by QEs */  
  
        CdbExplain_Agg memory_accounting_global_peak;           /* Peak memory  
                                                                                                                 * accounting balance by  
                                                                                                                 * QEs */  
  
        /* Rollup of per-node stats over all of the slice`s workers and nodes */  
        double          workmemused_max;  
        double          workmemwanted_max;  
  
        /* How many workers were dispatched and returned results? (0 if local) */  
        CdbExplain_DispatchSummary dispatchSummary;  
} CdbExplain_SliceSummary;  


相關文章