請問:hive中avg聚合函式會使用到combiner功能嗎?

菜鳥coder發表於2018-11-23

例如下面這條SQL, 肯定是用上了combiner功能的

select deptno, sum(sal) as sum_sal from emp group by deptno

hive (test)> explain select deptno, sum(sal) as sum_sal from emp group by deptno;
OK
Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: emp
            Statistics: Num rows: 5 Data size: 603 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: deptno (type: int), sal (type: decimal(22,2))
              outputColumnNames: deptno, sal
              Statistics: Num rows: 5 Data size: 603 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: sum(sal)
                keys: deptno (type: int)
                mode: hash
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 5 Data size: 603 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: int)
                  sort order: +
                  Map-reduce partition columns: _col0 (type: int)
                  Statistics: Num rows: 5 Data size: 603 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col1 (type: decimal(32,2))
      Reduce Operator Tree:
        Group By Operator
          aggregations: sum(VALUE._col0)
          keys: KEY._col0 (type: int)
          mode: mergepartial
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 2 Data size: 241 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 2 Data size: 241 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

如果是這個SQL,是否能用上combiner功能?? 之前學習的時候說combiner不能處理avg這種函式的

select deptno, avg(sal) as avg_sal from emp group by deptno

我看執行計劃和使用sum聚合函式無差異

hive (test)> explain select deptno, avg(sal) as avg_sal from emp group by deptno;
OK
Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: emp
            Statistics: Num rows: 5 Data size: 603 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: deptno (type: int), sal (type: decimal(22,2))
              outputColumnNames: deptno, sal
              Statistics: Num rows: 5 Data size: 603 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: avg(sal)
                keys: deptno (type: int)
                mode: hash
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 5 Data size: 603 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: int)
                  sort order: +
                  Map-reduce partition columns: _col0 (type: int)
                  Statistics: Num rows: 5 Data size: 603 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col1 (type: struct<count:bigint,sum:decimal(32,2),input:decimal(22,2)>)
      Reduce Operator Tree:
        Group By Operator
          aggregations: avg(VALUE._col0)
          keys: KEY._col0 (type: int)
          mode: mergepartial
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 2 Data size: 241 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 2 Data size: 241 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink


相關文章