Spark2 Dataset多維度統計cube與rollup

智慧先行者發表於2016-11-25
val df6 = spark.sql("select gender,children,max(age),avg(age),count(age) from Affairs group by Cube(gender,children) order by 1,2")
df6.show
+------+--------+--------+--------+----------+                                  
|gender|children|max(age)|avg(age)|count(age)|
+------+--------+--------+--------+----------+
|  null|    null|    57.0|    34.0|        10|
|  null|      no|    37.0|    27.0|         6|
|  null|     yes|    57.0|    44.5|         4|
|female|    null|    32.0|    29.0|         5|
|female|      no|    32.0|    27.0|         3|
|female|     yes|    32.0|    32.0|         2|
|  male|    null|    57.0|    39.0|         5|
|  male|      no|    37.0|    27.0|         3|
|  male|     yes|    57.0|    57.0|         2|
+------+--------+--------+--------+----------+


val df7 = spark.sql("select gender,children,max(age),avg(age),count(age) from Affairs group by rollup(gender,children) order by 1,2")

df7.show
+------+--------+--------+--------+----------+                                  
|gender|children|max(age)|avg(age)|count(age)|
+------+--------+--------+--------+----------+
|  null|    null|    57.0|    34.0|        10|
|female|    null|    32.0|    29.0|         5|
|female|      no|    32.0|    27.0|         3|
|female|     yes|    32.0|    32.0|         2|
|  male|    null|    57.0|    39.0|         5|
|  male|      no|    37.0|    27.0|         3|
|  male|     yes|    57.0|    57.0|         2|
+------+--------+--------+--------+----------+

 

相關文章