SparkSql 06 開窗函式

BF-LoneSilverWind發表於2019-12-31

row_number()

row_number() 開窗函式是按照某個欄位分組,然後取另一欄位的前幾個的值,相當於 分組取topN

開窗函式格式:
row_number() over (partitin by xxx order by xxx )
java程式碼示例:

SparkSession sparkSession = SparkSession
                .builder()
                .appName("window")
                .master("local")
                //開啟hive的支援,接下來就可以操作hive表了
                // 前提需要是需要開啟hive metastore 服務
                .enableHiveSupport()
                .getOrCreate();

   sparkSession.sql("use spark");
   sparkSession.sql("drop table if exists sales");
   sparkSession.sql("create table if not exists sales (riqi string,leibie string,jine Int) "
      + "row format delimited fields terminated by '\t'");
   sparkSession.sql("load data local inpath '/root/test/sales' into table sales");
   /**
    * 開窗函式格式:
    * 【 rou_number() over (partitin by XXX order by XXX) 】
    */
   Dataset<Row> result = sparkSession.sql("select riqi,leibie,jine "
         	+ "from ("
            + "select riqi,leibie,jine,"
            + "row_number() over (partition by leibie order by jine desc) rank "
            + "from sales) t "
         + "where t.rank<=3");
   result.show();
   sparkSession.stop();

相關文章