Hive之同比環比的計算

大資料技術派發表於2022-02-17

Hive系列文章

  1. Hive表的基本操作
  2. Hive中的集合資料型別
  3. Hive動態分割槽詳解
  4. hive中orc格式表的資料匯入
  5. Java通過jdbc連線hive
  6. 通過HiveServer2訪問Hive
  7. SpringBoot連線Hive實現自助取數
  8. hive關聯hbase表
  9. Hive udf 使用方法
  10. Hive基於UDF進行文字分詞
  11. Hive視窗函式row number的用法
  12. 資料倉儲之拉鍊表

關注公眾號:大資料技術派,回覆: 資料,領取1024G資料。

同比環比的計算

測試資料

1,2020-04-20,420
2,2020-04-04,800
3,2020-03-28,500
4,2020-03-13,100
5,2020-02-27,300
6,2020-01-07,450
7,2019-04-07,800
8,2019-03-15,1200
9,2019-02-17,200
10,2019-02-07,600
11,2019-01-13,300
CREATE TABLE ods_saleorder  (
  order_id int ,
  order_time date ,
  order_num int
)ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
;
LOAD DATA LOCAL INPATH '/Users/liuwenqiang/workspace/hive/saleorder.txt' OVERWRITE INTO TABLE ods.ods_saleorder;

銷售量的月年佔比

關聯實現

select
    a.m_num,a.cmonth,b.y_num,b.cyear,round( m_num / y_num, 2 ) AS ratio
from(
        select
            sum(order_num) as m_num,
            DATE_FORMAT(order_time,'yyyy-MM') as cmonth
        from
            ods_saleorder
        group by
            DATE_FORMAT(order_time,'yyyy-MM')
    ) a
        inner join
    (
        select
            sum(order_num) as y_num,
            DATE_FORMAT(order_time,'yyyy') as cyear
        from
            ods_saleorder
        group by
            DATE_FORMAT(order_time,'yyyy')
    ) b
 on
    substring(a.cmonth,1,4)=b.cyear
;

image-20210114192005253

視窗實現

SELECT
    order_month,
    num,
    total,
    round( num / total, 2 ) AS ratio
FROM
    (
        select
            substr(order_time, 1, 7) as order_month,
            sum(order_num) over (partition by substr(order_time, 1, 7)) as num,
            sum(order_num) over (partition by substr( order_time, 1, 4 ) ) total,
            row_number() over (partition by substr(order_time, 1, 7)) as rk
        from ods_saleorder
    ) temp
where rk = 1;

同比環比

與上年度資料對比稱"同比",與上月資料對比稱"環比"。
相關公式如下:

同比增長率計算公式
(當年值-上年值)/上年值x100% 

環比增長率計算公式
(當月值-上月值)/上月值x100% 

lead lag 的實現

這裡我們就用環比做個例子,同比類似

select
    now_month,
    now_num,
    last_num,
    round( (now_num-last_num) / last_num, 2 ) as ratio
FROM(
    select
        now_month,
        now_num,
        lag( t1.now_num, 1) over (order by t1.now_month ) as last_num
    from
        (
            select
                substr(order_time, 1, 7) as now_month,
                sum(order_num) as now_num
            from ods_saleorder
            group by
                substr(order_time, 1, 7)
        ) t1
) t2;

image-20210114203453599

我們看到有null 值,這裡我們可以使用,lag的預設值做一次優化

select
    now_month,
    now_num,
    last_num,
    -- 分母是0的話返回值是null
    nvl(round( (now_num-last_num) / last_num, 2 ),0)as ratio
FROM(
    select
        now_month,
        now_num,
        lag( t1.now_num, 1,0) over (order by t1.now_month ) as last_num
    from
        (
            select
                substr(order_time, 1, 7) as now_month,
                sum(order_num) as now_num
            from ods_saleorder
            group by
                substr(order_time, 1, 7)
        ) t1
) t2;

image-20210114203935322

其實到這裡我們就處理完了,但是這樣真的對嗎,我們看到'2020-01' 的last_num 是800 也就是'2019-04',其實到這裡我們就明白了,我們的資料是不連續的,所以我們這樣計算是不行的,如果每個月都齊全,都有資料lag(num,12)就可以。

那就只能做自關聯了,這樣的話我們可以對時間做精準的限制

自關聯的實現

with a as (
    select
        now_month,
        now_num,
        substr(date(concat(now_month,'-','01')) - INTERVAL '1' month, 1, 7) as last_month
    from(
         select
             substr(order_time, 1, 7) as now_month,
             sum(order_num) as now_num
         from ods_saleorder
         group by
             substr(order_time, 1, 7)
    ) tmp
)
select
    a1.now_month,a1.now_num,a1.last_month,a2.now_num,
    nvl(round( (a1.now_num-a2.now_num) / a2.now_num, 2 ),0) as ratio
from
    a  a1
inner join
    a a2
on
    a1.last_month=a2.now_month
;

image-20210114210717600

這裡的時間計算INTERVAL 你也可以換成其他函式

with a as (
    select
        now_month,
        now_num,
        substr(add_months(concat(now_month,'-','01'),-1), 1, 7) as last_month
    from(
         select
             substr(order_time, 1, 7) as now_month,
             sum(order_num) as now_num
         from ods_saleorder
         group by
             substr(order_time, 1, 7)
    ) tmp
)
select
    a1.now_month,a1.now_num,a1.last_month,nvl(a2.now_num,0),
    nvl(round( (a1.now_num-a2.now_num) / a2.now_num, 2 ),0) as ratio
from
    a  a1
left join
    a a2
on
    a1.last_month=a2.now_month
;

猜你喜歡

Hadoop3資料容錯技術(糾刪碼)

Hadoop 資料遷移用法詳解

Flink實時計算topN熱榜

數倉建模分層理論

數倉建模方法論

相關文章