GaussDB(DWS)效能調優,解決DM區大記憶體佔用問題

华为云开发者联盟發表於2024-07-04

本文分享自華為雲社群《GaussDB(DWS)效能調優:DM區最佳化案例——維度表關聯條件存在會計期》,作者: O泡果奶~。

當前DM(P1、P3、CBGDM)存在維度表與主表關聯時使用會計期作為關聯條件,會導致出現大記憶體佔用或未識別資料傾斜的問題

【場景一】f.period_id = 維度表.period_id

1.1、【問題描述】

主表和維度表關聯過程中將會計期作為關聯條件,導致維度表未進行分割槽剪枝,可能會產生大記憶體佔用的情況

1.2、【原始SQL】

僅呈現SQL中的問題,詳細SQL見附件

FROM
        DMACC.dm_adp_ar_trx_dtl_tmp F
        INNER JOIN DMDIM.DM_DIM_REGION_RC_D REG ON F.COA_GEO_PC_KEY = REG.GEO_PC_KEY
        INNER JOIN DMDIM.DM_DIM_PRODUCT_T_D T9 ON F.PROD_KEY = T9.PROD_KEY 
        AND T9.PROD_POV_ID = 1
        INNER JOIN DMDIM.DM_DIM_PROJECT_D J ON F.PROJ_KEY = J.PROJ_KEY
        INNER JOIN DMDIM.DM_DIM_CONTRACT_D HT ON HT.CONTRACT_KEY = F.CONTRACT_KEY
        LEFT JOIN DMCOMMON.DWR_CONFIG_DOMESTIC_FINANCE_V FIN ON F.COA_COMPANY_KEY = FIN.COMPANY_KEY 
        AND F.COA_GEO_PC_KEY = FIN.GEO_PC_KEY
        LEFT JOIN DMAR.DWB_FMD_DIM_INVOICE_PAY_PLAN_D PP ON F.AR_INVOICE_PAY_PLAN_ID = PP.AR_INVOICE_PAY_PLAN_ID 
        AND F.PERIOD_ID = PP.PERIOD_ID
        LEFT JOIN DMARDI.DWR_DIM_AR_INVOICE_V INV ON F.AR_INVOICE_ID = INV.AR_INVOICE_ID
        INNER JOIN DMARDI.DWR_DIM_AR_APPLICATION_V APP ON F.AR_APPLICATION_RECORD_ID = APP.AR_APPLICATION_RECORD_ID
        INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_V RCP ON F.AR_RECEIPT_RECORD_ID = RCP.AR_RECEIPT_RECORD_ID
        INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_TYPE_V RT ON RCP.RECEIPT_RECORD_TYPE_ID = RT.AR_RECEIPT_TYPE_ID
        LEFT JOIN (
        SELECT C
            .CONTRACT_KEY,
            D.COMPANY_KEY,
            R.FIRST_SHIP_DATE 
        FROM
            DMDIM.dm_dim_contract_d C,
            DMDIM.DM_DIM_COMPANY_D D,
            DMARDI.DWR_CTRCT_FIRST_SHIP_DATE_R R 
        WHERE
            C.CONTRACT_ID = R.CONTRACT_ID 
            AND D.COMPANY_ID = R.COMPANY_ID 
        ) FR ON F.CONTRACT_KEY = FR.CONTRACT_KEY 
        AND F.COA_COMPANY_KEY = FR.COMPANY_KEY
        INNER JOIN DMDIM.DM_DIM_SALES_MODE_D MO ON F.SALES_MODE_KEY = MO.SALES_MODE_KEY
        JOIN DMDIM.DM_DIM_JOURNAL_SOURCE_D T29 ON F.JE_SOURCE_ID = T29.JE_SOURCE_ID
        JOIN DMDIM.DM_DIM_JOURNAL_CATEGORY_D T30 ON F.JE_CATEGORY_ID = T30.JE_CATEGORY_ID 

1.3、【效能分析】

image.png
image.png
image.png
從上圖的執行計劃可以看出,由於用會計期作為關聯條件,導致維度表未進行分割槽剪枝,資料量大,不但產生了資料傾斜,同時還由於資料量大出現了關聯下盤,大大降低了sql執行效能。
主表只有一個會計期,可以識別出對應的會計期,然後對SQL進行如下改寫:

FROM
        DMACC.dm_adp_ar_trx_dtl_tmp F
        INNER JOIN DMDIM.DM_DIM_REGION_RC_D REG ON F.COA_GEO_PC_KEY = REG.GEO_PC_KEY
        INNER JOIN DMDIM.DM_DIM_PRODUCT_T_D T9 ON F.PROD_KEY = T9.PROD_KEY 
        AND T9.PROD_POV_ID = 1
        INNER JOIN DMDIM.DM_DIM_PROJECT_D J ON F.PROJ_KEY = J.PROJ_KEY
        INNER JOIN DMDIM.DM_DIM_CONTRACT_D HT ON HT.CONTRACT_KEY = F.CONTRACT_KEY
        LEFT JOIN DMCOMMON.DWR_CONFIG_DOMESTIC_FINANCE_V FIN ON F.COA_COMPANY_KEY = FIN.COMPANY_KEY 
        AND F.COA_GEO_PC_KEY = FIN.GEO_PC_KEY
        LEFT JOIN DMAR.DWB_FMD_DIM_INVOICE_PAY_PLAN_D PP ON F.AR_INVOICE_PAY_PLAN_ID = PP.AR_INVOICE_PAY_PLAN_ID 
        AND PP.PERIOD_ID = '202406'
        LEFT JOIN DMARDI.DWR_DIM_AR_INVOICE_V INV ON F.AR_INVOICE_ID = INV.AR_INVOICE_ID
        INNER JOIN DMARDI.DWR_DIM_AR_APPLICATION_V APP ON F.AR_APPLICATION_RECORD_ID = APP.AR_APPLICATION_RECORD_ID
        INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_V RCP ON F.AR_RECEIPT_RECORD_ID = RCP.AR_RECEIPT_RECORD_ID
        INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_TYPE_V RT ON RCP.RECEIPT_RECORD_TYPE_ID = RT.AR_RECEIPT_TYPE_ID
        LEFT JOIN (
        SELECT C
            .CONTRACT_KEY,
            D.COMPANY_KEY,
            R.FIRST_SHIP_DATE 
        FROM
            DMDIM.dm_dim_contract_d C,
            DMDIM.DM_DIM_COMPANY_D D,
            DMARDI.DWR_CTRCT_FIRST_SHIP_DATE_R R 
        WHERE
            C.CONTRACT_ID = R.CONTRACT_ID 
            AND D.COMPANY_ID = R.COMPANY_ID 
        ) FR ON F.CONTRACT_KEY = FR.CONTRACT_KEY 
        AND F.COA_COMPANY_KEY = FR.COMPANY_KEY
        INNER JOIN DMDIM.DM_DIM_SALES_MODE_D MO ON F.SALES_MODE_KEY = MO.SALES_MODE_KEY
        JOIN DMDIM.DM_DIM_JOURNAL_SOURCE_D T29 ON F.JE_SOURCE_ID = T29.JE_SOURCE_ID
        JOIN DMDIM.DM_DIM_JOURNAL_CATEGORY_D T30 ON F.JE_CATEGORY_ID = T30.JE_CATEGORY_ID 

經最佳化後,執行計劃如下圖所示,維度表進行了分割槽剪枝,資料量減少,緩解了資料傾斜,也避免了關聯下盤的問題。
image.png
image.png

【場景二】f left join 維度表 on f.period_id = 維度表.period_id and 維度表.period_id = ‘會計期’

2.1、【問題描述】

主表和維度表關聯過程中將會計期作為關聯條件,同時還為維度表會計期進行賦值,可能會產生資料傾斜未識別的情況

2.2、【原始SQL】

FROM
        dmdp.dm_dpc_inv_m_dtl_f_TEM_A LT1
        LEFT JOIN dmcommon.dm_dim_prod_key_r LT2 ON LT1.prod_key = LT2.old_key 
        AND LT1.period_id = LT2.period_id 
        AND LT2.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_reg_key_r LT3 ON LT1.period_id = LT3.period_id 
        AND LT1.geo_pc_key = LT3.old_key 
        AND LT3.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT4 ON LT1.period_id = LT4.period_id 
        AND LT1.account_dept_cust_key = LT4.old_key 
        AND LT4.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_proj_key_r LT5 ON LT1.period_id = LT5.period_id 
        AND LT1.proj_key = LT5.old_key 
        AND LT5.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT6 ON LT1.period_id = LT6.period_id 
        AND LT1.enterprise_cust_key = LT6.old_key 
        AND LT6.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_rep_key_r LT7 ON LT1.period_id = LT7.period_id 
        AND LT1.report_item_id = LT7.old_key 
        AND LT7.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT8 ON LT1.period_id = LT8.period_id 
        AND LT1.supply_center_key = LT8.old_key 
        AND LT8.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_inv_key_r LT9 ON LT1.period_id = LT9.period_id 
        AND LT1.inventory_class_key = LT9.old_key 
        AND LT9.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_bus_key_r LT10 ON LT1.period_id = LT10.period_id 
        AND LT1.business_status_key = LT10.old_key 
        AND LT10.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_hisi_key_r LT11 ON LT1.period_id = LT11.period_id 
        AND LT1.hisi_prod_key = LT11.old_key 
        AND LT11.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_inv_org_key_r LT12 ON LT1.period_id = LT12.period_id 
        AND LT1.inventory_org_key = LT12.old_key 
        AND LT12.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT13 ON LT1.period_id = LT13.period_id 
        AND LT1.end_cust_key = LT13.old_key 
        AND LT13.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT14 ON LT1.period_id = LT14.period_id 
        AND LT1.sign_cust_key = LT14.old_key 
        AND LT14.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT15 ON LT1.period_id = LT15.period_id 
        AND LT1.agent_distribution_cust_key = LT15.old_key 
        AND LT15.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_com_key_r LT16 ON LT1.period_id = LT16.period_id 
        AND LT1.company_key = LT16.old_key 
        AND LT16.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_con_key_r LT17 ON LT1.period_id = LT17.period_id 
        AND LT1.contract_key = LT17.old_key 
        AND LT17.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_con_key_r LT18 ON LT1.period_id = LT18.period_id 
        AND LT1.loan_contract_key = LT18.old_key 
        AND LT18.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT19 ON LT1.period_id = LT19.period_id 
        AND LT1.target_supply_center_key = LT19.old_key 
        AND LT19.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_subinventory_key_r LT20 ON LT1.period_id = LT20.period_id 
        AND LT1.subinventory_key = LT20.old_key 
        AND LT20.PERIOD_ID = 202406 
    WHERE
        1 = 1 
    AND partition_value IN ( 0, 1 )

2.3、【效能分析】

image.png
image.png
上圖的執行計劃可以看出,在主表一開始關聯過程中就存在資料傾斜,導致SQL執行效能差。
image.png
image.png
詳細執行計劃中,雖然維度表進行了分割槽剪枝,但由於使用了 left join,導致關聯條件中維度表的常量period_id不能直接賦值給主表period_id,主表關聯後的結果重分佈時將period_id作為了分佈鍵之一,這會影響最佳化器的傾斜最佳化。
可以將f.period_id = 維度表.period_id這一關聯條件刪掉,對sql進行如下改寫

FROM
        dmdp.dm_dpc_inv_m_dtl_f_TEM_A LT1
        LEFT JOIN dmcommon.dm_dim_prod_key_r LT2 ON LT1.prod_key = LT2.old_key 
        AND LT2.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_reg_key_r LT3 ON LT1.geo_pc_key = LT3.old_key 
        AND LT3.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT4 ON LT1.account_dept_cust_key = LT4.old_key 
        AND LT4.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_proj_key_r LT5 ON LT1.proj_key = LT5.old_key 
        AND LT5.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT6 ON LT1.enterprise_cust_key = LT6.old_key 
        AND LT6.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_rep_key_r LT7 ON LT1.report_item_id = LT7.old_key 
        AND LT7.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT8 ON LT1.supply_center_key = LT8.old_key 
        AND LT8.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_inv_key_r LT9 ON LT1.inventory_class_key = LT9.old_key 
        AND LT9.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_bus_key_r LT10 ON LT1.business_status_key = LT10.old_key 
        AND LT10.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_hisi_key_r LT11 ON LT1.hisi_prod_key = LT11.old_key 
        AND LT11.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_inv_org_key_r LT12 ON LT1.inventory_org_key = LT12.old_key 
        AND LT12.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT13 ON LT1.end_cust_key = LT13.old_key 
        AND LT13.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT14 ON LT1.sign_cust_key = LT14.old_key 
        AND LT14.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT15 ON LT1.agent_distribution_cust_key = LT15.old_key 
        AND LT15.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_com_key_r LT16 ON LT1.company_key = LT16.old_key 
        AND LT16.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_con_key_r LT17 ON LT1.contract_key = LT17.old_key 
        AND LT17.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_con_key_r LT18 ON LT1.loan_contract_key = LT18.old_key 
        AND LT18.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT19 ON LT1.target_supply_center_key = LT19.old_key 
        AND LT19.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_subinventory_key_r LT20 ON LT1.subinventory_key = LT20.old_key 
        AND LT20.PERIOD_ID = 202406 
    WHERE
        1 = 1 
    AND partition_value IN ( 0, 1 )

改寫後,執行計劃如下所示

image.png

可以看出,執行計劃不但進行了分割槽剪枝,同時最佳化器還進行了傾斜最佳化,提高了SQL執行效能

點選關注,第一時間瞭解華為雲新鮮技術~

相關文章