瞭解GaussDB效能調優之隱式轉換,解決慢SQL問題

华为云开发者联盟發表於2024-12-10

一、問題現象

某專案上一段慢SQL,相關技術人員反饋條件中用到a.systemdate >= 20240422 and a.systemdate <= 20240422 執行耗時,而把條件改為 a.systemdate = 20240422這樣的點查詢後,執行耗時就變快了。使用者認為這兩個條件是一樣的,那最佳化器生成的執行計劃也應該要一樣。

二、技術背景

SQL執行計劃是一個節點樹,顯示GaussDB執行一條SQL語句時執行的詳細步驟。每一個步驟為一個資料庫運算子。

使用EXPLAIN命令可以檢視最佳化器為每個查詢生成的具體執行計劃。EXPLAIN給每個執行節點都輸出一行,顯示基本的節點型別和最佳化器為執行這個節點預計的開銷值。影響SQL語句計劃的優劣有很多方面,其中表的欄位型別應使用一致的資料型別,表關聯列儘量使用相同的資料型別。如果表關聯列資料型別不同,資料庫必須動態地轉化為相同的資料型別進行比較,這種轉換會帶來一定的效能開銷,也稱為 “隱式轉換”,也會影響到最佳化器生成的執行計劃的不同。

三、處理過程

3.1 建立表結構DDL定義

CREATE TABLE test1 (
	systemdate int4 NOT NULL,
	cbsserverid varchar(4) NOT NULL,
	busidate int4 NOT NULL,
	sno int4 NOT NULL,
	busiflowid varchar(32),
	trdsysid varchar(4),
	coreid varchar(4),
	operdate int4,
	cleardate int4,
	relativesno int4,
	busitype varchar(16),
	accountingpoint bpchar(1),
	bookkeepingpoint bpchar(1),
	settgroup varchar(32),
	settunit varchar(32) NOT NULL,
	settname varchar(256),
	fundacct varchar(32),
	fundunit varchar(32),
	fortuneid int8,
	orgid varchar(32),
	brhid varchar(32),
	custkind varchar(8),
	custgroup varchar(8),
	fundkind varchar(8),
	fundlevel varchar(8),
	fundgroup varchar(8),
	curcode varchar(3),
	bankcode varchar(10),
	fundeffect numeric(20,2),
	fundbal numeric(20,2),
	secuid varchar(32),
	market varchar(2),
	mainseat varchar(16),
	trdseat varchar(16),
	stkcode varchar(32),
	stktype varchar(2),
	stkname varchar(128),
	stkeffect numeric(20,4),
	stkbal numeric(20,4)
	CONSTRAINT cbs_node_logasset_pkey PRIMARY KEY (systemdate,custid,cbsserverid,sno,busidate)
);
CREATE INDEX idx_cbs_node_logasset_custid ON fs_das.fs_das_his.cbs_node_logasset (custid);
CREATE INDEX idx_cbs_node_logasset_fundacct ON fs_das.fs_das_his.cbs_node_logasset (fundacct);
CREATE TABLE test2(
	systemdate int4 NOT NULL,
	cbsserverid varchar(4) NOT NULL,
	custid int8 NOT NULL,
	corpid int4,
	serverid int4 NOT NULL,
	bizdate int4 NOT NULL,
	sno varchar(32) NOT NULL,
	fee_one_yhs numeric(12,2),
	fee_one_ghf numeric(12,2),
	fee_one_qsf numeric(12,2),
	fee_one_jygf numeric(12,2),
	fee_one_jsf numeric(12,2),
	fee_one_zgf numeric(12,2),
	fee_one_qtf numeric(12,2),
	fee_one_fxj numeric(12,2),
	remark varchar(1024),
	settrate numeric(12,8),
	moneytype_wb bpchar(1),
	clearedamt_wb numeric(19,2),
	fee_sxf_wb numeric(12,2),
	fee_jsxf_wb numeric(12,2),
	fee_two_yhs_wb numeric(12,2),
	fee_two_ghf_wb numeric(12,2),
	fee_two_qsf_wb numeric(12,2),
	fee_two_jygf_wb numeric(12,2),
	fee_two_jsf_wb numeric(12,2),
	fee_two_zgf_wb numeric(12,2),
	fee_two_qtf_wb numeric(12,2),
	CONSTRAINT cbs_node_logasset_ext_pk PRIMARY KEY (systemdate,custid,cbsserverid,sno,bizdate)
);

3.2 執行計劃分析

explain analyze
select 
rownum + 0 as posstr, 
t.* 
from 
(
select  
a.busidate as bizdate,
a.cleardate,
a.orderdate, 
a.orgid,
a.settunit as custid,
a.settname as custname, 
a.fundacct as fundid,
case when a.fortuneid = 0 then '' else a.fortuneid end as fortuneid,
a.secuid, 
LPAD(CAST(a.busitype AS VARCHAR), 6, '0') as cbsbusitype,
a.curcode as moneytype,
a.market, 
a.stktype,
a.stkcode,
a.stkname, 
a.fundeffect,
a.fundbal,
a.stkbal, 
a.orderid, 
a.orderqty,
cast(a.orderprice as numeric(12,3)) as orderprice, 
a.matchtime, 
a.matchcode,
a.matchtimes,a.matchqty, 
cast(a.matchprice as numeric(12,3)) as matchprice,
a.matchamt,
cast(a.fee_yhs as numeric(19,3)) as feeyhs,
cast(a.fee_seat as numeric(19,3)) as feesxf, 
cast(a.fee_ghf as numeric(19,3)) as feeghf,
cast(a.fee_qsf as numeric(19,3)) as feeqsf,
cast(a.fee_jygf as numeric(19,3)) as feejygf,
cast(a.fee_jsf as numeric(19,3)) as feejsf,
cast(a.fee_zgf as numeric(19,3)) as feezgf, 
cast(b.fee_one_fxj as numeric(19,3)) as feeonefxj, 
cast(a.fee_chjzf as numeric(19,3)) as feechjzf,a.bankcode,
'' as bankname,COALESCE(d.businame,'') digestname, 
left(cast(to_date(a.orderdate,'yyyy-mm-dd') as varchar(32)), 10) || ' ' || case when 10 > cast(a.ordertime/1000000 as int) then '0' || cast(cast(a.ordertime/1000000 as int) as varchar(8)) else cast(cast(a.ordertime/1000000 as int) as varchar(8)) end || ':' || case when 10 > cast(a.ordertime/10000%100 as int) then '0' || cast(cast(a.ordertime/10000%100 as int) as varchar(8)) else cast(cast(a.ordertime/10000%100 as int) as varchar(8)) end || ':' || case when 10 > cast(a.ordertime/100%100 as int) then '0' || cast(cast(a.ordertime/100%100 as int) as varchar(8)) else cast(cast(a.ordertime/100%100 as int) as varchar(8)) end as ordertime, a.fee_jjgsglf as jjglffee, 
a.stkrealsettdate, a.fundrealsettdate 
from test1  a        -- 主表
left join test2 b   -- 主表的擴充套件表(主表左關聯他的擴充套件表取它放在擴充套件表中的一些資訊)
on a.systemdate = b.systemdate    -- 按年分割槽的鍵
 and a.custid = b.custid 
 and a.cbsserverid = b.cbsserverid 
 and a.sno = b.sno  -- 兩邊型別改為一致,執行計劃走的是引數化路徑,這樣就快了。
 and a.busidate = b.bizdate
     -- (systemdate,custid,cbsserverid,sno,busidate) 主表的聯合主鍵
     -- (systemdate,custid,cbsserverid,sno,bizdate)  擴充套件表的聯合主鍵
left join cbs_comm_busitype d   -- d表沒有過濾條件
  on a.busitype = d.busitype 
WHERE 
--a.systemdate >= 20240422 and a.systemdate <= 20240422  
--分割槽鍵                                             
-- actual time = 38564.818
  a.systemdate = 20240422  -- 分割槽的鍵 上面一句的條件,應該跟這個條件是等價的。為什麼執行時間差那麼久呢?   -- actual time = 110.894
  and a.fundacct = '880300015593' 
  and a.custid = 106931062
  and a.fortuneid ='0' 
order by a.systemdate ,a.sno desc
limit 10 offset 0 ) t;
  • 針對上面 systemdate >= 20240422 and a.systemdate <= 20240422 這個條件計劃最佳化器估算的不準,導致執行運算元NestedLoop。

  • 而對於條件 a.systemdate = 20240422 這種寫法,最佳化器的估算讓它執行運算元hasjoin。

  • 條件的這兩種寫法,看起來是一樣的,執行結果等價的啊,為什麼GaussDB最佳化器解析的執行計劃又不同了呢?

  • 第一種條件a.systemdate >= 20240422 and a.systemdate <= 20240422下,客戶提供的的執行計劃如下:

注:上圖可以看到最佳化器基於cost估算的值,讓它走了Nested Loop的方式,導致最終actual time=38564.806ms。上面有一個Select Partition 1..2是由於d表沒有條件,並不是只有分割槽鍵一個欄位關聯了,就一定會走分割槽剪枝的,所以走了全表掃描(要想走分割槽剪枝,必須得具備有剪枝的條件)。

  • 第二種條件a.systemdate = 20240422 點查下的執行計劃如下:

注:上圖可以看到最佳化器基於成本估算的值,讓他走了hashjoin的方式,導致最終actual time=110.894ms。

四、處理結果

把test1表欄位sno 跟關聯表test2表欄位sno的資料型別修改為一致後測試,避免進行隱式轉換,計劃執行規則路徑,計劃執行時間由actual time=38564.806ms,提升到actual time=110.894ms。

五、總結

條件1:a.systemdate >= 20240422 and a.systemdate <= 20240422 -- actual time =38564.806ms

條件2:a.systemdate = 20240422 -- actual time = 110.894ms

上面兩個條件寫法雖然在邏輯上是等價的,最佳化器可能會認為有多個可能的結果(儘管實際上只有一個),兩個條件的成本估算的值不同,走的join方式就不一樣了,進一步導致執行計劃的時間就有差別。

華為開發者空間,匯聚鴻蒙、昇騰、鯤鵬、GaussDB、尤拉等各項根技術的開發資源及工具,致力於為每位開發者提供一臺雲主機、一套開發工具及雲上儲存空間,讓開發者基於華為根生態創新。點選連結,免費領取您的專屬雲主機

相關文章