Teradata SQL調優

miguelmin發表於2008-11-17

1.最佳化過程:依照執行時間,資料量和複雜度來定位瓶頸。檢視sql執行計劃,判斷其合理性。

效能監控 ==》目標選取 ==》效能分析 ==》過程最佳化 ==》執行跟蹤(效能監控)

注意:每個過程中都會產生必須的文件

[@more@]2.效能分析:

• Review PDM

--表定義 --PI的選擇 --表的記錄數與空間佔用

• Review SQL

--關聯的表 --邏輯處理複雜度 --整體邏輯 --多餘的處理

• 測試執行

--響應時間

• 檢視EXPLAIN

--瓶頸定位

3.過程最佳化:

• 業務規則理解

--合理選取資料訪問路徑

• PDM設計

--調整PDM

• SQL寫法不最佳化,忽略了Teradata的機理與特性

--調整SQL

• Teradata最佳化器未得到足夠的統計資訊

--Collect Statistics

4.Multiple Insert/select --&gt Multi-Statement Insert/Select

* 並行插入空表不記錄Transient Journal

* 充分利用Teradata向空表Insert較快以及並行操作的特性如:

• 現狀 INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... )

SELECT … FROM SRC1 ;

INSERT INTO ${TARGETDB}.DES

(Party_Id ,Party_Name ... )

SELECT … FROM SRC2 ;

INSERT INTO ${TARGETDB}.DES

(Party_Id ,Party_Name ... )

SELECT … FROM SRC3 ;

說明:序列執行,多個Transaction

• 最佳化後: INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... )

SELECT … FROM SRC1

;INSERT INTO ${TARGETDB}.DES

(Party_Id ,Party_Name ... )

SELECT … FROM SRC2

;INSERT INTO ${TARGETDB}.DES

(Party_Id ,Party_Name ... )

SELECT … FROM SRC3 ;

說明:並行執行,單個Transaction

5.Insert/Select with Union/Union all --&gt Multi-Statement Insert/Select

* Union 需要排除重複記錄,Union all雖不需要排重,但都需要佔用大量的Spool空間,都需要進行重新組織資料

如:現狀:

INSERT INTO ${TARGETDB}.DES

(Party_Id ,Party_Name ... )

SELECT … FROM SRC1 ;

UNION ALL SELECT … FROM SRC2 ;

UNION ALL SELECT … FROM SRC3 ;

調整後:

INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... )

SELECT … FROM SRC1

;INSERT INTO ${TARGETDB}.T01_DES

(Party_Id ,Party_Name ... )

SELECT … FROM SRC2

;INSERT INTO ${TARGETDB}.T01_DES

(Party_Id ,Party_Name ... )

SELECT … FROM SRC3 ;

6.排除重複記錄

* 針對單表內的重複記錄使用ROW_ NUMBER函式排重

* 排重方式多了一層子查詢

* 增加了大量的資料重新分佈的時間

現狀:

……

INSERT INTO ${TARGETDB}.T01_INDIV (Party_Id ,Party_Name ... )

SELECT COALESCE(b1.Party_Id,'-1') ,

COALESCE(TRIM(b1.Party_name),'') ...

FROM

( select party_id party_name, … ,

ROW_NUMBER() OVER

(PARTITION BY Party_Id ORDER BY Party_Name )

as rownum from ${TEMPDB}.T01_INDIV b1 … ) AA

where AA.rownum = 1 ……

建議做法:

INSERT INTO ${TEMPDB}.T01_INDIV …

INSERT INTO ${TEMPDB}.T01_INDIV …

……

INSERT INTO ${TARGETDB}.T01_INDIV (Party_Id ,Party_Name ... )

SELECT party_id party_name, …

From ${TEMPDB}.T01_INDIV b1

Qualify ROW_NUMBER() OVER

(PARTITION BY Party_Id ORDER BY Party_Name ) = 1

• 運用Qualify + ROW_ NUMBER函式

• SQL語句簡潔明瞭

• 避免子查詢

最佳化前explain:

……

4) We do an all-AMPs STAT FUNCTION step from PTEMP.VT_T01_INDIV_cur by way of an all-rows scan with no residual conditions into Spool 5 (Last Use), which is assumed to be redistributed by value to all AMPs. The result rows are put into Spool 3 (all_amps), which is built locally on the AMPs.

5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (all_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no confidence to be 6,781,130 rows. The estimated time for this step is 16.01 seconds.

6) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan with a condition of ("ROWNUMBER = 1") into Spool 8 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 8 by row hash. The result spool file will not be cached in memory. The size of Spool 8 is estimated with no confidence to be 6,781,130 rows. The estimated time for this step is 1 minute.

7) We do an all-AMPs MERGE into PDATA.T01_INDIV from Spool 8 (Last Use).

最佳化後explain:

……

4) We do an all-AMPs STAT FUNCTION step from PTEMP.VT_T01_INDIV_cur by way of an all-rows scan with no residual conditions into Spool 5 (Last Use), which is assumed to be redistributed by value to all AMPs. The result rows are put into Spool 3 (all_amps), which is built locally on the AMPs.

5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan with a condition of ("Field_10 = 1") into Spool 1 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 1 by row hash. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no confidence to be 6,781,130 rows. The estimated time for this step is 1 minute.

6) We do an all-AMPs MERGE into PDATA.T01_INDIV from Spool 1 (Last Use).

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/16723161/viewspace-1013482/,如需轉載,請註明出處,否則將追究法律責任。

相關文章