今天發現系統在下午1點左右的時候負載比較高,就抓取了一個最新的awr報告.

	Snap Id	Snap Time	Sessions	Cursors/Session
Begin Snap:	20892	26-Nov-14 13:20:17	3623	5.4
End Snap:	20893	26-Nov-14 13:30:17	3602	5.4
Elapsed:		10.01 (mins)
DB Time:		365.56 (mins)

來看看是否有top的等待事件，對於其它的幾個等待事件都很熟悉了。有些問題早已進行了建議，還在修改中，但是對於第3個等待事件"enq: TX - allocate ITL entry"還是比較陌生的。準備來做一個深入的分析。

Event	Waits	Time(s)	Avg wait (ms)	% DB time	Wait Class
DB CPU		7,322		33.38
db file sequential read	3,833,628	5,031	1	22.94	User I/O
enq: TX - allocate ITL entry	187	3,607	19287	16.44	Configuration
direct path read temp	1,017,363	1,640	2	7.48	User I/O
read by other session	3,873,525	1,612	0	7.35	User I/O

因為得到的是最新的awr報告。所以就在sqllplus上簡單排查了一把。可以看到有兩個session都持有這個等待事件。
SQL> select EVENT,sid,serial# from v$session where event='enq: TX - allocate ITL entry';
EVENT SID SERIAL#
---------------------------------------------------------------- ---------- ----------
enq: TX - allocate ITL entry 5592 25063
enq: TX - allocate ITL entry 11533 51179

檢視一下session的明細資訊。可以看到這個session都是來自同一個客戶端。

SID SERIAL# USERNAME OSUSER MACHINE PROCESS TERMINAL TYPE LOGIN_TIME
---------- ---------- --------------- --------------- -------------------- --------------- --------------- ---------- -------------------
5592 25063 APPC pappwrk01 prod_client_db 1234 unknown USER 2014-11-26 13:03:45
11533 51179 APPC pappwrk01 prod_client_db 1234 unknown USER 2014-11-26 13:03:46

檢視這個session正在執行的sql語句，發現都是對同一個表的update。語句類似
UPDATE DIRECT_DEBIT_REQUEST SET DIRECT_DEBIT_REQUEST.ACCOUNT_ID=..........

到這個地方，問題似乎很清晰了，就是由於表DIRECT_DEBIT_REQUEST的ITL的設定不能夠滿足併發事務的需求而導致的等待。資料塊是oracle能夠發出的最小I/O單位。在資料塊中，資料塊頭部的ITL資訊是至關重要的。
每當一個事務需要修改一個資料塊時，需要在資料塊頭部獲得一個可用的ITL槽，其中記錄了當前事務的id，使用的undo資料塊地址，還有對應的scn,事務是否提交等資訊。如果一個新的事務發現ITL槽已經被使用，會重新
申請一個新的ITL槽，這個過程是動態的，進一步來說，ITL槽的設定是由ini_trans,max_trans來決定的，在10g之後，max_trans引數被忽略了。
SQL> create table trans_test(id number,name varchar2(100)) initrans 1 maxtrans 1;
Table created.

SQL> set linesize 100
SQL> col table_name format a30
SQL> select ini_trans,max_trans,table_name from user_tables where table_name='TRANS_TEST';
INI_TRANS MAX_TRANS TABLE_NAME
---------- ---------- ------------------------------
1 255 TRANS_TEST

關於這個引數的解釋在oracle sql reference中有詳細的解釋。

MAXTRANS Parameter

In earlier releases, the MAXTRANS parameter determined the maximum number of concurrent update transactions allowed for each data block in the segment. This parameter has been deprecated. Oracle now automatically allows up to 255 concurrent update transactions for any data block, depending on the available space in the block.

Existing objects for which a value of MAXTRANS has already been set retain that setting. However, if you attempt to change the value for MAXTRANS, Oracle ignores the new specification and substitutes the value 255 without returning an error.

對於initrans,maxtrans的預設值，表級為1,索引級為2. 一般來說不需要做特別的設定。可以根據業務的需要來配置。
來繼續回到awr報告。可以在"Segments by ITL Waits"裡面得到一個全面的資訊。

Owner	Tablespace Name	Object Name	Subobject Name	Obj. Type	ITL Waits	% of Capture
APPO	INDXH01	MST_LOG_1IX	I2	INDEX PARTITION	12	66.67
APPO	DATAS01	DIRECT_DEBIT_REQUEST		TABLE	3	16.67
APPO	INDXS01	PAYMENT_2IX	A23_B2	INDEX PARTITION	1	5.56
APPO	INDXH01	ACTIVITY_HISTORY_1IX	C90	INDEX PARTITION	1	5.56
SYS	SYSAUX	WRH$_SQL_PLAN_PK		INDEX	1	5.56

從上面的列表我們可以清晰的看到對於ITL wait的情況，可以說明兩點。
首先是我們在發現問題的時候，可能有些操作已經結束了，我們分析的結果不一定是最全面的資訊。就如我們上面所做的分析。定位到問題的是表DIRECT_DEBIT_REQUEST,其實在問題時間段內，一個大的分割槽索引佔用了高達
67%左右的ITL wait。
其次是我們透過awr報告得到了一些詳細的資訊。但是還是不能夠對問題做最終的結論，不能頭痛醫頭，腳痛醫腳，如果能夠以點帶面來排查問題，可能最後的效率要高很多。

對於這個等待事件，處理的思路在metalink上也給出了詳盡的解釋。可以參見

Troubleshooting waits for 'enq: TX - allocate ITL entry' (Doc ID 1472175.1)

To Bottom

解決思路有3種

Increase INITRANS

A)

1) Depending on the number of transactions in the table we need to alter the value of INITRANS. here it has been changed to 50:

		alter table <table name> INITRANS 50;
	

2) Then re-organize the table using move (alter table <table_name> move;)

3) Then rebuild all the indexes of this table as below

		alter index <index_name> rebuild INITRANS 50;
	

Increase PCTFREE

If the issue is not resolved by increasing INITRANS then try increasing PCTFREE. Increasing PCTFREE holds more space back and so spreads the same number of rows over more blocks. This means that there are more ITL slots available overall :
B)

1) Spreading rows into more number of blocks will also helps to reduce this wait event.

		alter table <table name>  PCTFREE 40;
	

2) Then re-organize the table using move (alter table service_T move;)

3) Rebuild index

		alter index index_name  rebuild PCTFREE 40;
	

A Combination of increasing both INITRANS and PCTFREE

1) Set INITRANS to 50 pct_free to 40

		alter table <table_name> PCTFREE 40  INITRANS 50;
	

2) Re-organize the table using move (alter table <table_name> move;)

3) Then rebuild all the indexes of the table as below

		alter index <index_name>  rebuild PCTFREE 40 INITRANS 50;
	

在這個基礎上更近一步，我查詢產品部的文件，發現在專案開始已經對initrans的設定做了建議，但是對於這個問題，不知道什麼原因最後給漏掉了。這樣下來需要對一些重要的表都需要做initrans的改動。本來從awr報告中得到的
的資訊顯示4個物件（表&索引）可能存在initrans設定不足的情況，如果結合專案的情況來看，需要做的變更可能就是上百個物件。需要好好評估。
以下的標準可以參考以下。
對於大表，資料千萬級以上的表，initrans建議設定為8~16
對於中級表，資料量在百萬到千萬級，initrans建議設定為4~8
對於普通的表，initrans建議設定為1~4

關於enq: TX - allocate ITL entry的問題分析

Increase INITRANS

Increase PCTFREE

A Combination of increasing both INITRANS and PCTFREE

相關文章