Oracle alter index rebuild 說明

在ITPUB 論壇上看到的一個帖子，很不錯。根據論壇的帖子重做整理了一下。原文連結如下：

alter index rebuild online引發的血案

http://www.itpub.net/thread-1445427-1-1.html

一. 官網說明

在MOS 上的一篇文章講到了rebuild online 和offline的區別：

Index Rebuild Is Hanging Or Taking Too Long [ID 272762.1]

Symptoms:
=========
       Performance issues while rebuilding very large indexes. The offline rebuilds of their index is relatively quick -finishes in 15 minutes. Issuing index rebuild ONLINE statement => finishes in about an hour. This behavior. of ONLINE index rebuilds makes it a non-option for large tables as it just takes too long to scan the table to rebuild the index. The offline may not be feasible due to due to the 24/7 nature of the database. This may be a loss of functionality for such situations. If we attempt to simultaneously ONLINE rebuild the same indexes we may encounter hanging behavior. indefinitely (or more than 6 hours).

DIAGNOSTIC ANALYSIS:
--------------------
       We can trace the sessions rebuilding the indexes with 10046 level 12. Comparing the IO reads for the index-rebuild and the index-rebuild-online reveals the following:
       ONLINE index rebuilds： It scans the base table and it doesn't scan the blocks of the index.
       OFFLINE index rebuilds：It scans the index for the build operation.

This behaviour is across all versions.

Cause/Explanation
=============
When you rebuild index online, it will do a full table scan on the base table. At the same time it will maintain a journal table for DML data, which has changed during this index rebuilding operation. So it should take longer time, specially if you do lots of DML on the same table,while rebuilding index online.
-- rebuild index online的時候，會選擇全表掃描，同時會維護一箇中間日誌表，用來記錄在rebuild 期間的增量資料，原理類似於物化檢視日誌，日誌表是一個索引組織表(IOT)，這張中間表只有插入，不會有刪除和修改操作，而且只有主鍵條件查詢，正是IOT最合適的場景。

On the other hand, while rebuilding the index without online option, Oracle will grab the index in X-mode and rebuild a new index segment by selecting the data from the old index. So here we are not allowing any DML on the table hence there is no journal table involved and it is doing an index scan. Hence it will be pretty fast.

--rebuild offline時，選擇的6模式的X 鎖，它根據old index 來rebuild。因此不允許進行DML，也就沒有中間表。因此也比較塊。

Solution/Conclusion:
===========
- The ONLINE index rebuild reads the base table, and this is by design.
- Rebuilding index ONLINE is pretty slow.
- Rebuilding index offline is very fast, but it prevents any DML on the base table.

二. rebuild index 說明

有關鎖的模式資訊如下：

鎖模式	鎖描述	解釋	SQL操作
0	none
1	NULL	空	Select
2	SS(Row-S)	行級共享鎖，其他物件只能查詢這些資料行	Select for update、Lock for update、Lock row share
3	SX(Row-X)	行級排它鎖，在提交前不允許做DML操作	Insert、Update、 Delete、Lock row share
4	S(Share)	共享鎖: 阻止其他DML操作	Create index、Lock share
5	SSX(S/Row-X)	共享行級排它鎖：阻止其他事務操作	Lock share row exclusive
6	X(Exclusive)	排它鎖：獨立訪問使用	Alter table、Drop able、Drop index、Truncate table 、Lock exclusive

鎖死鎖阻塞 Latch 等待詳解

http://blog.csdn.net/tianlesoftware/archive/2010/08/20/5822674.aspx

DML操作一般要加兩個鎖，一個是對錶加模式為3的TM鎖，一個是對資料行的模式為6的TX鎖。只要操作的不是同一行資料，是互不阻塞的。

在rebuild index online 的開始和結束階段時，需要短暫的對錶持有模式為4的TM鎖的，當獲取到4級別的鎖之後，才降為2級。如果rebuild online一直沒獲取到4級別的鎖,那麼相關的DML全部產生等待。在執行期間只持有模式2的TM鎖，不會阻塞DML操作。在Oracle 11g之後,oracle做了特殊處理,後續的dml不會被rebuild online的4級別鎖阻塞.

所以如果在執行rebuild index online前長事務，並且併發量比較大，則一旦執行alter index rebuild online，可能因為長事務阻塞，可能導致系統瞬間出現大量的鎖，對於壓力比較大的系統，這是一個不小的風險。這是需要迅速找出導致阻塞的會話kill掉，rebuild index online一旦執行，不可輕易中斷，否則可能遇到ORA-08104。

MOS 的文件：

Session Was Killed During The Rebuild Of Index ORA-08104 [ID 375856.1]

While running an online index rebuild your session was killed or otherwise terminated abnormally. You are now attempting to run the index rebuild again and is throwing the error:
ORA-08104: this index object ##### is being online built or rebuilt

關於這個錯誤NiGoo 同學的blog 有說明，連結如下：

http://www.ningoo.net/html/2007/dba_memo_online_rebuild_index_encounter_ora-08104.html

根據以上說明，我們可以知道在進行online rebuild 的時候，Oracle 會修改如下資訊：

（1）修改ind$中索引的flags，將該flags+512. 關於這個flags的含義，在下面的實驗中進行說明。

（2）在該使用者下建立一個journal table 來儲存在rebuild期間的增量資料。該表明名稱: sys_journal_.

如果異常結束online rebuild操作，那麼oracle就沒及時清理journal table和ind$的flags標誌位，系統會認為online rebuild還在操作。

當然SMON 程式會來處理這些臨時段。在maclean 同學(10g,11g OCM)的Blog裡提到了功能：

瞭解你所不知道的SMON功能(一):清理臨時段

http://www.oracledatabase12g.com/archives/smon-cleanup-temporary-segment.html

對於永久表空間上的temporary segment，SMON會三分鐘清理一次(前提是接到post)，如果SMON過於繁忙那麼可能temporary segment長期不被清理。temporary segment長期不被清理可能造成一個典型的問題是:在rebuild index online失敗後，後續執行的rebuild index命令要求之前產生的temporary segment已被cleanup，如果cleanup沒有完成那麼就需要一直等下去。

如果SMON 不能及時清理，在操作時就會報ORA-08104的錯誤。

在Oracle10gR2中可以使用dbms_repair.online_index_clean手工清理這些資訊，在Oracle 9i下，需要打Bug 3805539 後才可以使用該工具。

手工處理的步驟如下：

（1）先檢視ind$ flags 標誌，如果不正確，就減去512.

sql>update ind$ set flags=flags-512 where obj#=

（2）drop journal table，這個步驟可能會報資源忙，因為有大量的日誌正在插入，可以反覆重試一下。

sql>drop table .sys_journal_;

注意：

這個步驟不能反，如果先刪除sys_journal_臨時表，然後再修改index的flags狀態，則會報出ora-600 [4610]號錯誤，即資料字典不一致的錯誤。

官閘道器於dbms_repair.online_index_clean 的說明：

ONLINE_INDEX_CLEAN Function

This function performs a manual cleanup of failed or interrupted online index builds or rebuilds. This action is also performed periodically by SMON, regardless of user-initiated cleanup.

This function returns TRUE if all indexes specified were cleaned up and FALSE if one or more indexes could not be cleaned up.

Syntax

DBMS_REPAIR.ONLINE_INDEX_CLEAN (

object_id IN BINARY_INTEGER DEFAULT ALL_INDEX_ID,

wait_for_lock IN BINARY_INTEGER DEFAULT LOCK_WAIT)

RETURN BOOLEAN;

Parameters

Parameter	Description
object_id	Object id of index to be cleaned up. The default cleans up all object ids that qualify.
wait_for_lock	This parameter specifies whether to try getting DML locks on underlying table [[sub]partition] object. The default retries up to an internal retry limit, after which the lock get will give up. If LOCK_NOWAIT is specified, then the lock get does not retry.

因此在做rebuild index online的時候，一定要在開始和結束階段觀察系統中是否有長事務的儲存，對於併發量較大的系統，最嚴重的後果，可能在這兩個關鍵點導致資料庫產生大量鎖等待，系統負載飆升，甚至當機。

三. rebuild index 的一些測試

NiGoo 的blog 上示例：

http://www.ningoo.net/html/2008/lock_mechanism_of_rebuild_index_online.html

3.1 準備工作

先建立一個測試表：

SYS@anqing2(rac2)> create table rb_test(id number,con varchar2(20));

Table created.

插入一些測試資料：

SYS@anqing2(rac2)> begin

2 for i in 1..1000000 loop

3 insert into rb_test values(i,'hello DBA');

4 if mod(i,1000)= 0 then

5 commit;

6 end if;

7 end loop;

8 end;

9 /

PL/SQL procedure successfully completed.

在ID 欄位上建立索引：

SYS@anqing2(rac2)> create index idx_rbt_id on rb_test(id);

Index created.

3.2 測試預設的rebuild,即rebuild offline

3.2.1 場景1

session 1:

SYS@anqing2(rac2)> alter index idx_rbt_id rebuild;

因為表比較大，執行需要一定的時間，我們到session 2上檢視lock。

session 2：

SYS@anqing2(rac2)> select sid,type,id1,id2,lmode,request,block from v$lock where type in('DL','TM','TX');

SID TY ID1 ID2 LMODE REQUEST BLOCK

---------- -- ---------- ---------- ---------- ---------- ----------

147 DL 53366 0 3 0 2

147 TM 53366 0 4 0 2

這時候持有的LMODE 4的鎖，即在rebuild 期間我們不能執行DML 操作。

這裡的lock 型別說明：

TM : DML enqueue 表級鎖

TX : Transaction enqueue 行級鎖

DL : Direct loader parallel index create

更多資訊，參考聯機文件：

V$LOCK

http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/dynviews_1147.htm#REFRN30121

3.2.2 場景2

我們update 一下rb_test 表，但不提交，在rebuild offline 看一下：

session 1：

SYS@anqing2(rac2)> update rb_test set con='hello Dave!' where id=168;

1 row updated.

session 2:

SYS@anqing2(rac2)> alter index idx_rbt_id rebuild;

alter index idx_rbt_id rebuild

*

ERROR at line 1:

ORA-00054: resource busy and acquire with NOWAIT specified

我們這時候rebuild 的時候報錯。

session 1：提交修改

SYS@anqing2(rac2)> commit;

Commit complete.

session 2：

SYS@anqing2(rac2)> alter index idx_rbt_id rebuild;

Index altered.

3.2.3 場景3

在索引rebuild 期間，我們去update table，並檢視v$lock

先檢視一個正常update的時間：

SYS@anqing2(rac2)> set timing on

SYS@anqing2(rac2)> update rb_test set con='hello Dave!' where id=168;

1 row updated.

Elapsed: 00:00:00.00

SYS@anqing2(rac2)> commit;

session 1:

SYS@anqing2(rac2)> alter index idx_rbt_id rebuild;

Index altered.

Elapsed: 00:00:11.76

session 2:

SYS@anqing2(rac2)> update rb_test set con='hello Dave!' where id=168;

1 row updated.

Elapsed: 00:00:11.00 -- 這個時間是在等rebuild 操作的結束

session 3: 檢視鎖資訊

SYS@anqing2(rac2)> select sid,type,id1,id2,lmode,request,block from v$lock where type in('DL','TM','TX');

SID TY ID1 ID2 LMODE REQUEST BLOCK

---------- -- ---------- ---------- ---------- ---------- ----------

147 DL 53366 0 3 0 2

147 TM 53366 0 4 0 1

153 TM 53366 0 0 3 0

SID 153 是我們的update ，它在request mode 為3的lock。

等我們的rebuild 結束，我們在檢視一下鎖資訊：

SYS@anqing2(rac2)> select sid,type,id1,id2,lmode,request,block from v$lock where type in('DL','TM','TX');

SID TY ID1 ID2 LMODE REQUEST BLOCK

---------- -- ---------- ---------- ---------- ---------- ----------

153 TM 53366 0 3 0 2

153 TX 1114130 289 6 0 2

此時我們的update 的已經修改，但還沒有提交。這個就和我們之前說的一致： DML操作一般要加兩個鎖，一個是對錶加模式為3的TM鎖，一個是對資料行的模式為6的TX鎖。

我們在session 2上commit 一下，在檢視一下：

SYS@anqing2(rac2)> select sid,type,id1,id2,lmode,request,block from v$lock where type in('DL','TM','TX');

no rows selected

測試已經沒有相關的lock了。

3.3 測試 rebuild online

3.3.1 場景1

測試步驟：

（1）sesssion 1：先update 一條記錄，但不commit，這時session 持有TM 3 和 TX 6的lock。

（2）session 2： rebuild online

（3）session 3：在進行update 操作

session 1:

SYS@anqing2(rac2)> update rb_test set con='hello Dave!' where id=168;

1 row updated.

SYS@anqing2(rac2)> select sid,type,id1,id2,lmode,request,block from v$lock where type in('DL','TM','TX');

SID TY ID1 ID2 LMODE REQUEST BLOCK

---------- -- ---------- ---------- ---------- ---------- ----------

153 TM 53366 0 3 0 2

153 TX 1179653 288 6 0 2

session 2:

SYS@anqing2(rac2)> alter index idx_rbt_id rebuild online;

-- 測試session 掛住在這

檢視一下lock

SYS@anqing2(rac2)> select sid,type,id1,id2,lmode,request,block from v$lock where type in('DL','TM','TX');

SID TY ID1 ID2 LMODE REQUEST BLOCK

---------- -- ---------- ---------- ---------- ---------- ----------

147 DL 53366 0 3 0 2

153 TM 53366 0 3 0 1

147 TM 53366 0 2 4 2

147 TM 53386 0 4 0 2

153 TX 1179653 288 6 0 2

6 rows selected.

SID 153 是我們之前的update 操作，該DML 只有3和6鎖。 147 是我們的rebuild online。該操作在申請lock mode 為4的鎖。在沒有申請到鎖之前所有的操作都會被掛住，我們到session 3上在開一個update 驗證一下：

session 3：

SYS@anqing2(rac2)> update rb_test set con='hello Dave!' where id=188;

-- 也是掛住的，我們看一下lock

SYS@anqing2(rac2)> select sid,type,id1,id2,lmode,request,block from v$lock where type in('DL','TM','TX');

SID TY ID1 ID2 LMODE REQUEST BLOCK

---------- -- ---------- ---------- ---------- ---------- ----------

147 DL 53366 0 3 0 2

153 TM 53366 0 3 0 1

147 TM 53366 0 2 4 2

147 TM 53386 0 4 0 2

143 TM 53366 0 0 3 0

153 TX 1179653 288 6 0 2

7 rows selected.

這裡多一個SID 143，它在申請 TM 3的鎖，申請成功之後在去申請TX 6. 但沒有拿到TM 3，所以也掛住了。

這個就是我們前面說的一種情況，當前有大量的事務在操作，我們去rebuild online。我們的rebuild online 需要拿到TM 4的lock。如果拿不到就掛在那，並且之後的事務也沒辦法進行。如果這個表的事務操作很頻繁，資料庫就會產生大量鎖等待，系統負載飆升，甚至當機。

這個問題的解決方法就是找到阻止rebuild inline 拿到 TM 4的session。把它kill 掉，這樣rebuild 拿到TM 4之後就會降到TM 2. 這樣其他的DML 就可以操作了。

這點和rebuild offline的區別，如果是當前有事務操作，rebuild offline 就不會執行，直接報ORA-00054: resource busy and acquire with NOWAIT specified的錯誤。

後續操作：

session 1： commit

SYS@anqing2(rac2)> commit;

Commit complete.

檢視lock 資訊：

SYS@anqing2(rac2)> select sid,type,id1,id2,lmode,request,block from v$lock where type in('DL','TM','TX');

SID TY ID1 ID2 LMODE REQUEST BLOCK

---------- -- ---------- ---------- ---------- ---------- ----------

147 DL 53366 0 3 0 2

147 TM 53366 0 2 4 2

147 TM 53386 0 4 0 2

143 TM 53366 0 3 0 1

143 TX 1048602 287 6 0 2

147 TX 983080 291 6 0 2

這裡還有兩個鎖，147 是我們的rebuild online。 143 是我們之前session 3的update DML。

這裡要注意的是，如果我們的session 3 不commit，那麼我們的rebuild online 還是不會結束。因為rebuild online 在開始和結束的時候需要申請去拿TM 4的lock。現在rebuild online 的被我們的session 3阻止了。在開始是被session 1 阻止了。

我們在session 3 提交一下，在檢視lock：

SYS@anqing2(rac2)> commit;

Commit complete.

SYS@anqing2(rac2)> select sid,type,id1,id2,lmode,request,block from v$lock where type in('DL','TM','TX');

no rows selected

rebuild 結束

SYS@anqing2(rac2)> alter index idx_rbt_id rebuild online;

Index altered.

Elapsed: 00:16:44.86

由此可見rebuild online 是個需要謹慎使用的命令。

3.3.2 場景2

在前面說過， kill 掉正在rebuild online 的session。可能會導致在下次rebuild index或者drop，analyze 的時候報ORA-08104的錯誤。因為在異常終止online rebuild操作的時候，oracle沒來得及清理相應的臨時段和標誌位，系統認為online rebuild操作還在進行造成的。

在這裡我們就模擬一下這個操作。

在rebuild online 的時候，按下ctrl + c 結束：

SYS@anqing2(rac2)> alter index idx_rbt_id rebuild online;

alter index idx_rbt_id rebuild online

*

ERROR at line 1:

ORA-01013: user requested cancel of current operation

Elapsed: 00:00:01.78

SYS@anqing2(rac2)> exec dbms_stats.gather_table_stats('SYS','RB_TEST');

PL/SQL procedure successfully completed.

Elapsed: 00:00:07.02

SYS@anqing2(rac2)> alter index idx_rbt_id rebuild online;

Index altered.

Elapsed: 00:00:09.38

沒有報錯。把資料量弄大一點。在試試

SYS@anqing2(rac2)> insert into rb_test select * from rb_test;

1000000 rows created.

Elapsed: 00:00:23.68

SYS@anqing2(rac2)> commit;

Commit complete.

Elapsed: 00:00:00.08

SYS@anqing2(rac2)> insert into rb_test select * from rb_test;

2000000 rows created.

Elapsed: 00:01:54.36

SYS@anqing2(rac2)> commit;

Commit complete.

Elapsed: 00:00:00.00

用kill session 的方法試試：

SYS@anqing2(rac2)> alter system kill session '147,31436';

System altered.

Elapsed: 00:00:01.01

SYS@anqing2(rac2)> alter index idx_rbt_id rebuild online;

alter index idx_rbt_id rebuild online

*

ERROR at line 1:

ORA-00028: your session has been killed

Elapsed: 00:00:10.89

還是沒有報錯：

SYS@anqing2(rac2)> alter index idx_rbt_id rebuild online;

Index altered.

Elapsed: 00:00:51.65

比較頑強啊。有點小崩潰。再來：

SYS@anqing2(rac2)> alter index idx_rbt_id rebuild online;

啟動rebuild online 之後，直接把ssh 強行關閉。

在次ssh 過去，嘗試rebuild online：

SYS@anqing2(rac2)> alter index idx_rbt_id rebuild online;

alter index idx_rbt_id rebuild online

*

ERROR at line 1:

ORA-08104: this index object 53367 is being online built or rebuilt

終於報錯了。不容易啊。不過這個測試也說明了，在rebuild online 期間不能強行關閉ssh。如果在期間斷網的話，估計效果相當。

檢視Flag：

SYS@anqing2(rac2)> select obj#,flags from ind$ where obj#=53367;

OBJ# FLAGS

---------- ----------

53367 514

根據NiGoo blog 上的說明，可以通過ind$的flags檢視是什麼型別的標誌。

sql.bsq 是個總的說明，在dcore.bsq 裡找到了ind$的建立SQL：

/* mutable flags: anything permanent should go into property */
/* unusable (dls) : 0×01 */
/* analyzed : 0×02 */
/* no logging : 0×04 */
/* index is currently being built : 0×08 */
/* index creation was incomplete : 0×10 */
/* key compression enabled : 0×20 */
/* user-specified stats : 0×40 */
/* secondary index on IOT : 0×80 */
/* index is being online built : 0×100 */
/* index is being online rebuilt : 0×200 */
/* index is disabled : 0×400 */
/* global stats : 0×800 */
/* fake index(internal) : 0×1000 */
/* index on UROWID column(s) : 0×2000 */
/* index with large key : 0×4000 */
/* move partitioned rows in base table : 0×8000 */
/* index usage monitoring enabled : 0×10000 */

這裡的0×200 等是十六進位制來表示的。 Flags 是514，其16進位制是是202，514=0×202，表示該索引狀態為index is being online rebuilt : 0×200 + analyzed : 0×02

在上面，我們說減去512. 512 的16進位制是200. 對應的是：/* index is being online rebuilt : 0×200 */。所以，我們在rebuild的時候，會對flags 加上512.

MOS 803008.1 上的說明：

SMON should cleanup the failed online index rebuild operation and so correct this. However, if the table is highly active with transactions, SMON may not be able to get the required lock and so the index will not get cleaned up. In such situations, you can manually cleanup the failed index rebuild using the DBMS_REPAIR.ONLINE_INDEX_CLEAN procedure.

To do this, if activity on the problem table can be stopped, then simply execute:

connect / as sysdba
select dbms_repar.online_index_clean() from dual;
exit

不過這個命令執行沒有成功：

SYS@anqing2(rac2)> select dbms_repair.online_index_clean(53367) from dual;

select dbms_repair.online_index_clean(53367) from dual

*

ERROR at line 1:

ORA-06552: PL/SQL: Statement ignored

ORA-06553: PLS-382: expression is of wrong type

糾結中...

If activity on the table cannot be stopped, then it may be possible to resolve the problem using the following PL/SQL block:

declare
isClean boolean;

begin
isClean := FALSE;
while isClean=FALSE loop
isClean := dbms_repair.online_index_clean(dbms_repair.all_index_id,
dbms_repair.lock_wait);
dbms_lock.sleep(2);
end loop;

exception
when others then
RAISE;
end;
/

或者：

DECLARE

RetVal BOOLEAN;

OBJECT_ID BINARY_INTEGER;

WAIT_FOR_LOCK BINARY_INTEGER;

BEGIN

OBJECT_ID := 53367;

WAIT_FOR_LOCK := NULL;

RetVal := SYS.DBMS_REPAIR.ONLINE_INDEX_CLEAN ();

COMMIT;

END;

/

SYS@anqing2(rac2)> select obj#,flags from ind$ where obj#=53367;

OBJ# FLAGS

---------- ----------

53367 2

在這個測試上也是相當的糾結。之前產生了ORA-08104的錯誤，但是如果之後沒有其他的DML 來操作這張表，那麼online rebuild 產生的lock 一段時間之後就會釋放掉，然後ind$的flag 也會變成2. 即正常狀態。

如果事務A阻止online rebuild申請TM 4鎖，那麼之前所有的事務都會掛住，當事務A commit之後，相關的鎖會釋放，索引也會變成正常狀態。

在執行clean命令的時候，可能會遇到：

ORA-00054: resource busy and acquire with NOWAIT specified

多執行幾次就ok了。應該也是和這個鎖有關。

可能還是環境模擬的有問題。這個測試總感覺怪怪的。糾結啊。

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/235507/viewspace-703335/，如需轉載，請註明出處，否則將追究法律責任。

Oracle alter index rebuild 說明

相關文章