alter index rebuild online引發的血案
alter index rebuild online引發的血案
‘早上起來沒有一個人……‘,伸手抓起手機,‘喂,……應用hang住了……rac的一節點在手工shutdown……需要到現場……’。趕緊穿上衣服,拿起電腦往客戶辦公室趕。路上還接到客戶電話具體描述了下故障現象。
原來是客戶在rac的其中一個節點進行index rebuild online操作,結果命令發出去不久應用就hang住了,沒有辦法的客戶就直接用了大絕招:把操作的節點資料庫重啟了,現在的問題是shutdown immediate也hang住了。聽到這裡,馬上喊他看日誌,回覆日誌沒報錯,鬱悶的shutdown,如果有大事務在執行,豈不是要rollback死,由於此應用級別很高,等不起,只能死馬當活馬醫了。不管了,先讓它關了在說,先直接作業系統殺客戶端程式,使資源能更快的釋放。於是,指導輸入命令ps –ef|grep oracle|grep LOCAL=NO|grep –v grep|awk ‘{awk print $2}’|xargs kill -9。系統終於關閉了,直接startup……
等我到現場時,系統居然還沒啟動起來,難道真的有大事務需要恢復,難道就是重建索引?
登入伺服器檢視,日誌顯示資料庫正在recover,想起經常的recover幾個小時的情況,我再次對貿然的shutdown很是鬱悶。不過好在是rac,可客戶反應即使連線到好的節點還是不能使用,登入檢視,檢查鎖,檢查等待。發現存在大量的鎖,而且有一個鎖已經2個月了,恰恰就是重建索引的表,明顯這個鎖有問題,直接kill。等一會,另一個節點啟動好了,問應用的情況,也恢復了,就這麼簡單?就是那個鎖的問題?我暈,守一會業務後我就撤退了。
在路上我就在想,rebuild index online怎麼會把整個應用hang死,難道它鎖了整個表,怎麼會呢,在印象中online的同時是可以進行dml操作的啊,奇了怪了,先睡一覺,空了再來重現故障分析……
N天過後……
先上一篇文章,關於rebuild和rebuild online的區別metalink Note:272762.1
大家耐心看完喲,原理是很重要的……
========
- Online Index rebuild takes a long time.
- ONLINE INDEX REBUILD SCANS THE BASE TABLE AND NOT THE INDEX
Symptoms:
=========
Performance issues while rebuilding very large indexes.
- The offline rebuilds of their index is relatively quick -finishes in 15 minutes.
- Issuing index rebuild ONLINE statement => finishes in about an hour.
- This behavior. of ONLINE index rebuilds makes it a non-option for large tables
as it just takes too long to scan the table to rebuild the index. The offline may not be feasible due to due to the 24/7 nature of the database.
- This may be a loss of functionality for such situations.
- If we attempt to simultaneously ONLINE rebuild the same indexes we may encounter hanging behavior. indefinitely (or more than 6 hours).
DIAGNOSTIC ANALYSIS:
--------------------
We can trace the sessions rebuilding the indexes with 10046 level 12.
Comparing the IO reads for the index-rebuild and the index-rebuild-online reveals the following:
-ONLINE index rebuilds
It scans the base table and it doesn't scan the blocks of the index.
-OFFLINE index rebuilds
It scans the index for the build operation.
- This behaviour is across all versions.
Cause
Cause/Explanation
=============
When you rebuild index online,
- it will do a full tablescan on the base table.
- At the same time it will maintain a journal table for DML data, which has
changed during this index rebuilding operation.
So it should take longer time, specially if you do lots of DML on the same table,while rebuilding index online.
On the other hand, while rebuilding the index without online option, Oracle will grab the index in X-mode and rebuild a new index segment by selecting the data from the old index. So here we are
- not allowing any DML on the table hence there is no journal table involved
- and it is doing an index scan
Hence it will be pretty fast.
Fix
Solution/Conclusion:
===========
- The ONLINE index rebuild reads the base table, and this is by design.
- Rebuilding index ONLINE is pretty slow.
- Rebuilding index offline is very fast, but it prevents any DML on the base table.
好,我們現在明白了,rebuild online是可以同時進行dml操作的,但是online會維護一個操作日誌,會使rebuild時間大幅加長。巴拉巴拉……這一切的一切,跟我遇到的應用掛起沒有任何關係啊,難道是應用騙了我,其實應用可以用?還是模擬一下當時的場景再說。
Action:
Conn oracle/oracle
--建表
create table test(a int,b varchar2(64));
--插入資料
begin
for i in 1..1000000 loop
insert into test values(i,'ok');
if i=100 then
commit;
end if;
end loop;
end;
/
--建立索引
create index idx_a1 on test(a);
準備工作做好了……
--更新其中一條記錄,製造一個排他鎖,不要提交
SQL> update test set a=1000000 where a=103;
--開啟另一個視窗,線上重建索引
SQL> alter index idx_a1 rebuild online;
……
發現此命令長時間不能執行完,沒關係,大表的索引重建是需要大量時間的,查下等待和鎖吧。
sys@ORCL_SQL> select sid,event,P1TEXT,state from v$session_wait where event not in ('SQL*Net message from client');
SID EVENT P1TEXT STATE
------- ------------------------------ ------------------------------ --------------------------------------
139 enq: TM - contention name|mode WAITING
149 Streams AQ: waiting for time m WAITING
anagement or cleanup tasks
151 Streams AQ: qmn coordinator id WAITING
le wait
155 rdbms ipc message timeout WAITING
sys@ORCL_SQL> select * from v$lock where block>0;
ADDR KADDR SID TYPE ID1 ID2 Lock Held Lock Req. CTIME BLOCK
-------- -------- ------- ---- -------- -------- ---------- ---------- ---------- ----------
315C4134 315C414C 154 TM 52543 0 ########## ########## 24 1
看到木有,重建根本就沒有進行,而是在等待sid154,這個sid就是我們開始的update的命令,看來rebuild online是不會阻礙dml操作,但是在它之前的dml操作它會去等待,知道資源釋放,如果這時有個大事務一直不十分資源,那就恭喜了,你的重建就會hang在這裡。慢著,這裡hang住了對應用不會有影響吧,想當然是,馬上測試
又開啟個新連線
SQL> update test set a=1000000 where a=102;
……
1分鐘過去了,此命令沒有成功,真的hang住了,整個表被鎖住了?
再次檢查鎖
WAITING_SESSION LOCK_TYPE MODE_REQUESTED MODE_HELD LOCK_ID1 LOCK_ID2
----------------- ----------------- ---------------------------- ---------------------------- ----------------- -----------------
154 None
139 DML Share Row-X (SX) 52543 0
142 DML Row-X (SX) Row-X (SX) 52543 0
這裡解釋一下,154是我們最開始update的sid,139是rebuild online的sid,而142是我們剛才update的sid,看來它們都在等sid154,也就是說,這個時候我不能對此表做dml操作了,也就是對於此操作非常頻繁的表來說,應用已經hang住了!!
看來rebuild online之前一定要檢查下系統,其實是做所有操作之前都要這樣做,確認系統沒問題後再做操作。
在貼下鎖的具體情況,大家看rebuild操作的139有幾個行排他鎖,我懷疑是它需要先把所有的資料都用排他鎖鎖住,然後其它dml的操作它會記入輔助日誌,而我這裡的情況是有個行鎖一直存在,就造成了它把其他行都鎖住了,但一直等待最開始的那個行鎖釋放,這個時候如果我開始那個鎖不釋放,系統也就掛在這裡了!!
SID SERIAL# USERNAME Term Resource Type TAB OWNER ID1 ID2 Lock Held Lock Req. CTIME
------- ------- ---------- ------ -------------------- ---------- -------- -------- -------- -------------- -------------- ----------
139 11 ORACLE pts/1 DLSystem TEST ORACLE 52543 0 Row Exclusive 2958
139 11 ORACLE pts/1 DLSystem TEST ORACLE 52543 0 Row Exclusive 2958
154 9 ORACLE pts/2 TM - DML Enqueue TEST ORACLE 52543 0 Row Exclusive 2967
139 11 ORACLE pts/1 TM - DML Enqueue TEST ORACLE 52543 0 Row Share Share 2958
139 11 ORACLE pts/1 TM - DML Enqueue SYS_JOURNA ORACLE 52558 0 Share 2957
L_52544
142 33 ORACLE pts/4 TM - DML Enqueue TEST ORACLE 52543 0 Row Excl 2844
sys@ORCL_SQL> select sql_text from v$sqltext where hash_value=&hash_value order by piece;
Enter value for hash_value: 1016442092
SQL_TEXT
alter index idx_a1 rebuild online
原來是客戶在rac的其中一個節點進行index rebuild online操作,結果命令發出去不久應用就hang住了,沒有辦法的客戶就直接用了大絕招:把操作的節點資料庫重啟了,現在的問題是shutdown immediate也hang住了。聽到這裡,馬上喊他看日誌,回覆日誌沒報錯,鬱悶的shutdown,如果有大事務在執行,豈不是要rollback死,由於此應用級別很高,等不起,只能死馬當活馬醫了。不管了,先讓它關了在說,先直接作業系統殺客戶端程式,使資源能更快的釋放。於是,指導輸入命令ps –ef|grep oracle|grep LOCAL=NO|grep –v grep|awk ‘{awk print $2}’|xargs kill -9。系統終於關閉了,直接startup……
等我到現場時,系統居然還沒啟動起來,難道真的有大事務需要恢復,難道就是重建索引?
登入伺服器檢視,日誌顯示資料庫正在recover,想起經常的recover幾個小時的情況,我再次對貿然的shutdown很是鬱悶。不過好在是rac,可客戶反應即使連線到好的節點還是不能使用,登入檢視,檢查鎖,檢查等待。發現存在大量的鎖,而且有一個鎖已經2個月了,恰恰就是重建索引的表,明顯這個鎖有問題,直接kill。等一會,另一個節點啟動好了,問應用的情況,也恢復了,就這麼簡單?就是那個鎖的問題?我暈,守一會業務後我就撤退了。
在路上我就在想,rebuild index online怎麼會把整個應用hang死,難道它鎖了整個表,怎麼會呢,在印象中online的同時是可以進行dml操作的啊,奇了怪了,先睡一覺,空了再來重現故障分析……
N天過後……
先上一篇文章,關於rebuild和rebuild online的區別metalink Note:272762.1
大家耐心看完喲,原理是很重要的……
========
- Online Index rebuild takes a long time.
- ONLINE INDEX REBUILD SCANS THE BASE TABLE AND NOT THE INDEX
Symptoms:
=========
Performance issues while rebuilding very large indexes.
- The offline rebuilds of their index is relatively quick -finishes in 15 minutes.
- Issuing index rebuild ONLINE statement => finishes in about an hour.
- This behavior. of ONLINE index rebuilds makes it a non-option for large tables
as it just takes too long to scan the table to rebuild the index. The offline may not be feasible due to due to the 24/7 nature of the database.
- This may be a loss of functionality for such situations.
- If we attempt to simultaneously ONLINE rebuild the same indexes we may encounter hanging behavior. indefinitely (or more than 6 hours).
DIAGNOSTIC ANALYSIS:
--------------------
We can trace the sessions rebuilding the indexes with 10046 level 12.
Comparing the IO reads for the index-rebuild and the index-rebuild-online reveals the following:
-ONLINE index rebuilds
It scans the base table and it doesn't scan the blocks of the index.
-OFFLINE index rebuilds
It scans the index for the build operation.
- This behaviour is across all versions.
Cause
Cause/Explanation
=============
When you rebuild index online,
- it will do a full tablescan on the base table.
- At the same time it will maintain a journal table for DML data, which has
changed during this index rebuilding operation.
So it should take longer time, specially if you do lots of DML on the same table,while rebuilding index online.
On the other hand, while rebuilding the index without online option, Oracle will grab the index in X-mode and rebuild a new index segment by selecting the data from the old index. So here we are
- not allowing any DML on the table hence there is no journal table involved
- and it is doing an index scan
Hence it will be pretty fast.
Fix
Solution/Conclusion:
===========
- The ONLINE index rebuild reads the base table, and this is by design.
- Rebuilding index ONLINE is pretty slow.
- Rebuilding index offline is very fast, but it prevents any DML on the base table.
好,我們現在明白了,rebuild online是可以同時進行dml操作的,但是online會維護一個操作日誌,會使rebuild時間大幅加長。巴拉巴拉……這一切的一切,跟我遇到的應用掛起沒有任何關係啊,難道是應用騙了我,其實應用可以用?還是模擬一下當時的場景再說。
Action:
Conn oracle/oracle
--建表
create table test(a int,b varchar2(64));
--插入資料
begin
for i in 1..1000000 loop
insert into test values(i,'ok');
if i=100 then
commit;
end if;
end loop;
end;
/
--建立索引
create index idx_a1 on test(a);
準備工作做好了……
--更新其中一條記錄,製造一個排他鎖,不要提交
SQL> update test set a=1000000 where a=103;
--開啟另一個視窗,線上重建索引
SQL> alter index idx_a1 rebuild online;
……
發現此命令長時間不能執行完,沒關係,大表的索引重建是需要大量時間的,查下等待和鎖吧。
sys@ORCL_SQL> select sid,event,P1TEXT,state from v$session_wait where event not in ('SQL*Net message from client');
SID EVENT P1TEXT STATE
------- ------------------------------ ------------------------------ --------------------------------------
139 enq: TM - contention name|mode WAITING
149 Streams AQ: waiting for time m WAITING
anagement or cleanup tasks
151 Streams AQ: qmn coordinator id WAITING
le wait
155 rdbms ipc message timeout WAITING
sys@ORCL_SQL> select * from v$lock where block>0;
ADDR KADDR SID TYPE ID1 ID2 Lock Held Lock Req. CTIME BLOCK
-------- -------- ------- ---- -------- -------- ---------- ---------- ---------- ----------
315C4134 315C414C 154 TM 52543 0 ########## ########## 24 1
看到木有,重建根本就沒有進行,而是在等待sid154,這個sid就是我們開始的update的命令,看來rebuild online是不會阻礙dml操作,但是在它之前的dml操作它會去等待,知道資源釋放,如果這時有個大事務一直不十分資源,那就恭喜了,你的重建就會hang在這裡。慢著,這裡hang住了對應用不會有影響吧,想當然是,馬上測試
又開啟個新連線
SQL> update test set a=1000000 where a=102;
……
1分鐘過去了,此命令沒有成功,真的hang住了,整個表被鎖住了?
再次檢查鎖
WAITING_SESSION LOCK_TYPE MODE_REQUESTED MODE_HELD LOCK_ID1 LOCK_ID2
----------------- ----------------- ---------------------------- ---------------------------- ----------------- -----------------
154 None
139 DML Share Row-X (SX) 52543 0
142 DML Row-X (SX) Row-X (SX) 52543 0
這裡解釋一下,154是我們最開始update的sid,139是rebuild online的sid,而142是我們剛才update的sid,看來它們都在等sid154,也就是說,這個時候我不能對此表做dml操作了,也就是對於此操作非常頻繁的表來說,應用已經hang住了!!
看來rebuild online之前一定要檢查下系統,其實是做所有操作之前都要這樣做,確認系統沒問題後再做操作。
在貼下鎖的具體情況,大家看rebuild操作的139有幾個行排他鎖,我懷疑是它需要先把所有的資料都用排他鎖鎖住,然後其它dml的操作它會記入輔助日誌,而我這裡的情況是有個行鎖一直存在,就造成了它把其他行都鎖住了,但一直等待最開始的那個行鎖釋放,這個時候如果我開始那個鎖不釋放,系統也就掛在這裡了!!
SID SERIAL# USERNAME Term Resource Type TAB OWNER ID1 ID2 Lock Held Lock Req. CTIME
------- ------- ---------- ------ -------------------- ---------- -------- -------- -------- -------------- -------------- ----------
139 11 ORACLE pts/1 DLSystem TEST ORACLE 52543 0 Row Exclusive 2958
139 11 ORACLE pts/1 DLSystem TEST ORACLE 52543 0 Row Exclusive 2958
154 9 ORACLE pts/2 TM - DML Enqueue TEST ORACLE 52543 0 Row Exclusive 2967
139 11 ORACLE pts/1 TM - DML Enqueue TEST ORACLE 52543 0 Row Share Share 2958
139 11 ORACLE pts/1 TM - DML Enqueue SYS_JOURNA ORACLE 52558 0 Share 2957
L_52544
142 33 ORACLE pts/4 TM - DML Enqueue TEST ORACLE 52543 0 Row Excl 2844
sys@ORCL_SQL> select sql_text from v$sqltext where hash_value=&hash_value order by piece;
Enter value for hash_value: 1016442092
SQL_TEXT
alter index idx_a1 rebuild online
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29254281/viewspace-1075734/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- alter index rebuild 與 rebuild onlineIndexRebuild
- alter index rebuild和rebuild online的區別IndexRebuild
- alter index ind1 rebuild 和alter index ind1 rebuild onlineIndexRebuild
- alter index ... rebuild online的機制(zt)IndexRebuild
- alter index rebuild與index_statsIndexRebuild
- Index Online RebuildIndexRebuild
- "Alter index rebuild online parallel n"會修改索引的degree屬性IndexRebuildParallel索引
- oracle10g_alter index rebuild_online_重構索引OracleIndexRebuild索引
- Oracle alter index rebuild 說明OracleIndexRebuild
- create index online 與rebuild index onlineIndexRebuild
- create index online 與rebuild index onlineIndexRebuild
- index rebuild online的問題IndexRebuild
- 聊聊索引Index Rebuild和Rebuild Online(下)索引IndexRebuild
- 聊聊索引Index Rebuild和Rebuild Online(上)索引IndexRebuild
- 【羅玄】從鎖的角度看rebuild index online和rebuild indexRebuildIndex
- rebuild index online和create index online及沒有online的區別RebuildIndex
- rebuild index online的鎖機制淺析RebuildIndex
- 測試index online rebuild故障記錄IndexRebuild
- “rebuild index online hang住" 問題解析RebuildIndex
- 關於rebuild index online 及drop index後重建問題RebuildIndex
- Online rebuild index遭遇ORA-08104RebuildIndex
- 重建索引index rebuild online vs offline vs index coalesce vs index shrik space索引IndexRebuild
- rebuild indexRebuildIndex
- index rebuildIndexRebuild
- .Net版本引發的血案
- (轉)Index Rebuild Online 過程(9i)完整版IndexRebuild
- rebuild index 排序RebuildIndex排序
- sybase rebuild indexRebuildIndex
- ORACLE中index的rebuildOracleIndexRebuild
- rebuild與rebuild online效率比對Rebuild
- RestTemplate超時引發的血案REST
- JDBC亂碼引發的"血案"JDBC
- SwipeRefreshLayout 引發的一場血案
- best practice of rebuild your partition table local index online by using: "update indexes"RebuildIndex
- Index rebuild --case 1IndexRebuild
- 索引rebuild和rebuild online時要慎重索引Rebuild
- vue watch陣列引發的血案Vue陣列
- _nop_()函式引發的血案函式