AIX6調整系統時間,引發asm報錯例項當機

fudaliang1999發表於2014-02-18

環境:22.188.20.196 早上8:30左右SIT測試進行系統時間切換,機器時間由2014-9-21切換到了2014-9-30
AIX版本:6100-07-05-1228
ORACLE版本:11.2.0.3.0
現象:早上8:30左右專案組反映資料庫例項重啟了,導致應用報錯

日誌資訊:
完整日誌:

其中摘取的部分日誌資訊
ASM中alert日誌
Sat Sep 20 23:59:00 2014
Time drift detected. Please check VKTM trace file for more details.
Sun Sep 21 01:10:00 2014
Time drift detected. Please check VKTM trace file for more details.
Sun Sep 21 04:15:00 2014
Time drift detected. Please check VKTM trace file for more details.
Sun Sep 21 05:30:00 2014
Time drift detected. Please check VKTM trace file for more details.
Sun Sep 21 07:25:00 2014
Time drift detected. Please check VKTM trace file for more details.
Tue Sep 30 08:29:45 2014
WARNING: client [orau11g:orau11g] not responsive for 777586s; state=0x1. killing pid 38076510
Tue Sep 30 08:29:57 2014
Starting background process ASMB
Tue Sep 30 08:29:57 2014
ASMB started with pid=19, OS id=31129644
Tue Sep 30 08:29:57 2014
NOTE: client +ASM:+ASM registered, osid 41549842, mbr 0x0
Tue Sep 30 08:30:01 2014
NOTE: client orau11g:orau11g registered, osid 40828940, mbr 0x1
Tue Sep 30 08:35:02 2014
NOTE: ASMB process exiting due to lack of ASM file activity for 305 seconds
Tue Sep 30 08:50:00 2014
Time drift detected. Please check VKTM trace file for more details.


orau1g中alert日誌
Sun Sep 21 07:11:47 2014
Thread 1 advanced to log sequence 1367 (LGWR switch)
  Current log# 2 seq# 1367 mem# 0: +DATA1/orau11g/onlinelog/group_2.262.818505661
Sun Sep 21 07:45:00 2014
Time drift detected. Please check VKTM trace file for more details.
Tue Sep 30 08:29:45 2014
Closing scheduler window
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter
Tue Sep 30 08:29:46 2014
NOTE: ASMB terminating
Errors in file /oracle/diag/rdbms/orau11g/orau11g/trace/orau11g_asmb_40960218.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 1153 Serial number: 51
Errors in file /oracle/diag/rdbms/orau11g/orau11g/trace/orau11g_asmb_40960218.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 1153 Serial number: 51
ASMB (ospid: 40960218): terminating the instance due to error 15064
Tue Sep 30 08:29:46 2014
opiodr aborting process unknown ospid (32309392) as a result of ORA-1092
Tue Sep 30 08:29:46 2014
opiodr aborting process unknown ospid (38535384) as a result of ORA-1092
Tue Sep 30 08:29:47 2014
System state dump requested by (instance=1, osid=40960218 (ASMB)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/diag/rdbms/orau11g/orau11g/trace/orau11g_diag_41549836.trc
Tue Sep 30 08:29:47 2014
ORA-1092 : opitsk aborting process
Tue Sep 30 08:29:47 2014
ORA-1092 : opitsk aborting process
Tue Sep 30 08:29:47 2014
License high water mark = 143
Instance terminated by ASMB, pid = 40960218
USER (ospid: 36372532): terminating the instance
Instance terminated by USER, pid = 36372532
Tue Sep 30 08:29:57 2014
Adjusting the default value of parameter parallel_max_servers
from 960 to 585 due to the value of parameter processes (600)
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /oracle/product/11g/rdbms/dbs/arch
Autotune of undo retention is turned on.
IMODE=BR
ILAT =102
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.
ORACLE_HOME = /oracle/product/11g/rdbms
System name: AIX
Node name: ZHS3BB306
Release: 1
Version: 6


問題分析
可能!理論上vktm就是Oracle所有程式的時間伺服器。應該隔離系統date變化的。可能是在特定時點觸發了這個問題。於是報SR。

Oracle回覆如下:
這個問題只發生在當Oracle需要釋放一些shared pool的LRU list裡的空閒塊的時候,由於兩個chunk會使用到一個相同的duration time ,當2個塊之間釋放的時間超時的時候才會出現您的問題。
這個發生的機率確實不是100%的。
Oracle已經把這個問題定位成了一個bug : Bug 13914613 - DATABASE CRASHED DUE TO ORA-240 AND ORA-15064
如果您的時間調整的機制不能修改,為了避免這個問題您可以做以下操作來避免該問題:
1.升級資料庫版本到12.1 或者11.2.0.3.6 ,這個問題在12.1及11.2.0.3.6 上已經做了修復;
2.您可以在11.2.0.3.5的基礎上打補丁13914613 ,這個補丁您可以透過以下連結下載到:

3.我們還可以透過隱含引數來遮蔽這個問題
_enable_shared_pool_durations=false


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/15711267/viewspace-1082896/,如需轉載,請註明出處,否則將追究法律責任。

相關文章