MMMNL absent錯誤 ,資料庫無法連線

kewin發表於2011-08-24
MMMNL absent錯誤 ,資料庫無法連線
2011-8-24
Kevin Zou
大運結束後回公司上班,美國同事告訴我,一個DB 異常無法連線上,只好shutdown abort,重啟資料庫。 需要去找下root cause.
我登入到系統,檢查日誌檢視原因。

這個系統一般每個小時都會有歸檔日誌的產生,但是在當地時間14-16之間是沒有日誌的。
SQL> select to_char(FIRST_TIME,'YYYY-MM-DD HH24') , count(*)
 from v$archived_log
 group by  to_char(FIRST_TIME,'YYYY-MM-DD HH24')
我擷取了部分的記錄
2011-08-23 00          4
2011-08-23 01          2
2011-08-23 02          3
2011-08-23 03          2
2011-08-23 04          2
2011-08-23 05          2
2011-08-23 06          2
2011-08-23 07          2
2011-08-23 08          2
2011-08-23 09          3
2011-08-23 10          2
2011-08-23 11          1
2011-08-23 12          3
2011-08-23 13          2
2011-08-23 14          4
2011-08-23 16          2
2011-08-23 17          5
2011-08-23 18          2
2011-08-23 19          2
2011-08-23 20          2
2011-08-23 21          2
2011-08-23 22          1


2011-08-23 14          4
2011-08-23 16          2
之間是沒有2011-08-23 15 的記錄。說明在這段時間內資料庫處於HUNG 狀態,使用者無法登陸,系統也不能做任何事情。
Tue Aug 23 14:12:25 2011
Thread 1 advanced to log sequence 26547 (LGWR switch)
  Current log# 1 seq# 26547 mem# 0: /diskfdcprodb03/oradata1/oracle/timsdb/redo01/redo1a.log
  Current log# 1 seq# 26547 mem# 1: /diskfdcprodb04/oradata2/oracle/timsdb/redo02/redo1b.log
Tue Aug 23 16:29:31 2011
Shutting down instance: further logons disabled
Tue Aug 23 16:29:31 2011
MMNL absent for 7795 secs; Foregrounds taking over
Tue Aug 23 16:29:31 2011
WARNING: inbound connection timed out (ORA-3136)
Tue Aug 23 16:29:33 2011
ksvcreate: Process(q002) creation failed
Tue Aug 23 16:29:33 2011
ksvcreate: Process(m000) creation failed
Tue Aug 23 16:29:34 2011
Thread 1 advanced to log sequence 26548 (LGWR switch)
  Current log# 2 seq# 26548 mem# 0: /diskfdcprodb03/oradata1/oracle/timsdb/redo01/redo2a.log
  Current log# 2 seq# 26548 mem# 1: /diskfdcprodb04/oradata2/oracle/timsdb/redo02/redo2b.log
Tue Aug 23 16:30:33 2011
ksvcreate: Process(m000) creation failed
ksvcreate: Process(m000) creation failed
Tue Aug 23 16:29:34 2011
Thread 1 advanced to log sequence 26548 (LGWR switch)
  Current log# 2 seq# 26548 mem# 0: /diskfdcprodb03/oradata1/oracle/timsdb/redo01/redo2a.log
  Current log# 2 seq# 26548 mem# 1: /diskfdcprodb04/oradata2/oracle/timsdb/redo02/redo2b.log
Tue Aug 23 16:30:33 2011
ksvcreate: Process(m000) creation failed
Tue Aug 23 16:31:33 2011
ksvcreate: Process(m000) creation failed
Tue Aug 23 16:31:36 2011
Stopping background process QMNC
Tue Aug 23 16:31:38 2011
Stopping background process MMNL
Tue Aug 23 16:31:39 2011
Stopping background process MMON
Tue Aug 23 16:31:40 2011
Shutting down instance (immediate)
License high water mark = 164
Tue Aug 23 16:36:43 2011
SHUTDOWN: Active processes prevent shutdown operation
Tue Aug 23 16:38:37 2011
ALTER DATABASE CLOSE NORMAL
Tue Aug 23 16:38:37 2011
SMON: disabling tx recovery
SMON: disabling cache recovery
Tue Aug 23 16:38:43 2011
Shutting down archive processes
Archiving is disabled
Tue Aug 23 16:38:48 2011
ARCH shutting down
ARC3: Archival stopped


可能的原因:
1) The Memory Monitor Light (MMNL) process is a new process in 10g which works with the Automatic Workload Repository new features (AWR) to write out full statistics buffers to disk as needed. These messages can be generated if you have the database in restricted mode. 

2) These messages are for informational purposes only to indicate a hang situation which lasted a long period of time and MMNL is unable to run its job at that moment. When the hang lasts for several hours before the instance is completely stopped/killed, the tail of the alert.log may flood with these messages. Although the message text contains MMNL process, in this case MMNL is actually the victim of the hang issue rather than the cause. 


我覺的這個資料庫應該是第二種原因,資料庫由於HANG住了,MMNL程式無法在後臺執行;當這種HANG的狀態一直到系統完全被停止前有好幾個小時,Oracle會在alert.log的尾端加上MMNL ABSENT 的一句話。 雖然這句話包括了MMNL程式,實際上是ORACLE程式HANG住的證明,而不僅僅是MMNL程式。
MMNL的這段話只證明了ORACLE 程式確實是被HANG啦,但到底是啥原因導致的呢?
我找到了一個BUG 的描述:
Bug 5470031: INCORRECT MAXSIZE ON UNDO DATAFILE LEADS TO "MMNL ABSENT FOR XX SECS; FOREGROUND。 這個BUG還是處於開發/解決狀態。
就是當設定了AUTOEXTEND 為 ON,MAXSIZE小於真實的SIZE大小時,系統就會報這個錯誤。

An incorrect MAXSIZE for the undo-datafile(s) leads to MMNL messages in the 
alert.log

I think there are 2 issues :
1. Why are you able to set a ALTER DATABASE DATAFILE ...AUTOEXTEND MAXSIZE
   smaller than the actual filesize.

2. MMNL is reporting messages in the alert.log which do not have a clear
   root-cause. At least there is no direct link to the message and the 
   strange MAXSIZE (MAXBLOCKS) value.

解決方案為關閉AUOEXNTEND,或者MAXSIZE值大於SIZE的值。

我的環境:
SQL> select bytes   , maxbytes  ,file_name , AUTOEXTENSIBLE
  2  from dba_data_files
  3  where file_name like '%undo%';

          BYTES        MAXBYTES FILE_NAME                                          AUT
--------------- --------------- -------------------------------------------------- ---
     1073741824      1073741824 /diskfdcprodb03/oradata1/oracle/timsdb/data00/dbf/ YES
                                undotbs_01.dbf

     1073741824               0 /diskfdcprodb06/oradata1/oracle/timsdb/idx01/dbf/u NO
                                ndotbs_02.dbf


一個檔案沒有開啟AUTOEXEND,另外一個檔案是AUTOEXEND 為ON,難道 MAXSIZE等於SIZE的大小也會出錯?
取消自動增長選項,觀察是否再次出現類似情況
-END-

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/40239/viewspace-705720/,如需轉載,請註明出處,否則將追究法律責任。

相關文章