11.2.0.3 ASM例項出現ORA-4031錯誤導致資料庫歸檔失敗

wuweilong發表於2014-03-02
環境:
平臺:RedHat EnterPrise 5.8 X86_X64
資料庫:Oracle EnterPrise 11.2.0.3
叢集軟體:Oracle grid 11.2.0.3

故障現象:
資料庫出現了歸檔失敗,其中有一個節點的例項出現HANG死的狀況。

日誌資訊如下:

  1. Fri Feb 28 19:49:04 2014
  2. ARC1: Error 19504 Creating archive log file to \'+DATA02\'
  3. ARCH: Archival stopped, error occurred. Will continue retrying
  4. ORACLE Instance orcl1 - Archival Error
  5. ORA-16038: log 14 sequence# 68244 cannot be archived
  6. ORA-19504: failed to create file \"\"
  7. ORA-00312: online log 14 thread 1: \'+DATA02/orcl/onlinelog/group_14.264.792274883\'
  8. ORA-00312: online log 14 thread 1: \'+DATA02/orcl/onlinelog/group_14.265.792274889\'
  9. Archiver process freed from errors. No longer stopped
  10. Fri Feb 28 19:50:22 2014
  11. ARC0: LGWR is actively archiving destination LOG_ARCHIVE_DEST_3
  12. ARCH: Archival stopped, error occurred. Will continue retrying
  13. ORACLE Instance orcl1 - Archival Error
  14. ORA-16014: log 14 sequence# 68244 not archived, no available destinations
  15. ORA-00312: online log 14 thread 1: \'+DATA02/orcl/onlinelog/group_14.264.792274883\'
  16. ORA-00312: online log 14 thread 1: \'+DATA02/orcl/onlinelog/group_14.265.792274889\'
  17. ARC0: Archive log rejected (thread 1 sequence 68240) at host \'orclsh\'
  18. FAL[server, ARC0]: FAL archive failed, see trace file.
  19. ARCH: FAL archive failed. Archiver continuing
  20. ORACLE Instance orcl1 - Archival Error. Archiver continuing.
分析:
       由於歸檔失敗發生在ASM磁碟上,首先檢查ASM磁碟空間以及DB_RECOVERY_FILE_DEST_SIZE,ASM磁碟空間是足夠的,而且由於只有一個節點出現出現了無法歸檔的問題,也可以排除是空間不足造成的。確認兩個節點的DB_RECOVERY_FILE_DEST_SIZE引數設定都是0,基本上可以判斷問題和當前節點的ASM例項狀態不正常有關。

檢查ASM例項的錯誤資訊:
  1. Fri Feb 28 19:41:23 2014
  2. Dumping diagnostic data in directory=[cdmp_20130702164115], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165521].
  3. Fri Feb 28 19:49:19 2014
  4. Dumping diagnostic data in directory=[cdmp_20130702164845], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165522].
  5. Fri Feb 28 19:55:56 2014
  6. Dumping diagnostic data in directory=[cdmp_20130702165517], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165523].
當前節點ASM例項出現了的這個資訊,說明報錯發生在例項2上:
  1. Fri Feb 28 18:34:25 2014
  2. Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_2032294.trc (incident=186256):
  3. ORA-04031: unable to allocate 3768 bytes of shared memory (\"shared pool\",\"unknown object\",\"sga heap(1,0)\",\"ges enqueues\")
  4. Use ADRCI or Support Workbench to package the incident.
  5. See Note 411.1 at My Oracle Support for error and packaging details.
  6. Insufficient shared pool to allocate a GES object (ospid 2032294)
  7. Fri Feb 28 18:29:53 2014
  8. Sweep [inc][186256]: completed
  9. Fri Feb 28 18:36:49 2014
  10. Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_2032294.trc (incident=186257):
  11. ORA-04031: unable to allocate 3768 bytes of shared memory (\"shared pool\",\"unknown object\",\"sga heap(1,0)\",\"ges enqueues\")
  12. Use ADRCI or Support Workbench to package the incident.
  13. See Note 411.1 at My Oracle Support for error and packaging details.
  14. Insufficient shared pool to allocate a GES object (ospid 2032294)
果然例項2上的ASM出現了大量ORA-4031錯誤。檢查ASM啟動的引數配置:

  1. Fri Feb 28 20:06:55 2012
  2. NOTE: No asm libraries found in the system
  3. ERROR: -5(Duplicate disk DATA_DG01:ASM_DISK1)
  4. ERROR: -5(Duplicate disk DATA_DG01:ASM_DISK2)
  5. MEMORY_TARGET defaulting to 411041792.
  6. * instance_number obtained from CSS = 2, checking for the existence of node 0...
  7. * node 0 does not exist. instance_number = 2
  8. Starting ORACLE instance (normal)
  9. LICENSE_MAX_SESSION = 0
  10. LICENSE_SESSIONS_WARNING = 0
  11. Private Interface \'en1\' configured from GPnP for use as a private interconnect.
  12. [name=\'en1\', type=1, ip=169.254.78.6, mac=00-1a-64-bb-50-7d, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
  13. Public Interface \'en0\' configured from GPnP for use as a public interface.
  14. [name=\'en0\', type=1, ip=10.1.16.35, mac=00-1a-64-bb-50-7c, net=10.1.16.32/27, mask=255.255.255.224, use=public/1]
  15. Picked latch-free SCN scheme 3
  16. Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/11.2.0.3/grid/dbs/arch
  17. Autotune of undo retention is turned on.
  18. LICENSE_MAX_USERS = 0
  19. SYS auditing is disabled
  20. NOTE: Volume support enabled
  21. Starting up:
  22. Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
  23. With the Real Application Clusters and Automatic Storage Management options.
  24. ORACLE_HOME = /u01/app/11.2.0.3/grid
  25. System name: AIX
  26. Node name: orcldb2
  27. Release: 1
  28. Version: 6
  29. Machine: 00C94E064C00
  30. Using parameter settings in server-side pfile /u01/app/11.2.0.3/grid/dbs/init+ASM2.ora
  31. System parameters with non-default values:
  32. large_pool_size = 12M
  33. instance_type = \"asm\"
  34. remote_login_passwordfile= \"EXCLUSIVE\"
  35. asm_diskstring = \"/dev/ocr_*\"
  36. asm_diskstring = \"/dev/voting_*\"
  37. asm_diskstring = \"/dev/asm_*\"
  38. asm_diskgroups = \"DATA\"
  39. asm_diskgroups = \"DATA_DG01\"
  40. asm_diskgroups = \"SPFILE_DG\"
  41. asm_power_limit = 1
  42. diagnostic_dest = \"/u01/app/grid\"
  43. Cluster communication is configured to use the following interface(s) for this instance
  44. 169.254.78.6
  45. cluster interconnect IPC version:Oracle UDP/IP (generic)
  46. IPC Vendor 1 proto 2
調整及建議:
        當前ASM例項使用預設的MEMORY_TARGET配置,分配大小大約是400M,根據Oracle的MOS文章:ASM & Shared Pool (ORA-4031) [ID 437924.1],在11.2.0.3中,Oracle增加了ASM例項所允許的預設程式數PROCESSES,但是預設的MEMORY_TARGET引數沒有增加。
         根據Oracle的建議,11.2.0.3的MEMORY_TARGET至少應該設定到1536M,而MEMORY_MAX_TARGET設定為4096M。
  1. SQL> alter system set memory_max_target=4096m scope=spfile;

  2. SQL> alter system set memory_target=1536m scope=spfile;
      對於當前的情況,如果短時間內無法重啟DB和ASM例項,可以在問題節點配置一個本地歸檔路徑,設定目標路徑為本地磁碟,從而避免歸檔無法完成而導致的例項HANG死。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/20674423/viewspace-1098176/,如需轉載,請註明出處,否則將追究法律責任。

相關文章