oracle 11.2.0.4 rac節點異常當機之ORA-07445
環境資訊:
作業系統版本:HP-UX B.11.31 U ia64 3938805652 unlimited-user license
資料庫版本:oracle 11.2.0.3 rac
資料庫異常當機告警日誌:
Fri May 18 10:37:46 2018
Archived Log entry 511495 added for thread 1 sequence 260260 ID 0xf640cdf dest 1:
Fri May 18 10:52:29 2018
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0xFFFFFFFF00000094] [PC:0x400000000D060550, kxfpnid()+1728] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/**/**1/trace/**1_lmon_8174.trc (incident=4200673):
ORA-07445: exception encountered: core dump [kxfpnid()+1728] [SIGSEGV] [ADDR:0xFFFFFFFF00000094] [PC:0x400000000D060550] [Address not mapped to object] []
Incident details in: /u01/app/oracle/diag/rdbms/**/**1/incident/incdir_4200673/**1_lmon_8174_i4200673.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fri May 18 10:52:31 2018
Dumping diagnostic data in directory=[cdmp_20180518105231], requested by (instance=1, osid=8174 (LMON)), summary=[incident=4200673].
Fri May 18 10:52:34 2018
PMON (ospid: 8152): terminating the instance due to error 481
Fri May 18 10:52:34 2018
opiodr aborting process unknown ospid (428) as a result of ORA-1092
Fri May 18 10:52:34 2018
ORA-1092 : opitsk aborting process
Fri May 18 10:52:34 2018
ORA-1092 : opitsk aborting process
ORA-07445相關trc檔案 /u01/app/oracle/diag/rdbms/cbsprd/cbsprd1/trace/**_lmon_8174.trc內容:
$[/home/grid] bsprd1/incident/incdir_4200673/**_lmon_8174_i4200673.trc <
Dump file /u01/app/oracle/diag/rdbms/**/**1/incident/incdir_4200673/**1_lmon_8174_i4200673.
trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
System name: HP-UX
Node name: **
Release: B.11.31
Version: U
Machine: ia64
Instance name: **
Redo thread mounted by this instance: 1
Oracle process number: 11
Unix process pid: 8174, image: oracle@**1 (LMON)
*** 2018-05-18 10:52:29.072
*** SESSION ID:(694.1) 2018-05-18 10:52:29.072
*** CLIENT ID:() 2018-05-18 10:52:29.072
*** SERVICE NAME:(SYS$BACKGROUND) 2018-05-18 10:52:29.072
*** MODULE NAME:() 2018-05-18 10:52:29.072
*** ACTION NAME:() 2018-05-18 10:52:29.072
Dump continued from file: /u01/app/oracle/diag/rdbms/**/**1/trace/**1_lmon_8174.trc
ORA-07445: exception encountered: core dump [kxfpnid()+1728] [SIGSEGV] [ADDR:0xFFFFFFFF00000094] [PC:0x
400000000D060550] [Address not mapped to object] []
========= Dump for incident 4200673 (ORA 7445 [kxfpnid()+1728]) ========
----- Beginning of Customized Incident Dump(s) -----
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0xFFFFFFFF00000094] [PC:0x400000000D06055
0, kxfpnid()+1728] [flags: 0x0, count: 1]
r1: 600000000014b480 r20: c0000001bfe1beec br5: 0
r2: c0000001bfde6000 r21: c000002ef7838670 br6: c000000000651f30
r3: 9fffffff5ffe7c00 r22: c000002f35c07b10 br7: c0000000004b53a0
r4: 0 r23: 0 ip: 400000000d060550
r5: c000000000000408 r24: 0 iipa: 0
r6: c00000000006fb60 r25: 1 cfm: ca1
r7: 9ffffffffd7f7350 r26: 5175657269657320 um: 1a
r8: ffffffff00000094 r27: 2900000000000000 rsc: 1f
$ more /u01/app/oracle/diag/rdbms/**/**1/trace/**1_lmon_8174.trc
Trace file /u01/app/oracle/diag/rdbms/**/**1/trace/**1_lmon_8174.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
System name: HP-UX
Node name: **
Release: B.11.31
Version: U
Machine: ia64
Instance name: **
Redo thread mounted by this instance: 1
Oracle process number: 11
Unix process pid: 8174, image: oracle@**1 (LMON)
*** 2018-05-14 04:12:05.408
*** SESSION ID:(694.1) 2018-05-14 04:12:05.408
*** CLIENT ID:() 2018-05-14 04:12:05.408
*** SERVICE NAME:(SYS$BACKGROUND) 2018-05-14 04:12:05.408
*** MODULE NAME:() 2018-05-14 04:12:05.408
*** ACTION NAME:() 2018-05-14 04:12:05.408
*** TRACE FILE RECREATED AFTER BEING REMOVED ***
kjfc_TaskScheduler_Execute_wTime: timer wraps at 0xffffff44 max 0xffffffd6
*** 2018-05-18 10:52:28.415
kjxggpoll: change db group poll time to 50 ms
*** 2018-05-18 10:52:28.495
kjxgmpoll reconfig instance map: 1
*** 2018-05-18 10:52:28.495
kjxgmrcfg: Reconfiguration started, type 1
CGS/IMR TIMEOUTS:
CSS recovery timeout = 31 sec (Total CSS waittime = 65)
IMR Reconfig timeout = 75 sec
CGS rcfg timeout = 85 sec
kjxgmcs: Setting state to 26 0.
*** 2018-05-18 10:52:28.514
Name Service frozen
kjxgmcs: Setting state to 26 1.
kjxgrdecidever: No old version members in the cluster
kjxgrssvote: reconfig bitmap chksum 0x1a8c9 cnt 1 master 1 ret 0
kjxgrpropmsg: SSMEMI: inst 1 - no disk vote
kjxgrpropmsg: SSVOTE: Master indicates no Disk Voting
2018-05-18 10:52:28.514499 : kjxgrDiskVote: nonblocking method is chosen
kjxgrDiskVote: Only one inst in the cluster - no disk vote
2018-05-18 10:52:28.686421 : kjxgrDiskVote: Obtained RR update lock for sequence 27, RR seq 26
2018-05-18 10:52:28.814554 : kjxgrDiskVote: derive membership from CSS (no disk votes)
2018-05-18 10:52:28.814589 : proposed membership: 1
2018-05-18 10:52:28.875275 : kjxgrDiskVote: new membership is updated by inst 1, seq 28
2018-05-18 10:52:28.875300 : kjxgrDiskVote: bitmap: 1
CGS/IMR TIMEOUTS:
CSS recovery timeout = 31 sec (Total CSS waittime = 65)
IMR Reconfig timeout = 75 sec
CGS rcfg timeout = 85 sec
kjxgmmeminfo: can not invalidate inst 2
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 28 2.
kjfmSendAbortInstMsg: send an abort message to instance 2
kjfmuin: inst bitmap 1
kjfmmhi: received msg from inst 1 (inc 22)
Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 28 3.
Name Service recovery started
Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 28 4.
Multicasted all local name entries for publish
Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 28 5.
Name Service normal
Name Service recovery done
*** 2018-05-18 10:52:29.026
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 28 6.
kjxgmcs: total reconfig time 0.508 seconds (from 58968544 to 58969052) (old dlminc 26, new dlminc 28)
kjxggpoll: change db group poll time to 600 ms
kjfmact: call ksimdic on instance (2)
*** 2018-05-18 10:52:29.026
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0xFFFFFFFF00000094] [PC:0x400000000D060550, kxfpnid()+1728] [flags: 0x0, count: 1]
Incident 4200673 created, dump file: /u01/app/oracle/diag/rdbms/**/**1/incident/incdir_4200673/**1_lmon_8174_i4200673.trc
ORA-07445: exception encountered: core dump [kxfpnid()+1728] [SIGSEGV] [ADDR:0xFFFFFFFF00000094] [PC:0x400000000D060550] [Address not mapped to object] []
ssexhd: crashing the process...
Background_Core_Dump = partial
ksdbgcra: writing core file to directory '/u01/app/oracle/diag/rdbms/**/**1/cdump'
檢視oracle官方網站,文章(文件 ID 1505057.1)與本次故障相似:
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.2 and later
Information in this document applies to any platform.
SYMPTOMS
The following symptoms were seen:
-
The following error was seen in the alert log, and then PMON terminated the instance:
ORA-07445: exception encountered: core dump [kxfpnid()+632] [SIGBUS] [ADDR:0x95] [PC:0x103AACCD8] [Invalid address alignment] [] - The value for parallel_max_servers was NOT explicitly set in the spfile or pfile.
- The LMON tracefile showed a value for parallel_max_servers > 3600 (3600 is the maximum allowed in 11.2).
-
The call stack contained the following:
kxfpnid <- ksimdic <- kjfmact <- kjfcln <- ksbrdp <- opirip <- opidrv <- sou2o <- opimai_real <- ssthrdmain
CAUSE
- RAC NODE CRASHED AFTER ORA-7445 [KXFPNID()+632] IN LMON was filed for this issue and was closed as unpublished Bug 13743987 - ASM INSTANCE TERMINATES WITH DR CHANGE IN CPU COUNT PARALLE_MAX_SERVERS. More information about unpublished Bug 13743987 is contained in the note Document 13743987.8 - Bug 13743987 - A high CPU_COUNT can cause ORA-68 for parallel_max_servers.SOLUTION
The following solutions are available:
- Apply the 11.2.0.4 Patch Set, when available.
- For an 11.2.0.3 Exadata database only, apply Bundle Patch 11.
-
Workaround:
Explicitly set the value for parallel_max_servers to a value < 3601. A reasonable value at which to start setting this parameter is (number of physical CPUs) * (parallel_threads_per_server) * 4. So, for example, if you have 4 quad-core CPUs and parallel_threads_per_cpu is set to 2, you would set your starting value at 16 * 2 * 4 = 128 per instance.
該引數值為3600以下即可。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29357786/viewspace-2154858/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- ORACLE 11.2.0.4 rac for linux 鏈路宕導致的單節點異常當機OracleLinux
- Oracle RAC某一節點異常,你該怎麼辦?Oracle
- ORACLE 11.2.0.4 RAC RMAN異機恢復之ORA-15001Oracle
- DRM特性引起的RAC節點當機
- Oracle Linux 6.7中 Oracle 11.2.0.4 RAC叢集CRS異常處理OracleLinux
- Windows 11.2.0.4 RAC安裝配置以及RAC新增節點Windows
- 模擬oracle rac節點異常時如何保持ogg正常執行Oracle
- AIX RAC9I 節點當機測試AI
- 【RAC】處理因ASM例項異常導致RAC第一節點例項異常終止故障ASM
- Oracle 11.2.0.4 從單例項,使用RMAN 異機恢復到RACOracle單例
- oracle 11g rac新增節點前之清除節點資訊Oracle
- Oracle RAC新增節點Oracle
- Oracle RAC 新增節點Oracle
- 處理rac資料庫一個節點監聽異常資料庫
- 一次RAC節點當機的解決過程
- Oracle9204 RAC 節點2當機後5小時重新啟動找不到節點1上例項Oracle
- Oracle Rac 刪除節點Oracle
- 當機導致slave異常分析
- [網摘] Oracle RAC新增節點Oracle
- 檢視oracle rac的節點Oracle
- Oracle 11.2.0.4 rac for aix acfs異常環境的克隆環境ASM磁碟組掛載緩慢OracleAIASM
- 記一次oracle 19c RAC叢集重啟單節點DB啟動異常(二)Oracle
- asm例項自動dismount導致rac一個節點當機ASM
- RAC一個節點記憶體故障當機,無法訪問記憶體
- RAC節點之間通訊問題 兩節點 11g RAC
- ORACLE RAC spfile異常處理辦法Oracle
- redhat 6.5之oracle 11.2.0.4 asm例項異常抽取asm配置資訊之amdu初識之一RedhatOracleASM
- oracle11g RAC新增節點Oracle
- Oracle10g RAC 加節點Oracle
- 【RAC】Oracle10g RAC 節點重配的方式Oracle
- Oracle資料庫異機升級(10.2.0.5 --> 11.2.0.4)Oracle資料庫
- 刪除oracle10g rac(rhel4)節點_節點Oracle
- MongoDB 異常當機與引數cacheSizeGBMongoDB
- RAC系統當中,job在哪個節點執行?
- ORACLE RAC環境下刪除節點Oracle
- ORA-07445異常報錯opixguid()+13GUI
- oracle 10g rac,刪除故障節點並新增新節點Oracle 10g
- oracle 10g rac 新增節點與刪除節點步驟Oracle 10g