【RAC】PMON: terminating the instance due to error 481
Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.2.0 and later [Release: 11.2 and later ]
Information in this document applies to any platform.
Symptoms
On 11.2.0.2+ cluster, instance is running on one node, startup instance on the other node(s) fails with:
PMON (ospid: 487580): terminating the instance due to error 481
If ASM is used, +ASMn alert log shows:
Sat Oct 01 19:19:38 2011
MMNL started with pid=21, OS id=6488362
lmon registered with NM - instance number 2 (internal mem no 1)
Sat Oct 01 19:21:37 2011
PMON (ospid: 4915562): terminating the instance due to error 481
Sat Oct 01 19:21:37 2011
System state dump requested by (instance=2, sid=4915562 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_diag_4915388.trc
Dumping diagnostic data in directory=[cdmp_20111001192138], requested by (instance=2, sid=4915562 (PMON)), summary=[abnormal instance termination].
Sat Oct 01 19:21:38 2011
License high water mark = 1
Instance terminated by PMON, pid = 4915562
+ASMn_diag_xxx.trc trace shows:
*** 2011-10-01 19:19:37.526
Reconfiguration starts [incarn=0]
*** 2011-10-01 19:19:37.526
I'm the voting node
Group reconfiguration cleanup
kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).
kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).
...... << repeated messages
If ASM is not used, then DB instance could fail with the same error:
Mon Jul 04 16:22:50 2011
Starting ORACLE instance (normal)
...
Mon Jul 04 16:22:54 2011
MMNL started with pid=24, OS id=667660
starting up 1 shared server(s) ...
lmon registered with NM - instance number 2 (internal mem no 1)
Mon Jul 04 16:26:15 2011
PMON (ospid: 487580): terminating the instance due to error 481
lmon trace shows:
*** 2011-07-04 16:22:59.852
=====================================================
kjxgmpoll: CGS state (0 1) start 0x4e11785e cur 0x4e117863 rcfgtm 5 sec
...
*** 2011-07-04 16:26:14.248
=====================================================
kjxgmpoll: CGS state (0 1) start 0x4e11785e cur 0x4e117926 rcfgtm 200 sec
dia0 trace shows:
*** 2011-07-04 16:22:53.414
Reconfiguration starts [incarn=0]
*** 2011-07-04 16:22:53.414
I'm the voting node
Group reconfiguration cleanup
kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).
...<< repeated message
Changes
This could happen during patching or after node reboot.
Cause
The problem is caused by HAIP is not ONLINE on either the running node or the problem node(s).
Basically the ASM or DB instance(s) can not startup if they use a different cluster_interconnect than the running instance.
With HAIP ONLINE, all instances (DB and ASM) should use HAIP IP address: 169.254.x.x.
If on any node HAIP is OFFLINE, the ASM and DB instance will use the native private network address which causes communication problem with the instance using HAIP.
Use the following commands to verify HAIP status, as grid user:
$ crsctl stat res -t -init
check for resource ora.cluster_interconnect.haip status.
In this example, HAIP is OFFLINE on the running node 1, hence +ASM1 is using 10.1.1.1 as cluster_interconnect, while on node 2 HAIP is ONLINE, +ASM2 is using HAIP 169.254.239.144 as cluster_interconnect, causing communication problem between them and +ASM2 can not startup.
alert_+ASM1.log shows:
Cluster communication is configured to use the following interface(s) for this instance
10.1.1.1
alert_+ASM2.log shows:
Cluster communication is configured to use the following interface(s) for this instance
169.254.239.144
Solution
The solution is to start HAIP on all nodes before start ASM or DB instance by either restart HAIP resource or restart the GI stack.
For this example, +ASM1 was started first with HAIP OFFLINE:
1. Try to start HAIP manually on node 1
as grid user:
$ crsctl start res ora.cluster_interconnect.haip -init
To verify:
$ crsctl stat res -t -init
2. If this succeeds, then restart ora.asm resource (note, this will bring down all dependent diskgroup resource and db resource):
as root user:
# crsctl stop res ora.crsd -init
# crsctl stop res ora.asm -init -f
# crsctl start res ora.asm -init
# crsctl start res ora.crsd -init
startup any dependent resource as necessary
3. If above does not help, try to restart the GI stack on node 1, see if HAIP can be ONLINE after that.
As root user:
# crsctl stop crs
# crsctl start crs
Check $GRID_HOME/log//agent/ohasd/orarootagent_root/orarootagent_root.log for any HAIP error.
4. Once HAIP is ONLINE on node 1, proceed to start ASM on the rest of cluster nodes and ensure HAIP are ONLINE on all nodes.
$ crsctl start res ora.asm -init
ASM or DB instances should be able to start on all nodes after above.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/22664653/viewspace-711805/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481AIError
- terminating the instance due to error481導致ASM無法啟動故障ErrorASM
- [總結]9i RAC LMON: terminating instance due to error 29702Error
- [總結]9i RAC LMON: terminating instance due to error 29702Error
- LGWR (ospid: 29534): terminating the instance due to error 4021Error
- LMON: terminating instance due to error 29702 -- ORA-29702Error
- RAC Instance Crashes During Startup Due To Error 495Error
- Oracle9.2.0.4 RAC 升級到Oracle9.2.0.7 ,LMON: terminating instance due to error 29702OracleError
- 系統記憶體不足導致oracle程式被誤殺terminating the instance due to error 822記憶體OracleError
- Close the Database by Terminating the Instance (304)Database
- DB error due to HP-UX Error:23ErrorUX
- *** Terminating app due to uncaught exception 'NSUnknownKeyException', reason: '[APPException
- error: Exited sync due to fetch errorsError
- Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'unable to deq...APPException
- RAC筆記之instance recovery筆記
- PMON failed to acquire latch, see PMON dumpAIUI
- iOS-程式錯誤導致App閃退了怎麼辦?Terminating app due to uncaught exception...iOSAPPException
- PMON failed to acquire latch, see PMON dump in alert logAIUI
- 【BUG】RAC instance eviction in oracle11.2.0.4Oracle
- RAC中刪除特定instance的sessionSession
- Top 5 Database and/or Instance Performance Issues in RAC EnvironmentDatabaseORM
- Convert a Single-Instance to RAC with ASMASM
- RESTORE DATABASE is terminating abnormally. (Microsoft SQL Server, Error: 3154)RESTDatabaseORMROSSQLServerError
- Migrate database from single instance to Oracle RACDatabaseOracle
- 'PMON failed to acquire latch, see PMON dump' in Alert Log-976714.1AIUI
- Database and/or Instance Performance Issues in RAC Environment_1373500.1DatabaseORM
- Bug 4872999 - RAC instance cannot get global enqueueENQ
- 升級pip報錯ERROR: Could not install packages due to an OSError: [WinError 5]ErrorPackage
- Error 945 Database cannot be opened due to inaccessible files or insufficient memory or disk spaceErrorDatabase
- Killed Session Are Not Cleaned By PMONSession
- Leetcode 481 Magical StringLeetCode
- oracle rconfig convert single instance to rac databaseOracleDatabase
- oracle 9i single instance convert to rac databaseOracleDatabase
- Agent Install Fails with Error 'OUI-25031' Due to Environment Settings_380507.1AIErrorUI
- Add Node/Instance Remove Node/Instance in 10gR2 11g Clusterware RAC_1332451.1REM
- Counld not connect to ASM due to following error,ora-12547:TNS:lost comactASMErrorMac
- rac one node、Single Instance HA(SIHA)、Oracle Restart的概念OracleREST
- 【RAC】Diskgroup shows offline after restart even it is mounted in ASM instanceRESTASM