11.2.0.1 Grid Infrastructure Installation Failed at .... While Running root.sh
11.2.0.1 Grid Infrastructure Installation Failed at Second Nodes While Running root.sh Due To ASM Crash Caused by lmon Timeout [ID 1239123.1]
--------------------------------------------------------------------------------
修改時間 06-JAN-2011 型別 PROBLEM 狀態 PUBLISHED
修改時間 06-JAN-2011 型別 PROBLEM 狀態 PUBLISHED
In this Document
Symptoms
Cause
Solution
Symptoms
Cause
Solution
--------------------------------------------------------------------------------
Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1 and later [Release: 11.2 and later ]
Information in this document applies to any platform.
Oracle Server - Enterprise Edition - Version: 11.2.0.1 and later [Release: 11.2 and later ]
Information in this document applies to any platform.
Symptoms
While installation Oracle Grid Infrastructure 11.2.0.1, root.sh has run successfully on the first node, but failed on the second node, indicating: The OCR location in an ASM disk group is inaccessible.
While installation Oracle Grid Infrastructure 11.2.0.1, root.sh has run successfully on the first node, but failed on the second node, indicating: The OCR location in an ASM disk group is inaccessible.
alert_nodename.log:
2010-08-26 19:16:15.416
[cssd(17484)]CRS-1605:CSSD voting file is online: /db/app/oracle/ocr_vote_n01; details in /db/app/crs
/11.2_Grid_Home/log/rmodbd03/cssd/ocssd.log.
2010-08-26 19:16:17.432
[cssd(17484)]CRS-1601:CSSD Reconfiguration complete. Active nodes are d02 d03 .
2010-08-26 19:16:19.057
[ctssd(17512)]CRS-2403:The Cluster Time Synchronization Service on host d03 is in observer mode.
2010-08-26 19:16:19.063
[ctssd(17512)]CRS-2407:The new Cluster Time Synchronization Service reference node is host d02.
2010-08-26 19:16:19.961
[ctssd(17512)]CRS-2401:The Cluster Time Synchronization Service started on host d03.
2010-08-26 19:21:22.696
[ohasd(15890)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.asm'. Details at
(:CRSPE00111:) in /db/app/crs/11.2_Grid_Home/log/rmodbd03/ohasd/ohasd.log.
2010-08-26 19:21:24.798
[crsd(19090)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /db/app/crs/11.2_Grid_Home/log/rmodbd03/crsd/crsd.log.
2010-08-26 19:21:25.427
[ohasd(15890)]CRS-2765:Resource 'ora.crsd' has failed on server 'd03'.
2010-08-26 19:21:26.523
[crsd(19119)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /db/app/crs/11.2_Grid_Home/log/rmodbd03/crsd/crsd.log.
[cssd(17484)]CRS-1605:CSSD voting file is online: /db/app/oracle/ocr_vote_n01; details in /db/app/crs
/11.2_Grid_Home/log/rmodbd03/cssd/ocssd.log.
2010-08-26 19:16:17.432
[cssd(17484)]CRS-1601:CSSD Reconfiguration complete. Active nodes are d02 d03 .
2010-08-26 19:16:19.057
[ctssd(17512)]CRS-2403:The Cluster Time Synchronization Service on host d03 is in observer mode.
2010-08-26 19:16:19.063
[ctssd(17512)]CRS-2407:The new Cluster Time Synchronization Service reference node is host d02.
2010-08-26 19:16:19.961
[ctssd(17512)]CRS-2401:The Cluster Time Synchronization Service started on host d03.
2010-08-26 19:21:22.696
[ohasd(15890)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.asm'. Details at
(:CRSPE00111:) in /db/app/crs/11.2_Grid_Home/log/rmodbd03/ohasd/ohasd.log.
2010-08-26 19:21:24.798
[crsd(19090)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /db/app/crs/11.2_Grid_Home/log/rmodbd03/crsd/crsd.log.
2010-08-26 19:21:25.427
[ohasd(15890)]CRS-2765:Resource 'ora.crsd' has failed on server 'd03'.
2010-08-26 19:21:26.523
[crsd(19119)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /db/app/crs/11.2_Grid_Home/log/rmodbd03/crsd/crsd.log.
alert_+ASM2.log shows:
Thu Aug 26 19:16:25 2010
Reconfiguration started (old inc 0, new inc 4)
ASM instance
List of instances:
1 2 (myinst: 2)
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
Thu Aug 26 19:21:57 2010
IPC Send timeout detected. Sender: ospid 17593 [oracle@rmodbd03 (PING)]
Receiver: inst 1 binc 63701371 ospid 7549
Thu Aug 26 19:22:16 2010
Received an instance abort message from instance 1
Please check instance 1 alert and LMON trace files for detail.
LMS0 (ospid: 17603): terminating the instance due to error 481
Reconfiguration started (old inc 0, new inc 4)
ASM instance
List of instances:
1 2 (myinst: 2)
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
Thu Aug 26 19:21:57 2010
IPC Send timeout detected. Sender: ospid 17593 [oracle@rmodbd03 (PING)]
Receiver: inst 1 binc 63701371 ospid 7549
Thu Aug 26 19:22:16 2010
Received an instance abort message from instance 1
Please check instance 1 alert and LMON trace files for detail.
LMS0 (ospid: 17603): terminating the instance due to error 481
The lmon trace shows:
SKGXP:[fffffd7ffcbecd28.6]:[ctx]: (ms) prev wait(ms) before
SKGXP:[fffffd7ffcbecd28.7]:[ctx]: --------- -------------- ----------- --------- -----------
SKGXP:[fffffd7ffcbecd28.8]:[ctx]: 88 0 0 NORMAL TIMEDOUT
SKGXP:[fffffd7ffcbecd28.9]:[ctx]: 80 0 0 NORMAL TIMEDOUT
SKGXP:[fffffd7ffcbecd28.10]:[ctx]: 88 0 0 NORMAL TIMEDOUT
SKGXP:[fffffd7ffcbecd28.7]:[ctx]: --------- -------------- ----------- --------- -----------
SKGXP:[fffffd7ffcbecd28.8]:[ctx]: 88 0 0 NORMAL TIMEDOUT
SKGXP:[fffffd7ffcbecd28.9]:[ctx]: 80 0 0 NORMAL TIMEDOUT
SKGXP:[fffffd7ffcbecd28.10]:[ctx]: 88 0 0 NORMAL TIMEDOUT
SKGXP:[fffffd7ffcbecd28.35]:[ctx]: admno 0x3911544a admport:
SKGXP:[fffffd7ffcbecd28.36]:[ctx]: SSKGXPT 0xfcbee024 flags SSKGXPT_LOCAL_PORT sockno 10 IP 192.168.1.78 UDP 40467
SKGXP:[fffffd7ffcbecd28.36]:[ctx]: SSKGXPT 0xfcbee024 flags SSKGXPT_LOCAL_PORT sockno 10 IP 192.168.1.78 UDP 40467
SKGXP:[fffffd7ffcbecd28.70]:[ctx]: flags=8 nreqs=1100 free_rbufs=1100 msgsz=8192 min_frag_sz_ach=8192
SKGXP:[fffffd7ffcbecd28.71]:[ctx]: OS Level Port
SKGXP:[fffffd7ffcbecd28.72]:[ctx]: SSKGXPT 0xfca36a80 flags SSKGXPT_LOCAL_PORT sockno 25 IP 192.168.1.178 UDP 40469
SKGXP:[fffffd7ffcbecd28.73]:[ctx]: OS Level Port ID
SKGXP:[fffffd7ffcbecd28.74]:[ctx]: SKGXPGPID Internet address 192.168.1.78 UDP port number 40469
SKGXP:[fffffd7ffcbecd28.317]:[obj]: SSKGXPT 0xfca2352c flags SSKGXPT_WRITE sockno 10 IP 192.168.1.162 UDP 63320
SKGXP:[fffffd7ffcbecd28.318]:[obj]: Remote data port
SKGXP:[fffffd7ffcbecd28.319]:[obj]: SSKGXPT 0xfca23598 flags SSKGXPT_WRITE sockno 10 IP 192.168.1.162 UDP 63322
SKGXP:[fffffd7ffcbecd28.320]:[obj]: next seqno 32770 last ack 32765 credits 3 total credits 8 ertt 16 resends on con 116390
SKGXP:[fffffd7ffcbecd28.71]:[ctx]: OS Level Port
SKGXP:[fffffd7ffcbecd28.72]:[ctx]: SSKGXPT 0xfca36a80 flags SSKGXPT_LOCAL_PORT sockno 25 IP 192.168.1.178 UDP 40469
SKGXP:[fffffd7ffcbecd28.73]:[ctx]: OS Level Port ID
SKGXP:[fffffd7ffcbecd28.74]:[ctx]: SKGXPGPID Internet address 192.168.1.78 UDP port number 40469
SKGXP:[fffffd7ffcbecd28.317]:[obj]: SSKGXPT 0xfca2352c flags SSKGXPT_WRITE sockno 10 IP 192.168.1.162 UDP 63320
SKGXP:[fffffd7ffcbecd28.318]:[obj]: Remote data port
SKGXP:[fffffd7ffcbecd28.319]:[obj]: SSKGXPT 0xfca23598 flags SSKGXPT_WRITE sockno 10 IP 192.168.1.162 UDP 63322
SKGXP:[fffffd7ffcbecd28.320]:[obj]: next seqno 32770 last ack 32765 credits 3 total credits 8 ertt 16 resends on con 116390
SKGXP:[fffffd7ffcbecd28.70]:[ctx]: flags=8 nreqs=1100 free_rbufs=1100 msgsz=8192 min_frag_sz_ach=8192
ICMP Time exceeded during reassembly from bd02 (192.168.1.78)
ICMP Time exceeded during reassembly from bd02 (192.168.1.78)
The package size is 8k, the timeout of which matches the ping err message:
So it's due to the package size 8k package cannot go through the network. This can be caused by the fact that the MTU size setting at NIC is appropriate for using jumbo frames but the MTU size setting is not right at the switch.
Note that this was an issue in versions prior to 11gR2, it would show as CRS hang on the second node. Since 11gR2 Grid Infrastructure includes ASM, the symptom shows as an ASM crash due to the lmon timeout.
Cause
In this case, the MTU size had been set to 9000 but the switch was not configured to be compatible with that MTU size, so the related database package could not be transferred to the remote node, causing ASM to crash on the second node (due to lmon timeout), which in turn prevented CSS from reading the OCR (from ASM).
In this case, the MTU size had been set to 9000 but the switch was not configured to be compatible with that MTU size, so the related database package could not be transferred to the remote node, causing ASM to crash on the second node (due to lmon timeout), which in turn prevented CSS from reading the OCR (from ASM).
$ ping -s 192.168.1.78 8192
-- Use 8k package size to ping the remote Note. It's failed at customer side.
Solution
The switch setting needs to be modified to accommodate the 8k package size ping for MTU size 9000. Following this change, ping command as below should succeed. Then the Grid Infrastructure installation should complete successfully.
The switch setting needs to be modified to accommodate the 8k package size ping for MTU size 9000. Following this change, ping command as below should succeed. Then the Grid Infrastructure installation should complete successfully.
$ ping -s IP 8192 (Solaris)
$ ping IP -s 8192 (Linux)
$ ping IP -s 8192 (Linux)
相關內容
--------------------------------------------------------------------------------
產品
--------------------------------------------------------------------------------
產品
--------------------------------------------------------------------------------
•Oracle Database Products > Oracle Database > Oracle Database > Oracle Server - Enterprise Edition
關鍵字
--------------------------------------------------------------------------------
TIME OUT; INFRASTRUCTURE; ASM; ROOT.SH; 11GR2; GRID; LMON
錯誤
--------------------------------------------------------------------------------
CRS-1013; CRS-2765; CRS-2401; CRS-2757; CRS-2407; CRS-1601; CRS-2403; CRS-1605; ERROR 481
關鍵字
--------------------------------------------------------------------------------
TIME OUT; INFRASTRUCTURE; ASM; ROOT.SH; 11GR2; GRID; LMON
錯誤
--------------------------------------------------------------------------------
CRS-1013; CRS-2765; CRS-2401; CRS-2757; CRS-2407; CRS-1601; CRS-2403; CRS-1605; ERROR 481
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/20674423/viewspace-721545/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 【轉】How to recover from root.sh on 11.2 Grid Infrastructure FailedASTStructAI
- Oracle 11R2 Grid Infrastructure執行root.sh指令碼rootcrs.pl execution failed的處理OracleASTStruct指令碼AI
- 11g oracle database installation with oracle grid infrastructure on linux(文件)OracleDatabaseASTStructLinux
- RAC grid 11.2.0.1執行root.sh需執行此命令(啟動crs)
- Ins-06001 During Grid Infrastructure Installation (文件 ID 1270620.1)ASTStruct
- Oracle Grid Infrastructure for a Standalone ServerOracleASTStructServer
- clone grid INfrastructure Home and clusterwareASTStruct
- Oracle Clusterware and Oracle Grid InfrastructureOracleASTStruct
- How to Troubleshoot Grid Infrastructure Startup IssuesASTStruct
- Oracle grid infrastructure 解除安裝OracleASTStruct
- DNS and DHCP Setup Example for Grid Infrastructure GNSDNSASTStruct
- 記錄下 patch Grid Infrastructure for StandaloneASTStruct
- Oracle:GRID 下 root.sh 指令碼Oracle指令碼
- Oracle Grid Infrastructure Patch Set Update 11.2.0.4.3OracleASTStruct
- Connected to an idle instance – while database is runningWhileDatabase
- Installation failed with message INSTALL_FAILED_USER_RESTRICTED錯誤AIREST
- 【Oracle】11gR2的安裝Grid執行root.sh出現ohasd failed解決方案OracleAI
- 安裝grid時如何回退root.sh
- 重新配置 11gR2 Grid InfrastructureASTStruct
- Database Creation on 11.2 Grid Infrastructure with Role SeparationDatabaseASTStruct
- root.sh Fails on the First Node for 11gR2 GI InstallationAI
- 【MOS】Top 5 Grid Infrastructure Startup Issues (文件 ID 1368382.1)ASTStruct
- Apply PSU for Grid Infrastructure Standalone and DB with Oracle RestartAPPASTStructOracleREST
- 升級Grid Infrastructure到10.2.0.2 遭遇bug 9413827ASTStruct
- 11gr2安裝Grid 時 node2 root.sh失敗Start of resource “ora.asm -init” failedASMAI
- How to Clean Up After a Failed Oracle Clusterware (CRS) InstallationAIOracle
- Fixed the bug:while running alert/confirm in javascript the chrome freezesWhileJavaScriptChrome
- How to Troubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]ASTStruct
- Troubleshoot Grid Infrastructure Startup Issues (Doc ID 1050908.1)ASTStruct
- 【GRID】Grid Infrastructure 啟動的五大問題 (Doc ID 1526147.1)ASTStruct
- Oracle 12c Grid Infrastructure for a Standalone Server on Oracle Linux 7OracleASTStructServerLinux
- 【MOS】How to backup or restore OLR in 11.2/12c Grid InfrastructureRESTASTStruct
- oracle linux 11.2 rac grid infrastructure add scan ipOracleLinuxASTStruct
- redhat linux 11.2 rac grid infrastructure add scan ipRedhatLinuxASTStruct
- backup or restore OLR in 11.2 Grid Infrastructure (Doc ID 1193643.1)RESTASTStruct
- Master Note for RAC Oracle Clusterware and Oracle Grid Infrastructure 1096952.ASTOracleStruct
- [INS-40406] The installer detects no existing Oracle Grid Infrastructure ...OracleASTStruct
- 聊聊兩種給Grid Infrastructure打補丁的方法(上)ASTStruct