Waiting for clusterware split-brain resolution
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.1 to 12.1.0.2 [Release 11.2 to 12.1]Information in this document applies to any platform.
PURPOSE
The purpose of this document is to explain when a "Waiting for clusterware split-brain resolution" alert log message precedes an instance crash or eviction.
TROUBLESHOOTING STEPS
Background
Before one of more instances crash, the alert.log shows "Waiting for clusterware split-brain resolution". This is often followed by "Evicting instance n from cluster" where n is the instance number that is getting evicted. The lmon process sends a network ping to remote instances, and if lmon processes on the remote instances do not respond, a split brain at the instance level occurred. Therefore, finding out the reason that the lmon can not communicate with each other is important in resolving this issue.
The common causes are:
1) The instance level split brain is frequently caused by the network problem, so checking the network setting and connectivity is important. However, since the clusterware (CRS) would have failed if the network is down, the network is likely not down as long as both CRS and database use the same network.
2) The server is very busy and/or the amount of free memory is low -- heavy swapping and scanning or memory will prevent lmon processes from getting scheduled.
3) The database or instance is hanging and lmon process is stuck.
4) Oracle bug
Troubleshooting Instructions
1) Check network and make sure there is no network error such as UDP error or IP packet loss or failure errors.
2) Check network configuration to make sure that all network configurations are set up correctly on all nodes.
For example, MTU size must be same on all nodes and the switch can support MTU size of 9000 if jumbo frame is used.
3) Check if the server had a CPU load problem or a free memory shortage.
4) Check if the database was hanging or having a severe performance problem prior to the instance eviction.
5) Check CHM (Cluster Health Monitor) output to see if the server had a CPU or memory load problem, network problem, or spinning lmd or lms processes. The CHM output is available only on certain platform and version, so please check the CHM FAQ Document 1328466.1
6) Set up to run OSWatcher by following the instruction in the note Document 301137.1 if OSWatcher is not set up already.
Having OSWatcher output is helpful when CHM output is not available.
Diagnostic Collection
If TFA is installed (:)) simply run the following command:
Format example: "Jul/1/2014 21:00:00"
Specify the "from time" to be 4 hours before and the "to time" to be 4 hours after the time of error.
If TFA is not installed (:():
Datatbase logs & trace files:
cd $(orabase)/diag/rdbms
tar cf - $(find . -name '*.trc' -exec egrep '<date_time_search_string>' {} \; grep -v bucket) | gzip > /tmp/database_trace_files.tar.gz
ASM logs & trace files:
cd $(orabase)/diag/asm/+asm/
tar cf - $(find . -name '*.trc' -exec egrep '<date_time_search_string>' {} \; grep -v bucket) | gzip > /tmp/asm_trace_files.tar.gz
Clusteware logs:
<GI home>/bin/diagcollection.sh --collect --crs --crshome <GI home>
OS logs:
/var/adm/messages* or /var/log/messages* or 'errpt -a' or Windows System Event Viewer log (saved as .TXT file)
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/20747382/viewspace-2131514/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Oracle10g RAC clusterware split-brain - 腦裂OracleAI
- Object Name Resolution (248)Object
- oracle clusterwareOracle
- Solution for Deepin DNS resolution errorDNSError
- Resolution of Schema Object Dependencies (241)Object
- ORACLE RAC clusterwareOracle
- split-brain 腦裂問題(Keepalived)AI
- MySQL InnoDB Cluster – how to manage a split-brain situationMySqlAI
- [virtualbox] temporary failure in name resolutionAI
- Oracle Clusterware的心跳Oracle
- Oracle Clusterware工具集Oracle
- Clusterware 後臺程式
- 2021 New Year‘s Resolution
- Right here waitingAI
- Oracle clusterware組成概述Oracle
- 安裝clusterware問題
- HACMP & Oracle Clusterware 對比ACMOracle
- High-Resolution Mandelbrot in Obfuscated PythonPython
- Waiting for table metadata lockAI
- RMAN progress and what it is waiting for scriptsAI
- clone grid INfrastructure Home and clusterwareASTStruct
- Oracle Clusterware and Oracle Grid InfrastructureOracleASTStruct
- TypeScript 裡的 module 解析過程 - Module ResolutionTypeScript
- Merge two videos into a large resolution videoIDE
- MySQL5.7 Waiting FOR TABLE FLUSHMySqlAI
- How to Deinstall Oracle Clusterware Home ManuallyOracle
- Oracle Clusterware 命令集分類Oracle
- BLOCKED,WAITING,TIMED_WAITING有什麼區別?-用生活的例子解釋BloCAI
- rust-quiz:005-trait-resolution-hrtb.rsRustUIAI
- name server cannot be used, reason: Temporary failure in name resolutionServerAI
- Oracle 21C Clusterware Technology StackOracle
- oracle clusterware命令集的分類:Oracle
- clusterware完全解除安裝oracle官方指南Oracle
- CLUSTERWARE管理和部署手冊總結
- Metlink:How to clean up a failed CRS/ClusterwareAI
- Oracle Clusterware Software Component Processing DetailsOracleAI
- The Oracle Clusterware Voting Disk and Oracle Cluster RegistryOracle
- 【RAC】Oracle Clusterware 診斷收集指令碼Oracle指令碼