Waiting for clusterware split-brain resolution
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.1 to 12.1.0.2 [Release 11.2 to 12.1]Information in this document applies to any platform.
PURPOSE
The purpose of this document is to explain when a "Waiting for clusterware split-brain resolution" alert log message precedes an instance crash or eviction.
TROUBLESHOOTING STEPS
Background
Before one of more instances crash, the alert.log shows "Waiting for clusterware split-brain resolution". This is often followed by "Evicting instance n from cluster" where n is the instance number that is getting evicted. The lmon process sends a network ping to remote instances, and if lmon processes on the remote instances do not respond, a split brain at the instance level occurred. Therefore, finding out the reason that the lmon can not communicate with each other is important in resolving this issue.
The common causes are:
1) The instance level split brain is frequently caused by the network problem, so checking the network setting and connectivity is important. However, since the clusterware (CRS) would have failed if the network is down, the network is likely not down as long as both CRS and database use the same network.
2) The server is very busy and/or the amount of free memory is low -- heavy swapping and scanning or memory will prevent lmon processes from getting scheduled.
3) The database or instance is hanging and lmon process is stuck.
4) Oracle bug
Troubleshooting Instructions
1) Check network and make sure there is no network error such as UDP error or IP packet loss or failure errors.
2) Check network configuration to make sure that all network configurations are set up correctly on all nodes.
For example, MTU size must be same on all nodes and the switch can support MTU size of 9000 if jumbo frame is used.
3) Check if the server had a CPU load problem or a free memory shortage.
4) Check if the database was hanging or having a severe performance problem prior to the instance eviction.
5) Check CHM (Cluster Health Monitor) output to see if the server had a CPU or memory load problem, network problem, or spinning lmd or lms processes. The CHM output is available only on certain platform and version, so please check the CHM FAQ Document 1328466.1
6) Set up to run OSWatcher by following the instruction in the note Document 301137.1 if OSWatcher is not set up already.
Having OSWatcher output is helpful when CHM output is not available.
Diagnostic Collection
If TFA is installed (:)) simply run the following command:
Format example: "Jul/1/2014 21:00:00"
Specify the "from time" to be 4 hours before and the "to time" to be 4 hours after the time of error.
If TFA is not installed (:():
Datatbase logs & trace files:
cd $(orabase)/diag/rdbms
tar cf - $(find . -name '*.trc' -exec egrep '<date_time_search_string>' {} \; grep -v bucket) | gzip > /tmp/database_trace_files.tar.gz
ASM logs & trace files:
cd $(orabase)/diag/asm/+asm/
tar cf - $(find . -name '*.trc' -exec egrep '<date_time_search_string>' {} \; grep -v bucket) | gzip > /tmp/asm_trace_files.tar.gz
Clusteware logs:
<GI home>/bin/diagcollection.sh --collect --crs --crshome <GI home>
OS logs:
/var/adm/messages* or /var/log/messages* or 'errpt -a' or Windows System Event Viewer log (saved as .TXT file)
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/20747382/viewspace-2131514/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- ORACLE RAC clusterwareOracle
- Oracle 21C Clusterware Technology StackOracle
- Split Brain in Oracle Clusterware and Real Application ClusterAIOracleAPP
- MySQL InnoDB Cluster – how to manage a split-brain situationMySqlAI
- [virtualbox] temporary failure in name resolutionAI
- 2021 New Year‘s Resolution
- Waiting for target device to come onlineAIdev
- MySQL5.7 Waiting FOR TABLE FLUSHMySqlAI
- MySQL5.7 Waiting for global read lockMySqlAI
- MySQL DDL Waiting for table metadata lock 解決MySqlAI
- MySQL新增索引偶遇waiting for table metadata lockMySql索引AI
- MySQL:Analyze table導致'waiting for table flush'MySqlAI
- rust-quiz:005-trait-resolution-hrtb.rsRustUIAI
- TypeScript 裡的 module 解析過程 - Module ResolutionTypeScript
- MySQL Cases-記錄大量waiting for handler commitMySqlAIMIT
- ORA-04021: timeout occurred while waiting to lock objectWhileAIObject
- 如何用double hashing解決collision resolution問題
- Failed to connect to ESP8266: Timed out waiting for packet headerAIHeader
- Error waiting for a debug connection: ProcessException: adb did not report forwarded portErrorAIExceptionForward
- Oracle例項關閉:SHUTDOWN: waiting for active calls to completeOracleAI
- PAT甲級-1014. Waiting in Line (30)(模擬)AI
- RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth EstimationAPTMono
- Restormer Efficient Transformer for High-Resolution Image Restoration——2022CVPRRESTORM
- Deep Unfolding Network for Image Super-Resolution 論文解讀
- MySQL:簡單記錄一下Waiting for commit lockMySqlAIMIT
- yum 提示 Another App is currently holding the yum lock; waiting for it to exit...APPAI
- [論文閱讀] High-Resolution Image Synthesis with Latent Diffusion Models
- [基礎] Stable Diffusion, High-Resolution Image Synthesis with Latent Diffusion Models
- 論文翻譯:Real-Time High-Resolution Background Matting
- Image Super-Resolution Using DeepConvolutional Networks論文閱讀筆記筆記
- Blind Super-Resolution Kernel Estimation using an Internal-GAN 論文解讀
- CVPR2017:Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
- java中WAITING狀態的執行緒為啥還會消耗CPUJavaAI執行緒
- 當匯入flutter專案時,報Waiting for another flutter command to release the startup lockFlutterAI
- Sector size導致ORA-27061: waiting for async I/Os failed with root.shAI
- Image Super-Resolution via Sparse Representation——基於稀疏表示的超解析度重建
- k8s中問題: waiting for a volume to be created, either by external provisioner ****or manually created by system administratorK8SAI
- 2019-02-26 論文閱讀:Learning a Single Convolutional Super-Resolution Network for Multiple Degradations...
- 優化vue+springboot專案頁面響應時間:waiting(TTFB) 及content Download優化VueSpring BootAI