Oracle RAC 10.1.0 - 11.1.0.7引起節點被踢出的原因
Date | 2008-12-06 09:42:43 |
Component | CRS |
Title | What can cause a Node Eviction ? |
Version | 10.1.0 - 11.1.0.7 |
Problem |
Node evictions can occur in a cluster environment, the main question is why did the eviction occured ? Below I try to make that part easier. |
Solution |
There are 4 possible causes why a node eviction can occur.
The title start with cause, but an Node eviction is a symptom of another problem not the cause. Keep this always in mind when investigating why a node eviction can occur. Kernel Hang depended on the Operation System used. For Window or Linux this can be done based on the Hangcheck Timer and other Unix environments OPROCD is started. From Oracle 10.2.0.4 and higher OPROCD is also active on LINUX. (Still install the hangcheck timer) To validate if HANGCHECK timer or OPROCD was causing the node eviction validate the OS logfiles for the hangcheck timer. For OPROCD validate the OPROCD logfile.
An other possible node eviction can be triggered by OCLSMON starting with the 10.2.0.3 patchset or higher. The Clusterware proces is validating if there is an issue with CSSD. When this is the case it will kill the CSSD deamon, which will lead to the eviction. When this issue occur validate the oclsmon logfile and contact Oracle support. In this note we don’t focus on these parts, but on heartbeat lost. Below are two examples of a heartbeat lost symptom. The OCSSD background process is taking care of the heartbeats. In the cssd.log file you can find detail information about the node eviction. In case of an eviction validate all the cssd.log file on all the nodes in your cluster environment. But start with the evicted node. The logging information logged can be changed during patchset and Oracle releases. Node eviction due to Interconnect lost symptom. Oracle 11g
Oracle 10g [ CSSD]2006-10-18 23:49:06.226 [1] >USER: NMEVENT_SUSPEND [00][00][00][06] [ CSSD]2006-10-18 23:49:08.032 [1030] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(23) wrtcnt(634354) LATS(2345205587) Disk lastSeqNo(634354) [ CSSD]2006-10-18 23:49:09.199 [3600] >TRACE: clssnmCheckDskInfo: node(2) timeout(1167) state_network(0) state_disk(3) missCount(33) [ CSSD]2006-10-18 23:49:10.199 [3600] >TRACE: clssnmCheckDskInfo: node(2) timeout(2167) state_network(0) state_disk(3) missCount(33) …….
Here we see that the Diskkillcheck is report by node 1 and this node is evicted. The diskkillcheck is done using a poison packets trough the voting disk, as interconnect is lost. Possible action: check the availability of the Adapters, large network load/port scans and the OS logfiles for reported errrors related to the interconnect.
Node eviction due to Voting disk lost symptom. Below an example where we lose the heartbeat to the voting disk. [ CSSD]2006-10-11 00:35:33.658 [1801] >TRACE: clssnmHandleSync: Acknowledging sync: src[1] srcName[alligator] seq[9] sync[15] [ CSSD]2006-10-11 00:35:36.956 [1801] >TRACE: clssnmHandleSync: diskTimeout set to (27000)ms [ CSSD]2006-10-11 00:35:37.960 [3343] >TRACE: clscsendx: (11145a3f0) Physical connection (111459b30) not active
Possible action: check the availability of the Disk subsystem and the OS logfiles for reported errrors related to the voting disk Trace the heartbeat: If needed you can enable a higher level of tracing to debug the heartbeat part. This can be done using the command, level 5 tracing. Level 0 disables the extra trace again. Please keep in mind that this can make your cssd.log growth hard. (4 lines added every second).
crsctl debug log css CSSD:5 crsctl debug log css CSSD:0
NOTICE: Node evictions is a symptom for another problem ! |
轉載自:
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23135684/viewspace-719226/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- DRM特性引起的RAC節點當機
- Oracle RAC新增節點Oracle
- Oracle RAC 新增節點Oracle
- Sqlserver 2014 alwayson故障轉移群集節點被踢出群集SQLServer
- 檢視oracle rac的節點Oracle
- Oracle Rac 刪除節點Oracle
- 【RAC】Oracle10g RAC 節點重配的方式Oracle
- [網摘] Oracle RAC新增節點Oracle
- oracle11g RAC新增節點Oracle
- Oracle10g RAC 加節點Oracle
- oracle 11g rac新增節點前之清除節點資訊Oracle
- 刪除oracle10g rac(rhel4)節點_節點Oracle
- ORACLE RAC環境下刪除節點Oracle
- RAC第一個節點被剔除叢集故障分析
- oracle 10g rac,刪除故障節點並新增新節點Oracle 10g
- oracle 10g rac 新增節點與刪除節點步驟Oracle 10g
- RAC 雙節點 轉單節點流程
- oracle11g_RAC新增刪除節點Oracle
- Oracle 11g RAC重新新增節點Oracle
- Oracle 10g RAC刪除、增加節點Oracle 10g
- Oracle RAC 10g叢集節點增加Oracle
- Oracle 10g RAC增加節點例項Oracle 10g
- Oracle9.2.0.4 RAC系統加入新節點Oracle
- Oracle RAC解除安裝後的重灌重點環節Oracle
- 【ASK_ORACLE】RAC節點自動重啟但日誌裡未報錯的原因和解決方法Oracle
- 壓力之下的Oracle10.1.0.XOracle
- 【RAC】Oracle10g rac新增刪除節點命令參考Oracle
- 【RAC】Oracle11g RAC刪除節點相關事項Oracle
- 【RAC】Oracle11g RAC新增新節點相關事項Oracle
- Oracle10g RAC 刪除已經失效的節點Oracle
- Oracle優化案例-新增RAC節點(二十九)Oracle優化
- Oracle RAC節點時間差過大解決方法Oracle
- oracle11GR2 RAC節點crash故障分析Oracle
- 關於Oracle RAC節點間免密碼策略Oracle密碼
- Oracle RAC恢復成單節點資料庫Oracle資料庫
- [轉載] Oracle9i RAC 手工新增節點Oracle
- Oracle9.2.0.4 RAC 三節點引數設定Oracle
- Oracle10g RAC 刪除及加入節點Oracle