Linux: Hangcheck-Timer Module Requirements for Oracle 9i,10g,11gR1 RAC_726833.1
In this Document
Purpose |
Scope |
Details |
References |
Applies to:
Oracle Database - Enterprise Edition - Version 9.2.0.8 to 11.1.0.7 [Release 9.2 to 11.1]Linux x86
Linux x86-64
Purpose
Hangcheck_timer module is required to run a supported configuration in Oracle Real Application Clusters environments on Linux, with Oracle releases 9i, 10g, or 11gR1 RAC. This note identifies and outlines the requirements needed to configure hangcheck-timer in an Oracle Enterprise Linux, Red Hat Linux, or SUSE Linux environment.
Scope
This article is provided for product management, system architects, and system administrators involved in deploying and configuring Oracle RAC 9i, 10g, or 11gR1 in a Linux environment. This document will also be useful to field engineers and consulting organizations to facilitate installations and configuration requirements of Oracle in a Linux RAC environment.
Details
Starting in release 9.2.0.2 and later, Oracle RAC environments required using a new I/O fencing model, named the hangcheck-timer module. This module was implemented to replace the Watchdog module, which provided similar fencing functionality. Hangcheck-timer was subsequently delivered as part of the standard kernel distribution for Linux kernel releases 2.4 and above.
Hangcheck-timer should be loaded at boot time, and monitors the Linux kernel for long operating system hangs that could affect the reliability of a RAC node. It runs in kernel mode and uses the Time Stamp Counter (TSC) to catch scheduling delays or node hangs. This is done by setting a timer, then checking when the timer fires as to whether it was delayed by more than the allowed margin of error. If the duration exceeds the allowed time of (hangcheck_tick + hangcheck_margin seconds), the machine is restarted. Hangcheck-timer will not cause reboots to occur due to CPU starvation.
Hangcheck-timer requires three configuration parameters:
- hangcheck_tick - defines how often, in seconds, the hangcheck-timer checks the node for hangs. The default value is 60 seconds.
- hangcheck_margin - defines how much margin is allowed, in seconds, between expected scheduling and real scheduling time. The default value is 180 seconds.
- hangcheck_reboot - determines if the hangcheck-timer restarts the node if the kernel fails to respond within the sum of the hangcheck_tick and hangcheck_margin parameter values. If the value of hangcheck_reboot is equal to or greater than 1, then the hangcheck-timer module restarts the system. If the hangcheck_reboot parameter is set to zero, then the hangcheck-timer module will not reboot the node, even if a hang is detected. The default value varies by kernel version. In the 2.4 kernel, the default is 1. In 2.6 kernels, the default is 0.
-
9i: Assuming the default setting of "oracm misscount" is set to 220 seconds:
hangcheck_tick=30 hangcheck_margin=180 hangcheck_reboot=1 -
10g/11gR1: Assuming the default setting of "CSS misscount" is set to either 30 or 60 seconds:
hangcheck_tick=1 hangcheck_margin=10 hangcheck_reboot=1
You must always ensure that the Cluster misscount setting is greater than the sum of the setting for hangcheck_tick + hangcheck_margin.
@ Unpublished information for Oracle Support Internal Use:
When running Oracle Clusterware on Linux, hangcheck-timer should always be configured on each RAC cluster node, as the functionality of this module is required to provide I/O Fencing to ensure no stray writes will occur from an evicted node in a RAC cluster. To verify if the hangcheck-timer module is running on a node execute as the root or oracle user:
hangcheck-timer 2672 0
If the hangcheck-timer module is loaded (running) you will see output similar to above. When hangcheck-timer is not loaded no output is generated, and the command prompt is returned to the user.
In an Oracle Enterprise Linux, Red Hat 4/5, or SUSE 9/10 environment the hangcheck-timer module is loaded using the modprobe command:
In order to ensure the module is loaded at boot time, you should also place the same command in the appropriate local command execution directory (e.g. /etc/rc.d/rc.local, or /etc/init.d/boot.local). In earlier releases, hangcheck-timer was loaded using insmod in place of modprobe. Consult your release specific documentation to determine which initialization method is required.
Hangcheck-timer will provide message logging to the system messages log when a failure is detected, and a node restart is initiated by the module:
- When Hangcheck-timer reboots it may leave "Hangcheck: hangcheck is restarting the machine" message in /var/log/messages
- If you see the following message in /var/log/messages: "Hangcheck: hangcheck value past margin!" this means a reboot was required but was not performed, because hangcheck_reboot was not set to 1. If this message is seen, you must reload the hangcheck module as described earlier in this note, with the hangcheck_reboot value set to 1.
Known Issues
- Bug:6125546 which can prevent hangcheck-timer from rebooting in RHEL4 (fixed in 2.6.9.56 or RHEL4.6)
Database - RAC/Scalability Community
To discuss this topic further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Database - RAC/Scalability Community
References
NOTE:232355.1 - Hangcheck Timer FAQNOTE:559365.1 - Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions
NOTE:567730.1 - Changes in Oracle Clusterware on Linux with the 10.2.0.4 Patchset
|
|
- Oracle Database Products > Oracle Database Suite > Oracle Database > Oracle Database - Enterprise Edition > Clusterware > Installation Issues including cluvfy, OUI and root.sh
- Oracle Database Products > Oracle Database Suite > Oracle Database > Oracle Database - Enterprise Edition > Real Application Cluster > OUI and other Installation Issues
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/17252115/viewspace-1126109/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Oracle 9i,10g,11gR1基於Linux的RAC都需要Hangcheck-Timer模組OracleLinuxGC
- Oracle 9i, 10g, and 11g RAC on Linux所需要的Hangcheck-Timer Module介紹OracleLinuxGC
- 【官方文件 oracle documentation】oracle官方文件總彙(9i,10g,11gR1, 11gR2)Oracle
- 【官方文件】【Doc】oracle官方文件總彙(9i,10g,11gR1, 11gR2)Oracle
- Configure the hangcheck-timer Kernel ModuleGC
- Oracle Flashback (9i & 10g) [zt]Oracle
- [筆記]Semaphores Tunning on RedHat Linux for Oracle 9i or 10g筆記RedhatLinuxOracle
- upgrade oracle version 9i to 10gOracle
- oracle 9i/10g merge 用法Oracle
- Oracle 9i/10g的官方教材Oracle
- Requirements for Installing Oracle 11gR1 RDBMS on Solaris 10 SPARC 64-bit [ID 743042.1]UIREMOracle
- Oracle 9i, 10g jdbc driver 檔案OracleJDBC
- 轉:oracle 9i/10g merge 用法Oracle
- oracle 10g的dmp如何匯入9iOracle 10g
- Oracle 補丁全集 (Oracle 9i 10g 11g Path)Oracle
- Oracle Advanced Replication 1 例 9i to 10g MVROracleVR
- 區別oracle 9i 與 oracle 10g 備份表空間Oracle 10g
- Placement of Voting disk and OCR Files in Oracle RAC 10g and 11gR1 [ID 293819.1]Oracle
- Oracle 9i、10g 常用軟體補丁下載地址Oracle
- oracle 9i for linux的安裝OracleLinux
- oracle 9i 和oracle 10g 和oracle 11g有什麼區別Oracle 10g
- Module of ORacleOracle
- Oracle 9i和10G軟體及補丁下載地址Oracle
- open the oracle 10g on linuxOracle 10gLinux
- Linux環境ORACLE 9i安裝LinuxOracle
- Oracle 資料庫歸檔配置-9i,10g,11gOracle資料庫
- Oracle Flashback 閃回查詢功能操作範例(9i and 10g)Oracle
- oracle 9i 10G 11G 的RAC 穩定性比較Oracle
- Oracle 9i、10g、11g補丁集下載大全Oracle
- Oracle 9i和10g安裝介質and補丁下載大全Oracle
- Oracle Linux 5.5 安裝Oracle 10gLinuxOracle 10g
- 操作規範(三)——Linux 5.4安裝Oracle 11gR1(2)LinuxOracle
- 操作規範(三)——Linux 5.4安裝Oracle 11gR1(1)LinuxOracle
- Oracle 11gR1中的SecureFileOracle
- Install Oracle 9i on Redhat Linux AS4OracleRedhatLinux
- Oracle 9i Installation on Red Hat Linux (轉)OracleLinux
- Oracle 隱藏引數(9i,10g,11g,12c)Oracle
- [轉載]Oracle 9i和10G軟體及補丁下載地址Oracle