RAC and Oracle Clusterware Best Practices and Starter Kit (Linux)_811306.1

rongshiyuan發表於2014-11-09

In this Document

Purpose
Scope
Details
  RAC and Oracle Clusterware Best Practices and Starter Kit (Platform Independent)
  RAC Platform Specific Best Practices and Starter Kits
  RAC on Linux Step by Step Installation Cookbook Video
  RAC on Linux Step by Step Installation Instructions
  RAC on Linux Best Practices
  OS Configuration Considerations
  Virtualization Considerations
  Storage Considerations
  General
  ASM Specific
  OCFS2 Specific
  Network Considerations
  Hardware/Vendor Specific Considerations
  Oracle Software Considerations
  Install
  General
  Community Discussions
References

Applies to:

Oracle Database - Enterprise Edition
Linux x86
Generic Linux
Linux x86-64

Purpose


The goal of the Oracle Real Application Clusters (RAC) series of Best Practice and Starter Kit notes provide customers with quick knowledge transfer of generic and platform specific best practices for implementing, upgrading and maintaining an Oracle RAC system. This document is compiled and maintained based on Oracle's experience with its global RAC customer base.

This Starter Kit is not meant to replace or supplant the Oracle Documentation set, but rather, it is meant as a supplement to the same. It is imperative that the Oracle Documentation be read, understood, and referenced to provide answers to any questions that may not be clearly addressed by this Starter Kit.

All recommendations should be carefully reviewed by your own operations group and should only be implemented if the potential gain as measured against the associated risk warrants implementation. Risk assessments can only be made with a detailed knowledge of the system, application, and business environment.

As every customer environment is unique, the success of any Oracle Database implementation, including implementations of Oracle RAC, is predicated on a successful test environment. It is thus imperative that any recommendations from this Starter Kit are thoroughly tested and validated using a testing environment that is a replica of the target production environment before being implemented in the production environment to ensure that there is no negative impact associated with the recommendations that are made.

 

Scope

This article applies to all new and existing RAC implementations as well as RAC upgrades.

Note:  For similar information specific for EXADATA environments, please refer to:  Document 1306791.2 - Information Center: Oracle Exadata Database Machine

 

Details

RAC and Oracle Clusterware Best Practices and Starter Kit (Platform Independent)

The following document focuses on RAC and Oracle Clusterware Best Practices that are applicable to all platforms including a white paper on available RAC System Load Testing Tools and RAC System Test Plan outlines for 10gR2 & 11gR1 and 11gR2:

Document 810394.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Platform Independent)

 

RAC Platform Specific Best Practices and Starter Kits

The following notes contain detailed platform specific best practices including Step-By-Step installation cookbooks (downloadable in PDF format):

Document 811306.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Linux)
Document 811280.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Solaris)
Document 811271.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Windows)
Document 811293.1 RAC and Oracle Clusterware Best Practices and Starter Kit (AIX)
Document 811303.1 RAC and Oracle Clusterware Best Practices and Starter Kit (HP-UX)

 

RAC on Linux Step by Step Installation Cookbook Video

Document 1442768.1 - RACGuide: 11.2.0.3 RAC Installation on Linux [Video]
Document 1600316.1 - RACGuide: 12.1.0.1 RAC Installation on Linux [Video] *NEW*

 

RAC on Linux Step by Step Installation Instructions

Click here for a Step By Step guide for installing Oracle RAC 10gR2 on Linux.
Click here for a Step By Step guide for installing Oracle RAC 11gR1 on Linux.
Click here for a Step By Step guide for installing Oracle RAC 11gR2 (11.2.0.2) on Linux.

 

RAC on Linux Best Practices

The Best Practices in this section are specific to the Linux Platform. That said, it is essential that the Platform Independent Best Practices found in Document 810394.1 be reviewed.

OS Configuration Considerations

  • If running EL4, it is highly recommended that the system be patched to a minimum version of EL 4.7 to avoid  the following issues:
    • Bug 6116137 - avoid global OS hang
    • Bug 6125546 - Node not evicted with sysreq call
    • Bug 5041764 - CFQ IO Scheduler allows one process to temp starve other processes 
    • Document 400959.1 - e1000 flow control defaults to off in versions RHEL/OEL 4 prior to 4.4
    • Bug 5136660 - Bonding devices to not come up with primary slave
  • Validate your hardware/software configuration against the RAC Technologies Matrix for Linux.
  • In pre-11gR2 clusters, system times are to be synchronized across cluster nodes using NTPD and NTPD should be configured to slew time to prevent false reboots. On Linux, this slewing is accomplished by starting ntpd with the '-x' option. Refer to Document 551704.1 for Linux NTPD configuration details.
  • For 11gR2, Cluster Time Synchronization Daemon (CTSSD) can be used in place of NTPD.  CTSSD will synchronize time with a reference node in the cluster when an NTPD is not found to be configured.  Should you require synchronization from an external time source you must use NTPD which will cause CTSSD to run in "observer" mode.  However, if NTP is running, then it must to be configured with the slewing option as documented in Document 551704.1.
  • The Hangcheck-timer kernel module is required for both Oracle 10gR2 and 11gR1 RAC on Linux. Assuming the default setting of "CSS misscount" is set to either 30 or 60 seconds, the recommended hangcheck-timer settings are: hangcheck_tick=1 hangcheck_margin=10 hangcheck_reboot=1,  see Document 726833.1.  For 11gR2, hangcheck-timer is no longer required as this functionality has been built into the Grid Infrastructure stack.
  • Implement HugePages for the database instances.  HugePages provides a method to have larger page sizes and it will lock the SGA into physical memory thus eliminating the need for system page table lookups for the SGA.  This is especially important on systems with high memory allocations due to eliminate the management overhead associated with the such configurations.  HugePages is also recommended in environments where server stability issues are evident – various spins/hangs, which are not attributed to Oracle Clusterware or other known OS issues.  To compute size, see Document 401749.1. For more information, see Document 361323.1. In 11gR1 and above, you must disable AMM to use HugePages as directed in Document 749851.1.
  • If you are running RedHat 6, OEL 6, SLES 11 or UEK2 kernels, be sure to disable Transparent HugePages to prevent performance problems and Node/Instance evictions.  Our long standing recomendation of using general HugePages still stands (THP is a different implementation than general HugePages).  See Document 1557478.1 for additional details.
  • Set the vm.min_free_kbytes kernel parameter to reserve 512MB to allow the OS to reclaim memory faster and avoid LowMem pressure.  See Document 452326.1Document 452000.1 and Document 1367153.1 for additional information.
  • On kernel revision 2.6.18 and below set the kernel parameter vm.swappiness=100.  Stress testing has shown that vm.swappiness = 100 (default = 60) on kernel version  2.6.18 or lower can reduce or delay node evictions under conditions of heavy memory pressure due to large numbers of client connections or during login storms.
  • Ensure ALL required 32-bit and 64-bit OS packages are installed and system prerequisites for your particular release of Oracle have been properly implemented.  This information is documented in Document 169706.1 as well as the install guides for your particular release.
  • It is possible to see random node eviction(s) induced by OCSSD in RAC on Red Hat, Suse and Oracle Enterprise Linux OS implementations following application of the 10.2.0.4 patchset and in 11g environments. These random reboots are the result of a bug within GLIBC which is fixed in EL4.7 (and above) and EL5.2 (and above).  Details on this issue and corrective action to be taken for this issue are found in Document 731599.1.
  • For pre-11.2.0.2 installations, SELinux must be disabled. For 11.2.0.2, SELinux is supported but the recommendation (if possible) is to run with SELinux disabled. See Bug 9746474.
  • Disable the AVAHI daemon to prevent startup issues with the CSSD daemon, see Document 1501093.1 for additional details.
  • For Linux 6.x, you may find that limits.conf appears to be ignored when establishing nproc - see  Document 1487773.1 for additional details.
  • Configure and test Linux debugging options (Netdump, 'Magic' SysRq Keys and Serial Console) as documented in Document 443578.1
    Note:  Netdump does not support bonded interfaces until RHEL/OEL4.4 or later.

Virtualization Considerations

  • RAC on Linux is supported on Oracle VM 2.1.2 and above when using the Para-virtualized guests.  For more information on Virtualization support on Oracle VM see Document 464754.1.
  • When deploying RAC in an Oracle VM environment, be sure to review and adhere to the best practices documented in the Oracle RAC in Oracle VM Environments white paper found on OTN.

Storage Considerations

General
  • See the following notes on how to configure multipath shared storage:
    • Document 564580.1 - Configuring raw devices (multipath) for Oracle Clusterware 10g Release 2 on RHEL5/OEL5
    • Document 605828.1 - Configuring non-raw multipath devices for Oracle Clusterware 11g on RHEL5/OEL5
      Note:  Document 605828.1 can easily be adapted to configure multipath for an 11gR2 installation without ASMLib.  For ASMLib specific instructions see the ASMLib section below.
  • On RedHat EL4 or later it is recommended to set aio-max-nr=1048576 (or 4194304 for 11g).
  • Starting with the Linux 2.6 kernel (RHEL 5, OEL 5) raw devices are being phased out in favor of O_DIRECT access to block devices. See Document 401132.1 for details regarding this issue. 
    • Starting with 11gR2, RAW and/or Block devices are supported for backward compatibility only (on ALL platforms), see Document Note: 754305.1 for details. 
  • On OCFS2, configure timeouts correctly for OCR/VOTING disks (The setting should satisfy: O2CB_HEARTBEAT_THRESHOLD>=((max(HW_STORAGE_TIMEOUT, SW_STORAGE_TIMEOUT) / 2) + 1). Reference Document 395878.1

 

ASM Specific
  • Use ASMLib for ease of management and for increased performance provided by the ASMLib Async I/O driver.
    Note:  For RedHat Enterprise Linux 6 (beginning with 6.4) the kernel driver package 'kmod-oracleasm' is available directly from RedHat, and can be installed from the "RHEL Server Supplementary (v. 6 64-bit x86_64)" channel on RedHat Network (RHN).  Updates to this module will be provided by RedHat.  Additional information may be found in Document 1089399.1.
  • When using ASMLib in a multipath environment ensure that ASMLib has been configured properly such that the multipath device is used for access to the devices.  This is achieved by editing the /etc/sysconfig/oracleasm configuration file and modifying the ORACLEASM_SCANORDER to include a valid prefix entry from /proc/partitions (e.g. "dm","emcpower").  A common misconfiguration is to have ORACLEASM_SCANORDER" set to "/dev/mapper/".  "/dev/mapper: are aliases which will NOT exist in /proc/partitions.  See Document 602952.1 for details.
  • If you are NOT or are unable to use ASMLib please refer to Document 357472.1 for guidance on how to configure UDEV and/or Device Mapper to manage Clusterware and ASM devices.
OCFS2 Specific
  • Generally speaking it is discouraged to use OCFS2 on a RAC system since it adds additional complexity to the cluster.  Use of ASM is the current and future direction for storage on a RAC system and is a highly recommended best practice.
  • OCFS2 1.2.7-1 can make the OCFS2 filesystem(s) become unavailable for access after a node panic or a node eviction occurs. Users are advised to upgrade to OCFS2 1.2.8-2 as soon as possible. See Document 553600.1 - "OCFS2 1.2.7-1 Filesystem May Become Unavailable after Node Panic or Eviction".
  • OCFS2 volumes containing the voting diskfile (Clusterware), Cluster registry (OCR), Data files, Redo logs, Archive logs and Control files must be mounted with the data volume and nointr mount options.
  • OCFS2 does not support shared writeable mmap in releases below OCFS2 1.4.x. The health check (GIMH) file $ORACLE_HOME/dbs/hc_ORACLESID.dat and the ASM file $ASM_HOME/dbs/ab_ORACLESID.dat should be symlinked to local filesystem when using any OCFS2 release prior to 1.4.x
  • Use a private interconnect for OCFS2 communication to avoid a network delay being interpreted as a node disappearing on the net which could lead to a node-self-fencing. One could use the same interconnect for Oracle RAC and OCFS2. See Document 603080.1 and Document 391771.1. When making this change be sure to leave the OCFS2 node name as the public node name, the only change to be made is the IP address to reflect the private interconnect IP.
  • Ensure that /usr/bin/updatedb (aka /usr/bin/locate, slocate) does not run against OCFS partitions. Updatedb, a file indexer, will reduce OCFS file I/O performance. To prevent updatedb from indexing OCFS partitions, add 'ocfs' to the PRUNEFS= and/or list OCFS volumes specifically in the PRUNEPATHS list in /etc/updatedb.conf. See Document 789946.1.
  • It is highly discouraged to store the Voting Disk and OCR on OCFS2 filesystems and should be avoided whenever possible.  Should there be a requirement to use OCFS2 for OCR files and/or VOTING disks, review Document 395878.1 - "Heartbeat/Voting/Quorum Related Timeout Configuration for Linux, OCFS2, RAC Stack to avoid unnecessary node fencing, panic and reboot".

Network Considerations

  • As workload dictates, rmem_max and wmem_max kernel parameters should be increased beyond the default 256kb.  These values determine how much kernel buffer memory is allocated per socket opened for network reads and writes:
    net.core.rmem_default=262144
    net.core.rmem_max=4194304 (for 11g and all RDS implementations)
    net.core.rmem_max=2097152 (for 10g)
    net.core.wmem_default=262144
    net.core.wmem_max=1048576 (with RDS use at least 2097152)
The _default values determine how much memory is consumed per socket immediately at socket creation.  The _max values determine how much memory each socket is allowed to consume dynamically, if the memory requirements grow beyond the defaults.    Therefore, the best practice recommendation is to define lower default values (i.e. 256k) in order to conserve memory when possible, but have larger max values (1MB or greater) to allow for network performance gains if more memory is needed for a given socket.
  • When implementing the Linux Bonding driver, mode 3 (for the interconnect) and mode 6 (for the interconnect and public networks) must be avoided:
    • Testing of mode 3 for the private interconnect has proven that it duplicates all UDP packets and transmits them on every path. This increases CPU overhead for processing data from the interconnect thereby making the interconnect less efficient.  The duplicate UDP packets caused by mode 3 bonding has exposed unpublished Bug 7238620 (ORA-600 [2032]) and unpublished Bug 9081436 (GC CR REQUEST WAIT CAUSING SESSIONS TO WAIT).  Though the known issues with mode 3 are isolated to the interconnect, it is discouraged to use mode 3 for the public network also due to the inefficiencies with mode 3 mentioned above.
    • Mode 6 bonding has an inherent race condition with floating IP addresses causing failover issues with VIPs, SCAN VIPs and HAIP.
  • Use NSCD to cache NIS lookups for LINUX -- Check to ensure the NSCD service is installed and running (information can be found in the RDA).
  • Ensure that the Linux Firewall (iptables) is disabled.  This is especially important for the private interconnect as this will cause instability within the cluster.  See Document 554781.1 for details. 
  • For Linux Kernels 2.6.31 and above, a bug has been fixed in the Reverse Path Filtering. As a consequence of this bug fix, interconnect packets may be blocked/discarded on multi-interconnect systems.  To avoid this situation, set the rp_filter kernel parameter to a vale of 0 (disable) or 2 (loose) for the private interconnect NICs.  For more information see Document 1286796.1.

Hardware/Vendor Specific Considerations

  • On EL5 (RHEL and OEL) systems running the bnx2 network driver ensure that a minimum kernel version of 2.6.18-194.8.1 (EL5.5 or higher) has been implemented to aovoid network instability issues. Refer to https://rhn.redhat.com/errata/RHSA-2010-0398.html for details of BZ#587799, BZ#581148
  • Ensure minimum BIOS version 2.35.3.3 is used for SUN V40Z DUAL CORE machines, for ECC memory checking.
  • Ensure SUN V40Z 2.6V memory management voltage regulator issues. A SUN CE can identify if the voltage regulator is begging to fail. The new VRM (Voltage Regulator Module) revision board from rev 1.0 to rev 2.0.

Oracle Software Considerations

The Software Considerations in this section are specific to the Linux Platform.  That said, it is highly recommended that the Platform Independent Best Practices found in Document 810394.1 be reviewed.

Install

  • Prevent root.sh failures by ensuring that the Linux Firewall (iptables) has been disabled.  See Document 554781.1 for details.
  • For pre-11.2.0.2 installations, prevent root.sh failures by disabling SELinux.  For 11.2.0.2, SELinux is supported but the recommendation (if possible) is to run with SELinux disabled.  See Bug 9746474.
  • On SLES installations of 11.2.0.2 will fail on the copy of ohasd.sles during the execution of root.sh Bug 10428946:

    Performing root user operation for Oracle 11g
    ...
    The init script file "/u01/app/11202/grid/crs/init/ohasd.sles" does not exist
    ...
    Configure Oracle Grid Infrastructure for a Cluster ... succeeded
The workaround for Bug 10428946 is to manually copy ohasd.sles from the $GI_HOME/crs/util directory to the $GI_HOME/crs/init directory:
# cp $GIHOME/crs/utl/ohasd.sles $GIHOME/crs/init
Note:  This issue will be fixed in the 11.2.0.2.4 GI PSU and is fixed in 11.2.0.3.
 

General

  • For 10gR2 and 11gR1 installations, verify that the oradism executable matches the following ownership and permissions "-rwsr-sr-x 1 root dba oradism" and make sure the lms is running in Real Time mode.

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/17252115/viewspace-1326289/,如需轉載,請註明出處,否則將追究法律責任。

相關文章