RAC and Oracle Clusterware Best Practices and Starter Kit (AIX)_811293.1

rongshiyuan發表於2013-12-25


RAC and Oracle Clusterware Best Practices and Starter Kit (AIX) (Doc ID 811293.1)
In this Document

Purpose
Scope
Details
  RAC Assurance Support Team: RAC and Oracle Clusterware Starter Kit and Best Practices (Generic)
  RAC Platform Specific Starter Kits and Best Practices
  RAC on AIX Step by Step Installation Instructions
  RAC on AIX Best Practices
  OS Configuration Considerations
  Storage Considerations
  Network Considerations
  Oracle Software Considerations
  Community Discussions
References

Applies to: 

Oracle Database - Enterprise Edition - Version 10.2.0.1 to 11.2.0.3 [Release 10.2 to 11.2]
IBM AIX on POWER Systems (64-bit)
IBM AIX Based Systems (64-bit)


Purpose

The goal of the Oracle Real Application Clusters (RAC) series of Best Practice and Starter Kit notes is to provide customers with quick knowledge transfer of generic and platform specific best practices for implementing, upgrading and maintaining an Oracle RAC system. This document is compiled and maintained based on Oracle's experience with its global RAC customer base.

This Starter Kit is not meant to replace or supplant the Oracle Documentation set, but rather, it is meant as a supplement to the same. It is imperative that the Oracle Documentation be read, understood, and referenced to provide answers to any questions that may not be clearly addressed by this Starter Kit.

All recommendations should be carefully reviewed by your own operations group and should only be implemented if the potential gain as measured against the associated risk warrants implementation. Risk assessments can only be made with a detailed knowledge of the system, application, and business environment.

As every customer environment is unique, the success of any Oracle Database implementation, including implementations of Oracle RAC, is predicated on a successful test environment. It is thus imperative that any recommendations from this Starter Kit are thoroughly tested and validated using a testing environment that is a replica of the target production environment before being implemented in the production environment to ensure that there is no negative impact associated with the recommendations that are made.

Scope

This article applies to all new and existing RAC implementations as well as RAC upgrades.

Details

RAC Assurance Support Team: RAC and Oracle Clusterware Starter Kit and Best Practices (Generic)

The following document focuses on RAC and Oracle Clusterware Best Practices that are applicable to all platforms including a white paper on available RAC System Load Testing Tools and RAC System Test Plan outlines for 10gR2 & 11gR1 and 11gR2:

Document 810394.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Platform Independent)

 

RAC Platform Specific Starter Kits and Best Practices

The following notes contain detailed platform specific best practices including Step-By-Step installation cookbooks (downloadable in PDF format):

Document 811306.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Linux)
Document 811280.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Solaris)
Document 811271.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Windows)
Document 811293.1 RAC and Oracle Clusterware Best Practices and Starter Kit (AIX)
Document 811303.1 RAC and Oracle Clusterware Best Practices and Starter Kit (HP-UX)

 

RAC on AIX Step by Step Installation Instructions

Click here for a Step By Step guide for installing Oracle RAC 10gR2 on AIX.
Click here for a Step By Step guide for installing Oracle RAC 11gR1 on AIX.
Click here for a Step By Step guide for installing Oracle RAC 11gR2 on AIX.

 

RAC on AIX Best Practices

The Best Practices in this section are specific to the AIX Platform. That said, it is essential that the Platform Independent Best Practices found in Document 810394.1 also be reviewed.

OS Configuration Considerations

  • It is essential that the Oracle Real Application Clusters on IBM AIX Best practices in memory tuning and configuring for system stability joint IBM/Oracle White Paper be reviewed by ALL customers running RAC on AIX.
  • For 11gR2, start with Document 1427855.1 - AIX: Top Things to DO NOW to Stabilize 11gR2 GI/RAC Cluster
  • Validate your hardware/software configuration against the RAC Technologies Matrix for Unix.
  • Ensure all required OS packages are installed and system prerequisites have been properly implemented for your particular release of Oracle. This information is documented in Document 169706.1 as well as the install guides for your particular release.
  • If deploying on an AIX virtualized system, review Document 1470654.1 to gain a understanding of the resource utilization in such configurations.
  • If running AIX 6.1, ensure that the fix for APAR IV04047 has been installed to avoid potential instance hangs and node evictions. Additional details can be found in Document 1393041.1.
  • To ensure system stability all mandatory patches for AIX (5L and 6) that are documented in Document 282036.1 have been applied.
  • Tune Virtual memory parameters. IBM recommended numbers are:
    minperm%=3
    maxperm%=90
    maxclient%=90
    lru_file_repage=0
    strict_maxperm=0
    strict_maxclient=1
    page_steal_method=1

    Example script for setting these parameters is as :
    #!/usr/bin/ksh
    vmo -p -o maxperm%=90;
    vmo -p -o minperm%=3;
    vmo -p -o maxclient%=90;
    vmo -p -o strict maxperm=0;
    vmo -p -o strict maxclient=1;
    vmo -p -o lru_file_repage=0;
    vmo -r -o page_steal_method=1; (need to reboot to take into effect)
    vmo -p -o strict_maxclient=1
    vmo -p -o strict_maxperm=0;
  • On AIX 5.3, apply APAR IY84780 to fix a known kernel issue with per-cpu freelists. For details on this APAR, refer to IY84780: KERNEL MEMORY GARBAGE COLLECTOR FAILS TO FREE LISTS.
    Note:  This fix is also included in Technology Level 4 (TL4) and higher. If necessary, check with IBM for any superceding fixes.
  • Set AIXTHREAD_SCOPE=S in the environment: export AIXTHREAD_SCOPE=S for improved performance (default of S on AIX 6.1 and above).  Refer to Document 458403.1 (Why AIXTHREAD_SCOPE should be set to 'S' on AIX) for additional details.
  • When using the Processor Folding feature (default), it is essential that the Fix Packs for AIX 5.3 and 6.1 are applied to prevent system hangs.
  • If not using HACMP then HACMP filesets must not be installed.
     
  • Do not use filesystems mounted with "cio" option for Oracle Homes, software staging or temp.  The "cio" mount option is not supported and will cause, installation, relinking and other unexpected failures.  See Document 869644.1 for details.
  • Ensure that the GI and ORACLE owner account has the CAP_NUMA_ATTACH, CAP_BYPASS_RAC_VMM, and CAP_PROPAGATE capabilities.  This is required per the 11gR2 installation guide and it is also required for all pre-11gR2 installations. Check and Set example for GRID user is as follows:
    #/usr/bin/lsuser -a capabilities grid
    #/usr/bin/chuser capabilities=CAP_NUMA_ATTACH,CAP_BYPASS_RAC_VMM,CAP_PROPAGATE grid

Storage Considerations

  • Ensure that SAN Storage is capable of read/write concurrency (writing at the same time from any member of the RAC cluster) through it's drivers. This means that "reserve_policy" attribute from the discovered disks (hdisk, hdiskpower, dlmfdrv, etc ) must be capable of handling settings with: "no_reserve" or "no_lock" values.  See Document 422075.1 for details.
  • Do not assign PVIDs (Physical Volume IDs) to disks or volumes that are being used for ASM Diskgroups. PVIDs should be cleared on all nodes from any candidate disks or volumes prior to being added to an ASM Diskgroup. Once a disk or volume is added to an ASM Diskgroup, PVIDs should never be assigned after the fact, from any node in the cluster, including nodes that are being added to an existing cluster. Reference Document 353761.1for more details on this issue.
    CAUTION:  Assigning PVIDs to ASM disks will corrupt the disk header resulting in catastrophic data loss!!
  • Set FSCSI Device Attribute FC_ERR_RECOV to FAST_FAIL for Voting Disk and ASM storage. This setting has been shown to avoid reboots in situations where a SAN storage outage of the volumes hosting one of 3 voting disks caused reboots to occur.  See Document 560077.1 for details.
  • When implementing GPFS, be sure to review Document 302806.1 for recommendations on LUN configuration, filesystem blocksize, AIO configuration, inodes, and implementation examples.
  • Users of AIX may encounter long interactive-application response times when other applications in the system are running large writes to disk. Configuring I/O pacing limits the number of outstanding I/O requests against a file. AIX 6.1 enables I/O pacing by default and the default value: "minpout=4096 and maxpout=8193" is good for AIX6.1.  However, in AIX 5.3, you need to explicitly enable this feature.
Oracle's testing has shown that starting values of 8 for minpout and 12 for maxpout are a good baseline for most Oracle customers. However, every environment is different, and therefore different values may very well be acceptable, if the system has been properly tuned and shown to perform with differing values. To configure I/O pacing on the system via SMIT, using Oracle's recommended baseline values, enter the following at the command line as root: 
# smitty chgsys
# chdev -l sys0 -a minpout=8 -a maxpout=12
  • On AIX ASM can use concurrent RAW logical volumes or RAW partitions.  When using multipath technologies with ASM, ASM must access the devices via the appropriate multipath device, the device paths for major multipath technologies are documented in Document 294869.1.

Network Considerations

  • Ensure that the network tuning parameters are set in accordance with the following to ensure optimal interconnect performance:
    tcp_recvspace = 65536
    tcp_sendspace = 65536
    udp_sendspace = ((DB_BLOCK_SIZE * DB_FILE_MULTIBLOCK_READ_COUNT) + 4 KB) but no lower than 65536
    udp_recvspace = 655360 (Minimum recommended value is 10x udp_sendspace, parameter value must be less than sb_max)
    rfc1323 = 1
    sb_max = 4194304
    ipqmaxlen = 512

    NOTE: Failure to set the udp_sendspace will result in failure of root.sh for 11.2.0.2 GI installations, see Document 1280234.1.
  • Oracle clusterware VIPs IP address and corresponding nodes names must not be used on the network prior to Oracle Clusterware installation. Don't make any AIX alias on the public network interface, the clusterware installation will do it.  Just reserve 1 VIP and it's hostname per RAC node. Oracle Clusterware VIP IPs and corresponding nodes names are to be defined in DNS.
  • Installations using AIX VIO must review Document 1305174.1 - AIX VIO: Block Lost or IPC Send Timeout Possible Without Fix of APAR IZ97457.
  • HAIP may randomly fail to start on cluster startup on 11.2.0.3 with: category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0.  This is the result of Bug 13989181 which is fixed in Patch 13989181.

Oracle Software Considerations

The Software Considerations in this section are specific to the AIX Platform. That said, it is essential that the Platform Independent Best Practices found in Document 810394.1 also be reviewed.

  • For those environments running 11gR2 on AIX 6.1 TL8 or 7.1 TL2 ensure the appropriate APAR docuemnted in Document 1528452.1 has been applied to take proactive action on issues with GI on the second node failing to join the jluster as CRSD and EVMD are in INTERMEDIATE State.
  • For 10.2.0.4 and 11.1.0.7 installations on AIX systems using the IBM Logical Host Ethernet Adapter (LHEA) interfaces, it's required to apply the fix for Bug 8725020 to ensure VIP functionality.  This fix is included in 10.2.0.5 and 11.1.0.7 CRS bundle#1 (and above).  See Document 959746.1 for additional details around this issue.
  • To ensure that critical process threads are running with the proper priority (to prevent node evictions), apply the fix for BUG 13940331 (AIX Specific).  Bug 13940331 is fixed in 11.2.0.4, at present one-off patches are available for 10.2.0.5 and 11.2.0.3 under Patch 13940331.
  • For 11.2.0.2 installations and/or upgrades apply the 11.2.0.2.4 GI PSU  Patch 12827731 (or later) prior to running root.sh or rootupgrade.sh to prevent failure of these scripts (due to Bug 10370797, fixed in 11.2.0.2.4).  Instructions on how to apply the 11.2.0.2.4 GI PSU Patch 12827731 prior to running root.sh or rootupgrade.sh are as follows:
    Note:  These instructions were written for the 11.2.0.2.4 GI PSU.  Though the patch numbers will differ, the same instructions are applicable for later GI PSUs.

    1. Perform an Oracle Grid Infrastructure 11.2.0.2 installation or upgrade
    2. Right before the first root.sh (or rootupgrade.sh) is supposed to be run, leave the current installation behind:
    • Do NOT run root.sh or rootupgrade.sh
    • Do NOT close the installer or abort an operation in progress.
    • Do Leave the current installation as-is and open a new terminal.
    3. Download and prepare the application of Patch 12827731 by unzipping the patch into an empty directory on EVERY node in the cluster.
    4. Download and install the latest version of OPatch to apply the patch.  The latest version of OPatch can be found under Patch 6880880.  Install OPatch into the GI Home on ALL nodes as follows:
    $ unzip -d

    5. Unlike described in the patch readme,
    • Do NOT use "opatch auto"
    • Since this is a fresh install that has not been configured, do NOT execute "rootcrs.pl -unlock" or "rootcrs.pl -patch"
    • Do use: "opatch napply -local" as the software install owner e.g. grid
      $GI_HOME/OPatch/opatch napply -local /12827731
      $GI_HOME/OPatch/opatch napply -local /12827726
    Note: Opatch is used with the "-local" flag here, you need to perform this operation on every node.

    6. After you have patched every node in the cluster, return to the original installation
    7. Proceed to run the root.sh (rootupgrade.sh) on all nodes and follow the instructions on the OUI screen.

 

  • On pre-11.2 AIX systems (without vendor clusterware) OPROCD by default is not running in the AIX global run queue (Bug 13623902) which can cause false reboots by OPROCD.  Corrective action on this issue is to modify the /etc/init.cssd file as follows:
    Note:  The instructions below are performed in a rolling method to avoid a complete outage of the database.

    1.  Stop the Clusterware stack on the local node.
    2.  Modify the /etc/init.cssd as follows:
    From:

       # Run oprocd synchronously and look for its status code
       cd $OPROCDIR

       # startup the some diagnostic collection scripts if any
       StartDiagCollect;

       $OPROCD run -t $OPROCD_DEFAULT_TIMEOUT -m $OPROCD_DEFAULT_MARGIN \
          $OPROCD_DEFAULT_HISTOGRAM $FATALARG
       RC=$?

    To:

       # Run oprocd synchronously and look for its status code
       cd $OPROCDIR

       # startup the some diagnostic collection scripts if any
       StartDiagCollect;

       RT_GRQ=ON
       export RT_GRQ

       $OPROCD run -t $OPROCD_DEFAULT_TIMEOUT -m $OPROCD_DEFAULT_MARGIN \
          $OPROCD_DEFAULT_HISTOGRAM $FATALARG
       RC=$?


    3.  Restart the Clusterware stack on the local node.
    4.  Repeat Steps 1-3 on all remaining cluster nodes.

 

Community Discussions


Still have questions? Use the communities window below to search for similar discussions or start a new discussion on this subject.

Note: Window is the LIVE community not a screenshot.

Click here to open in main browser window.

*LINK to Community:  https://communities.oracle.com/portal/server.pt/community/scalability_rac/253
Hover over text:  Database - RAC/Scalability Community

References

NOTE:294869.1 - Oracle ASM and Multi-Pathing Technologies
NOTE:353761.1 - Assigning a Physical Volume ID (PVID) To An Existing ASM Disk Corrupts the ASM Disk Header
NOTE:422075.1 - Error ORA-27091, ORA-27072 When Mounting Diskgroup
NOTE:560077.1 - Asm Hangs After Loss Of Failgroup on AIX
NOTE:869644.1 - Having an ORACLE_HOME on a Filesystem Mounted With "cio" Option is Not Supported and Will Have Issues
BUG:8725020 - VIP WONT RUN (LHEA) ADAPTER 5.3 TL9
NOTE:1305174.1 - AIX VIO: Block Lost or IPC Send Timeout Possible Without Fix of APAR IZ97457
NOTE:959746.1 - AIX: 10.2/11.1 VIP Fails to Come Up with "Invalid Parameters, Or Failed To Bring Up VIP"
NOTE:811303.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (HP-UX)
NOTE:811306.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (Linux)
NOTE:1393041.1 - AIX 6.1 Instance Hang Then Node Reboot due to High Load IV04047
NOTE:1427855.1 - AIX: Top Things to DO NOW to Stabilize 11gR2 GI/RAC Cluster
NOTE:169706.1 - Oracle Database (RDBMS) on Unix AIX,HP-UX,Linux,Mac OS X,Solaris,Tru64 Unix Operating Systems Installation and Configuration Requirements Quick Reference (8.0.5 to 11.2)
NOTE:282036.1 - Minimum Software Versions and Patches Required to Support Oracle Products on IBM Power Systems
NOTE:811293.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (AIX)
NOTE:810394.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (Platform Independent)
NOTE:811271.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (Windows)
NOTE:811280.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (Solaris)

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/17252115/viewspace-1064259/,如需轉載,請註明出處,否則將追究法律責任。

相關文章