AIX: Top Things to DO NOW to Stabilize 11gR2 GI/RAC Cluster (文件 ID 1427855.1)
A. Required OS Technology Level and Service Pack, and Recommended VM Setting
- AIX kernel should be equal or higher than the following(execute "/bin/oslevel -s" to confirm):
AIX 7.1 TL 00 SP1 ("7100-00-01"), 64-bit kernel
AIX 6.1 TL 02 SP1 ("6100-02-01"), 64-bit kernel
AIX 5.3 TL 09 SP1 ("5300-09-01"), 64 bit kernel
- Recommended Virtual Memory setting:
maxperm%=90
minperm%=3
maxclient%=90
strict_maxperm=0
strict_maxclient=1
lru_file_repage=0
page_steal_method=1 ###(change requires reboot to take effective)
- vpm_xvcpus should never be set to 0: set to 2 or higher so at least 2 or more processors will be in unfolded.
B. USLA heap fix to reduce memory footprint for Oracle Server processes
- For AIX 6.1 TL07 SP02/AIX 7.1 TL01 SP02 or later, apply
- For AIX 6.1 TL07 or AIX 7.1 TL01, install AIX 6.1 TL-07 APAR IV09580, AIX 7.1 TL-01 APAR IV09541, and apply
- For other AIX level, apply , this will disable Oracle's online patching mechanism
- Note: as of 06/21/2012, fix for or are not included in any PSU and the interim patch is needed. Interim exists on top of most PSU, and on top of 11.2.0.3 does not conflict with 11.2.0.3.1 PSU and can be applied on top of both 11.2.0.3 base and 11.2.0.3.1 PSU.
- New connection can be slow to establish without fix for which is fixed in 11.2.0.4
@ if online patch exists, process startup/new connection will be slower: , duplicate
C. Other recommended OS fixes
-
note 1528452.1 - AIX 6.1 TL8 or 7.1 TL2: 11gR2 GI Second Node Fails to Join the Cluster as CRSD and EVMD are in INTERMEDIATE State
- Paging space growth leads to node failure/eviction:
64K paging taking place when available system RAM exists, the fix will avoid unexpected paging space growth and node failure. Below is a matrix of APAR for various TL/SP level
6100 TL5 6100-05 IZ71603
6100 TL4 SP4 6100-04-04-1014 IZ71191
6100 TL3 SP4 6100-03-04-1014 IZ72031
6100 TL2 SP7 6100-02-07-1014 IZ71850
6100 TL1 SP8 6100-01-08-1014 IZ71987
5300 TL12 5300-12 IZ71460
5300 TL11 SP4 5300-11-04-1015 IZ73687
5300 TL10 SP4 5300-10-04-1015 IZ73754
5300 TL9 SP7 5300-09-07-1015 IZ73864
5300 TL8 SP10 5300-08-10-1015 IZ67445
For more info, refer to
- gc block lost or IPC send timeout or instance eviction
VIOS Server will not forward traffic from its VIO Clients to the external network, interrupts do not reach the trunk adapter, the fix will avoid SEA/VIO client hang. Below is a matrix of APAR for various TL/SP level
7100 TL0 SP3 7100-00-03-1115 IZ97035
6100 TL6 SP5 6100-06-05-1115 IZ96155
6100 TL5 SP6 6100-05-06-1119 IZ97457
6100 TL4 SP10 6100-04-10-1119 IZ97605
5300 TL12 SP4 5300-12-04-1119 IZ98126
5300 TL11 SP7 5300-11-07-1119 IZ98424
For more info, refer to
- Other kernel hang fix
* IZ91983 lockl performance issue, hang
For more info, refer to
* IV04047: shlap64 unable to process Oracle request leading to kernel hang
For more info, refer to
-
Excessive CPU usage in LPAR in shared processor mode
If LPAR is in shared processor mode, without the following fix, LPAR may see excessive CPu usage:APARs for WAITPROC IDLE LOOPING CONSUMES CPU:
IV01111 AIX 6.1 TL05 if before SP08 (fixed in SP08)
IV06197 AIX 6.1 TL06 if before SP07 (fixed in SP07)
IV10172 AIX 6.1 TL07 if before SP02 (fixed in SP02)
IV09133 AIX 7.1 TL00 if before SP05 (fixed in SP05)
IV10484 AIX 7.1 TL01 if before SP02 (fixed in SP02)This problem can effect POWER7 systems running any level of Ax720 firmware prior to Ax720_101. But it is recommended to update to the latest available firmware. If required, AIX and Firmware fixes can be obtained from IBM Support Fix Central:
-
Crash in netinfo_unixdomnlist while running netstat
6100 TL6 SP6 6100-06-06-1140 IZ97166
6100 TL5 SP7 6100-05-07-1140 IZ97353
6100 TL4 SP11 6100-04-11-1140 IV00634For more info, refer to
- <note 2237498.1> - ALERT: Database Corruption ORA-600 ORA-7445 errors after applying AIX SP patches - AIX 6.1.9.8 or AIX 7.1.3.8 or AIX 7.1.4.3 or AIX 7.2.0.3 or AIX 7.2.1.0, 01
D. Apply the latest GI PSU to avoid known high resource consumption bugs
If you are running 11.2.0.3, apply 11.2.0.3 GI PSU8 ()
For 11.2.0.3, applying above PSU will fix the following known bugs (Note: it does not fixes bugs in Section D1)
- Note 1062676.1 - ORAAGENT or ORAROOTAGENT High Resource (CPU, Memory etc) Usage
Except , all others have been fixed in 11.2.0.2
is fixed in 11.2.0.2 GI PSU6, 11.2.0.3 GI PSU2, 11.2.0.4 and 12.1, interim exists on top of 11.2.0.3.1 GI PSU
- Note 1287709.1 - ocssd.bin High CPU Usage, Instance Crashes With ORA-29702 or ORA-29770 or ORA-29701 With "gipcWait failed with 16"
This note talks about which is fixed in 11.2.0.2 GI PSU3, 11.2.0.3
- Note 1338981.1 - High Memory Usage in GIPC Code
This note talks about the following bugs:
, fixed in 11.2.0.2 GI PSU2, 11.2.0.3 and above
, fixed in 11.2.0.2 GI PSU4, 11.2.0.3 and above
- Note 1348202.1 - 11gR2 Grid Infrastructure CRSD High CPU Usage or Slow Command Response
This note talks about the following bugs:
is fixed in 11.2.0.2 GI PSU3, 11.2.0.3 and above
is fixed in 11.2.0.2 GI PSU4, 11.2.0.3 and above
is fixed in 11.2.0.2 GI PSU4, 11.2.0.3 and above
- note 1455973.1 - 11gR2 Grid Infrastructure High CPU Usage by crsd.bin, ocssd.bin, evmd.bin gipcd.bin etc due to GIPC
is fixed in 11.2.0.4, please request interim if it's not available
E. ASM/Database fixes
- - diag0 high memory usage, fixed in 11.2.0.4, interim exists on top of certain patchset/PSU
Refer to Note 1376981.1 for more information
- - high "log file sync" or "asynch descriptor resize" wait , fixed in 11.2.0.4, interim exists on top of most patchset/PSU
- - higher CPU usage in 11.2 on AIX , fixed in 11.2.0.4. Interim on top of 11.2.0.3 does not conflict with 11.2.0.3.1 PSU and can be applied on top of both 11.2.0.3 base and 11.2.0.3.1 PSU
- - instance hangs; fixed in 11.2.0.2 DB PSU4, 11.2.0.3
Refer to Note 1348264.1 for more information.
Please check the interim patch available status for your release.
F. CSSD fix to avoid node eviction/reboot related issues
- - Threads does not always inherit parent processes's real time priority
-
- 11.2.0.3 GI node reboot if only one voting file exists
Refer to Note 1466639.1 for more information
-
Unpublished Bug 17733927 - CSS CLIENTS TIMEOUT UNDER HEAVY CONNECTIVITY LOADS ON AIX
Refer to Note 1953101.1 AIX: High CPU utilization for CSSD
G. EM agent high memory consumption on AIX (likely node will be rebooted)
- note 1530102.1 - EM 12c: Agent emdprocstats.pl Consuming High Memory
- note 1332522.1 - EM Agent On AIX Causes Page Space Issues
Appendix A: Data gathering
If the issue still happens after the above recommendations are in place, collect output of the followings from all nodes as root user:
# svmon -P -O unit=MB -O segment=category
# svmon -U -O unit=MB -O segment=category
# ps -elf
# vmstat 5 3
And all other requested files per the following notes:
Note 289690.1 - Data Gathering for Troubleshooting Oracle Clusterware (CRS or GI) and RAC Issues
Appendix B: Reference
Note 282036.1 - Minimum Software Versions and Patches Required to Support Oracle Products on IBM Power Systems
Note 811293.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (AIX)
Note 169706.1 - Oracle Database on AIX,HP-UX,Linux,Solaris.. Installation and Configuration Requirements Quick Reference
Note 1379753.1 - AIX: Link/Relink/Make Fails With: ld: 0711-780 SEVERE ERROR: Symbol .ksmpfpva (entry 58964) in object libserver11.a[ksmp.o]
note 1470654.1 - Understanding Processor Utilization with IBM PowerVM
Note 1530943.1 - AIX: VIP and SCAN VIP fails to failover to other node after pulled cable on public network if LHEA is being used
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/20747382/viewspace-2146977/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- AIX: Top Things to DO NOW to Stabilize 11gR2 GI/RAC ClusterAI
- Top 11 Things to do NOW to Stabilize your RAC Cluster Environment_1344678.1
- Do Evil Things with gopher://Go
- 【RAC安裝】 AIX下安裝Oracle 11gR2 RACAIOracle
- 11gR2 RAC 使用infiniband 支援AIX 不力AI
- TOP Note: Solutions for Typical GI /RAC Database runInstaller Issues_1056713.1Database
- oracle 11gR2 rac for aix 5310 安裝OracleAI
- 11gR2 RAC DB switchover using DG broker (文件 ID 880017.1)
- 11Gr2 RAC udev ASM openfiler(安裝文件)devASM
- 【RAC】11gR2 新特性:Oracle Cluster Health Monitor(CHM)簡介Oracle
- 100 Things You Probably Didn't Know About Oracle DatabaseOracleDatabase
- 最常見的 11gR2 RAC 安裝問題 (文件 ID 1549168.1)
- 事件驅動,Do you know?事件
- AIX 6.1平臺下11gR2 RAC實施注意點AI
- 11gR2 RAC環境下私網網路卡更名後如何調整GI配置
- RAC 和 Oracle Clusterware 最佳實踐和初學者指南 (AIX) (文件 ID 1526555.1)OracleAI
- Remove a node from Oracle10g RAC cluster and add back for IBM AIXREMOracleIBMAI
- ORACLE RAC UNKNOWNOracle
- Remote Diagnostic Agent (RDA) - RAC Cluster Guide (Doc ID 359395.1)REMGUIIDE
- 對於診斷Oracle Clusterware CRS或GI和Real Application Cluster RAC問題的資料收集OracleAPP
- 11gr2 RAC配置Service-Side TAFIDE
- OEL6.3 64位部署ORACLE 11gR2(11.2.0.4) RAC+DG(5)安裝GIOracle
- 【MOS】Top 5 Grid Infrastructure Startup Issues (文件 ID 1368382.1)ASTStruct
- oracle RAC RDS on AIXOracleAI
- aix rac安裝AI
- root.sh Fails on the First Node for 11gR2 GI InstallationAI
- 【MOS】Cluster Health Monitor (CHM) FAQ (文件 ID 1328466.1 ID 2062234.1)
- Oracle 11gR2 RAC Service-Side TAF 配置示例OracleIDE
- 10g RAC on AIXAI
- oracle rac aix 安裝OracleAI
- 安裝11gr2 RAC
- 11gR2 RAC修改IP
- 【安裝】AIX安裝單例項11gR2 GRID+DBAI單例
- (文件 ID 1373242.1) oracle 11.2之後安裝GI 網路卡引數bugOracle
- Top ASM Bugs In 11gR2 (Doc ID 1506033.1)ASM
- Akka官方文件翻譯:Cluster Specification
- Information Center: Oracle Scalability GI/Clusterware and RAC_1452965.2ORMOracle
- 【RAC】failed to online diskgroup resource ora.GI.dgAI