AIX: Top Things to DO NOW to Stabilize 11gR2 GI/RAC Cluster
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.1 and laterIBM AIX on POWER Systems (64-bit)
PURPOSE
This note lists the top things to stabilize 11gR2 Grid Infrastructure and Real Application Cluster on IBM AIX, the focus area is the issues that cause high memory usage, high CPU usage and hang.
SCOPE
This document is intended for Oracle Clusterware/RAC Database Administrators and Oracle support engineers.
DETAILS
A. Required OS Technology Level and Service Pack, and Recommended VM Setting
- AIX kernel should be equal or higher than the following(execute "/bin/oslevel -s" to confirm):
AIX 7.1 TL 00 SP1 ("7100-00-01"), 64-bit kernel
AIX 6.1 TL 02 SP1 ("6100-02-01"), 64-bit kernel
AIX 5.3 TL 09 SP1 ("5300-09-01"), 64 bit kernel
- Recommended Virtual Memory setting:
maxperm%=90
minperm%=3
maxclient%=90
strict_maxperm=0
strict_maxclient=1
lru_file_repage=0
page_steal_method=1 ###(change requires reboot to take effective)
B. USLA heap fix to reduce memory footprint for Oracle Server processes
- For AIX 6.1 TL07 SP02/AIX 7.1 TL01 SP02 or later, apply
- For AIX 6.1 TL07 or AIX 7.1 TL01, install AIX 6.1 TL-07 APAR IV09580, AIX 7.1 TL-01 APAR IV09541, and apply
- For other AIX level, apply , this will disable Oracle's online patching mechanism
- Note: as of 06/21/2012, fix for or are not included in any PSU and the interim patch is needed. Interim exists on top of most PSU, and on top of 11.2.0.3 does not conflict with 11.2.0.3.1 PSU and can be applied on top of both 11.2.0.3 base and 11.2.0.3.1 PSU.
- New connection can be slow to establish without fix for which is fixed in 11.2.0.4
@ if online patch exists, process startup/new connection will be slower: , duplicate
C. Other recommended OS fixes
-
note 1528452.1 - AIX 6.1 TL8 or 7.1 TL2: 11gR2 GI Second Node Fails to Join the Cluster as CRSD and EVMD are in INTERMEDIATE State
- Paging space growth leads to node failure/eviction:
64K paging taking place when available system RAM exists, the fix will avoid unexpected paging space growth and node failure. Below is a matrix of APAR for various TL/SP level
6100 TL5 6100-05 IZ71603
6100 TL4 SP4 6100-04-04-1014 IZ71191
6100 TL3 SP4 6100-03-04-1014 IZ72031
6100 TL2 SP7 6100-02-07-1014 IZ71850
6100 TL1 SP8 6100-01-08-1014 IZ71987
5300 TL12 5300-12 IZ71460
5300 TL11 SP4 5300-11-04-1015 IZ73687
5300 TL10 SP4 5300-10-04-1015 IZ73754
5300 TL9 SP7 5300-09-07-1015 IZ73864
5300 TL8 SP10 5300-08-10-1015 IZ67445
For more info, refer to
- gc block lost or IPC send timeout or instance eviction
VIOS Server will not forward traffic from its VIO Clients to the external network, interrupts do not reach the trunk adapter, the fix will avoid SEA/VIO client hang. Below is a matrix of APAR for various TL/SP level
7100 TL0 SP3 7100-00-03-1115 IZ97035
6100 TL6 SP5 6100-06-05-1115 IZ96155
6100 TL5 SP6 6100-05-06-1119 IZ97457
6100 TL4 SP10 6100-04-10-1119 IZ97605
5300 TL12 SP4 5300-12-04-1119 IZ98126
5300 TL11 SP7 5300-11-07-1119 IZ98424
For more info, refer to
- Other kernel hang fix
* IZ91983 lockl performance issue, hang
For more info, refer to
* IV04047: shlap64 unable to process Oracle request leading to kernel hang
For more info, refer to
-
Excessive CPU usage in LPAR in shared processor mode
If LPAR is in shared processor mode, without the following fix, LPAR may see excessive CPu usage:APARs for WAITPROC IDLE LOOPING CONSUMES CPU:
IV01111 AIX 6.1 TL05 if before SP08 (fixed in SP08)
IV06197 AIX 6.1 TL06 if before SP07 (fixed in SP07)
IV10172 AIX 6.1 TL07 if before SP02 (fixed in SP02)
IV09133 AIX 7.1 TL00 if before SP05 (fixed in SP05)
IV10484 AIX 7.1 TL01 if before SP02 (fixed in SP02)This problem can effect POWER7 systems running any level of Ax720 firmware prior to Ax720_101. But it is recommended to update to the latest available firmware. If required, AIX and Firmware fixes can be obtained from IBM Support Fix Central:
-
Crash in netinfo_unixdomnlist while running netstat
6100 TL6 SP6 6100-06-06-1140 IZ97166
6100 TL5 SP7 6100-05-07-1140 IZ97353
6100 TL4 SP11 6100-04-11-1140 IV00634For more info, refer to
D. Apply the latest GI PSU to avoid known high resource consumption bugs
If you are running 11.2.0.3, apply 11.2.0.3 GI PSU8 ()
For 11.2.0.3, applying above PSU will fix the following known bugs (Note: it does not fixes bugs in Section D1)
- Note 1062676.1 - ORAAGENT or ORAROOTAGENT High Resource (CPU, Memory etc) Usage
Except , all others have been fixed in 11.2.0.2
is fixed in 11.2.0.2 GI PSU6, 11.2.0.3 GI PSU2, 11.2.0.4 and 12.1, interim exists on top of 11.2.0.3.1 GI PSU
- Note 1287709.1 - ocssd.bin High CPU Usage, Instance Crashes With ORA-29702 or ORA-29770 or ORA-29701 With "gipcWait failed with 16"
This note talks about which is fixed in 11.2.0.2 GI PSU3, 11.2.0.3
- Note 1338981.1 - High Memory Usage in GIPC Code
This note talks about the following bugs:
, fixed in 11.2.0.2 GI PSU2, 11.2.0.3 and above
, fixed in 11.2.0.2 GI PSU4, 11.2.0.3 and above
- Note 1348202.1 - 11gR2 Grid Infrastructure CRSD High CPU Usage or Slow Command Response
This note talks about the following bugs:
is fixed in 11.2.0.2 GI PSU3, 11.2.0.3 and above
is fixed in 11.2.0.2 GI PSU4, 11.2.0.3 and above
is fixed in 11.2.0.2 GI PSU4, 11.2.0.3 and above
- note 1455973.1 - 11gR2 Grid Infrastructure High CPU Usage by crsd.bin, ocssd.bin, evmd.bin gipcd.bin etc due to GIPC
is fixed in 11.2.0.4, please request interim if it's not available
E. ASM/Database fixes
- - diag0 high memory usage, fixed in 11.2.0.4, interim exists on top of certain patchset/PSU
Refer to Note 1376981.1 for more information
- - high "log file sync" or "asynch descriptor resize" wait , fixed in 11.2.0.4, interim exists on top of most patchset/PSU
- - higher CPU usage in 11.2 on AIX , fixed in 11.2.0.4. Interim on top of 11.2.0.3 does not conflict with 11.2.0.3.1 PSU and can be applied on top of both 11.2.0.3 base and 11.2.0.3.1 PSU
- - instance hangs; fixed in 11.2.0.2 DB PSU4, 11.2.0.3
Refer to Note 1348264.1 for more information.
Please check the interim patch available status for your release.
F. CSSD fix to avoid node eviction/reboot related issues
- - Threads does not always inherit parent processes's real time priority
- - 11.2.0.3 GI node reboot if only one voting file exists
Refer to Note 1466639.1 for more information
G. EM agent high memory consumption on AIX (likely node will be rebooted)
- note 1530102.1 - EM 12c: Agent emdprocstats.pl Consuming High Memory
- note 1332522.1 - EM Agent On AIX Causes Page Space Issues
Apendix A: Data gathering
If the issue still happens after the above recommendations are in place, collect output of the followings from all nodes as root user:
# svmon -P -O unit=MB -O segment=category
# svmon -U -O unit=MB -O segment=category
# ps -elf
# vmstat 5 3
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/628922/viewspace-1274402/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- AIX: Top Things to DO NOW to Stabilize 11gR2 GI/RAC Cluster (文件 ID 1427855.1)AI
- Top 11 Things to do NOW to Stabilize your RAC Cluster Environment_1344678.1
- Do Evil Things with gopher://Go
- 【RAC安裝】 AIX下安裝Oracle 11gR2 RACAIOracle
- 11gR2 RAC 使用infiniband 支援AIX 不力AI
- TOP Note: Solutions for Typical GI /RAC Database runInstaller Issues_1056713.1Database
- oracle 11gR2 rac for aix 5310 安裝OracleAI
- 【RAC】11gR2 新特性:Oracle Cluster Health Monitor(CHM)簡介Oracle
- 事件驅動,Do you know?事件
- AIX 6.1平臺下11gR2 RAC實施注意點AI
- 11gR2 RAC環境下私網網路卡更名後如何調整GI配置
- Remove a node from Oracle10g RAC cluster and add back for IBM AIXREMOracleIBMAI
- ORACLE RAC UNKNOWNOracle
- 對於診斷Oracle Clusterware CRS或GI和Real Application Cluster RAC問題的資料收集OracleAPP
- OEL6.3 64位部署ORACLE 11gR2(11.2.0.4) RAC+DG(5)安裝GIOracle
- oracle RAC RDS on AIXOracleAI
- aix rac安裝AI
- root.sh Fails on the First Node for 11gR2 GI InstallationAI
- 100 Things You Probably Didn't Know About Oracle DatabaseOracleDatabase
- 10g RAC on AIXAI
- oracle rac aix 安裝OracleAI
- 安裝11gr2 RAC
- 11gR2 RAC修改IP
- Information Center: Oracle Scalability GI/Clusterware and RAC_1452965.2ORMOracle
- 【RAC】failed to online diskgroup resource ora.GI.dgAI
- 11gR2 RAC ASM 啟動ASM
- oracle 11gr2 rac 安裝Oracle
- oracle 11GR2新特性 Cluster Time Synchronization Service 配置Oracle
- 管理RAC中的OCR(Oracle Cluster Register)Oracle
- Oracle RAC(Cluster)的重構(整理)(1)Oracle
- Oracle RAC(Cluster)的重構(整理)(2)Oracle
- Oracle RAC(Cluster)的重構整理(3)Oracle
- Oracle 11gR2 RAC修改SCAN IPOracle
- 11gr2 rac 基本管理命令(一)
- 【RAC】11gR2 新特性: Rebootless RestartbootREST
- 11gr2 rac常用命令
- AIX 安裝 11g RACAI
- QUESTION :What kind of shared storage do you use for Oracle RAC?Oracle