Solaris8上安裝RAC10202環境(六)

yangtingkun發表於2007-03-20

前一陣一直在測試ORACLE 10R2RAC環境在Solaris上的安裝。碰到了很多的問題,不過最後總算成功了,這裡簡單總結一下安裝步驟,以及碰到的問題和解決方法。

這一篇主要討論Oracle RAC環境應用5117016補丁集以及應用這個補丁集後的bug

作業系統準備工作可以參考:Solaris8上安裝RAC10202環境(一):http://yangtingkun.itpub.net/post/468/271797

OracleClusterware安裝過程可以參考:Solaris8上安裝RAC10202環境(二):http://yangtingkun.itpub.net/post/468/271812

Oracle軟體安裝和ASM配置可以參考:Solaris8上安裝RAC10202環境(三):http://yangtingkun.itpub.net/post/468/272088

RAC資料庫的建立可以參考:Solaris8上安裝RAC10202環境(四):http://yangtingkun.itpub.net/post/468/272138

ORACLE 10.2.0.2的補丁安裝可以參考:Solaris8上安裝RAC10202環境(五):http://yangtingkun.itpub.net/post/468/272201


相信用過10.2.0.2的,都知道10.2.0.2有一個很嚴重的問題:LIBSERVER10.A INCORRECTLY LOCATED IN $ORACLE_HOME/RDBMS/LIB/

這會導致以後10.2.0.2版本的資料庫安裝或升級任何的補丁都必須強制打一個補丁5117016

這裡就一起將這個bug也打了patch

下載p5117016_10202_SOLARIS64.zip將檔案複製到兩個節點上並展開:

$ cd /data/patch/5117016
$ ls
p5117016_10202_SOLARIS64.zip
$ unzip p5117016_10202_SOLARIS64.zip
Archive: p5117016_10202_SOLARIS64.zip
creating: 5117016/
creating: 5117016/files/
creating: 5117016/etc/
creating: 5117016/etc/config/
inflating: 5117016/etc/config/inventory
inflating: 5117016/etc/config/actions
creating: 5117016/etc/xml/
inflating: 5117016/etc/xml/GenericActions.xml
inflating: 5117016/etc/xml/ShiphomeDirectoryStructure.xml
creating: 5117016/custom/
creating: 5117016/custom/scripts/
inflating: 5117016/custom/scripts/pre
inflating: 5117016/README.txt

根據本文前面的給出的停止程式的方法,停掉所有的Oracle程式:

$ srvctl stop db -d testrac
$ srvctl stop asm -n racnode1
$ srvctl stop asm -n racnode2
$ srvctl stop listener -n racnode1
$ srvctl stop listener -n racnode2

如果agentEnterprise Manager程式啟動著,在兩個節點使用emctl stop agentemctl stop dbconsole命令關閉程式。

然後使用root使用者在兩個節點上執行:

# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Mar 15 17:47:59.301 | INF | daemon shutting down
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.

透過ps –ef檢查是否所有的Oracle相關程式都已經停止。

下面在兩個節點上分別執行下面的操作:

$ cd /data/patch/5117016/5117016
$ $ORACLE_HOME/OPatch/opatch apply -local
Invoking OPatch 10.2.0.2.0

Oracle interim Patch Installer version 10.2.0.2.0
Copyright (c) 2005, Oracle Corporation. All rights reserved..


Oracle Home : /data/oracle/product/10.2/database
Central Inventory : /data/oracle/oraInventory
from : /data/oracle/product/10.2/database/oraInst.loc
OPatch version : 10.2.0.2.0
OUI version : 10.2.0.2.0
OUI location : /data/oracle/product/10.2/database/oui
Log file location : /data/oracle/product/10.2/database/cfgtoollogs/opatch/opatch-2007_Mar_15_18-03-44-CST_Thu.log

ApplySession applying interim patch '5117016' to OH '/data/oracle/product/10.2/database'

You selected -local option, hence OPatch will patch the local system only.


Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/data/oracle/product/10.2/database')

Is the local system ready for patching?

Do you want to proceed? [y|n]
y
User Responded with: Y
Backing up files and inventory (not for auto-rollback) for the Oracle Home
Backing up files affected by the patch '5117016' for restore. This might take a while...
Backing up files affected by the patch '5117016' for rollback. This might take a while...
Execution of 'sh /data/patch/5117016/5117016/custom/scripts/pre -apply 5117016 ':

Return Code = 0

Patching component oracle.rdbms, 10.2.0.2.0...
Running make for target ioracle
ApplySession adding interim patch '5117016' to inventory

Verifying the update...
Inventory check OK: Patch ID 5117016 is registered in Oracle Home inventory with proper meta-data.
Files check OK: Files from Patch ID 5117016 are present in Oracle Home.

The local system has been patched and can be restarted.


OPatch succeeded.

兩邊都打完patch,使用root重新啟動RAC環境:

# /etc/init.d/init.crs start
Startup will be queued to init within 30 seconds.

執行完操作後等待一段時間,檢查ASM、資料庫和LISTENER都已經正常啟動,則安裝完成。

不過鬱悶的是,安裝完這個補丁後,居然發現了嚴重的問題:

racnode2上的Oracle例項無法正常啟動了:

$ sqlplus "/ as sysdba"

SQL*Plus: Release 10.2.0.2.0 - Production on 星期四 3 15 18:16:04 2007

Copyright (c) 1982, 2005, Oracle. All Rights Reserved.

已連線到空閒例程。

SQL> startup
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DISK/testrac/spfiletestrac.ora'
ORA-17503: ksfdopn:2 Failed to open file +DISK/testrac/spfiletestrac.ora
ORA-03113: end-of-file on communication channel

同時從後臺的alert檔案中可以看到如下的錯誤:

Errors in file /data/oracle/admin/testrac/udump/testrac2_ora_4598.trc:
ORA-07445:
出現異常錯誤: 核心轉儲 [kkxcms()+1160] [SIGSEGV] [Address not mapped to object] [0x000000168] [] []

查詢了一下Metalink,發現是OraclebugNote:390591.1Subject: RAC instances cannot be started after applying 10.2.0.2 patchset

Applies to:

Oracle Server - Enterprise Edition - Version: 10.2.0.2 to 10.2.0.2
This problem can occur on any platform.

Symptoms

After applying the 10.2.0.2 patchset the following problem occurs :
The instance can be started only on one node. This is the node where the Oracle Universal Installer was started.

The following messages occur on the offending nodes while trying to startup the DB using an spfile within sqlplus:

SQL> startup nomount
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+SISTEMA/pge/spfilepgec.ora'
ORA-17503: ksfdopn:2 Failed to open file +SISTEMA/pge/spfilepgec.ora
ORA-03113: end-of-file on communication channel
SQL>

and the following error can be seen in the alert log of de ASM instance:

Errors in file /opt/sw/app/oracle/admin/+ASM/udump/+asm1_ora_9677.trc:
ORA-07445: exception encountered: core dump [kkxsyn()+740] [SIGSEGV] [Address not mapped to object] [0x000000168] [] []
Wed Sep 6 18:56:08 2006
Trace dumping is performing id=[cdmp_20060906185608]

No error is dumped in the alert.log of the instance.

If a pfile is used an error message occurs in the alert log of the instance :


replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=23, OS id=928
Wed Sep 6 18:51:20 2006
Errors in file /opt/sw/app/oracle/admin/pge/udump/pgec2_ora_29898.trc:
ORA-07445: exception encountered: core dump [kkxcms()+1160] [SIGSEGV] [Address not mapped to object] [0x000000168] [] []
Wed Sep 6 18:51:22 2006
Trace dumping is performing id=[cdmp_20060906185122]

Cause

Installing the 10.2.0.2 patchset in a RAC installation on any Unix platform does not correctly update the libknlopt.a file on all nodes. The local node where the installer is run does update libknlopt.a but remote nodes do not get the updated file. This can lead to dumps or internal errors on the remote nodes if Oracle is subsequently relinked.

Solution

There are two solutions for this problem:

1) Manual copy of the "libknlopt.a" library to the offending nodes :

- ensure all instances are shut down
- manually copy $ORACLE_HOME/rdbms/lib/libknlopt.a from the local node to all remote nodes
- relink Oracle on all nodes :
make -f ins_rdbms.mk ioracle

2) Install the patchset on every node using the "-local" option:

On Unix:
runInstaller -updateNodeList -local ORACLE_HOME=$ORACLE_HOME CLUSTER_NODES=node1,node2,...

On Windows:
setup.exe -updateNodeList -local ORACLE_HOME=%ORACLE_HOME% CLUSTER_NODES=node1,node2,...

References

@ Bug 5128575 - Libknlopt.A In 10.2.0.2 Not Relinked In Rac Installation With Patched .O Modules

根據Oracle提供的解決方法1進行測試,在racnode2上手工編譯,進行操作之前,確保Oracle資料庫已經關閉:

$ rcp racnode1:$ORACLE_HOME/rdbms/lib/libknlopt.a $ORACLE_HOME/rdbms/lib/libknlopt.a
$ cd $ORACLE_HOME/rdbms/lib
$ /usr/ccs/bin/make -f ins_rdbms.mk ioracle
chmod 755 /data/oracle/product/10.2/database/bin

- Linking Oracle
rm -f /data/oracle/product/10.2/database/rdbms/lib/oracle
/usr/ccs/bin/ld -o /data/oracle/product/10.2/database/rdbms/lib/oracle -L/data/oracle/product/10.2/database/rdbms/lib/ -L/data/oracle/product/10.2/database/lib/ -dy /data/oracle/product/10.2/database/lib/prod/lib/v9/crti.o /data/oracle/product/10.2/database/lib/prod/lib/v9/crt1.o /data/oracle/product/10.2/database/rdbms/lib/opimai.o /data/oracle/product/10.2/database/rdbms/lib/ssoraed.o /data/oracle/product/10.2/database/rdbms/lib/ttcsoi.o /data/oracle/product/10.2/database/rdbms/lib/defopt.o -z allextract -lperfsrv10 -z defaultextract /data/oracle/product/10.2/database/lib/nautab.o /data/oracle/product/10.2/database/lib/naeet.o /data/oracle/product/10.2/database/lib/naect.o /data/oracle/product/10.2/database/lib/naedhs.o /data/oracle/product/10.2/database/rdbms/lib/config.o -lserver10 -lodm10 -lnnet10 -lskgxp10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lhasgen10 -lcore10 -lskgxn2 -locr10 -locrb10 -locrutl10 -lhasgen10 -lcore10 -lskgxn2 -lclient10 -lvsn10 -lcommon10 -lgeneric10 -lknlopt `if /usr/ccs/bin/ar tv /data/oracle/product/10.2/database/rdbms/lib/libknlopt.a | grep xsyeolap.o > /dev/null 2>&1 ; then echo "-loraolap10" ; fi` -lslax10 -lpls10 -lplp10 -lserver10 -lclient10 -lvsn10 -lcommon10 -lgeneric10 -lknlopt -lslax10 -lpls10 -lplp10 -ljox10 -lserver10 -lclsra10 -ldbcfg10 -locijdbcst10 -lwwg `cat /data/oracle/product/10.2/database/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /data/oracle/product/10.2/database/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lmm -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `cat /data/oracle/product/10.2/database/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /data/oracle/product/10.2/database/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `if /usr/ccs/bin/ar tv /data/oracle/product/10.2/database/rdbms/lib/libknlopt.a | grep "kxmnsd.o" > /dev/null 2>&1 ; then echo " " ; else echo "-lordsdo10"; fi` -lctxc10 -lctx10 -lzx10 -lgx10 -lctx10 -lzx10 -lgx10 -lordimt10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lsnls10 -lunls10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `cat /data/oracle/product/10.2/database/lib/sysliblist` -R /opt/SUNWcluster/lib/sparcv9:/data/oracle/product/10.2/database/lib:/opt/ORCLcluster/lib/ -Y P,:/opt/SUNWcluster/lib/sparcv9:/opt/ORCLcluster/lib/:/usr/ccs/lib/sparcv9:/usr/lib/sparcv9 -Qy -lc -laio -lposix4 -lkstat -lm /data/oracle/product/10.2/database/lib/prod/lib/v9/crtn.o
mv -f /data/oracle/product/10.2/database/bin/oracle /data/oracle/product/10.2/database/bin/oracleO
mv /data/oracle/product/10.2/database/rdbms/lib/oracle /data/oracle/product/10.2/database/bin/oracle
chmod 6751 /data/oracle/product/10.2/database/bin/oracle

可能是由於沒有關閉ASM例項的原因,編譯後問題依舊。嘗試將整個系統重新啟動,問題得到解決。

這裡建議,如果沒有必要還是不要打5117016補丁。或者可以考慮將資料庫版本直接升級到10.2.0.3

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/4227/viewspace-69208/,如需轉載,請註明出處,否則將追究法律責任。

相關文章