分析11.2.0.3 rac CRS-1714:Unable to discover any voting files

wisdomone1發表於2015-09-19

結論:

   1,11.2.0.3或者說ORACLE不同版本的RAC程式依賴機制一直在發展演化,一定要盡力搞清RAC各程式間依賴關係,到關重要
   2,CRS-1714:Unable to discover any voting files只是表面現象,並非真正是VOTING DISK損壞,具體需要你結合對應的LOG進行分析
   3,如果RAC節點的GPNPD程式所用的配置檔案PROFILE.XML(OLR),可能要重建損壞的節點
   4,刪除RAC節點以及新增節點,一定要詳細檢視官方手冊,因為裡面分類很多
  5,最重要的一點,如果在分析LOG日誌,卡住沒思路或從未碰過類似問題,一定要檢視MOS,搜尋關鍵字,比如本案例的GPNP PROFILE

分析過程:

1,redhat 6.4上面的2節點11.2。0.4 RAC的CRSD程式沒有啟動,從叢集ALERT日誌發現,找不到表決磁碟
2015-09-16 16:53:36.138
[cssd(25059)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/grid/11.2.0.4/log/jingfa1/cssd/ocssd.log
2015-09-16 16:53:51.176




2,執行如下命令關閉2個節點的所在ORACLE相關程式
/u01/grid/11.2.0.4/bin/crsctl stop crs






3,確認2個節點的ORACLE程式全部關閉


ps -ef|grep d.bin
root      1077 24425  0 09:00 pts/1    00:00:00 grep d.bin


4,在第1個節點以獨佔方式啟動CRS
/u01/grid/11.2.0.4/bin/crsctl start crs -excl -nocrs




5,在第1個節點檢視ASM程式是否啟動




6,在第1個節點檢視叢集程式是否以獨佔方式啟動




7,在第1個節點檢視ocr磁碟是否工作正常
/u01/grid/11.2.0.4/bin/ocrcheck






8,如果ocr磁碟工作不正常,且其備份存在,可用備份恢復ocr磁碟
/u01/grid/11.2.0.4/bin/ocrconfig -showbackup




/u01/grid/11.2.0.4/bin/ocrconfig -restore ocr備份檔案 




9,在第1個節點以GRID使用者檢視OCR及VOTING DISK磁碟組是否存在,發現存在
  1* select disk_number,path from v$asm_disk
SQL> /


DISK_NUMBER PATH
----------- --------------------------------------------------
          0 /dev/ocr_vote
          0 /dev/data


SQL> 
SQL> 
SQL> show parameter disk_


NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                       string      DATA
asm_diskstring                       string      /dev/*
SQL> select name,sector_size,block_size,allocation_unit_size/1024/1024 as au_mb from v$asm_diskgroup;


NAME                           SECTOR_SIZE BLOCK_SIZE      AU_MB
------------------------------ ----------- ---------- ----------
DATA                                   512       4096          2
OCRVOTE                                512       4096          2




10,在第1個節點確認VOTING DISK是否工作不正常,確實發現不了
/u01/grid/11.2.0.4/bin/crsctl query css votedisk


11,從上述第9步的asm_diskgroups發現,僅載入一個ASM磁碟組DATA,而沒有載入OCRVOTE,所以調整其引數,讓ASM例項啟動時載入OCRVOTE及DATA磁碟組,這樣
    我想就可以在ASM例項啟時自動載入VOTING DISK磁碟組了




alter system set asm_diskgroups=data,ocrvote sid='*';






show parameter disk_


12,關閉節點1的CRS叢集相關程式
/u01/grid/11.2.0.4/bin/crsctl stop crs


13,重啟2個節點的叢集程式,確認crsd程式是否正常,發現問題依舊,還是找不到表決磁碟
/u01/grid/11.2.0.4/bin/crsctl start crs


14,關閉2個節點的叢集程式,然後在節點1以獨佔方式啟動叢集程式


/u01/grid/11.2.0.4/bin/crsctl stop crs


/u01/grid/11.2.0.4/bin/crsctl start crs -excl -nocrs


15,在節點1直接替換ocrvote磁碟組,修復voting disk
/u01/grid/11.2.0.4/bin/crsctl replace votedisk +ocrvote


16,在節點1檢視voting disk是否正常
/u01/grid/11.2.0.4/bin/crsctl query css votedisk


17,關閉節點的叢集程式,然後在2節點重啟叢集程式
/u01/grid/11.2.0.4/bin/crsctl stop crs


/u01/grid/11.2.0.4/bin/crsctl start crs






 18,在2個節點確認VOTING DISK是否可以正常工作(如下命令必須CRSD程式啟動才有結果,否則為空,且CRSD程式是在叢集所有程式最後一個啟動),這下節點1正常了,但節點2還是CRSD程式啟不來
 /u01/grid/11.2.0.4/bin/crsctl query css votedisk




19,檢視節點2的GRID使用者的TRC檔案,發現節點2的VOTING DISK的CLUSTER GUID標識和GPNP PROFILE不一致,所以最終節點2發現不了VOTING DISK
2015-09-16 17:58:51.847: [    CSSD][1851041536]clssnmvDiskVerify: discovered a potential voting file
2015-09-16 17:58:51.847: [   SKGFD][1851041536]Handle 0x7fd95808f980 from lib :UFS:: for disk :/dev/ocr_vote:




 ---這裡GPNP程式發現VOTING DISK的GUID和CLUSTER GUID不相同
2015-09-16 17:58:51.965: [    CSSD][1851041536]clssnmvDiskCreate: Cluster guid 0acef774f25dcfb0bf3d0c7b3db02abe found in voting disk /dev/ocr_vote does not match with the 
cluster guid 7d8026436ade6fe0ff597a0f6df497e1 obtained from the GPnP profile
--移除了VOTING DISK
2015-09-16 17:58:51.965: [    CSSD][1851041536]clssnmvDiskDestroy: removing the voting disk /dev/ocr_vote
2015-09-16 17:58:51.965: [   SKGFD][1851041536]Lib :UFS:: closing handle 0x7fd95808f980 for disk :/dev/ocr_vote:
--找不到VOTING DISK
2015-09-16 17:58:51.965: [    CSSD][1851041536]clssnmvDiskVerify: Successful discovery of 0 disks
2015-09-16 17:58:51.965: [    CSSD][1851041536]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2015-09-16 17:58:51.965: [    CSSD][1851041536]clssnmvFindInitialConfigs: No voting files found
2015-09-16 17:58:51.965: [    CSSD][1851041536](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds


21,我們在第2個節點看看GPNP程式是個什麼東西
[grid@jingfa2 jingfa2]$ ps -ef|grep -i gpnp
grid      5238 32255  0 10:02 pts/1    00:00:00 grep -i gpnp
grid     18060     1  0 09:45 ?        00:00:01 /u01/grid/11.2.0.4/bin/gpnpd.bin


22,在第2個節點看看gpnp profile檔案在哪兒
[grid@jingfa2 gpnpd]$ locate gpnp|grep -i --color profile
/u01/grid/11.2.0.4/gpnp/profiles
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/pending.xml
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.old
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml  --我估計就是這個檔案
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile_orig.xml
/u01/grid/11.2.0.4/gpnp/profiles/peer
/u01/grid/11.2.0.4/gpnp/profiles/peer/profile.xml
/u01/grid/11.2.0.4/gpnp/profiles/peer/profile_orig.xml




23,檢視節點2gpnp profile檔案的內容,從/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml檔案,發現7d8026436ade6fe0ff597a0f6df497e1這個GUID,可見就是這個檔案
    同時我對比了節點1的這個檔案,發現0acef774f25dcfb0bf3d0c7b3db02abe在此檔案可以找到,所以我嘗試手工更新GUID,用0acef774f25dcfb0bf3d0c7b3db02abe替換7d8026436ade6fe0ff597a0f6df497e1


0acef774f25dcfb0bf3d0c7b3db02abe


[grid@jingfa2 gpnpd]$ more /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml|grep -i --color 7d8026436ade6fe0ff597a0f6df497e1
<?xml version="1.0" encoding="UTF-8"?><gpnp:GPnP-Profile Version="1.0" xmlns=" xmlns:gpnp=" 
xmlns:orcl=" xmlns:xsi=" xsi:schemaLocation=" /> gpnp-profile.xsd" ProfileSequence="7" ClusterUId="7d8026436ade6fe0ff597a0f6df497e1" ClusterName="jingfa-scan" PALocation=""><gpnp:Network-Profile><gpnp:HostNetwork id="gen" 
HostName="*"><gpnp:Network id="net1" IP="192.168.0.0" Adapter="eth0" Use="public"/><gpnp:Network id="net2" IP="10.0.0.0" Adapter="eth1" 
Use="cluster_interconnect"/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id="css" DiscoveryString="+asm" 
LeaseDuration="400"/><orcl:ASM-Profile id="asm" DiscoveryString="/dev/ocr*" 
SPFile="+OCRVOTE/jingfa-scan/asmparameterfile/registry.253.849167179"/><ds:Signature 
xmlns:ds=" Algorithm=" /> <ds:SignatureMethod Algorithm=" URI=""><ds:Transforms>
<ds:Transform Algorithm=" Algorithm=" /> <InclusiveNamespaces xmlns=" PrefixList="gpnp orcl xsi"/></ds:Transform></ds:Transforms>
<ds:DigestMethod Algorithm=" /> </ds:SignedInfo><ds:SignatureValue>Ca56sx6DgsCSxrRqPz2ReOzhkf9eYiqVYuj2XLadwuBURX2PL+nYD7LhLFFj27EpuSIx0SfGVhOPm/i016ws7tWATeSKBJDVyTAELgBEYPsMumW4vKm7rVXs
SbVJolycA3pFHtGqZ7FZjzSXxdj5Xq4LlBLGVWR3gYKnqxuRGv0=</ds:SignatureValue>
</ds:Signature></gpnp:GPnP-Profile>
[grid@jingfa2 gpnpd]$ 


24,調整檔案前先備份節點2這個檔案
cp /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml  /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml.20150917bak


vi /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml


:s/7d8026436ade6fe0ff597a0f6df497e1/0acef774f25dcfb0bf3d0c7b3db02abe/g


儲存即可




25,在節點2重啟叢集程式,發現節點1的叢集程式發生了重啟,而且奇怪的是我24步改的又回以了原樣,再次強行修改,再重啟節點2叢集程式
     經過反覆嘗試,說明gpnp程式會對此檔案進行恢復,即使你手工改了也沒用


26,即使上面的方法行不通,換另一個方法,查查2個節點AGENT程式有何區別


[root@jingfa1 ~]# ps -ef|grep agent|grep grid|grep -v grep
grid      3647     1  0 09:44 ?        00:00:10 /u01/grid/11.2.0.4/bin/oraagent.bin
root      3660     1  0 09:44 ?        00:00:36 /u01/grid/11.2.0.4/bin/orarootagent.bin
grid      5793     1  0 09:45 ?        00:00:01 /u01/grid/11.2.0.4/bin/scriptagent.bin
oracle    5938     1  0 09:45 ?        00:00:20 /u01/grid/11.2.0.4/bin/oraagent.bin
grid     23427     1  0 09:43 ?        00:00:16 /u01/grid/11.2.0.4/bin/oraagent.bin
root     23656     1  0 09:43 ?        00:00:39 /u01/grid/11.2.0.4/bin/orarootagent.bin
root     23818     1  0 09:43 ?        00:00:19 /u01/grid/11.2.0.4/bin/cssdagent     


[grid@jingfa2 ctssd]$  ps -ef|grep agent|grep grid|grep -v grep
root     17274     1  0 11:31 ?        00:00:01 /u01/grid/11.2.0.4/bin/cssdagent
grid     31975     1  0 11:21 ?        00:00:01 /u01/grid/11.2.0.4/bin/oraagent.bin
root     32064     1  0 11:21 ?        00:00:00 /u01/grid/11.2.0.4/bin/orarootagent.bin
[grid@jingfa2 ctssd]$ 


27,在BAIDU找到一篇文章,準備直接從節點1把profile.xml複製到節點2進行替換


--關閉2個節點的叢集程式
/u01/grid/11.2.0.4/bin/crsctl stop crs


--備份節點2的PROFILE.XML
cd /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/
cp profile.xml profile.xml.20150918bak




--從節點1把profile.xml複製到節點2進行替換
rm profile.xml


cd /u01/grid/11.2.0.4/gpnp/jingfa1/profiles/peer
scp profile.xml grid@192.168.0.31:/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer


28,啟動節點2的叢集程式,報錯依舊
/u01/grid/11.2.0.4/bin/crsctl start crs


29,從節點2的gpnp.log可知,profile.xml是從本地的olr獲知資訊,我嘗試把節點2本地的OLR刪除,到時GPNP程式會從節點1獲取PROFILE.XML
u01/grid/11.2.0.4/bin/ocrcheck -local
Status of Oracle Local Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2508
         Available space (kbytes) :     259612
         ID                       : 1774858304
         Device/File Name         : /u01/grid/11.2.0.4/cdata/jingfa2.olr
                                    Device/File integrity check succeeded


         Local registry integrity check succeeded


         Logical corruption check succeeded


-- -備份節點2的OLR檔案
cp  /u01/grid/11.2.0.4/cdata/jingfa2.olr /u01/grid/11.2.0.4/cdata/jingfa2.olr.20150918bak




--刪除節點2的olr檔案
rm -rf  /u01/grid/11.2.0.4/cdata/jingfa2.olr


/u01/grid/11.2.0.4/bin/crsctl stop crs


--啟動節點2的叢集程式
/u01/grid/11.2.0.4/bin/crsctl start crs


---直接移除節點2的OLR檔案,發現節點2整個叢集程式無法啟動
[ohasd(6836)]CRS-0704:Oracle High Availability Service aborted due to Oracle Local Registry error [PROCL-26: Error while accessing the
 physical storage Operating System error [No such file or directory] [2]]. Details at (:OHAS00106:) in /u01/grid/11.2.0.4/log/jingfa2/ohasd/ohasd.log.


30,從官方手冊瞭解下OLR概念,發現每個RAC節點僅儲存與此節點相關的叢集資源資訊,可見每個節點的OLR檔案不同
Clusterware Administration and Deployment Guide
3 Managing Oracle Cluster Registry and Voting Disks
In Oracle Clusterware 11g release 2 (11.2), each node in a cluster has a local registry for node-specific resources, called an Oracle Local Registry (OLR), 
that is installed and configured when Oracle Clusterware installs OCR


31,經查閱MOS,看來只能重建節點2了
Cluster guid found in voting disk does not match with the cluster guid obtained from the GPnP profile (文件 ID 1281791.1)


32,在節點1刪除節點2,但報找不到節點2
[root@jingfa1 ~]# /u01/grid/11.2.0.4/bin/crsctl delete node -n jingfa2
CRS-4660: Could not find node jingfa2 to delete.
CRS-4000: Command Delete failed, or completed with errors.


33,在節點2更新要刪除節點2的資訊
[grid@jingfa2 ~]$ /u01/grid/11.2.0.4/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=jingfa2" crs=true -silent -local
Starting Oracle Universal Installer...


Checking swap space: must be greater than 500 MB.   Actual 4094 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /home/grid/oraInventory
'UpdateNodeList' was successful.


34,在節點2解除安裝叢集的安裝配置
[grid@jingfa2 ~]$ /u01/grid/11.2.0.4/deinstall/deinstall -local
Checking for required files and bootstrapping ...
Please wait ...
Location of logs /home/grid/oraInventory/logs/


############ ORACLE DEINSTALL & DECONFIG TOOL START ############




######################### CHECK OPERATION START #########################
## [START] Install check configuration ##




Checking for existence of the Oracle home location /u01/grid/11.2.0.4
Oracle Home type selected for deinstall is: Oracle Grid Infrastructure for a Cluster
Oracle Base selected for deinstall is: /u01/app/grid
Checking for existence of central inventory location /home/grid/oraInventory
Checking for existence of the Oracle Grid Infrastructure home /u01/grid/11.2.0.4
The following nodes are part of this cluster: jingfa2
Checking for sufficient temp space availability on node(s) : 'jingfa2'


## [END] Install check configuration ##


Traces log file: /home/grid/oraInventory/logs//crsdc.log
Enter an address or the name of the virtual IP used on node "jingfa2"[jingfa2-vip]
 > 


The following information can be collected by running "/sbin/ifconfig -a" on node "jingfa2"
Enter the IP netmask of Virtual IP "192.168.0.23" on node "jingfa2"[255.255.255.0]
 > 


Enter the network interface name on which the virtual IP address "192.168.0.23" is active
 > 


Enter an address or the name of the virtual IP[]
 > 




Network Configuration check config START


Network de-configuration trace file location: /home/grid/oraInventory/logs/netdc_check2015-09-19_08-30-44-AM.log


Specify all RAC listeners (do not include SCAN listener) that are to be de-configured [LISTENER,LISTENER_SCAN1]:


Network Configuration check config END


Asm Check Configuration START


ASM de-configuration trace file location: /home/grid/oraInventory/logs/asmcadc_check2015-09-19_08-31-13-AM.log




######################### CHECK OPERATION END #########################




####################### CHECK OPERATION SUMMARY #######################
Oracle Grid Infrastructure Home is: /u01/grid/11.2.0.4
The cluster node(s) on which the Oracle home deinstallation will be performed are:jingfa2
Since -local option has been specified, the Oracle home will be deinstalled only on the local node, 'jingfa2', and the global configuration will be removed.
Oracle Home selected for deinstall is: /u01/grid/11.2.0.4
Inventory Location where the Oracle home registered is: /home/grid/oraInventory
Following RAC listener(s) will be de-configured: LISTENER,LISTENER_SCAN1
Option -local will not modify any ASM configuration.
Do you want to continue (y - yes, n - no)? [n]: y
A log of this session will be written to: '/home/grid/oraInventory/logs/deinstall_deconfig2015-09-19_08-26-31-AM.out'
Any error messages from this session will be written to: '/home/grid/oraInventory/logs/deinstall_deconfig2015-09-19_08-26-31-AM.err'


######################## CLEAN OPERATION START ########################
ASM de-configuration trace file location: /home/grid/oraInventory/logs/asmcadc_clean2015-09-19_08-31-36-AM.log
ASM Clean Configuration END


Network Configuration clean config START


Network de-configuration trace file location: /home/grid/oraInventory/logs/netdc_clean2015-09-19_08-31-36-AM.log


De-configuring RAC listener(s): LISTENER,LISTENER_SCAN1


De-configuring listener: LISTENER
    Stopping listener on node "jingfa2": LISTENER
    Warning: Failed to stop listener. Listener may not be running.
Listener de-configured successfully.


De-configuring listener: LISTENER_SCAN1
    Stopping listener on node "jingfa2": LISTENER_SCAN1
    Warning: Failed to stop listener. Listener may not be running.
Listener de-configured successfully.


De-configuring backup files...
Backup files de-configured successfully.


The network configuration has been cleaned up successfully.


Network Configuration clean config END




---------------------------------------->
執行到這裡,提示以ROOT使用者在節點2執行如下指令碼
The deconfig command below can be executed in parallel on all the remote nodes. Execute the command on  the local node after the execution completes on all the remote nodes.


Run the following command as the root user or the administrator on node "jingfa2".


/tmp/deinstall2015-09-19_08-24-22AM/perl/bin/perl -I/tmp/deinstall2015-09-19_08-24-22AM/perl/lib -I/tmp/deinstall2015-09-19_08-24-22AM/crs/install /tmp/deinstall2015-09-19_08-24-22AM/crs/install/rootcrs.pl -force  -deconfig -paramfile "/tmp/deinstall2015-09-19_08-24-22AM/response/deinstall_Ora11g_gridinfrahome1.rsp"


Press Enter after you finish running the above commands


<----------------------------------------


35,在節點2以ROOT使用者執行上述提示的指令碼
[root@jingfa2 ~]# /tmp/deinstall2015-09-19_08-24-22AM/perl/bin/perl -I/tmp/deinstall2015-09-19_08-24-22AM/perl/lib -I/tmp/deinstall2015-09-19_08-24-22AM/crs/install /tmp/deinstall2015-09-19_08-24-22AM/crs/install/rootcrs.pl -force  -deconfig -paramfile "/tmp/deinstall2015-09-19_08-24-22AM/response/deinstall_Ora11g_gridinfrahome1.rsp"
Using configuration parameter file: /tmp/deinstall2015-09-19_08-24-22AM/response/deinstall_Ora11g_gridinfrahome1.rsp


PRCR-1119 : Failed to look up CRS resources of ora.cluster_vip_net1.type type
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd


CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'jingfa2'
CRS-2673: Attempting to stop 'ora.crf' on 'jingfa2'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'jingfa2'
CRS-2677: Stop of 'ora.crf' on 'jingfa2' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'jingfa2'
CRS-2677: Stop of 'ora.mdnsd' on 'jingfa2' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'jingfa2' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'jingfa2'
CRS-2677: Stop of 'ora.gpnpd' on 'jingfa2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'jingfa2' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully deconfigured Oracle clusterware stack on this node




36,繼續在節點2把34步沒有執行的指令碼執行完(點選Enter鍵)
Remove the directory: /tmp/deinstall2015-09-19_08-24-22AM on node: 
Setting the force flag to false
Setting the force flag to cleanup the Oracle Base
Oracle Universal Installer clean START


Detach Oracle home '/u01/grid/11.2.0.4' from the central inventory on the local node : Done


Delete directory '/u01/grid/11.2.0.4' on the local node : Done


Failed to delete the directory '/u01/app/grid'. The directory is in use.
Delete directory '/u01/app/grid' on the local node : Failed <<<<


Oracle Universal Installer cleanup completed with errors.


Oracle Universal Installer clean END




## [START] Oracle install clean ##


Clean install operation removing temporary directory '/tmp/deinstall2015-09-19_08-24-22AM' on node 'jingfa2'


## [END] Oracle install clean ##




######################### CLEAN OPERATION END #########################




####################### CLEAN OPERATION SUMMARY #######################
Following RAC listener(s) were de-configured successfully: LISTENER,LISTENER_SCAN1
Oracle Clusterware is stopped and successfully de-configured on node "jingfa2"
Oracle Clusterware is stopped and de-configured successfully.
Successfully detached Oracle home '/u01/grid/11.2.0.4' from the central inventory on the local node.
Successfully deleted directory '/u01/grid/11.2.0.4' on the local node.
Failed to delete directory '/u01/app/grid' on the local node.
Oracle Universal Installer cleanup completed with errors.


Oracle deinstall tool successfully cleaned up temporary directories.
#######################################################################




############# ORACLE DEINSTALL & DECONFIG TOOL END #############


[grid@jingfa2 ~]$ 




37,在節點1執行更新叢集配置資訊
[grid@jingfa1 ~]$ /u01/grid/11.2.0.4/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=jingfa1" crs=true -silent
Starting Oracle Universal Installer...


Checking swap space: must be greater than 500 MB.   Actual 4094 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /home/grid/oraInventory
'UpdateNodeList' was successful.


38,在節點1以ORACLE使用者更新叢集配置資訊
[oracle@jingfa1 ~]$ /u01/app/oracle/product/11.2.0.4/db_1/oui/bin/runInstaller  -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=jingfa1"
Starting Oracle Universal Installer...


Checking swap space: must be greater than 500 MB.   Actual 4094 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /home/grid/oraInventory
'UpdateNodeList' was successful.






39,確認節點2已經從叢集中移除
[grid@jingfa1 ~]$ /u01/grid/11.2.0.4/bin/cluvfy stage -post nodedel -n jingfa2


Performing post-checks for node removal 


Checking CRS integrity...


Clusterware version consistency passed


CRS integrity check passed


Node removal check passed


Post-check for node removal was successful. 




40,準備在節點1開始新增節點2到叢集,驗證節點2是否可以新增到節點1,執行如下命令
/u01/grid/11.2.0.4/bin/cluvfy stage -pre nodeadd -n jingfa2


 攝錯請重新重成節點間的SSH互信即可
ERROR: 
PRVF-7610 : Cannot verify user equivalence/reachability on existing cluster nodes
Verification cannot proceed




ERROR: 
PRVF-7617 : Node connectivity between "jingfa1 : 192.168.0.21" and "jingfa1 : 192.168.0.22" failed
TCP connectivity check failed for subnet "192.168.0.0"




原因:Linux中未關閉Firewall


停掉防火牆即可:
service iptables save
service iptables stop
chkconfig iptables off


41,再次在節點1重新驗證節點2是否可以驗證到叢集環境中,其實就是驗證節點2的硬軟體環境是否滿足執行叢集的環境
檢查節點互信,安裝軟體包好多資訊,NTP以及DNS和多播
[grid@jingfa1 ~]$ /u01/grid/11.2.0.4/bin/cluvfy stage -pre nodeadd -n jingfa2


Performing pre-checks for node addition 


Checking node reachability...
Node reachability check passed from node "jingfa1"




Checking user equivalence...
User equivalence check passed for user "grid"


Checking node connectivity...


Checking hosts config file...


Verification of the hosts config file successful


Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"
TCP connectivity check passed for subnet "192.168.0.0"


Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "192.168.0.0".
Subnet mask consistency check passed.


Node connectivity check passed


Checking multicast communication...


Checking subnet "192.168.0.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.0.0" for multicast communication with multicast group "230.0.1.0" passed.


Check of multicast communication passed.


Checking CRS integrity...


Clusterware version consistency passed


CRS integrity check passed


Checking shared resources...


Checking CRS home location...
"/u01/grid/11.2.0.4" is shared
Shared resources check for node addition passed




Checking node connectivity...


Checking hosts config file...


Verification of the hosts config file successful


Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"
TCP connectivity check passed for subnet "192.168.0.0"




Check: Node connectivity for interface "eth1"
Node connectivity passed for interface "eth1"
TCP connectivity check passed for subnet "10.0.0.0"


Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "192.168.0.0".
Subnet mask consistency check passed for subnet "10.0.0.0".
Subnet mask consistency check passed.


Node connectivity check passed


Checking multicast communication...


Checking subnet "192.168.0.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.0.0" for multicast communication with multicast group "230.0.1.0" passed.


Checking subnet "10.0.0.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "10.0.0.0" for multicast communication with multicast group "230.0.1.0" passed.


Check of multicast communication passed.
Total memory check passed
Available memory check passed
Swap space check passed
Free disk space check passed for "jingfa2:/u01/grid/11.2.0.4,jingfa2:/tmp"
Free disk space check failed for "jingfa1:/u01/grid/11.2.0.4,jingfa1:/tmp"
Check failed on nodes: 
        jingfa1
Check for multiple users with UID value 1101 passed 
User existence check passed for "grid"
Run level check passed
Hard limits check passed for "maximum open file descriptors"
Soft limits check passed for "maximum open file descriptors"
Hard limits check passed for "maximum user processes"
Soft limits check passed for "maximum user processes"
System architecture check passed
Kernel version check passed
Kernel parameter check passed for "semmsl"
Kernel parameter check passed for "semmns"
Kernel parameter check passed for "semopm"
Kernel parameter check passed for "semmni"
Kernel parameter check passed for "shmmax"
Kernel parameter check passed for "shmmni"
Kernel parameter check passed for "shmall"
Kernel parameter check passed for "file-max"
Kernel parameter check passed for "ip_local_port_range"
Kernel parameter check passed for "rmem_default"
Kernel parameter check passed for "rmem_max"
Kernel parameter check passed for "wmem_default"
Kernel parameter check passed for "wmem_max"
Kernel parameter check passed for "aio-max-nr"
Package existence check passed for "make"
Package existence check passed for "binutils"
Package existence check passed for "gcc(x86_64)"
Package existence check passed for "libaio(x86_64)"
Package existence check passed for "glibc(x86_64)"
Package existence check passed for "compat-libstdc++-33(x86_64)"
Package existence check passed for "elfutils-libelf(x86_64)"
Package existence check passed for "elfutils-libelf-devel"
Package existence check passed for "glibc-common"
Package existence check passed for "glibc-devel(x86_64)"
Package existence check passed for "glibc-headers"
Package existence check passed for "gcc-c++(x86_64)"
Package existence check passed for "libaio-devel(x86_64)"
Package existence check passed for "libgcc(x86_64)"
Package existence check passed for "libstdc++(x86_64)"
Package existence check passed for "libstdc++-devel(x86_64)"
Package existence check passed for "sysstat"
Package existence check failed for "pdksh"
Check failed on nodes: 
        jingfa2,jingfa1
Package existence check passed for "expat(x86_64)"
Check for multiple users with UID value 0 passed 
Current group ID check passed


Starting check for consistency of primary group of root user


Check for consistency of root user's primary group passed


Checking OCR integrity...


OCR integrity check passed


Checking Oracle Cluster Voting Disk configuration...


Oracle Cluster Voting Disk configuration check passed
Time zone consistency check passed


Starting Clock synchronization checks using Network Time Protocol(NTP)...


NTP Configuration file check started...
NTP Configuration file check passed
No NTP Daemons or Services were found to be running
PRVF-5507 : NTP daemon or service is not running on any node but NTP configuration file exists on the following node(s):
jingfa2,jingfa1
Clock synchronization check using Network Time Protocol(NTP) failed




User "grid" is not part of "root" group. Check passed
Checking consistency of file "/etc/resolv.conf" across nodes


File "/etc/resolv.conf" does not have both domain and search entries defined
domain entry in file "/etc/resolv.conf" is consistent across nodes
search entry in file "/etc/resolv.conf" is consistent across nodes
PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: jingfa1,jingfa2


File "/etc/resolv.conf" is not consistent across nodes




Pre-check for node addition was unsuccessful on all the nodes. 
[grid@jingfa1 ~]$ 




42,在節點1新增節點2


[grid@jingfa1 ~]$ /u01/grid/11.2.0.4/oui/bin/addNode.sh "CLUSTER_NEW_NODES={jingfa2}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={jingfa2-vip}"


類似內容略
ERROR: 
PRVF-7617 : Node connectivity between "jingfa1 : 192.168.0.21" and "jingfa1 : 192.168.0.24" failed


ERROR: 
PRVF-7617 : Node connectivity between "jingfa1 : 192.168.0.21" and "jingfa1 : 192.168.0.23" failed
TCP connectivity check failed for subnet "192.168.0.0"


Checking VIP configuration.
Checking VIP Subnet configuration.
Check for VIP Subnet configuration passed.
Checking VIP reachability
PRVF-10209 : VIPs "jingfa2-vip" are active before Clusterware installation




43,根據42步執行提示,停止節點2的VIP資源並移除並資源
[root@jingfa1 ~]# /u01/grid/11.2.0.4/bin/crsctl stop res  ora.jingfa2.vip
CRS-2673: Attempting to stop 'ora.jingfa2.vip' on 'jingfa1'
CRS-2677: Stop of 'ora.jingfa2.vip' on 'jingfa1' succeeded


[root@jingfa1 ~]# /u01/grid/11.2.0.4/bin/crsctl delete res  ora.jingfa2.vip


確認節點2的VIP資源已經清理完畢
[root@jingfa1 ~]# /u01/grid/11.2.0.4/bin/crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       jingfa1                                      
無關內容略     
ora.jingfa1.vip
      1        ONLINE  ONLINE       jingfa1                                      
ora.oc4j
      1        ONLINE  ONLINE       jingfa1                                      
ora.scan1.vip
      1        ONLINE  ONLINE       jingfa1




44,忽略預安裝前準備工作檢查,開始在節點1新增節點2      
[grid@jingfa1 ~]$ export IGNORE_PREADDNODE_CHECKS=Y
[grid@jingfa1 ~]$ /u01/grid/11.2.0.4/oui/bin/addNode.sh "CLUSTER_NEW_NODES={jingfa2}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={jingfa2-vip}"
Starting Oracle Universal Installer...


Checking swap space: must be greater than 500 MB.   Actual 4067 MB    Passed
Oracle Universal Installer, Version 11.2.0.3.0 Production
Copyright (C) 1999, 2011, Oracle. All rights reserved.




Performing tests to see whether nodes jingfa2 are available
............................................................... 100% Done.


.
-----------------------------------------------------------------------------
Cluster Node Addition Summary
Global Settings
   Source: /u01/grid/11.2.0.4
   New Nodes
Space Requirements
   New Nodes
      jingfa2
         /: Required 5.01GB : Available 8.51GB
Installed Products
   Product Names
      Oracle Grid Infrastructure 11.2.0.3.0 
      Sun JDK 1.5.0.30.03 
      Installer SDK Component 11.2.0.3.0 
      類似內容略


      SQL*Plus 11.2.0.3.0 
      Oracle Netca Client 11.2.0.3.0 
      Oracle Net 11.2.0.3.0 
      Oracle JVM 11.2.0.3.0 
      Oracle Internet Directory Client 11.2.0.3.0 
      Oracle Net Listener 11.2.0.3.0 
      Cluster Ready Services Files 11.2.0.3.0 
      Oracle Database 11g 11.2.0.3.0 
-----------------------------------------------------------------------------




Instantiating scripts for add node (Saturday, September 19, 2015 10:53:28 AM GMT+08:00)
.                                                                 1% Done.
Instantiation of add node scripts complete


Copying to remote nodes (Saturday, September 19, 2015 10:53:33 AM GMT+08:00)
...............................................................................................                                 96% Done.
Home copied to new nodes


Saving inventory on nodes (Saturday, September 19, 2015 11:05:15 AM GMT+08:00)
.                                                               100% Done.
Save inventory complete
WARNING:A new inventory has been created on one or more nodes in this session. However, it has not yet been registered as the central inventory of this system. 
To register the new inventory please run the script at '/home/grid/oraInventory/orainstRoot.sh' with root privileges on nodes 'jingfa2'.
If you do not register the inventory, you may not be able to update or patch the products you installed.
The following configuration scripts need to be executed as the "root" user in each new cluster node. Each script in the list below is followed by a list of nodes.
/home/grid/oraInventory/orainstRoot.sh #On nodes jingfa2
/u01/grid/11.2.0.4/root.sh #On nodes jingfa2
To execute the configuration scripts:
    1. Open a terminal window
    2. Log in as "root"
    3. Run the scripts in each cluster node
    
The Cluster Node Addition of /u01/grid/11.2.0.4 was successful.
Please check '/tmp/silentInstall.log' for more details.
[grid@jingfa1 ~]$ 


45,以ROOT使用者在節點2執行44步提示的SH指令碼
[root@jingfa2 ~]# /home/grid/oraInventory/orainstRoot.sh
Changing permissions of /home/grid/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.


Changing groupname of /home/grid/oraInventory to oinstall.
The execution of the script is complete.
[root@jingfa2 ~]# /u01/grid/11.2.0.4/root.sh
Performing root user operation for Oracle 11g 


The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/grid/11.2.0.4


Enter the full pathname of the local bin directory: [/usr/local/bin]: 
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.




Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/grid/11.2.0.4/crs/install/crsconfig_params
Creating trace directory
User ignored Prerequisites during installation
OLR initialization - successful
Adding Clusterware entries to upstart
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node jingfa1, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded
[root@jingfa2 ~]# 



個人簡介


8年oracle從業經驗,具備豐富的oracle技能,目前在國內北京某專業oracle服務公司從事高階技術顧問。
服務過的客戶:
中國電信
中國移動
中國聯通
中國電通
國家電網
四川達州商業銀行
湖南老百姓大藥房
山西省公安廳
中國郵政
北京302醫院     
河北廊坊新奧集團公司

 專案經驗:
中國電信3G專案AAA系統資料庫部署及最佳化
      中國聯通4G資料庫效能分析與最佳化
中國聯通CRM資料庫效能最佳化
中國移動10086電商平臺資料庫部署及最佳化
湖南老百姓大藥房ERR資料庫sql最佳化專案
四川達州商業銀行TCBS核心業務系統資料庫模型設計和RAC部署及最佳化
四川達州商業銀行TCBS核心業務系統後端批處理儲存過程功能模組編寫及最佳化
北京高鐵訊號監控系統RAC資料庫部署及最佳化
河南宇通客車資料庫效能最佳化
中國電信電商平臺核心採購模組表模型設計及最佳化
中國郵政儲蓄系統資料庫效能最佳化及sql最佳化
北京302醫院資料庫遷移實施
河北廊坊新奧data guard部署及最佳化
山西公安廳身份證審計資料庫系統故障評估
國家電網上海災備專案4 node rac+adg 
       貴州移動crm及客服資料庫效能最佳化專案
       貴州移動crm及客服務資料庫sql稽核專案
       深圳穆迪軟體有限公司資料庫效能最佳化專案

聯絡方式:
手機:18201115468
qq   :   305076427
qq微博: wisdomone1
新浪微博:wisdomone9
qq群:275813900    
itpub部落格名稱:wisdomone1    http://blog.itpub.net/9240380/

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/9240380/viewspace-1804149/,如需轉載,請註明出處,否則將追究法律責任。

相關文章