Oracle Exadata Machine X4-2實施記錄

dawn009發表於2015-04-05
    這是在我第一次成功部署Oracle Exadata Machine X4-2後總結的文章。
    這篇文章記錄在第一次實施過程中遇到的問題,以及解決問題的過程。

1.執行onecommand第1步驗證配置檔案時的報錯。

   出現這個問題的原因是我使用vi編輯器手動修改了作業系統的機器名,將原有的管理網段機器名(dm01dbadm01)修改為了Client網段機器名(dm01db01),下面是報錯的內容:
[root@dm01db01 linux-x64]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/linux-x64/tequ-dm01.xml -s 1

 Executing Validate Configuration File..............java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
        at oracle.onecommand.deploy.validation.DnsValidation.validateClusterDns(DnsValidation.java:137)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
        ... 11 more
Caused by: oracle.onecommand.escommon.common.OcmdException: Unable to locate file dm01db01.tequ.com--etc-hosts
        at oracle.onecommand.escommon.common.EsCommonUtils.getFileInputStream(EsCommonUtils.java:834)
        at oracle.onecommand.escommon.common.FileUtils.readFile(FileUtils.java:103)
        at oracle.onecommand.deploy.validation.DnsValidation.getMapfromHostsFile(DnsValidation.java:322)
        at oracle.onecommand.deploy.validation.DnsValidation.validateDnsOnMachine(DnsValidation.java:216)
        ... 11 more
java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
        at oracle.onecommand.deploy.validation.DnsValidation.validateClusterDns(DnsValidation.java:137)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

......

Caused by: oracle.onecommand.escommon.common.OcmdException: Unable to locate file dm01db02.tequ.com--etc-hosts
        at oracle.onecommand.escommon.common.EsCommonUtils.getFileInputStream(EsCommonUtils.java:834)
        at oracle.onecommand.escommon.common.FileUtils.readFile(FileUtils.java:103)
        at oracle.onecommand.deploy.validation.DnsValidation.getMapfromHostsFile(DnsValidation.java:322)
        at oracle.onecommand.deploy.validation.DnsValidation.validateDnsOnMachine(DnsValidation.java:216)
        ... 11 more
java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
        at oracle.onecommand.deploy.validation.DnsValidation.validateClusterDns(DnsValidation.java:137)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
        ... 11 more
Caused by: oracle.onecommand.escommon.common.OcmdException: Unable to locate file dm01db02.tequ.com--etc-hosts
        at oracle.onecommand.escommon.common.EsCommonUtils.getFileInputStream(EsCommonUtils.java:834)
        at oracle.onecommand.escommon.common.FileUtils.readFile(FileUtils.java:103)
        at oracle.onecommand.deploy.validation.DnsValidation.getMapfromHostsFile(DnsValidation.java:322)
        at oracle.onecommand.deploy.validation.DnsValidation.validateDnsOnMachine(DnsValidation.java:216)
        ... 11 more
java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
        at oracle.onecommand.deploy.validation.DnsValidation.validateClusterDns(DnsValidation.java:137)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
        ... 11 more
Caused by: oracle.onecommand.escommon.common.OcmdException: Unable to locate file dm01db02.tequ.com--etc-hosts
        at oracle.onecommand.escommon.common.EsCommonUtils.getFileInputStream(EsCommonUtils.java:834)
        at oracle.onecommand.escommon.common.FileUtils.readFile(FileUtils.java:103)
        at oracle.onecommand.deploy.validation.DnsValidation.getMapfromHostsFile(DnsValidation.java:322)
        at oracle.onecommand.deploy.validation.DnsValidation.validateDnsOnMachine(DnsValidation.java:216)
        ... 11 more
.....
 Validating cluster: cluster-clu1
  Locating machines...
  Verifying operating systems...
  Validating cluster networks......
  Validating network connectivity............
  Validating NTP setup..........
  Validating physical disks on storage cells........................................................
 Completed validation...
 
 SUCCESS: Validated NTP server 10.0.8.114
 SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine dm01db01.tequ.com, machine type: compute
 SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine dm01db02.tequ.com, machine type: compute
 SUCCESS: 
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip exists...
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p18371656_112040_Linux-x86-64.zip exists...
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip exists...
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip exists...
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p6880880_112000_Linux-x86-64.zip exists...
 ......

    最開始我一直以為這個報錯是hosts的配置問題,後來靜下心來想了一下,報錯是找不到dm01db02.tequ.com--etc-hosts和dm01db02.tequ.com--etc-hosts兩個檔案,難道這是兩個單獨的檔案,於是對onecommand目錄進行了搜尋:
[root@dm01db01 linux-x64]# find . -name *host*
./WorkDir/dm01dbadm02.tequ.com--etc-hosts
./WorkDir/dm01dbadm01.tequ.com--etc-hosts

   果然在WorkDir目錄下有兩個原有機器名的類似檔案。

[root@dm01db01 linux-x64]# cd WorkDir
[root@dm01db01 WorkDir]# ls
dm01celadm01.tequ.com-cpuInfo.txt  dm01db02.tequ.com-cpuInfo.txt     dm01dbadm02.tequ.com-memInfo.txt
dm01celadm01.tequ.com-memInfo.txt  dm01db02.tequ.com-memInfo.txt     dm01dbadm02.tequ.com-ntpConf.txt
dm01celadm02.tequ.com-cpuInfo.txt  dm01db02.tequ.com-ntpConf.txt     p13390677_112040_Linux-x86-64_1of7.zip
dm01celadm02.tequ.com-memInfo.txt  dm01dbadm01.tequ.com-cpuInfo.txt  p13390677_112040_Linux-x86-64_2of7.zip
dm01celadm03.tequ.com-cpuInfo.txt  dm01dbadm01.tequ.com--etc-hosts   p13390677_112040_Linux-x86-64_3of7.zip
dm01celadm03.tequ.com-memInfo.txt  dm01dbadm01.tequ.com-memInfo.txt  p18371656_112040_Linux-x86-64.zip
dm01db01.tequ.com-cpuInfo.txt      dm01dbadm01.tequ.com-ntpConf.txt  p6880880_112000_Linux-x86-64.zip
dm01db01.tequ.com-memInfo.txt      dm01dbadm02.tequ.com-cpuInfo.txt
dm01db01.tequ.com-ntpConf.txt      dm01dbadm02.tequ.com--etc-hosts
[root@dm01db01 WorkDir]# ls *host*
dm01dbadm01.tequ.com--etc-hosts  dm01dbadm02.tequ.com--etc-hosts
[root@dm01db01 WorkDir]# cat dm01dbadm01.tequ.com--etc-hosts 
#### BEGIN Generated by Exadata. DO NOT MODIFY ####
127.0.0.1       localhost.localdomain   localhost


192.168.10.1    dm01db01-priv1.tequ.com dm01db01-priv1
192.168.10.2    dm01db01-priv2.tequ.com dm01db01-priv2
10.0.3.10       dm01db01.tequ.com       dm01db01
10.255.255.10   dm01dbadm01.tequ.com    dm01dbadm01


192.168.10.3    dm01db02-priv1.tequ.com dm01db02-priv1
192.168.10.4    dm01db02-priv2.tequ.com dm01db02-priv2
10.0.3.11       dm01db02.tequ.com       dm01db02
10.255.255.11   dm01dbadm02.tequ.com    dm01dbadm02


10.0.3.13       dm01db02-vip.tequ.com   dm01db02-vip
10.0.3.12       dm01db01-vip.tequ.com   dm01db01-vip
#### END Generated by Exadata ####

   檔案的內容和/etc/hosts檔案是一致的,說明在執行onecommand的時候讀取的是WorkDir下的*--etc-hosts檔案,而不是直接讀取/etc/hosts檔案。

直接複製這兩份檔案:
[root@dm01db01 WorkDir]# cp dm01dbadm01.tequ.com--etc-hosts dm01db01.tequ.com--etc-hosts
[root@dm01db01 WorkDir]# cp dm01dbadm02.tequ.com--etc-hosts dm01db02.tequ.com--etc-hosts
[root@dm01db01 WorkDir]# 
[root@dm01db01 WorkDir]# cd ..

之後再次執行驗證命令:
[root@dm01db01 linux-x64]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/linux-x64/tequ-dm01.xml -s 1

 Executing Validate Configuration File...................
 Validating cluster: cluster-clu1
  Locating machines...
  Verifying operating systems...
  Validating cluster networks......
  Validating network connectivity............
  Validating NTP setup..........
  Validating physical disks on storage cells........................................................
 Completed validation...
 
 SUCCESS: 10.255.255.10 configured correctly on machine dm01db01.tequ.com
 SUCCESS: 10.0.3.10 configured correctly on machine dm01db01.tequ.com
 SUCCESS: 10.255.255.11 configured correctly on machine dm01db01.tequ.com
 SUCCESS: 10.0.3.11 configured correctly on machine dm01db01.tequ.com
 SUCCESS: 10.0.3.13 configured correctly on machine dm01db01.tequ.com
 SUCCESS: 10.0.3.12 configured correctly on machine dm01db01.tequ.com
 SUCCESS: 10.255.255.10 configured correctly on machine dm01db02.tequ.com
 SUCCESS: 10.0.3.10 configured correctly on machine dm01db02.tequ.com
 SUCCESS: 10.255.255.11 configured correctly on machine dm01db02.tequ.com
 SUCCESS: 10.0.3.11 configured correctly on machine dm01db02.tequ.com
 SUCCESS: 10.0.3.13 configured correctly on machine dm01db02.tequ.com
 SUCCESS: 10.0.3.12 configured correctly on machine dm01db02.tequ.com
 SUCCESS: Validated NTP server 10.0.8.114
 SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine dm01db02.tequ.com, machine type: compute
 SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine dm01db01.tequ.com, machine type: compute
 SUCCESS: 
 SUCCESS: NTP servers on machine dm01db02.tequ.com verified successfully
 SUCCESS: NTP servers on machine dm01db01.tequ.com verified successfully
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip exists...
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p18371656_112040_Linux-x86-64.zip exists...
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip exists...
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip exists...
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p6880880_112000_Linux-x86-64.zip exists...
 
 Following errors were found...
 ERROR: 10.0.3.16 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db01.tequ.com
 ERROR: 10.0.3.14 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db01.tequ.com
 ERROR: 10.0.3.15 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db01.tequ.com
 ERROR: 10.0.3.16 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db02.tequ.com
 ERROR: 10.0.3.14 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db02.tequ.com
 ERROR: 10.0.3.15 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db02.tequ.com
 ERROR: Encountered error while checking NTP server. Error getting time from NTP server: 10.0.5.114
 
 Errors occured...

    沒有再報之間的Java錯誤,這裡還報了dm01-scan解析有問題,透過在作業系統層面執行:
[root@dm01db01 bin]# nslookup dm01-scan
Server:         10.0.8.114
Address:        10.0.8.114#53

Name:   dm01-scan.tequ.com
Address: 10.0.3.15
Name:   dm01-scan.tequ.com
Address: 10.0.3.16
Name:   dm01-scan.tequ.com
Address: 10.0.3.14
   解析dm01-scan和dm01-scan.domain解析都沒問題,於是將該問題忽略。

   NTP可以配置多個,只要保障有一個暫時可用,以上的NTP錯誤即可忽略。

   另外想強調一點,onecommand目錄下的log目錄會詳細記錄每一次的onecommand操作,其他錯誤可以透過檢視對應的日誌來找問題。
[root@dm01db01 oracle.SupportTools]# cd /opt/oracle.SupportTools/onecommand/log
[root@dm01db01 log]# ll
total 33328
-rw-r--r-- 1 root root   61426 Jun 11 23:41 log.out
-rw-r--r-- 1 root root   57376 Jun 11 14:15 Step10_Initialize_Cluster_Software_140611_140220.out
-rw-r--r-- 1 root root 4369159 Jun 11 14:33 Step11_Install_Database_Software_140611_142500.out
-rw-r--r-- 1 root root    2136 Jun 11 14:38 Step12_Relink_Database_with_RDS_140611_143823.out
-rw-r--r-- 1 root root   84212 Jun 11 14:39 Step13_Create_ASM_Diskgroups_140611_143851.out
-rw-r--r-- 1 root root  217597 Jun 11 14:41 Step14_Create_Databases_140611_144043.out
-rw-r--r-- 1 root root  235300 Jun 11 19:52 Step14_Create_Databases_140611_195204.out
-rw-r--r-- 1 root root  236827 Jun 11 20:35 Step14_Create_Databases_140611_203506.out
-rw-r--r-- 1 root root  235302 Jun 11 20:43 Step14_Create_Databases_140611_204302.out
-rw-r--r-- 1 root root  225397 Jun 11 21:07 Step14_Create_Databases_140611_204615.out
-rw-r--r-- 1 root root  402325 Jun 11 21:14 Step15_Apply_Security_Fixes_140611_210805.out
-rw-r--r-- 1 root root  655789 Jun 11 21:23 Step16_Create_Installation_Summary_140611_212222.out
-rw-r--r-- 1 root root  741416 Jun 11 21:35 Step17_Resecure_Machine_140611_212329.out
-rw-r--r-- 1 root root   28409 Jun 11 23:32 Step17_Resecure_Machine_140611_233217.out
-rw-r--r-- 1 root root  256551 Jun 11 01:39 Step1_Validate_Configuration_File_140609_141934.out
-rw-r--r-- 1 root root  492178 Jun 11 01:39 Step1_Validate_Configuration_File_140609_142913.out
-rw-r--r-- 1 root root  492398 Jun 11 01:39 Step1_Validate_Configuration_File_140609_165947.out
-rw-r--r-- 1 root root  494475 Jun 11 01:39 Step1_Validate_Configuration_File_140609_171926.out
-rw-r--r-- 1 root root  495927 Jun 11 01:39 Step1_Validate_Configuration_File_140610_095109.out
-rw-r--r-- 1 root root  491049 Jun 11 01:39 Step1_Validate_Configuration_File_140610_104933.out
-rw-r--r-- 1 root root  227377 Jun 11 01:39 Step1_Validate_Configuration_File_140610_111043.out
-rw-r--r-- 1 root root  581537 Jun 11 01:39 Step1_Validate_Configuration_File_140610_111825.out
-rw-r--r-- 1 root root  580946 Jun 11 01:39 Step1_Validate_Configuration_File_140610_112825.out
-rw-r--r-- 1 root root  580689 Jun 11 01:39 Step1_Validate_Configuration_File_140610_114206.out
-rw-r--r-- 1 root root  583397 Jun 11 01:39 Step1_Validate_Configuration_File_140610_141909.out
-rw-r--r-- 1 root root  487083 Jun 11 01:39 Step1_Validate_Configuration_File_140610_143318.out
-rw-r--r-- 1 root root  534536 Jun 11 01:39 Step1_Validate_Configuration_File_140610_143842.out
-rw-r--r-- 1 root root  531754 Jun 11 01:39 Step1_Validate_Configuration_File_140610_145105.out
-rw-r--r-- 1 root root  585919 Jun 11 01:39 Step1_Validate_Configuration_File_140610_235714.out
-rw-r--r-- 1 root root  490349 Jun 11 01:39 Step1_Validate_Configuration_File_140611_005853.out
-rw-r--r-- 1 root root  489191 Jun 11 01:39 Step1_Validate_Configuration_File_140611_010318.out
-rw-r--r-- 1 root root  489267 Jun 11 01:39 Step1_Validate_Configuration_File_140611_010452.out
-rw-r--r-- 1 root root  489311 Jun 11 01:39 Step1_Validate_Configuration_File_140611_010935.out
-rw-r--r-- 1 root root  490987 Jun 11 01:39 Step1_Validate_Configuration_File_140611_011338.out
-rw-r--r-- 1 root root  491003 Jun 11 01:42 Step1_Validate_Configuration_File_140611_014106.out
-rw-r--r-- 1 root root 2468856 Jun 11 01:39 Step2_Setup_Required_Files_140610_113549.out
-rw-r--r-- 1 root root 2458010 Jun 11 01:39 Step2_Setup_Required_Files_140611_011643.out
-rw-r--r-- 1 root root   48406 Jun 11 01:46 Step2_Setup_Required_Files_140611_014537.out
-rw-r--r-- 1 root root 2468732 Jun 11 01:50 Step2_Setup_Required_Files_140611_014853.out
-rw-r--r-- 1 root root 2861511 Jun 11 10:10 Step2_Setup_Required_Files_140611_100656.out
-rw-r--r-- 1 root root  663710 Jun 11 01:39 Step3_Update_Nodes_for_Eighth_Rack_140610_113946.out
-rw-r--r-- 1 root root  697616 Jun 11 10:13 Step3_Update_Nodes_for_Eighth_Rack_140611_101020.out
-rw-r--r-- 1 root root  666778 Jun 11 10:27 Step3_Update_Nodes_for_Eighth_Rack_140611_102543.out
-rw-r--r-- 1 root root  386914 Jun 11 10:40 Step4_Create_Users_140611_103752.out
-rw-r--r-- 1 root root   17602 Jun 11 10:40 Step5_Setup_Cell_Connectivity_140611_104041.out
-rw-r--r-- 1 root root  701735 Jun 11 10:45 Step6_Verify_Infiniband_and_Calibrate_Cells_140611_104104.out
-rw-r--r-- 1 root root  697829 Jun 11 11:06 Step6_Verify_Infiniband_and_Calibrate_Cells_140611_110244.out
-rw-r--r-- 1 root root  771081 Jun 11 11:28 Step6_Verify_Infiniband_and_Calibrate_Cells_140611_112140.out
-rw-r--r-- 1 root root   23701 Jun 11 11:33 Step7_Create_Cell_Disks_140611_113025.out
-rw-r--r-- 1 root root  590542 Jun 11 11:34 Step8_Create_Grid_Disks_140611_113429.out
-rw-r--r-- 1 root root   70332 Jun 11 11:38 Step9_Install_Cluster_Software_140611_113621.out
-rw-r--r-- 1 root root   70570 Jun 11 12:00 Step9_Install_Cluster_Software_140611_115755.out
-rw-r--r-- 1 root root  190715 Jun 11 14:01 Step9_Install_Cluster_Software_140611_135339.out
-rw-r--r-- 1 root root   13982 Jun 11 19:48 UndoStep14_Create_Databases_140611_194842.out
-rw-r--r-- 1 root root   14250 Jun 11 19:51 UndoStep14_Create_Databases_140611_195103.out
-rw-r--r-- 1 root root   14004 Jun 11 19:51 UndoStep14_Create_Databases_140611_195135.out

2.執行onecommand第2步建立必要檔案時的報錯。

下面是在執行onecommand第二步時候的報錯:

[root@dm01db01 onecommand]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/tequ-dm01.xml -s 2

 Executing Setup Required Files..
 Copying and extracting required files...
 Required files are:
 /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip
 /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip
 /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip
 /opt/oracle.SupportTools/onecommand/WorkDir/p18371656_112040_Linux-x86-64.zip
 /opt/oracle.SupportTools/onecommand/WorkDir/p6880880_112000_Linux-x86-64.zip
 Copying required files...
 Checking status of remote files..........
 Getting status of local files............
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_1of7.zip.
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_2of7.zip.
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_3of7.zip.
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p18371656_112040_Linux-x86-64.zip at /opt/oracle.SupportTools/onecommand/p18371656_112040_Linux-x86-64.zip.
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p6880880_112000_Linux-x86-64.zip at /opt/oracle.SupportTools/onecommand/Software/patches/p6880880_112000_Linux-x86-64.zip.
 Copying file: p18371656_112040_Linux-x86-64.zip to node dm01db02.tequ.com.
 Copying file: p6880880_112000_Linux-x86-64.zip to node dm01db02.tequ.com............
 Completed copying files.....
 Extracting required files............................
 Copying resourcecontrol and other required files........................
 Execution Exception in future get
 OCMD-02624: Error while executing command {0}.java.lang.reflect.InvocationTargetException
 Error running Setup Required Files error message Error running oracle.onecommand.deploy.software.SoftwareUtils method setupRequiredFiles

    從這個報錯看不出任何的原因。

檢視/opt/oracle.SupportTools/onecommand/log/Step2_Setup_Required_Files_140611_014853.out,找到第一個報錯的地方,下面是相關的日誌輸出:
......
2014-06-11 01:50:26,449 [FINE  ][MDThread][        KommandOutput:95] ======
2014-06-11 01:50:26,449 [FINE  ][MDThread][        KommandOutput:64] # of kommand outputs 1
2014-06-11 01:50:26,449 [FINE  ][MDThread][          RunCommand:170] Ran commands, elapsed time = 17006 mS
2014-06-11 01:50:26,455 [FINE  ][MDThread][        KommandOutput:79] ======
2014-06-11 01:50:26,455 [FINE  ][MDThread][        KommandOutput:80] Output
2014-06-11 01:50:26,455 [FINE  ][MDThread][        KommandOutput:81] ======
2014-06-11 01:50:26,455 [FINE  ][MDThread][        KommandOutput:84] Ret code = <52>
2014-06-11 01:50:26,455 [FINE  ][MDThread][        KommandOutput:86] From node dm01celadm01.tequ.com
2014-06-11 01:50:26,455 [FINE  ][MDThread][        KommandOutput:89] ## Output Start
2014-06-11 01:50:26,455 [FINE  ][MDThread][       EsCommonUtils:596] OCMD-00052: Node dm01celadm01.tequ.com appears to be down.
2014-06-11 01:50:26,456 [FINE  ][MDThread][        KommandOutput:91] ## Output End
2014-06-11 01:50:26,456 [FINE  ][MDThread][        KommandOutput:95] ======
2014-06-11 01:50:26,456 [FINE  ][MDThread][        OcmdException:62] Throwing OcmdException... message:Command [mkdir -p /opt/oracle.SupportTools] run on node 10.255.255.12 as user root did not execute successfully...
2014-06-11 01:50:26,456 [FINE  ][MDThread][        OcmdException:98] Stack trace...
2014-06-11 01:50:26,457 [FINE  ][MDThread][       OcmdException:135] OcmdException from node dm01db01.tequ.com return code = 2 output string: Command [mkdir -p /opt/oracle.SupportTools] run on node 10.255.255.12 as user root did not execute successfully... stack trace = java.lang.Throwable
        at oracle.onecommand.escommon.common.OcmdException.ocmdException(OcmdException.java:95)
        at oracle.onecommand.escommon.common.OcmdException.(OcmdException.java:64)
        at oracle.onecommand.commandexec.utils.CommonUtils.checkKommandOutput(CommonUtils.java:1369)
        at oracle.onecommand.commandexec.utils.RemoteFileUtils.sftpPutFile(RemoteFileUtils.java:1491)
        at oracle.onecommand.commandexec.utils.RemoteFileUtils.sftpPutFile(RemoteFileUtils.java:1378)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.commandexec.utils.CallableReflectionMethod.runMethodInParallel(CallableReflectionMethod.java:121)
        at oracle.onecommand.commandexec.utils.CallableReflectionMethod.call(CallableReflectionMethod.java:70)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

2014-06-11 01:50:26,457 [INFO  ][    main][          RunCommand:714] Execution Exception in future get
2014-06-11 01:50:26,458 [INFO  ][    main][          RunCommand:721] OCMD-02624: Error while executing command {0}.java.lang.reflect.InvocationTargetException

   從表面上看是說在dm01celadm01(儲存伺服器)上執行mkdir命令不成功,但手動在儲存伺服器執行該命令是沒問題的。原因可能是在資料庫伺服器節點dm01celadm01.tequ.com名稱不能被解析,檢查資料庫伺服器的/etc/hosts檔案,確實沒有配置對3臺儲存伺服器名稱的解析,將以下內容加入到兩臺資料庫伺服器hosts檔案中
#### BEGIN Generated by Exadata. DO NOT MODIFY ####
127.0.0.1 localhost.localdomain localhost

192.168.10.1 dm01db01-priv1.tequ.com dm01db01-priv1
192.168.10.2 dm01db01-priv2.tequ.com dm01db01-priv2
10.0.3.10 dm01db01.tequ.com dm01db01
10.255.255.10 dm01dbadm01.tequ.com dm01dbadm01

192.168.10.3    dm01db02-priv1.tequ.com dm01db02-priv1
192.168.10.4    dm01db02-priv2.tequ.com dm01db02-priv2
10.0.3.11       dm01db02.tequ.com       dm01db02
10.255.255.11   dm01dbadm02.tequ.com    dm01dbadm02

10.0.3.13       dm01db02-vip.tequ.com   dm01db02-vip
10.0.3.12       dm01db01-vip.tequ.com   dm01db01-vip

10.255.255.12   dm01celadm01.tequ.com   dm01celadm01
10.255.255.13   dm01celadm02.tequ.com   dm01celadm02
10.255.255.14   dm01celadm03.tequ.com   dm01celadm03
#### END Generated by Exadata ####

之後再次執行onecommand操作:
[root@dm01db01 onecommand]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/tequ-dm01.xml -s 2

 Executing Setup Required Files.
 Copying and extracting required files...
 Required files are:
 /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip
 /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip
 /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip
 /opt/oracle.SupportTools/onecommand/WorkDir/p18371656_112040_Linux-x86-64.zip
 /opt/oracle.SupportTools/onecommand/WorkDir/p6880880_112000_Linux-x86-64.zip
 Copying required files...
 Checking status of remote files..........
 Checking status of existing files on remote nodes....
 Getting status of local files.................
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_1of7.zip.
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_2of7.zip.
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_3of7.zip.
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p18371656_112040_Linux-x86-64.zip at /opt/oracle.SupportTools/onecommand/p18371656_112040_Linux-x86-64.zip.
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p6880880_112000_Linux-x86-64.zip at /opt/oracle.SupportTools/onecommand/Software/patches/p6880880_112000_Linux-x86-64.zip..
 Extracting required files........................
 Copying resourcecontrol and other required files.............................................................................................................................
 Creating databasemachine.xml for EM discovery
 Done Creating databasemachine.xml for EM discovery.
 Successfully completed execution of step Setup Required Files [elapsed Time [Elapsed = 185647 mS [3.0 minutes] Wed Jun 11 10:10:02 CST 2014]]

成功完成第二步。

3.執行onecommand第6步驗證InfiniBand相關配置。

下面是執行第6步時候的報錯:

[root@dm01db01 onecommand]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/tequ-dm01.xml -s 6

 Executing Verify Infiniband and Calibrate Cells
 Running rds ping tests on cluster nodes...........................................................................................................
 Validating infiniband network with rds-ping.....
 No ping errors while pinging infiniband fabric.......................................................................................................................................
 dm01celadm02.tequ.com
 ssh: dm01celadm03: Temporary failure in name resolution
 ssh: dm01celadm01: Temporary failure in name resolution
 ssh: dm01db02: Temporary failure in name resolution
 ssh: dm01db01: Temporary failure in name resolution
 Error running Verify Infiniband and Calibrate Cells error message Error running oracle.onecommand.deploy.validation.ValidationUtils method validateInfiniband
 Error running oracle.onecommand.deploy.validation.ValidationUtils method validateInfiniband

在作業系統層面可以使用rds-ping >來驗證rds包傳輸。
手動測試沒問題之後,再次執行第6步操作成功

4.執行onecommand第9步安裝Cluster軟體時候的報錯。

在執行第9步的時候收到如下的報錯:

[root@dm01db01 onecommand]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/tequ-dm01.xml -s 9

 Executing Install Cluster Software
 Installing cluster cluster-clu1.
 Getting grid disks using utility in /opt/oracle.SupportTools/onecommand/Software/11.2.0.4/grid...................
 Running Oracle installer.................................................................................................................java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
        at oracle.onecommand.deploy.software.SoftwareUtils.getKommandOutputsFromParallelizer(SoftwareUtils.java:1304)
        at oracle.onecommand.deploy.software.SoftwareUtils.doInstallClusterware(SoftwareUtils.java:1327)
        at oracle.onecommand.deploy.software.SoftwareUtils.installClusterWare(SoftwareUtils.java:1292)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.deploy.cliXml.InstalSoftware.executeStep(InstalSoftware.java:526)
        at oracle.onecommand.deploy.cliXml.InstalSoftware.executeForwardAction(InstalSoftware.java:462)
        at oracle.onecommand.deploy.cliXml.InstalSoftware.parseCmdLine(InstalSoftware.java:368)
        at oracle.onecommand.deploy.cliXml.InstalSoftware.main(InstalSoftware.java:265)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: oracle.onecommand.escommon.common.OcmdException: Installation did not complete successfully, please check logs in /u01/app/oraInventory
        at oracle.onecommand.deploy.software.ClusterZipInstall112040.install(ClusterZipInstall112040.java:358)
        at oracle.onecommand.deploy.software.SoftwareUtils.installClusterwareByCluster(SoftwareUtils.java:1342)
        ... 11 more

 java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
        at oracle.onecommand.deploy.software.SoftwareUtils.getKommandOutputsFromParallelizer(SoftwareUtils.java:1304)
        at oracle.onecommand.deploy.software.SoftwareUtils.doInstallClusterware(SoftwareUtils.java:1327)
        at oracle.onecommand.deploy.software.SoftwareUtils.installClusterWare(SoftwareUtils.java:1292)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.deploy.cliXml.InstalSoftware.executeStep(InstalSoftware.java:526)
        at oracle.onecommand.deploy.cliXml.InstalSoftware.executeForwardAction(InstalSoftware.java:462)
        at oracle.onecommand.deploy.cliXml.InstalSoftware.parseCmdLine(InstalSoftware.java:368)
        at oracle.onecommand.deploy.cliXml.InstalSoftware.main(InstalSoftware.java:265)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: oracle.onecommand.escommon.common.OcmdException: Installation did not complete successfully, please check logs in /u01/app/oraInventory
        at oracle.onecommand.deploy.software.ClusterZipInstall112040.install(ClusterZipInstall112040.java:358)
        at oracle.onecommand.deploy.software.SoftwareUtils.installClusterwareByCluster(SoftwareUtils.java:1342)
        ... 11 more


 Errors occured...

檢視log/Step9_Install_Cluster_Software_140611_115755.out檔案可以看到下面的報錯:
2014-06-11 12:00:07,602 [FINE  ][thread-1][       EsCommonUtils:596] Preparing to launch Oracle Universal Installer from /tmp/OraInstall2014-06-11_11-58-17AM. Please wait ...[FATAL] [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes.

2014-06-11 12:00:07,603 [FINE  ][thread-1][       EsCommonUtils:596]    CAUSE: Installer has detected that network interface ib0 does not maintain connectivity on all cluster nodes.

2014-06-11 12:00:07,603 [FINE  ][thread-1][       EsCommonUtils:596]    ACTION: Ensure that the chosen interface has been configured across all cluster nodes.

2014-06-11 12:00:07,603 [FINE  ][thread-1][       EsCommonUtils:596] [FATAL] [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes.

2014-06-11 12:00:07,603 [FINE  ][thread-1][       EsCommonUtils:596]    CAUSE: Installer has detected that network interface ib1 does not maintain connectivity on all cluster nodes.

2014-06-11 12:00:07,603 [FINE  ][thread-1][       EsCommonUtils:596]    ACTION: Ensure that the chosen interface has been configured across all cluster nodes.

   這種報錯平時我們也會遇到,透過MOS找到下面這篇文章:

[INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes. (文件 ID 1427202.1)
修改時間:2013-7-8型別:REFERENCE

In this Document



APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
Information in this document applies to any platform.

PURPOSE

The note lists problems, solutions or workarounds that's related to the following 11gR2 GI OUI error:

[FATAL] [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes.
CAUSE: Installer has detected that network interface eth1 does not maintain connectivity on all cluster nodes.
ACTION: Ensure that the chosen interface has been configured across all cluster nodes.


DETAILS


[INS-41112] is a high level error number, the workarounds/solutions depend on the error code from lower layer, however, [INS-41112] does tell which interface is having the issue:

CAUSE: Installer has detected that network interface eth1 does not maintain connectivity on all cluster nodes.

## >> in this case, it's eth1 that's having connectivityissue



To find out lower layer error code, execute the following as grid user:

runcluvfy.sh comp nodecon -i -n ,, -verbose



Refer to the following once CVU reports real error code:

  • PRVF-7617
Refer to note 1335136.1 for details.

 

  • PRVF-6020 : Different MTU values used across network interfaces in subnet "10.10.10.0"
Refer to note 1429104.1 for details.

手動執行runcluvfy.sh命令驗證安裝環境,如果驗證透過可以再次嘗試執行這步。
再次執行這個步驟成功。

5.執行onecommand第14步DBCA建立資料庫。
   在執行第14步前之前確保所有資料庫伺服器作業系統grid和oracle使用者的環境變數已經正確配置。

    這次Exadata實施是我的第一次,透過這個過程學到了不少東西,在整個實施過程需要注意四點:
1).Exadata過程中沒有圖形化、沒有字元介面工具,幾乎全是指令碼和命令的方式。
2).在實施前的規劃過程中一定要考慮周全,避免在實施過程中推翻之前的規劃,特別是IP規劃。
2).儘量避免手動的修改包括IP地址、主機名等在內的配置檔案。
3).出現報錯心要靜,仔細檢視日誌,仔細分析報錯的內容,才能很快的找到問題的原因。

在手動修改主機IP的時候還遇到了如下的問題:
   參考文章《ssh連線Linux收到The remote system refused the connection報錯》:http://blog.itpub.net/23135684/viewspace-1181160/

--end--

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29119536/viewspace-1502231/,如需轉載,請註明出處,否則將追究法律責任。

相關文章