Oracle Exadata Machine X4-2實施記錄
這是在我第一次成功部署Oracle Exadata Machine X4-2後總結的文章。
這篇文章記錄在第一次實施過程中遇到的問題,以及解決問題的過程。
1.執行onecommand第1步驗證配置檔案時的報錯。
出現這個問題的原因是我使用vi編輯器手動修改了作業系統的機器名,將原有的管理網段機器名(dm01dbadm01)修改為了Client網段機器名(dm01db01),下面是報錯的內容:
[root@dm01db01 linux-x64]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/linux-x64/tequ-dm01.xml -s 1
Executing Validate Configuration File..............java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
at oracle.onecommand.deploy.validation.DnsValidation.validateClusterDns(DnsValidation.java:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
... 11 more
Caused by: oracle.onecommand.escommon.common.OcmdException: Unable to locate file dm01db01.tequ.com--etc-hosts
at oracle.onecommand.escommon.common.EsCommonUtils.getFileInputStream(EsCommonUtils.java:834)
at oracle.onecommand.escommon.common.FileUtils.readFile(FileUtils.java:103)
at oracle.onecommand.deploy.validation.DnsValidation.getMapfromHostsFile(DnsValidation.java:322)
at oracle.onecommand.deploy.validation.DnsValidation.validateDnsOnMachine(DnsValidation.java:216)
... 11 more
java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
at oracle.onecommand.deploy.validation.DnsValidation.validateClusterDns(DnsValidation.java:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
......
Caused by: oracle.onecommand.escommon.common.OcmdException: Unable to locate file dm01db02.tequ.com--etc-hosts
at oracle.onecommand.escommon.common.EsCommonUtils.getFileInputStream(EsCommonUtils.java:834)
at oracle.onecommand.escommon.common.FileUtils.readFile(FileUtils.java:103)
at oracle.onecommand.deploy.validation.DnsValidation.getMapfromHostsFile(DnsValidation.java:322)
at oracle.onecommand.deploy.validation.DnsValidation.validateDnsOnMachine(DnsValidation.java:216)
... 11 more
java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
at oracle.onecommand.deploy.validation.DnsValidation.validateClusterDns(DnsValidation.java:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
... 11 more
Caused by: oracle.onecommand.escommon.common.OcmdException: Unable to locate file dm01db02.tequ.com--etc-hosts
at oracle.onecommand.escommon.common.EsCommonUtils.getFileInputStream(EsCommonUtils.java:834)
at oracle.onecommand.escommon.common.FileUtils.readFile(FileUtils.java:103)
at oracle.onecommand.deploy.validation.DnsValidation.getMapfromHostsFile(DnsValidation.java:322)
at oracle.onecommand.deploy.validation.DnsValidation.validateDnsOnMachine(DnsValidation.java:216)
... 11 more
java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
at oracle.onecommand.deploy.validation.DnsValidation.validateClusterDns(DnsValidation.java:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
... 11 more
Caused by: oracle.onecommand.escommon.common.OcmdException: Unable to locate file dm01db02.tequ.com--etc-hosts
at oracle.onecommand.escommon.common.EsCommonUtils.getFileInputStream(EsCommonUtils.java:834)
at oracle.onecommand.escommon.common.FileUtils.readFile(FileUtils.java:103)
at oracle.onecommand.deploy.validation.DnsValidation.getMapfromHostsFile(DnsValidation.java:322)
at oracle.onecommand.deploy.validation.DnsValidation.validateDnsOnMachine(DnsValidation.java:216)
... 11 more
.....
Validating cluster: cluster-clu1
Locating machines...
Verifying operating systems...
Validating cluster networks......
Validating network connectivity............
Validating NTP setup..........
Validating physical disks on storage cells........................................................
Completed validation...
SUCCESS: Validated NTP server 10.0.8.114
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine dm01db01.tequ.com, machine type: compute
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine dm01db02.tequ.com, machine type: compute
SUCCESS:
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip exists...
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p18371656_112040_Linux-x86-64.zip exists...
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip exists...
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip exists...
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p6880880_112000_Linux-x86-64.zip exists...
......
最開始我一直以為這個報錯是hosts的配置問題,後來靜下心來想了一下,報錯是找不到dm01db02.tequ.com--etc-hosts和dm01db02.tequ.com--etc-hosts兩個檔案,難道這是兩個單獨的檔案,於是對onecommand目錄進行了搜尋:
[root@dm01db01 linux-x64]# find . -name *host*
./WorkDir/dm01dbadm02.tequ.com--etc-hosts
./WorkDir/dm01dbadm01.tequ.com--etc-hosts
果然在WorkDir目錄下有兩個原有機器名的類似檔案。
[root@dm01db01 linux-x64]# cd WorkDir
[root@dm01db01 WorkDir]# ls
dm01celadm01.tequ.com-cpuInfo.txt dm01db02.tequ.com-cpuInfo.txt dm01dbadm02.tequ.com-memInfo.txt
dm01celadm01.tequ.com-memInfo.txt dm01db02.tequ.com-memInfo.txt dm01dbadm02.tequ.com-ntpConf.txt
dm01celadm02.tequ.com-cpuInfo.txt dm01db02.tequ.com-ntpConf.txt p13390677_112040_Linux-x86-64_1of7.zip
dm01celadm02.tequ.com-memInfo.txt dm01dbadm01.tequ.com-cpuInfo.txt p13390677_112040_Linux-x86-64_2of7.zip
dm01celadm03.tequ.com-cpuInfo.txt dm01dbadm01.tequ.com--etc-hosts p13390677_112040_Linux-x86-64_3of7.zip
dm01celadm03.tequ.com-memInfo.txt dm01dbadm01.tequ.com-memInfo.txt p18371656_112040_Linux-x86-64.zip
dm01db01.tequ.com-cpuInfo.txt dm01dbadm01.tequ.com-ntpConf.txt p6880880_112000_Linux-x86-64.zip
dm01db01.tequ.com-memInfo.txt dm01dbadm02.tequ.com-cpuInfo.txt
dm01db01.tequ.com-ntpConf.txt dm01dbadm02.tequ.com--etc-hosts
[root@dm01db01 WorkDir]# ls *host*
dm01dbadm01.tequ.com--etc-hosts dm01dbadm02.tequ.com--etc-hosts
[root@dm01db01 WorkDir]# cat dm01dbadm01.tequ.com--etc-hosts
#### BEGIN Generated by Exadata. DO NOT MODIFY ####
127.0.0.1 localhost.localdomain localhost
192.168.10.1 dm01db01-priv1.tequ.com dm01db01-priv1
192.168.10.2 dm01db01-priv2.tequ.com dm01db01-priv2
10.0.3.10 dm01db01.tequ.com dm01db01
10.255.255.10 dm01dbadm01.tequ.com dm01dbadm01
192.168.10.3 dm01db02-priv1.tequ.com dm01db02-priv1
192.168.10.4 dm01db02-priv2.tequ.com dm01db02-priv2
10.0.3.11 dm01db02.tequ.com dm01db02
10.255.255.11 dm01dbadm02.tequ.com dm01dbadm02
10.0.3.13 dm01db02-vip.tequ.com dm01db02-vip
10.0.3.12 dm01db01-vip.tequ.com dm01db01-vip
#### END Generated by Exadata ####
檔案的內容和/etc/hosts檔案是一致的,說明在執行onecommand的時候讀取的是WorkDir下的*--etc-hosts檔案,而不是直接讀取/etc/hosts檔案。
直接複製這兩份檔案:
[root@dm01db01 WorkDir]# cp dm01dbadm01.tequ.com--etc-hosts dm01db01.tequ.com--etc-hosts
[root@dm01db01 WorkDir]# cp dm01dbadm02.tequ.com--etc-hosts dm01db02.tequ.com--etc-hosts
[root@dm01db01 WorkDir]#
[root@dm01db01 WorkDir]# cd ..
之後再次執行驗證命令:
[root@dm01db01 linux-x64]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/linux-x64/tequ-dm01.xml -s 1
Executing Validate Configuration File...................
Validating cluster: cluster-clu1
Locating machines...
Verifying operating systems...
Validating cluster networks......
Validating network connectivity............
Validating NTP setup..........
Validating physical disks on storage cells........................................................
Completed validation...
SUCCESS: 10.255.255.10 configured correctly on machine dm01db01.tequ.com
SUCCESS: 10.0.3.10 configured correctly on machine dm01db01.tequ.com
SUCCESS: 10.255.255.11 configured correctly on machine dm01db01.tequ.com
SUCCESS: 10.0.3.11 configured correctly on machine dm01db01.tequ.com
SUCCESS: 10.0.3.13 configured correctly on machine dm01db01.tequ.com
SUCCESS: 10.0.3.12 configured correctly on machine dm01db01.tequ.com
SUCCESS: 10.255.255.10 configured correctly on machine dm01db02.tequ.com
SUCCESS: 10.0.3.10 configured correctly on machine dm01db02.tequ.com
SUCCESS: 10.255.255.11 configured correctly on machine dm01db02.tequ.com
SUCCESS: 10.0.3.11 configured correctly on machine dm01db02.tequ.com
SUCCESS: 10.0.3.13 configured correctly on machine dm01db02.tequ.com
SUCCESS: 10.0.3.12 configured correctly on machine dm01db02.tequ.com
SUCCESS: Validated NTP server 10.0.8.114
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine dm01db02.tequ.com, machine type: compute
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine dm01db01.tequ.com, machine type: compute
SUCCESS:
SUCCESS: NTP servers on machine dm01db02.tequ.com verified successfully
SUCCESS: NTP servers on machine dm01db01.tequ.com verified successfully
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip exists...
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p18371656_112040_Linux-x86-64.zip exists...
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip exists...
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip exists...
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p6880880_112000_Linux-x86-64.zip exists...
Following errors were found...
ERROR: 10.0.3.16 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db01.tequ.com
ERROR: 10.0.3.14 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db01.tequ.com
ERROR: 10.0.3.15 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db01.tequ.com
ERROR: 10.0.3.16 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db02.tequ.com
ERROR: 10.0.3.14 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db02.tequ.com
ERROR: 10.0.3.15 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db02.tequ.com
ERROR: Encountered error while checking NTP server. Error getting time from NTP server: 10.0.5.114
Errors occured...
沒有再報之間的Java錯誤,這裡還報了dm01-scan解析有問題,透過在作業系統層面執行:
[root@dm01db01 bin]# nslookup dm01-scan
Server: 10.0.8.114
Address: 10.0.8.114#53
Name: dm01-scan.tequ.com
Address: 10.0.3.15
Name: dm01-scan.tequ.com
Address: 10.0.3.16
Name: dm01-scan.tequ.com
Address: 10.0.3.14
解析dm01-scan和dm01-scan.domain解析都沒問題,於是將該問題忽略。
NTP可以配置多個,只要保障有一個暫時可用,以上的NTP錯誤即可忽略。
另外想強調一點,onecommand目錄下的log目錄會詳細記錄每一次的onecommand操作,其他錯誤可以透過檢視對應的日誌來找問題。
[root@dm01db01 oracle.SupportTools]# cd /opt/oracle.SupportTools/onecommand/log
[root@dm01db01 log]# ll
total 33328
-rw-r--r-- 1 root root 61426 Jun 11 23:41 log.out
-rw-r--r-- 1 root root 57376 Jun 11 14:15 Step10_Initialize_Cluster_Software_140611_140220.out
-rw-r--r-- 1 root root 4369159 Jun 11 14:33 Step11_Install_Database_Software_140611_142500.out
-rw-r--r-- 1 root root 2136 Jun 11 14:38 Step12_Relink_Database_with_RDS_140611_143823.out
-rw-r--r-- 1 root root 84212 Jun 11 14:39 Step13_Create_ASM_Diskgroups_140611_143851.out
-rw-r--r-- 1 root root 217597 Jun 11 14:41 Step14_Create_Databases_140611_144043.out
-rw-r--r-- 1 root root 235300 Jun 11 19:52 Step14_Create_Databases_140611_195204.out
-rw-r--r-- 1 root root 236827 Jun 11 20:35 Step14_Create_Databases_140611_203506.out
-rw-r--r-- 1 root root 235302 Jun 11 20:43 Step14_Create_Databases_140611_204302.out
-rw-r--r-- 1 root root 225397 Jun 11 21:07 Step14_Create_Databases_140611_204615.out
-rw-r--r-- 1 root root 402325 Jun 11 21:14 Step15_Apply_Security_Fixes_140611_210805.out
-rw-r--r-- 1 root root 655789 Jun 11 21:23 Step16_Create_Installation_Summary_140611_212222.out
-rw-r--r-- 1 root root 741416 Jun 11 21:35 Step17_Resecure_Machine_140611_212329.out
-rw-r--r-- 1 root root 28409 Jun 11 23:32 Step17_Resecure_Machine_140611_233217.out
-rw-r--r-- 1 root root 256551 Jun 11 01:39 Step1_Validate_Configuration_File_140609_141934.out
-rw-r--r-- 1 root root 492178 Jun 11 01:39 Step1_Validate_Configuration_File_140609_142913.out
-rw-r--r-- 1 root root 492398 Jun 11 01:39 Step1_Validate_Configuration_File_140609_165947.out
-rw-r--r-- 1 root root 494475 Jun 11 01:39 Step1_Validate_Configuration_File_140609_171926.out
-rw-r--r-- 1 root root 495927 Jun 11 01:39 Step1_Validate_Configuration_File_140610_095109.out
-rw-r--r-- 1 root root 491049 Jun 11 01:39 Step1_Validate_Configuration_File_140610_104933.out
-rw-r--r-- 1 root root 227377 Jun 11 01:39 Step1_Validate_Configuration_File_140610_111043.out
-rw-r--r-- 1 root root 581537 Jun 11 01:39 Step1_Validate_Configuration_File_140610_111825.out
-rw-r--r-- 1 root root 580946 Jun 11 01:39 Step1_Validate_Configuration_File_140610_112825.out
-rw-r--r-- 1 root root 580689 Jun 11 01:39 Step1_Validate_Configuration_File_140610_114206.out
-rw-r--r-- 1 root root 583397 Jun 11 01:39 Step1_Validate_Configuration_File_140610_141909.out
-rw-r--r-- 1 root root 487083 Jun 11 01:39 Step1_Validate_Configuration_File_140610_143318.out
-rw-r--r-- 1 root root 534536 Jun 11 01:39 Step1_Validate_Configuration_File_140610_143842.out
-rw-r--r-- 1 root root 531754 Jun 11 01:39 Step1_Validate_Configuration_File_140610_145105.out
-rw-r--r-- 1 root root 585919 Jun 11 01:39 Step1_Validate_Configuration_File_140610_235714.out
-rw-r--r-- 1 root root 490349 Jun 11 01:39 Step1_Validate_Configuration_File_140611_005853.out
-rw-r--r-- 1 root root 489191 Jun 11 01:39 Step1_Validate_Configuration_File_140611_010318.out
-rw-r--r-- 1 root root 489267 Jun 11 01:39 Step1_Validate_Configuration_File_140611_010452.out
-rw-r--r-- 1 root root 489311 Jun 11 01:39 Step1_Validate_Configuration_File_140611_010935.out
-rw-r--r-- 1 root root 490987 Jun 11 01:39 Step1_Validate_Configuration_File_140611_011338.out
-rw-r--r-- 1 root root 491003 Jun 11 01:42 Step1_Validate_Configuration_File_140611_014106.out
-rw-r--r-- 1 root root 2468856 Jun 11 01:39 Step2_Setup_Required_Files_140610_113549.out
-rw-r--r-- 1 root root 2458010 Jun 11 01:39 Step2_Setup_Required_Files_140611_011643.out
-rw-r--r-- 1 root root 48406 Jun 11 01:46 Step2_Setup_Required_Files_140611_014537.out
-rw-r--r-- 1 root root 2468732 Jun 11 01:50 Step2_Setup_Required_Files_140611_014853.out
-rw-r--r-- 1 root root 2861511 Jun 11 10:10 Step2_Setup_Required_Files_140611_100656.out
-rw-r--r-- 1 root root 663710 Jun 11 01:39 Step3_Update_Nodes_for_Eighth_Rack_140610_113946.out
-rw-r--r-- 1 root root 697616 Jun 11 10:13 Step3_Update_Nodes_for_Eighth_Rack_140611_101020.out
-rw-r--r-- 1 root root 666778 Jun 11 10:27 Step3_Update_Nodes_for_Eighth_Rack_140611_102543.out
-rw-r--r-- 1 root root 386914 Jun 11 10:40 Step4_Create_Users_140611_103752.out
-rw-r--r-- 1 root root 17602 Jun 11 10:40 Step5_Setup_Cell_Connectivity_140611_104041.out
-rw-r--r-- 1 root root 701735 Jun 11 10:45 Step6_Verify_Infiniband_and_Calibrate_Cells_140611_104104.out
-rw-r--r-- 1 root root 697829 Jun 11 11:06 Step6_Verify_Infiniband_and_Calibrate_Cells_140611_110244.out
-rw-r--r-- 1 root root 771081 Jun 11 11:28 Step6_Verify_Infiniband_and_Calibrate_Cells_140611_112140.out
-rw-r--r-- 1 root root 23701 Jun 11 11:33 Step7_Create_Cell_Disks_140611_113025.out
-rw-r--r-- 1 root root 590542 Jun 11 11:34 Step8_Create_Grid_Disks_140611_113429.out
-rw-r--r-- 1 root root 70332 Jun 11 11:38 Step9_Install_Cluster_Software_140611_113621.out
-rw-r--r-- 1 root root 70570 Jun 11 12:00 Step9_Install_Cluster_Software_140611_115755.out
-rw-r--r-- 1 root root 190715 Jun 11 14:01 Step9_Install_Cluster_Software_140611_135339.out
-rw-r--r-- 1 root root 13982 Jun 11 19:48 UndoStep14_Create_Databases_140611_194842.out
-rw-r--r-- 1 root root 14250 Jun 11 19:51 UndoStep14_Create_Databases_140611_195103.out
-rw-r--r-- 1 root root 14004 Jun 11 19:51 UndoStep14_Create_Databases_140611_195135.out
2.執行onecommand第2步建立必要檔案時的報錯。
下面是在執行onecommand第二步時候的報錯:
[root@dm01db01 onecommand]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/tequ-dm01.xml -s 2
Executing Setup Required Files..
Copying and extracting required files...
Required files are:
/opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip
/opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip
/opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip
/opt/oracle.SupportTools/onecommand/WorkDir/p18371656_112040_Linux-x86-64.zip
/opt/oracle.SupportTools/onecommand/WorkDir/p6880880_112000_Linux-x86-64.zip
Copying required files...
Checking status of remote files..........
Getting status of local files............
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_1of7.zip.
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_2of7.zip.
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_3of7.zip.
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p18371656_112040_Linux-x86-64.zip at /opt/oracle.SupportTools/onecommand/p18371656_112040_Linux-x86-64.zip.
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p6880880_112000_Linux-x86-64.zip at /opt/oracle.SupportTools/onecommand/Software/patches/p6880880_112000_Linux-x86-64.zip.
Copying file: p18371656_112040_Linux-x86-64.zip to node dm01db02.tequ.com.
Copying file: p6880880_112000_Linux-x86-64.zip to node dm01db02.tequ.com............
Completed copying files.....
Extracting required files............................
Copying resourcecontrol and other required files........................
Execution Exception in future get
OCMD-02624: Error while executing command {0}.java.lang.reflect.InvocationTargetException
Error running Setup Required Files error message Error running oracle.onecommand.deploy.software.SoftwareUtils method setupRequiredFiles
從這個報錯看不出任何的原因。
檢視/opt/oracle.SupportTools/onecommand/log/Step2_Setup_Required_Files_140611_014853.out,找到第一個報錯的地方,下面是相關的日誌輸出:
......
2014-06-11 01:50:26,449 [FINE ][MDThread][ KommandOutput:95] ======
2014-06-11 01:50:26,449 [FINE ][MDThread][ KommandOutput:64] # of kommand outputs 1
2014-06-11 01:50:26,449 [FINE ][MDThread][ RunCommand:170] Ran commands, elapsed time = 17006 mS
2014-06-11 01:50:26,455 [FINE ][MDThread][ KommandOutput:79] ======
2014-06-11 01:50:26,455 [FINE ][MDThread][ KommandOutput:80] Output
2014-06-11 01:50:26,455 [FINE ][MDThread][ KommandOutput:81] ======
2014-06-11 01:50:26,455 [FINE ][MDThread][ KommandOutput:84] Ret code = <52>
2014-06-11 01:50:26,455 [FINE ][MDThread][ KommandOutput:86] From node dm01celadm01.tequ.com
2014-06-11 01:50:26,455 [FINE ][MDThread][ KommandOutput:89] ## Output Start
2014-06-11 01:50:26,455 [FINE ][MDThread][ EsCommonUtils:596] OCMD-00052: Node dm01celadm01.tequ.com appears to be down.
2014-06-11 01:50:26,456 [FINE ][MDThread][ KommandOutput:91] ## Output End
2014-06-11 01:50:26,456 [FINE ][MDThread][ KommandOutput:95] ======
2014-06-11 01:50:26,456 [FINE ][MDThread][ OcmdException:62] Throwing OcmdException... message:Command [mkdir -p /opt/oracle.SupportTools] run on node 10.255.255.12 as user root did not execute successfully...
2014-06-11 01:50:26,456 [FINE ][MDThread][ OcmdException:98] Stack trace...
2014-06-11 01:50:26,457 [FINE ][MDThread][ OcmdException:135] OcmdException from node dm01db01.tequ.com return code = 2 output string: Command [mkdir -p /opt/oracle.SupportTools] run on node 10.255.255.12 as user root did not execute successfully... stack trace = java.lang.Throwable
at oracle.onecommand.escommon.common.OcmdException.ocmdException(OcmdException.java:95)
at oracle.onecommand.escommon.common.OcmdException.(OcmdException.java:64)
at oracle.onecommand.commandexec.utils.CommonUtils.checkKommandOutput(CommonUtils.java:1369)
at oracle.onecommand.commandexec.utils.RemoteFileUtils.sftpPutFile(RemoteFileUtils.java:1491)
at oracle.onecommand.commandexec.utils.RemoteFileUtils.sftpPutFile(RemoteFileUtils.java:1378)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.commandexec.utils.CallableReflectionMethod.runMethodInParallel(CallableReflectionMethod.java:121)
at oracle.onecommand.commandexec.utils.CallableReflectionMethod.call(CallableReflectionMethod.java:70)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
2014-06-11 01:50:26,457 [INFO ][ main][ RunCommand:714] Execution Exception in future get
2014-06-11 01:50:26,458 [INFO ][ main][ RunCommand:721] OCMD-02624: Error while executing command {0}.java.lang.reflect.InvocationTargetException
從表面上看是說在dm01celadm01(儲存伺服器)上執行mkdir命令不成功,但手動在儲存伺服器執行該命令是沒問題的。原因可能是在資料庫伺服器節點dm01celadm01.tequ.com名稱不能被解析,檢查資料庫伺服器的/etc/hosts檔案,確實沒有配置對3臺儲存伺服器名稱的解析,將以下內容加入到兩臺資料庫伺服器hosts檔案中:
#### BEGIN Generated by Exadata. DO NOT MODIFY ####
127.0.0.1 localhost.localdomain localhost
192.168.10.1 dm01db01-priv1.tequ.com dm01db01-priv1
192.168.10.2 dm01db01-priv2.tequ.com dm01db01-priv2
10.0.3.10 dm01db01.tequ.com dm01db01
10.255.255.10 dm01dbadm01.tequ.com dm01dbadm01
192.168.10.3 dm01db02-priv1.tequ.com dm01db02-priv1
192.168.10.4 dm01db02-priv2.tequ.com dm01db02-priv2
10.0.3.11 dm01db02.tequ.com dm01db02
10.255.255.11 dm01dbadm02.tequ.com dm01dbadm02
10.0.3.13 dm01db02-vip.tequ.com dm01db02-vip
10.0.3.12 dm01db01-vip.tequ.com dm01db01-vip
10.255.255.12 dm01celadm01.tequ.com dm01celadm01
10.255.255.13 dm01celadm02.tequ.com dm01celadm02
10.255.255.14 dm01celadm03.tequ.com dm01celadm03
#### END Generated by Exadata ####
之後再次執行onecommand操作:
[root@dm01db01 onecommand]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/tequ-dm01.xml -s 2
Executing Setup Required Files.
Copying and extracting required files...
Required files are:
/opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip
/opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip
/opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip
/opt/oracle.SupportTools/onecommand/WorkDir/p18371656_112040_Linux-x86-64.zip
/opt/oracle.SupportTools/onecommand/WorkDir/p6880880_112000_Linux-x86-64.zip
Copying required files...
Checking status of remote files..........
Checking status of existing files on remote nodes....
Getting status of local files.................
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_1of7.zip.
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_2of7.zip.
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_3of7.zip.
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p18371656_112040_Linux-x86-64.zip at /opt/oracle.SupportTools/onecommand/p18371656_112040_Linux-x86-64.zip.
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p6880880_112000_Linux-x86-64.zip at /opt/oracle.SupportTools/onecommand/Software/patches/p6880880_112000_Linux-x86-64.zip..
Extracting required files........................
Copying resourcecontrol and other required files.............................................................................................................................
Creating databasemachine.xml for EM discovery
Done Creating databasemachine.xml for EM discovery.
Successfully completed execution of step Setup Required Files [elapsed Time [Elapsed = 185647 mS [3.0 minutes] Wed Jun 11 10:10:02 CST 2014]]
成功完成第二步。
3.執行onecommand第6步驗證InfiniBand相關配置。
下面是執行第6步時候的報錯:
[root@dm01db01 onecommand]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/tequ-dm01.xml -s 6
Executing Verify Infiniband and Calibrate Cells
Running rds ping tests on cluster nodes...........................................................................................................
Validating infiniband network with rds-ping.....
No ping errors while pinging infiniband fabric.......................................................................................................................................
dm01celadm02.tequ.com
ssh: dm01celadm03: Temporary failure in name resolution
ssh: dm01celadm01: Temporary failure in name resolution
ssh: dm01db02: Temporary failure in name resolution
ssh: dm01db01: Temporary failure in name resolution
Error running Verify Infiniband and Calibrate Cells error message Error running oracle.onecommand.deploy.validation.ValidationUtils method validateInfiniband
Error running oracle.onecommand.deploy.validation.ValidationUtils method validateInfiniband
在作業系統層面可以使用rds-ping>來驗證rds包傳輸。
手動測試沒問題之後,再次執行第6步操作成功。
4.執行onecommand第9步安裝Cluster軟體時候的報錯。
在執行第9步的時候收到如下的報錯:
[root@dm01db01 onecommand]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/tequ-dm01.xml -s 9
Executing Install Cluster Software
Installing cluster cluster-clu1.
Getting grid disks using utility in /opt/oracle.SupportTools/onecommand/Software/11.2.0.4/grid...................
Running Oracle installer.................................................................................................................java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
at oracle.onecommand.deploy.software.SoftwareUtils.getKommandOutputsFromParallelizer(SoftwareUtils.java:1304)
at oracle.onecommand.deploy.software.SoftwareUtils.doInstallClusterware(SoftwareUtils.java:1327)
at oracle.onecommand.deploy.software.SoftwareUtils.installClusterWare(SoftwareUtils.java:1292)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.deploy.cliXml.InstalSoftware.executeStep(InstalSoftware.java:526)
at oracle.onecommand.deploy.cliXml.InstalSoftware.executeForwardAction(InstalSoftware.java:462)
at oracle.onecommand.deploy.cliXml.InstalSoftware.parseCmdLine(InstalSoftware.java:368)
at oracle.onecommand.deploy.cliXml.InstalSoftware.main(InstalSoftware.java:265)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: oracle.onecommand.escommon.common.OcmdException: Installation did not complete successfully, please check logs in /u01/app/oraInventory
at oracle.onecommand.deploy.software.ClusterZipInstall112040.install(ClusterZipInstall112040.java:358)
at oracle.onecommand.deploy.software.SoftwareUtils.installClusterwareByCluster(SoftwareUtils.java:1342)
... 11 more
java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
at oracle.onecommand.deploy.software.SoftwareUtils.getKommandOutputsFromParallelizer(SoftwareUtils.java:1304)
at oracle.onecommand.deploy.software.SoftwareUtils.doInstallClusterware(SoftwareUtils.java:1327)
at oracle.onecommand.deploy.software.SoftwareUtils.installClusterWare(SoftwareUtils.java:1292)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.deploy.cliXml.InstalSoftware.executeStep(InstalSoftware.java:526)
at oracle.onecommand.deploy.cliXml.InstalSoftware.executeForwardAction(InstalSoftware.java:462)
at oracle.onecommand.deploy.cliXml.InstalSoftware.parseCmdLine(InstalSoftware.java:368)
at oracle.onecommand.deploy.cliXml.InstalSoftware.main(InstalSoftware.java:265)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: oracle.onecommand.escommon.common.OcmdException: Installation did not complete successfully, please check logs in /u01/app/oraInventory
at oracle.onecommand.deploy.software.ClusterZipInstall112040.install(ClusterZipInstall112040.java:358)
at oracle.onecommand.deploy.software.SoftwareUtils.installClusterwareByCluster(SoftwareUtils.java:1342)
... 11 more
Errors occured...
檢視log/Step9_Install_Cluster_Software_140611_115755.out檔案可以看到下面的報錯:
2014-06-11 12:00:07,602 [FINE ][thread-1][ EsCommonUtils:596] Preparing to launch Oracle Universal Installer from /tmp/OraInstall2014-06-11_11-58-17AM. Please wait ...[FATAL] [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes.
2014-06-11 12:00:07,603 [FINE ][thread-1][ EsCommonUtils:596] CAUSE: Installer has detected that network interface ib0 does not maintain connectivity on all cluster nodes.
2014-06-11 12:00:07,603 [FINE ][thread-1][ EsCommonUtils:596] ACTION: Ensure that the chosen interface has been configured across all cluster nodes.
2014-06-11 12:00:07,603 [FINE ][thread-1][ EsCommonUtils:596] [FATAL] [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes.
2014-06-11 12:00:07,603 [FINE ][thread-1][ EsCommonUtils:596] CAUSE: Installer has detected that network interface ib1 does not maintain connectivity on all cluster nodes.
2014-06-11 12:00:07,603 [FINE ][thread-1][ EsCommonUtils:596] ACTION: Ensure that the chosen interface has been configured across all cluster nodes.
這種報錯平時我們也會遇到,透過MOS找到下面這篇文章:
[INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes. (文件 ID 1427202.1)
修改時間:2013-7-8型別:REFERENCE
Information in this document applies to any platform.
手動執行runcluvfy.sh命令驗證安裝環境,如果驗證透過可以再次嘗試執行這步。
再次執行這個步驟成功。
5.執行onecommand第14步DBCA建立資料庫。
在執行第14步前之前確保所有資料庫伺服器作業系統grid和oracle使用者的環境變數已經正確配置。
這次Exadata實施是我的第一次,透過這個過程學到了不少東西,在整個實施過程需要注意四點:
1).Exadata過程中沒有圖形化、沒有字元介面工具,幾乎全是指令碼和命令的方式。
2).在實施前的規劃過程中一定要考慮周全,避免在實施過程中推翻之前的規劃,特別是IP規劃。
2).儘量避免手動的修改包括IP地址、主機名等在內的配置檔案。
3).出現報錯心要靜,仔細檢視日誌,仔細分析報錯的內容,才能很快的找到問題的原因。
在手動修改主機IP的時候還遇到了如下的問題:
參考文章《ssh連線Linux收到The remote system refused the connection報錯》:http://blog.itpub.net/23135684/viewspace-1181160/
--end--
這篇文章記錄在第一次實施過程中遇到的問題,以及解決問題的過程。
1.執行onecommand第1步驗證配置檔案時的報錯。
出現這個問題的原因是我使用vi編輯器手動修改了作業系統的機器名,將原有的管理網段機器名(dm01dbadm01)修改為了Client網段機器名(dm01db01),下面是報錯的內容:
[root@dm01db01 linux-x64]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/linux-x64/tequ-dm01.xml -s 1
Executing Validate Configuration File..............java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
at oracle.onecommand.deploy.validation.DnsValidation.validateClusterDns(DnsValidation.java:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
... 11 more
Caused by: oracle.onecommand.escommon.common.OcmdException: Unable to locate file dm01db01.tequ.com--etc-hosts
at oracle.onecommand.escommon.common.EsCommonUtils.getFileInputStream(EsCommonUtils.java:834)
at oracle.onecommand.escommon.common.FileUtils.readFile(FileUtils.java:103)
at oracle.onecommand.deploy.validation.DnsValidation.getMapfromHostsFile(DnsValidation.java:322)
at oracle.onecommand.deploy.validation.DnsValidation.validateDnsOnMachine(DnsValidation.java:216)
... 11 more
java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
at oracle.onecommand.deploy.validation.DnsValidation.validateClusterDns(DnsValidation.java:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
......
Caused by: oracle.onecommand.escommon.common.OcmdException: Unable to locate file dm01db02.tequ.com--etc-hosts
at oracle.onecommand.escommon.common.EsCommonUtils.getFileInputStream(EsCommonUtils.java:834)
at oracle.onecommand.escommon.common.FileUtils.readFile(FileUtils.java:103)
at oracle.onecommand.deploy.validation.DnsValidation.getMapfromHostsFile(DnsValidation.java:322)
at oracle.onecommand.deploy.validation.DnsValidation.validateDnsOnMachine(DnsValidation.java:216)
... 11 more
java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
at oracle.onecommand.deploy.validation.DnsValidation.validateClusterDns(DnsValidation.java:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
... 11 more
Caused by: oracle.onecommand.escommon.common.OcmdException: Unable to locate file dm01db02.tequ.com--etc-hosts
at oracle.onecommand.escommon.common.EsCommonUtils.getFileInputStream(EsCommonUtils.java:834)
at oracle.onecommand.escommon.common.FileUtils.readFile(FileUtils.java:103)
at oracle.onecommand.deploy.validation.DnsValidation.getMapfromHostsFile(DnsValidation.java:322)
at oracle.onecommand.deploy.validation.DnsValidation.validateDnsOnMachine(DnsValidation.java:216)
... 11 more
java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
at oracle.onecommand.deploy.validation.DnsValidation.validateClusterDns(DnsValidation.java:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
... 11 more
Caused by: oracle.onecommand.escommon.common.OcmdException: Unable to locate file dm01db02.tequ.com--etc-hosts
at oracle.onecommand.escommon.common.EsCommonUtils.getFileInputStream(EsCommonUtils.java:834)
at oracle.onecommand.escommon.common.FileUtils.readFile(FileUtils.java:103)
at oracle.onecommand.deploy.validation.DnsValidation.getMapfromHostsFile(DnsValidation.java:322)
at oracle.onecommand.deploy.validation.DnsValidation.validateDnsOnMachine(DnsValidation.java:216)
... 11 more
.....
Validating cluster: cluster-clu1
Locating machines...
Verifying operating systems...
Validating cluster networks......
Validating network connectivity............
Validating NTP setup..........
Validating physical disks on storage cells........................................................
Completed validation...
SUCCESS: Validated NTP server 10.0.8.114
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine dm01db01.tequ.com, machine type: compute
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine dm01db02.tequ.com, machine type: compute
SUCCESS:
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip exists...
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p18371656_112040_Linux-x86-64.zip exists...
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip exists...
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip exists...
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p6880880_112000_Linux-x86-64.zip exists...
......
最開始我一直以為這個報錯是hosts的配置問題,後來靜下心來想了一下,報錯是找不到dm01db02.tequ.com--etc-hosts和dm01db02.tequ.com--etc-hosts兩個檔案,難道這是兩個單獨的檔案,於是對onecommand目錄進行了搜尋:
[root@dm01db01 linux-x64]# find . -name *host*
./WorkDir/dm01dbadm02.tequ.com--etc-hosts
./WorkDir/dm01dbadm01.tequ.com--etc-hosts
果然在WorkDir目錄下有兩個原有機器名的類似檔案。
[root@dm01db01 linux-x64]# cd WorkDir
[root@dm01db01 WorkDir]# ls
dm01celadm01.tequ.com-cpuInfo.txt dm01db02.tequ.com-cpuInfo.txt dm01dbadm02.tequ.com-memInfo.txt
dm01celadm01.tequ.com-memInfo.txt dm01db02.tequ.com-memInfo.txt dm01dbadm02.tequ.com-ntpConf.txt
dm01celadm02.tequ.com-cpuInfo.txt dm01db02.tequ.com-ntpConf.txt p13390677_112040_Linux-x86-64_1of7.zip
dm01celadm02.tequ.com-memInfo.txt dm01dbadm01.tequ.com-cpuInfo.txt p13390677_112040_Linux-x86-64_2of7.zip
dm01celadm03.tequ.com-cpuInfo.txt dm01dbadm01.tequ.com--etc-hosts p13390677_112040_Linux-x86-64_3of7.zip
dm01celadm03.tequ.com-memInfo.txt dm01dbadm01.tequ.com-memInfo.txt p18371656_112040_Linux-x86-64.zip
dm01db01.tequ.com-cpuInfo.txt dm01dbadm01.tequ.com-ntpConf.txt p6880880_112000_Linux-x86-64.zip
dm01db01.tequ.com-memInfo.txt dm01dbadm02.tequ.com-cpuInfo.txt
dm01db01.tequ.com-ntpConf.txt dm01dbadm02.tequ.com--etc-hosts
[root@dm01db01 WorkDir]# ls *host*
dm01dbadm01.tequ.com--etc-hosts dm01dbadm02.tequ.com--etc-hosts
[root@dm01db01 WorkDir]# cat dm01dbadm01.tequ.com--etc-hosts
#### BEGIN Generated by Exadata. DO NOT MODIFY ####
127.0.0.1 localhost.localdomain localhost
192.168.10.1 dm01db01-priv1.tequ.com dm01db01-priv1
192.168.10.2 dm01db01-priv2.tequ.com dm01db01-priv2
10.0.3.10 dm01db01.tequ.com dm01db01
10.255.255.10 dm01dbadm01.tequ.com dm01dbadm01
192.168.10.3 dm01db02-priv1.tequ.com dm01db02-priv1
192.168.10.4 dm01db02-priv2.tequ.com dm01db02-priv2
10.0.3.11 dm01db02.tequ.com dm01db02
10.255.255.11 dm01dbadm02.tequ.com dm01dbadm02
10.0.3.13 dm01db02-vip.tequ.com dm01db02-vip
10.0.3.12 dm01db01-vip.tequ.com dm01db01-vip
#### END Generated by Exadata ####
檔案的內容和/etc/hosts檔案是一致的,說明在執行onecommand的時候讀取的是WorkDir下的*--etc-hosts檔案,而不是直接讀取/etc/hosts檔案。
直接複製這兩份檔案:
[root@dm01db01 WorkDir]# cp dm01dbadm01.tequ.com--etc-hosts dm01db01.tequ.com--etc-hosts
[root@dm01db01 WorkDir]# cp dm01dbadm02.tequ.com--etc-hosts dm01db02.tequ.com--etc-hosts
[root@dm01db01 WorkDir]#
[root@dm01db01 WorkDir]# cd ..
之後再次執行驗證命令:
[root@dm01db01 linux-x64]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/linux-x64/tequ-dm01.xml -s 1
Executing Validate Configuration File...................
Validating cluster: cluster-clu1
Locating machines...
Verifying operating systems...
Validating cluster networks......
Validating network connectivity............
Validating NTP setup..........
Validating physical disks on storage cells........................................................
Completed validation...
SUCCESS: 10.255.255.10 configured correctly on machine dm01db01.tequ.com
SUCCESS: 10.0.3.10 configured correctly on machine dm01db01.tequ.com
SUCCESS: 10.255.255.11 configured correctly on machine dm01db01.tequ.com
SUCCESS: 10.0.3.11 configured correctly on machine dm01db01.tequ.com
SUCCESS: 10.0.3.13 configured correctly on machine dm01db01.tequ.com
SUCCESS: 10.0.3.12 configured correctly on machine dm01db01.tequ.com
SUCCESS: 10.255.255.10 configured correctly on machine dm01db02.tequ.com
SUCCESS: 10.0.3.10 configured correctly on machine dm01db02.tequ.com
SUCCESS: 10.255.255.11 configured correctly on machine dm01db02.tequ.com
SUCCESS: 10.0.3.11 configured correctly on machine dm01db02.tequ.com
SUCCESS: 10.0.3.13 configured correctly on machine dm01db02.tequ.com
SUCCESS: 10.0.3.12 configured correctly on machine dm01db02.tequ.com
SUCCESS: Validated NTP server 10.0.8.114
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine dm01db02.tequ.com, machine type: compute
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine dm01db01.tequ.com, machine type: compute
SUCCESS:
SUCCESS: NTP servers on machine dm01db02.tequ.com verified successfully
SUCCESS: NTP servers on machine dm01db01.tequ.com verified successfully
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip exists...
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p18371656_112040_Linux-x86-64.zip exists...
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip exists...
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip exists...
SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p6880880_112000_Linux-x86-64.zip exists...
Following errors were found...
ERROR: 10.0.3.16 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db01.tequ.com
ERROR: 10.0.3.14 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db01.tequ.com
ERROR: 10.0.3.15 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db01.tequ.com
ERROR: 10.0.3.16 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db02.tequ.com
ERROR: 10.0.3.14 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db02.tequ.com
ERROR: 10.0.3.15 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db02.tequ.com
ERROR: Encountered error while checking NTP server. Error getting time from NTP server: 10.0.5.114
Errors occured...
沒有再報之間的Java錯誤,這裡還報了dm01-scan解析有問題,透過在作業系統層面執行:
[root@dm01db01 bin]# nslookup dm01-scan
Server: 10.0.8.114
Address: 10.0.8.114#53
Name: dm01-scan.tequ.com
Address: 10.0.3.15
Name: dm01-scan.tequ.com
Address: 10.0.3.16
Name: dm01-scan.tequ.com
Address: 10.0.3.14
解析dm01-scan和dm01-scan.domain解析都沒問題,於是將該問題忽略。
NTP可以配置多個,只要保障有一個暫時可用,以上的NTP錯誤即可忽略。
另外想強調一點,onecommand目錄下的log目錄會詳細記錄每一次的onecommand操作,其他錯誤可以透過檢視對應的日誌來找問題。
[root@dm01db01 oracle.SupportTools]# cd /opt/oracle.SupportTools/onecommand/log
[root@dm01db01 log]# ll
total 33328
-rw-r--r-- 1 root root 61426 Jun 11 23:41 log.out
-rw-r--r-- 1 root root 57376 Jun 11 14:15 Step10_Initialize_Cluster_Software_140611_140220.out
-rw-r--r-- 1 root root 4369159 Jun 11 14:33 Step11_Install_Database_Software_140611_142500.out
-rw-r--r-- 1 root root 2136 Jun 11 14:38 Step12_Relink_Database_with_RDS_140611_143823.out
-rw-r--r-- 1 root root 84212 Jun 11 14:39 Step13_Create_ASM_Diskgroups_140611_143851.out
-rw-r--r-- 1 root root 217597 Jun 11 14:41 Step14_Create_Databases_140611_144043.out
-rw-r--r-- 1 root root 235300 Jun 11 19:52 Step14_Create_Databases_140611_195204.out
-rw-r--r-- 1 root root 236827 Jun 11 20:35 Step14_Create_Databases_140611_203506.out
-rw-r--r-- 1 root root 235302 Jun 11 20:43 Step14_Create_Databases_140611_204302.out
-rw-r--r-- 1 root root 225397 Jun 11 21:07 Step14_Create_Databases_140611_204615.out
-rw-r--r-- 1 root root 402325 Jun 11 21:14 Step15_Apply_Security_Fixes_140611_210805.out
-rw-r--r-- 1 root root 655789 Jun 11 21:23 Step16_Create_Installation_Summary_140611_212222.out
-rw-r--r-- 1 root root 741416 Jun 11 21:35 Step17_Resecure_Machine_140611_212329.out
-rw-r--r-- 1 root root 28409 Jun 11 23:32 Step17_Resecure_Machine_140611_233217.out
-rw-r--r-- 1 root root 256551 Jun 11 01:39 Step1_Validate_Configuration_File_140609_141934.out
-rw-r--r-- 1 root root 492178 Jun 11 01:39 Step1_Validate_Configuration_File_140609_142913.out
-rw-r--r-- 1 root root 492398 Jun 11 01:39 Step1_Validate_Configuration_File_140609_165947.out
-rw-r--r-- 1 root root 494475 Jun 11 01:39 Step1_Validate_Configuration_File_140609_171926.out
-rw-r--r-- 1 root root 495927 Jun 11 01:39 Step1_Validate_Configuration_File_140610_095109.out
-rw-r--r-- 1 root root 491049 Jun 11 01:39 Step1_Validate_Configuration_File_140610_104933.out
-rw-r--r-- 1 root root 227377 Jun 11 01:39 Step1_Validate_Configuration_File_140610_111043.out
-rw-r--r-- 1 root root 581537 Jun 11 01:39 Step1_Validate_Configuration_File_140610_111825.out
-rw-r--r-- 1 root root 580946 Jun 11 01:39 Step1_Validate_Configuration_File_140610_112825.out
-rw-r--r-- 1 root root 580689 Jun 11 01:39 Step1_Validate_Configuration_File_140610_114206.out
-rw-r--r-- 1 root root 583397 Jun 11 01:39 Step1_Validate_Configuration_File_140610_141909.out
-rw-r--r-- 1 root root 487083 Jun 11 01:39 Step1_Validate_Configuration_File_140610_143318.out
-rw-r--r-- 1 root root 534536 Jun 11 01:39 Step1_Validate_Configuration_File_140610_143842.out
-rw-r--r-- 1 root root 531754 Jun 11 01:39 Step1_Validate_Configuration_File_140610_145105.out
-rw-r--r-- 1 root root 585919 Jun 11 01:39 Step1_Validate_Configuration_File_140610_235714.out
-rw-r--r-- 1 root root 490349 Jun 11 01:39 Step1_Validate_Configuration_File_140611_005853.out
-rw-r--r-- 1 root root 489191 Jun 11 01:39 Step1_Validate_Configuration_File_140611_010318.out
-rw-r--r-- 1 root root 489267 Jun 11 01:39 Step1_Validate_Configuration_File_140611_010452.out
-rw-r--r-- 1 root root 489311 Jun 11 01:39 Step1_Validate_Configuration_File_140611_010935.out
-rw-r--r-- 1 root root 490987 Jun 11 01:39 Step1_Validate_Configuration_File_140611_011338.out
-rw-r--r-- 1 root root 491003 Jun 11 01:42 Step1_Validate_Configuration_File_140611_014106.out
-rw-r--r-- 1 root root 2468856 Jun 11 01:39 Step2_Setup_Required_Files_140610_113549.out
-rw-r--r-- 1 root root 2458010 Jun 11 01:39 Step2_Setup_Required_Files_140611_011643.out
-rw-r--r-- 1 root root 48406 Jun 11 01:46 Step2_Setup_Required_Files_140611_014537.out
-rw-r--r-- 1 root root 2468732 Jun 11 01:50 Step2_Setup_Required_Files_140611_014853.out
-rw-r--r-- 1 root root 2861511 Jun 11 10:10 Step2_Setup_Required_Files_140611_100656.out
-rw-r--r-- 1 root root 663710 Jun 11 01:39 Step3_Update_Nodes_for_Eighth_Rack_140610_113946.out
-rw-r--r-- 1 root root 697616 Jun 11 10:13 Step3_Update_Nodes_for_Eighth_Rack_140611_101020.out
-rw-r--r-- 1 root root 666778 Jun 11 10:27 Step3_Update_Nodes_for_Eighth_Rack_140611_102543.out
-rw-r--r-- 1 root root 386914 Jun 11 10:40 Step4_Create_Users_140611_103752.out
-rw-r--r-- 1 root root 17602 Jun 11 10:40 Step5_Setup_Cell_Connectivity_140611_104041.out
-rw-r--r-- 1 root root 701735 Jun 11 10:45 Step6_Verify_Infiniband_and_Calibrate_Cells_140611_104104.out
-rw-r--r-- 1 root root 697829 Jun 11 11:06 Step6_Verify_Infiniband_and_Calibrate_Cells_140611_110244.out
-rw-r--r-- 1 root root 771081 Jun 11 11:28 Step6_Verify_Infiniband_and_Calibrate_Cells_140611_112140.out
-rw-r--r-- 1 root root 23701 Jun 11 11:33 Step7_Create_Cell_Disks_140611_113025.out
-rw-r--r-- 1 root root 590542 Jun 11 11:34 Step8_Create_Grid_Disks_140611_113429.out
-rw-r--r-- 1 root root 70332 Jun 11 11:38 Step9_Install_Cluster_Software_140611_113621.out
-rw-r--r-- 1 root root 70570 Jun 11 12:00 Step9_Install_Cluster_Software_140611_115755.out
-rw-r--r-- 1 root root 190715 Jun 11 14:01 Step9_Install_Cluster_Software_140611_135339.out
-rw-r--r-- 1 root root 13982 Jun 11 19:48 UndoStep14_Create_Databases_140611_194842.out
-rw-r--r-- 1 root root 14250 Jun 11 19:51 UndoStep14_Create_Databases_140611_195103.out
-rw-r--r-- 1 root root 14004 Jun 11 19:51 UndoStep14_Create_Databases_140611_195135.out
2.執行onecommand第2步建立必要檔案時的報錯。
下面是在執行onecommand第二步時候的報錯:
[root@dm01db01 onecommand]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/tequ-dm01.xml -s 2
Executing Setup Required Files..
Copying and extracting required files...
Required files are:
/opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip
/opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip
/opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip
/opt/oracle.SupportTools/onecommand/WorkDir/p18371656_112040_Linux-x86-64.zip
/opt/oracle.SupportTools/onecommand/WorkDir/p6880880_112000_Linux-x86-64.zip
Copying required files...
Checking status of remote files..........
Getting status of local files............
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_1of7.zip.
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_2of7.zip.
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_3of7.zip.
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p18371656_112040_Linux-x86-64.zip at /opt/oracle.SupportTools/onecommand/p18371656_112040_Linux-x86-64.zip.
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p6880880_112000_Linux-x86-64.zip at /opt/oracle.SupportTools/onecommand/Software/patches/p6880880_112000_Linux-x86-64.zip.
Copying file: p18371656_112040_Linux-x86-64.zip to node dm01db02.tequ.com.
Copying file: p6880880_112000_Linux-x86-64.zip to node dm01db02.tequ.com............
Completed copying files.....
Extracting required files............................
Copying resourcecontrol and other required files........................
Execution Exception in future get
OCMD-02624: Error while executing command {0}.java.lang.reflect.InvocationTargetException
Error running Setup Required Files error message Error running oracle.onecommand.deploy.software.SoftwareUtils method setupRequiredFiles
檢視/opt/oracle.SupportTools/onecommand/log/Step2_Setup_Required_Files_140611_014853.out,找到第一個報錯的地方,下面是相關的日誌輸出:
......
2014-06-11 01:50:26,449 [FINE ][MDThread][ KommandOutput:95] ======
2014-06-11 01:50:26,449 [FINE ][MDThread][ KommandOutput:64] # of kommand outputs 1
2014-06-11 01:50:26,449 [FINE ][MDThread][ RunCommand:170] Ran commands, elapsed time = 17006 mS
2014-06-11 01:50:26,455 [FINE ][MDThread][ KommandOutput:79] ======
2014-06-11 01:50:26,455 [FINE ][MDThread][ KommandOutput:80] Output
2014-06-11 01:50:26,455 [FINE ][MDThread][ KommandOutput:81] ======
2014-06-11 01:50:26,455 [FINE ][MDThread][ KommandOutput:84] Ret code = <52>
2014-06-11 01:50:26,455 [FINE ][MDThread][ KommandOutput:86] From node dm01celadm01.tequ.com
2014-06-11 01:50:26,455 [FINE ][MDThread][ KommandOutput:89] ## Output Start
2014-06-11 01:50:26,455 [FINE ][MDThread][ EsCommonUtils:596] OCMD-00052: Node dm01celadm01.tequ.com appears to be down.
2014-06-11 01:50:26,456 [FINE ][MDThread][ KommandOutput:91] ## Output End
2014-06-11 01:50:26,456 [FINE ][MDThread][ KommandOutput:95] ======
2014-06-11 01:50:26,456 [FINE ][MDThread][ OcmdException:62] Throwing OcmdException... message:Command [mkdir -p /opt/oracle.SupportTools] run on node 10.255.255.12 as user root did not execute successfully...
2014-06-11 01:50:26,456 [FINE ][MDThread][ OcmdException:98] Stack trace...
2014-06-11 01:50:26,457 [FINE ][MDThread][ OcmdException:135] OcmdException from node dm01db01.tequ.com return code = 2 output string: Command [mkdir -p /opt/oracle.SupportTools] run on node 10.255.255.12 as user root did not execute successfully... stack trace = java.lang.Throwable
at oracle.onecommand.escommon.common.OcmdException.ocmdException(OcmdException.java:95)
at oracle.onecommand.escommon.common.OcmdException.
at oracle.onecommand.commandexec.utils.CommonUtils.checkKommandOutput(CommonUtils.java:1369)
at oracle.onecommand.commandexec.utils.RemoteFileUtils.sftpPutFile(RemoteFileUtils.java:1491)
at oracle.onecommand.commandexec.utils.RemoteFileUtils.sftpPutFile(RemoteFileUtils.java:1378)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.commandexec.utils.CallableReflectionMethod.runMethodInParallel(CallableReflectionMethod.java:121)
at oracle.onecommand.commandexec.utils.CallableReflectionMethod.call(CallableReflectionMethod.java:70)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
2014-06-11 01:50:26,457 [INFO ][ main][ RunCommand:714] Execution Exception in future get
2014-06-11 01:50:26,458 [INFO ][ main][ RunCommand:721] OCMD-02624: Error while executing command {0}.java.lang.reflect.InvocationTargetException
從表面上看是說在dm01celadm01(儲存伺服器)上執行mkdir命令不成功,但手動在儲存伺服器執行該命令是沒問題的。原因可能是在資料庫伺服器節點dm01celadm01.tequ.com名稱不能被解析,檢查資料庫伺服器的/etc/hosts檔案,確實沒有配置對3臺儲存伺服器名稱的解析,將以下內容加入到兩臺資料庫伺服器hosts檔案中:
#### BEGIN Generated by Exadata. DO NOT MODIFY ####
127.0.0.1 localhost.localdomain localhost
192.168.10.1 dm01db01-priv1.tequ.com dm01db01-priv1
192.168.10.2 dm01db01-priv2.tequ.com dm01db01-priv2
10.0.3.10 dm01db01.tequ.com dm01db01
10.255.255.10 dm01dbadm01.tequ.com dm01dbadm01
192.168.10.3 dm01db02-priv1.tequ.com dm01db02-priv1
192.168.10.4 dm01db02-priv2.tequ.com dm01db02-priv2
10.0.3.11 dm01db02.tequ.com dm01db02
10.255.255.11 dm01dbadm02.tequ.com dm01dbadm02
10.0.3.13 dm01db02-vip.tequ.com dm01db02-vip
10.0.3.12 dm01db01-vip.tequ.com dm01db01-vip
10.255.255.12 dm01celadm01.tequ.com dm01celadm01
10.255.255.13 dm01celadm02.tequ.com dm01celadm02
10.255.255.14 dm01celadm03.tequ.com dm01celadm03
#### END Generated by Exadata ####
之後再次執行onecommand操作:
[root@dm01db01 onecommand]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/tequ-dm01.xml -s 2
Executing Setup Required Files.
Copying and extracting required files...
Required files are:
/opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip
/opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip
/opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip
/opt/oracle.SupportTools/onecommand/WorkDir/p18371656_112040_Linux-x86-64.zip
/opt/oracle.SupportTools/onecommand/WorkDir/p6880880_112000_Linux-x86-64.zip
Copying required files...
Checking status of remote files..........
Checking status of existing files on remote nodes....
Getting status of local files.................
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_1of7.zip.
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_2of7.zip.
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_3of7.zip.
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p18371656_112040_Linux-x86-64.zip at /opt/oracle.SupportTools/onecommand/p18371656_112040_Linux-x86-64.zip.
Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p6880880_112000_Linux-x86-64.zip at /opt/oracle.SupportTools/onecommand/Software/patches/p6880880_112000_Linux-x86-64.zip..
Extracting required files........................
Copying resourcecontrol and other required files.............................................................................................................................
Creating databasemachine.xml for EM discovery
Done Creating databasemachine.xml for EM discovery.
Successfully completed execution of step Setup Required Files [elapsed Time [Elapsed = 185647 mS [3.0 minutes] Wed Jun 11 10:10:02 CST 2014]]
成功完成第二步。
3.執行onecommand第6步驗證InfiniBand相關配置。
下面是執行第6步時候的報錯:
[root@dm01db01 onecommand]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/tequ-dm01.xml -s 6
Executing Verify Infiniband and Calibrate Cells
Running rds ping tests on cluster nodes...........................................................................................................
Validating infiniband network with rds-ping.....
No ping errors while pinging infiniband fabric.......................................................................................................................................
dm01celadm02.tequ.com
ssh: dm01celadm03: Temporary failure in name resolution
ssh: dm01celadm01: Temporary failure in name resolution
ssh: dm01db02: Temporary failure in name resolution
ssh: dm01db01: Temporary failure in name resolution
Error running Verify Infiniband and Calibrate Cells error message Error running oracle.onecommand.deploy.validation.ValidationUtils method validateInfiniband
Error running oracle.onecommand.deploy.validation.ValidationUtils method validateInfiniband
在作業系統層面可以使用rds-ping
手動測試沒問題之後,再次執行第6步操作成功。
4.執行onecommand第9步安裝Cluster軟體時候的報錯。
在執行第9步的時候收到如下的報錯:
[root@dm01db01 onecommand]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/tequ-dm01.xml -s 9
Executing Install Cluster Software
Installing cluster cluster-clu1.
Getting grid disks using utility in /opt/oracle.SupportTools/onecommand/Software/11.2.0.4/grid...................
Running Oracle installer.................................................................................................................java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
at oracle.onecommand.deploy.software.SoftwareUtils.getKommandOutputsFromParallelizer(SoftwareUtils.java:1304)
at oracle.onecommand.deploy.software.SoftwareUtils.doInstallClusterware(SoftwareUtils.java:1327)
at oracle.onecommand.deploy.software.SoftwareUtils.installClusterWare(SoftwareUtils.java:1292)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.deploy.cliXml.InstalSoftware.executeStep(InstalSoftware.java:526)
at oracle.onecommand.deploy.cliXml.InstalSoftware.executeForwardAction(InstalSoftware.java:462)
at oracle.onecommand.deploy.cliXml.InstalSoftware.parseCmdLine(InstalSoftware.java:368)
at oracle.onecommand.deploy.cliXml.InstalSoftware.main(InstalSoftware.java:265)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: oracle.onecommand.escommon.common.OcmdException: Installation did not complete successfully, please check logs in /u01/app/oraInventory
at oracle.onecommand.deploy.software.ClusterZipInstall112040.install(ClusterZipInstall112040.java:358)
at oracle.onecommand.deploy.software.SoftwareUtils.installClusterwareByCluster(SoftwareUtils.java:1342)
... 11 more
java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
at oracle.onecommand.deploy.software.SoftwareUtils.getKommandOutputsFromParallelizer(SoftwareUtils.java:1304)
at oracle.onecommand.deploy.software.SoftwareUtils.doInstallClusterware(SoftwareUtils.java:1327)
at oracle.onecommand.deploy.software.SoftwareUtils.installClusterWare(SoftwareUtils.java:1292)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.deploy.cliXml.InstalSoftware.executeStep(InstalSoftware.java:526)
at oracle.onecommand.deploy.cliXml.InstalSoftware.executeForwardAction(InstalSoftware.java:462)
at oracle.onecommand.deploy.cliXml.InstalSoftware.parseCmdLine(InstalSoftware.java:368)
at oracle.onecommand.deploy.cliXml.InstalSoftware.main(InstalSoftware.java:265)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: oracle.onecommand.escommon.common.OcmdException: Installation did not complete successfully, please check logs in /u01/app/oraInventory
at oracle.onecommand.deploy.software.ClusterZipInstall112040.install(ClusterZipInstall112040.java:358)
at oracle.onecommand.deploy.software.SoftwareUtils.installClusterwareByCluster(SoftwareUtils.java:1342)
... 11 more
Errors occured...
檢視log/Step9_Install_Cluster_Software_140611_115755.out檔案可以看到下面的報錯:
2014-06-11 12:00:07,602 [FINE ][thread-1][ EsCommonUtils:596] Preparing to launch Oracle Universal Installer from /tmp/OraInstall2014-06-11_11-58-17AM. Please wait ...[FATAL] [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes.
2014-06-11 12:00:07,603 [FINE ][thread-1][ EsCommonUtils:596] CAUSE: Installer has detected that network interface ib0 does not maintain connectivity on all cluster nodes.
2014-06-11 12:00:07,603 [FINE ][thread-1][ EsCommonUtils:596] ACTION: Ensure that the chosen interface has been configured across all cluster nodes.
2014-06-11 12:00:07,603 [FINE ][thread-1][ EsCommonUtils:596] [FATAL] [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes.
2014-06-11 12:00:07,603 [FINE ][thread-1][ EsCommonUtils:596] CAUSE: Installer has detected that network interface ib1 does not maintain connectivity on all cluster nodes.
2014-06-11 12:00:07,603 [FINE ][thread-1][ EsCommonUtils:596] ACTION: Ensure that the chosen interface has been configured across all cluster nodes.
[INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes. (文件 ID 1427202.1)
修改時間:2013-7-8型別:REFERENCE
In this Document
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.1 and laterInformation in this document applies to any platform.
PURPOSE
The note lists problems, solutions or workarounds that's related to the following 11gR2 GI OUI error:
[FATAL] [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes.
CAUSE: Installer has detected that network interface eth1 does not maintain connectivity on all cluster nodes.
ACTION: Ensure that the chosen interface has been configured across all cluster nodes.
CAUSE: Installer has detected that network interface eth1 does not maintain connectivity on all cluster nodes.
ACTION: Ensure that the chosen interface has been configured across all cluster nodes.
DETAILS
[INS-41112] is a high level error number, the workarounds/solutions depend on the error code from lower layer, however, [INS-41112] does tell which interface is having the issue:
CAUSE: Installer has detected that network interface eth1 does not maintain connectivity on all cluster nodes.
## >> in this case, it's eth1 that's having connectivityissue
## >> in this case, it's eth1 that's having connectivityissue
To find out lower layer error code, execute the following as grid user:
runcluvfy.sh comp nodecon -i -n ,, -verbose
Refer to the following once CVU reports real error code:
- PRVF-7617
Refer to note 1335136.1 for details.
- PRVF-6020 : Different MTU values used across network interfaces in subnet "10.10.10.0"
Refer to note 1429104.1 for details.
手動執行runcluvfy.sh命令驗證安裝環境,如果驗證透過可以再次嘗試執行這步。
再次執行這個步驟成功。
5.執行onecommand第14步DBCA建立資料庫。
在執行第14步前之前確保所有資料庫伺服器作業系統grid和oracle使用者的環境變數已經正確配置。
這次Exadata實施是我的第一次,透過這個過程學到了不少東西,在整個實施過程需要注意四點:
1).Exadata過程中沒有圖形化、沒有字元介面工具,幾乎全是指令碼和命令的方式。
2).在實施前的規劃過程中一定要考慮周全,避免在實施過程中推翻之前的規劃,特別是IP規劃。
2).儘量避免手動的修改包括IP地址、主機名等在內的配置檔案。
3).出現報錯心要靜,仔細檢視日誌,仔細分析報錯的內容,才能很快的找到問題的原因。
在手動修改主機IP的時候還遇到了如下的問題:
參考文章《ssh連線Linux收到The remote system refused the connection報錯》:http://blog.itpub.net/23135684/viewspace-1181160/
--end--
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29119536/viewspace-1502231/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Oracle Exadata Database MachineOracleDatabaseMac
- 【轉Exadata】Exadata V2 Oracle-Sun Database Machine資料庫一體機OracleDatabaseMac資料庫
- Migrating an Oracle E-Business Suite Database to Sun Oracle Database Machine(Exadata)OracleUIDatabaseMac
- 全球首例 Oracle EBS R12 on Sun Exadata 上實施感受總結Oracle
- 雲渲染實施記錄(暫未跑通)
- oracle實驗記錄 (flashback)Oracle
- oracle實驗記錄 (OMF)Oracle
- oracle實驗記錄 (NET)Oracle
- oracle實驗記錄 (audit)Oracle
- oracle實驗記錄 (oracle reset parameter)Oracle
- 又一SAP on Exadata 的專案完成,特此記錄
- oracle實驗記錄 (oracle 資料字典)Oracle
- Oracle Data Redaction實驗記錄Oracle
- oracle實驗記錄 (block cleanout)OracleBloC
- oracle實驗記錄 (dump undo)Oracle
- oracle實驗記錄 (inlist card)Oracle
- oracle rac CTSS時鐘同步模式轉換為NTP同步模式的實施記錄(4)Oracle模式
- oracle rac CTSS時鐘同步模式轉換為NTP同步模式的實施記錄(3)Oracle模式
- oracle rac CTSS時鐘同步模式轉換為NTP同步模式的實施記錄(2)Oracle模式
- oracle rac CTSS時鐘同步模式轉換為NTP同步模式的實施記錄(1)Oracle模式
- Oracle Exadata與SGA快取記憶體CQOracle快取記憶體
- Oracle Exadata試用Oracle
- Oracle檔案改名實驗記錄Oracle
- oracle實驗記錄 (選擇率)Oracle
- oracle實驗記錄 (dump logfile)Oracle
- oracle實驗記錄 (事務控制)Oracle
- oracle實驗記錄 (函式index)Oracle函式Index
- oracle實驗記錄 (bigfile tablespace)Oracle
- oracle實驗記錄 (恢復-redo)Oracle
- oracle實驗記錄 (expdp/impdp使用)Oracle
- oracle實驗記錄 (transport tablespace(Rman))Oracle
- oracle實驗記錄 (使用exp/imp)Oracle
- 深入理解Oracle ExadataOracle
- oracle實驗記錄 (oracle 分析shared pool(1))Oracle
- oracle實驗記錄 (oracle 分析shared pool(2))Oracle
- oracle實驗記錄 (oracle 詳細分析redo(1))Oracle
- oracle實驗記錄 (oracle 詳細分析redo(2))Oracle
- oracle實驗記錄 (oracle 詳細分析redo(3))Oracle