修改/dev/shm大小造成Oracle 12c叢集啟動故障
由於維護人員修改Oracle Linux 7中的/dev/shm大小造成其大小小於Oracle例項的MEMORY_TARGET或者SGA_TARGET而導致叢集不能啟動(CRS-4535,CRS-4000)
[grid@jtp1 ~]$ crsctl stat res -t CRS-4535: Cannot communicate with Cluster Ready Services CRS-4000: Command Status failed, or completed with errors.
檢查asm磁碟的許可權是否問題,發現磁碟許可權正常
[root@jtp3 ~]# ls -lrt /dev/asm* brw-rw----. 1 grid oinstall 8, 128 Apr 3 2018 /dev/asmdisk07 brw-rw----. 1 grid oinstall 8, 48 Apr 3 2018 /dev/asmdisk02 brw-rw----. 1 grid oinstall 8, 96 Apr 3 2018 /dev/asmdisk05 brw-rw----. 1 grid oinstall 8, 112 Apr 3 2018 /dev/asmdisk06 brw-rw----. 1 grid oinstall 8, 64 Apr 3 2018 /dev/asmdisk03 brw-rw----. 1 grid oinstall 8, 80 Apr 3 2018 /dev/asmdisk04 brw-rw----. 1 grid oinstall 8, 32 Apr 3 2018 /dev/asmdisk01
重啟crs
[root@jtp1 bin]# ./crsctl stop crs -f CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'jtp1' CRS-2673: Attempting to stop 'ora.mdnsd' on 'jtp1' CRS-2673: Attempting to stop 'ora.gpnpd' on 'jtp1' CRS-2677: Stop of 'ora.mdnsd' on 'jtp1' succeeded CRS-2677: Stop of 'ora.gpnpd' on 'jtp1' succeeded CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'jtp1' CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'jtp1' CRS-2677: Stop of 'ora.drivers.acfs' on 'jtp1' succeeded CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'jtp1' succeeded CRS-2673: Attempting to stop 'ora.ctssd' on 'jtp1' CRS-2673: Attempting to stop 'ora.evmd' on 'jtp1' CRS-2677: Stop of 'ora.ctssd' on 'jtp1' succeeded CRS-2677: Stop of 'ora.evmd' on 'jtp1' succeeded CRS-2673: Attempting to stop 'ora.cssd' on 'jtp1' CRS-2677: Stop of 'ora.cssd' on 'jtp1' succeeded CRS-2673: Attempting to stop 'ora.gipcd' on 'jtp1' CRS-2673: Attempting to stop 'ora.driver.afd' on 'jtp1' CRS-2677: Stop of 'ora.driver.afd' on 'jtp1' succeeded CRS-2677: Stop of 'ora.gipcd' on 'jtp1' succeeded CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'jtp1' has completed CRS-4133: Oracle High Availability Services has been stopped. [root@jtp1 bin]# ./crsctl start crs CRS-4123: Oracle High Availability Services has been started.
檢視crs的alert.log發現磁碟組不能載入
[root@jtp1 ~]# tail -f /u01/app/grid/diag/crs/jtp1/crs/trace/alert.log 2018-04-02 18:30:21.227 [OHASD(8143)]CRS-8500: Oracle Clusterware OHASD process is starting with operating system process ID 8143 2018-04-02 18:30:21.230 [OHASD(8143)]CRS-0714: Oracle Clusterware Release 12.2.0.1.0. 2018-04-02 18:30:21.245 [OHASD(8143)]CRS-2112: The OLR service started on node jtp1. 2018-04-02 18:30:21.262 [OHASD(8143)]CRS-8017: location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred 2018-04-02 18:30:21.262 [OHASD(8143)]CRS-1301: Oracle High Availability Service started on node jtp1. 2018-04-02 18:30:21.567 [ORAROOTAGENT(8214)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 8214 2018-04-02 18:30:21.600 [CSSDAGENT(8231)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 8231 2018-04-02 18:30:21.607 [CSSDMONITOR(8241)]CRS-8500: Oracle Clusterware CSSDMONITOR process is starting with operating system process ID 8241 2018-04-02 18:30:21.620 [ORAAGENT(8225)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 8225 2018-04-02 18:30:22.146 [ORAAGENT(8316)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 8316 2018-04-02 18:30:22.211 [MDNSD(8335)]CRS-8500: Oracle Clusterware MDNSD process is starting with operating system process ID 8335 2018-04-02 18:30:22.215 [EVMD(8337)]CRS-8500: Oracle Clusterware EVMD process is starting with operating system process ID 8337 2018-04-02 18:30:23.259 [GPNPD(8369)]CRS-8500: Oracle Clusterware GPNPD process is starting with operating system process ID 8369 2018-04-02 18:30:24.275 [GPNPD(8369)]CRS-2328: GPNPD started on node jtp1. 2018-04-02 18:30:24.283 [GIPCD(8433)]CRS-8500: Oracle Clusterware GIPCD process is starting with operating system process ID 8433 2018-04-02 18:30:26.296 [CSSDMONITOR(8464)]CRS-8500: Oracle Clusterware CSSDMONITOR process is starting with operating system process ID 8464 2018-04-02 18:30:28.299 [CSSDAGENT(8482)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 8482 2018-04-02 18:30:28.496 [OCSSD(8497)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 8497 2018-04-02 18:30:29.538 [OCSSD(8497)]CRS-1713: CSSD daemon is started in hub mode 2018-04-02 18:30:36.015 [OCSSD(8497)]CRS-1707: Lease acquisition for node jtp1 number 1 completed 2018-04-02 18:30:37.087 [OCSSD(8497)]CRS-1605: CSSD voting file is online: AFD:CRS1; details in /u01/app/grid/diag/crs/jtp1/crs/trace/ocssd.trc. 2018-04-02 18:30:37.103 [OCSSD(8497)]CRS-1672: The number of voting files currently available 1 has fallen to the minimum number of voting files required 1. 2018-04-02 18:30:46.237 [OCSSD(8497)]CRS-1601: CSSD Reconfiguration complete. Active nodes are jtp1 . 2018-04-02 18:30:48.514 [OCTSSD(9302)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 9302 2018-04-02 18:30:48.535 [OCSSD(8497)]CRS-1720: Cluster Synchronization Services daemon (CSSD) is ready for operation. 2018-04-02 18:30:50.626 [OCTSSD(9302)]CRS-2407: The new Cluster Time Synchronization Service reference node is host jtp1. 2018-04-02 18:30:50.627 [OCTSSD(9302)]CRS-2401: The Cluster Time Synchronization Service started on host jtp1. 2018-04-02 18:31:04.202 [ORAROOTAGENT(8214)]CRS-5019: All OCR locations are on ASM disk groups [CRS], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc". 2018-04-02 18:41:00.225 [ORAROOTAGENT(8214)]CRS-5818: Aborted command 'start' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:9:3} in /u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc. 2018-04-02 18:41:03.757 [ORAROOTAGENT(8214)]CRS-5017: The resource action "ora.storage start" encountered the following error: 2018-04-02 18:41:03.757+Storage agent start action aborted. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc". 2018-04-02 18:41:03.760 [OHASD(8143)]CRS-2757: Command 'Start' timed out waiting for response from the resource 'ora.storage'. Details at (:CRSPE00221:) {0:9:3} in /u01/app/grid/diag/crs/jtp1/crs/trace/ohasd.trc. 2018-04-02 18:42:09.921 [ORAROOTAGENT(8214)]CRS-5019: All OCR locations are on ASM disk groups [CRS], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc".
檢查跟蹤檔案,發現查詢ASM_DISCOVERY_ADDRESS與ASM_DISCOVERY_ADDRESS屬性時出現
[root@jtp1 ~]# more /u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc Trace file /u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc Oracle Database 12c Clusterware Release 12.2.0.1.0 - Production Copyright 1996, 2016 Oracle. All rights reserved. *** TRACE CONTINUED FROM FILE /u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root_93.trc *** 2018-04-02 18:42:09.165 : CSSCLNT:3554666240: clsssterm: terminating context (0x7f03c0229390) 2018-04-02 18:42:09.165 : default:3554666240: clsCredDomClose: Credctx deleted 0x7f03c0459470 2018-04-02 18:42:09.166 : GPNP:3554666240: clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:399] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.Mode='remote' 2018-04-02 18:42:09.253 : CSSCLNT:3554666240: clsssinit: initialized context: (0x7f03c045c2c0) flags 0x115 2018-04-02 18:42:09.253 : CSSCLNT:3554666240: clsssterm: terminating context (0x7f03c045c2c0) 2018-04-02 18:42:09.254 : CLSNS:3554666240: clsns_SetTraceLevel:trace level set to 1. 2018-04-02 18:42:09.254 : GPNP:3554666240: clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:399] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.Mode='remote' 2018-04-02 18:42:09.257 : default:3554666240: Inited LSF context: 0x7f03c04f0420 2018-04-02 18:42:09.260 : CLSCRED:3554666240: clsCredCommonInit: Inited singleton credctx. 2018-04-02 18:42:09.260 : CLSCRED:3554666240: (:CLSCRED0101:)clsCredDomInitRootDom: Using user given storage context for repository access. 2018-04-02 18:42:09.294 : USRTHRD:3554666240: {0:9:3} 8033 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS 2018-04-02 18:42:09.300 : USRTHRD:3554666240: {0:9:3} 8033 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS 2018-04-02 18:42:09.356 : CLSCRED:3554666240: (:CLSCRED1079:)clsCredOcrKeyExists: Obj dom : SYSTEM.credentials.domains.root.ASM.Self.5c82286a084bcf37ffa014144074e5dd.root not found 2018-04-02 18:42:09.356 : USRTHRD:3554666240: {0:9:3} 7755 Error 4 opening dom root in 0x7f03c064c980
檢查ASM的alert.log 發現/dev/shm大小小於MEMORY_TARGET大小,並且給出了/dev/shm應該被設定的最小值
[root@jtp1 ~]# tail -f /u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log WARNING: ASM does not support ipclw. Switching to skgxp WARNING: ASM does not support ipclw. Switching to skgxp WARNING: ASM does not support ipclw. Switching to skgxp * instance_number obtained from CSS = 1, checking for the existence of node 0... * node 0 does not exist. instance_number = 1 Starting ORACLE instance (normal) (OS id: 9343) 2018-04-02T18:31:00.187055+08:00 CLI notifier numLatches:7 maxDescs:2301 2018-04-02T18:31:00.193961+08:00 WARNING: You are trying to use the MEMORY_TARGET feature. This feature requires the /dev/shm file system to be mounted for at least 1140850688 bytes. /dev/shm is either not mounted or is mounted with available space less than this size. Please fix this so that MEMORY_TARGET can work as expected. Current available is 1073573888 and used is 167936 bytes. Ensure that the mount point is /dev/shm for this directory.
修改/dev/shm的大小可以透過修改/etc/fstab來實現,將/dev/shm的大小修改為12G
[root@jtp1 bin]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/ol-root 49G 42G 7.9G 85% / devtmpfs 12G 28K 12G 1% /dev tmpfs 1.0G 164K 1.0G 1% /dev/shm tmpfs 1.0G 9.3M 1015M 1% /run tmpfs 1.0G 0 1.0G 0% /sys/fs/cgroup /dev/sda1 1014M 141M 874M 14% /boot [root@jtp1 bin]# vi /etc/fstab # # /etc/fstab # Created by anaconda on Sat Mar 18 15:27:13 2017 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # /dev/mapper/ol-root / xfs defaults 0 0 UUID=ca5854cd-0125-4954-a5c4-1ac42c9a0f70 /boot xfs defaults 0 0 /dev/mapper/ol-swap swap swap defaults 0 0 tmpfs /dev/shm tmpfs defaults,size=12G 0 0 tmpfs /run tmpfs defaults,size=12G 0 0 tmpfs /sys/fs/cgroup tmpfs defaults,size=12G 0 0
重啟叢集后,再次檢查叢集資源狀態恢復正常
-------------------------------------------------------------------------------- [grid@jtp1 ~]$ crsctl stat res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.ASMNET1LSNR_ASM.lsnr ONLINE ONLINE jtp1 STABLE ONLINE ONLINE jtp2 STABLE ora.CRS.dg ONLINE ONLINE jtp1 STABLE ONLINE ONLINE jtp2 STABLE ora.DATA.dg ONLINE ONLINE jtp1 STABLE ONLINE ONLINE jtp2 STABLE ora.FRA.dg ONLINE ONLINE jtp1 STABLE ONLINE ONLINE jtp2 STABLE ora.LISTENER.lsnr ONLINE ONLINE jtp1 STABLE ONLINE ONLINE jtp2 STABLE ora.TEST.dg ONLINE ONLINE jtp1 STABLE ONLINE ONLINE jtp2 STABLE ora.chad ONLINE ONLINE jtp1 STABLE ONLINE ONLINE jtp2 STABLE ora.net1.network ONLINE ONLINE jtp1 STABLE ONLINE ONLINE jtp2 STABLE ora.ons ONLINE ONLINE jtp1 STABLE ONLINE ONLINE jtp2 STABLE ora.proxy_advm OFFLINE OFFLINE jtp1 STABLE OFFLINE OFFLINE jtp2 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE jtp1 STABLE ora.LISTENER_SCAN2.lsnr 1 ONLINE ONLINE jtp2 STABLE ora.LISTENER_SCAN3.lsnr 1 ONLINE ONLINE jtp2 STABLE ora.MGMTLSNR 1 ONLINE ONLINE jtp2 169.254.237.250 88.8 8.88.2,STABLE ora.asm 1 ONLINE ONLINE jtp1 Started,STABLE 2 ONLINE ONLINE jtp2 Started,STABLE 3 OFFLINE OFFLINE STABLE ora.cvu 1 ONLINE ONLINE jtp2 STABLE ora.jy.db 1 ONLINE OFFLINE STABLE 2 ONLINE OFFLINE STABLE ora.jtp1.vip 1 ONLINE ONLINE jtp1 STABLE ora.jtp2.vip 1 ONLINE ONLINE jtp2 STABLE ora.mgmtdb 1 ONLINE ONLINE jtp2 Open,STABLE ora.qosmserver 1 ONLINE ONLINE jtp2 STABLE ora.scan1.vip 1 ONLINE ONLINE jtp1 STABLE ora.scan2.vip 1 ONLINE ONLINE jtp2 STABLE ora.scan3.vip 1 ONLINE ONLINE jtp2 STABLE --------------------------------------------------------------------------------
到此叢集恢復正常
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25462274/viewspace-2156391/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Oracle 12c叢集啟動故障Oracle
- /dev/shm大小修改dev
- 修改/dev/shm的大小dev
- 修改/dev/shm 重啟失效原因分析dev
- 私有IP丟失造成Oracle 12C RAC叢集節點不能啟動Oracle
- ORACLE 12C 叢集啟動順序圖Oracle
- 修改/dev/shm共享記憶體dev記憶體
- linux下修改/dev/shm引數Linuxdev
- /dev/shmdev
- Sybase IQ 16叢集修改啟動埠
- Oracle叢集技術 | 叢集的自啟動系列(一)Oracle
- 【例項】tmpfs /dev/shmdev
- 對/dev/shm認識dev
- /dev/shm 介紹 --轉載dev
- Linux下安裝Oralce11g 時/dev/shm 大小的更改Linuxdev
- windows叢集下 ORACLE EM 的啟動問題WindowsOracle
- 新特性:/dev/shm對Oracle 11g的影響devOracle
- oracle 執行過程中 /dev/shm 下是什麼Oracledev
- /dev/shm與swap的區別dev
- 11g AMM和/dev/shmdev
- oracle 11gR2 grid 叢集資源設定跟隨叢集自動啟動Oracle
- ORACLE 12C 之叢集日誌位置變化Oracle
- 關於Oracle 12c的叢集監控(CHM)Oracle
- 沃趣微講堂 | Oracle叢集技術(三):被誤傳的叢集自啟動Oracle
- Linux系統目錄/dev/shmLinuxdev
- Oracle 叢集的自啟動,OLR與套接字檔案Oracle
- linux下/dev/shm的大小引發ORA-00845: MEMORY_TARGET not supported on this systemLinuxdev
- Oracle RAC日常運維-NetworkManager導致叢集故障Oracle運維
- storm叢集啟動停止指令碼ORM指令碼
- hadoop叢集配置和啟動Hadoop
- 將Standard標準叢集修改為Flex叢集Flex
- WebSphere 叢集建立及故障排除Web
- 將12c RAC由標準叢集改為flex叢集Flex
- Oracle 12c啟動時PDBs的自動開啟Oracle
- 【故障處理】修改主機名導致oracle例項無法啟動暨如何修改hostnameOracle
- Hadoop叢集初始化啟動Hadoop
- Oracle RAC啟動失敗(DNS故障)OracleDNS
- HP伺服器控制檯輸出設定造成的啟動故障伺服器