由於維護人員修改Oracle Linux 7中的/dev/shm大小造成其大小小於Oracle例項的MEMORY_TARGET或者SGA_TARGET而導致叢集不能啟動(CRS-4535,CRS-4000)

[grid@jtp1 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

檢查asm磁碟的許可權是否問題，發現磁碟許可權正常

[root@jtp3 ~]# ls -lrt /dev/asm*
brw-rw----. 1 grid oinstall 8, 128 Apr  3  2018 /dev/asmdisk07
brw-rw----. 1 grid oinstall 8,  48 Apr  3  2018 /dev/asmdisk02
brw-rw----. 1 grid oinstall 8,  96 Apr  3  2018 /dev/asmdisk05
brw-rw----. 1 grid oinstall 8, 112 Apr  3  2018 /dev/asmdisk06
brw-rw----. 1 grid oinstall 8,  64 Apr  3  2018 /dev/asmdisk03
brw-rw----. 1 grid oinstall 8,  80 Apr  3  2018 /dev/asmdisk04
brw-rw----. 1 grid oinstall 8,  32 Apr  3  2018 /dev/asmdisk01

重啟crs

[root@jtp1 bin]# ./crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'jtp1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'jtp1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'jtp1'
CRS-2677: Stop of 'ora.mdnsd' on 'jtp1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'jtp1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'jtp1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'jtp1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'jtp1' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'jtp1' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'jtp1'
CRS-2673: Attempting to stop 'ora.evmd' on 'jtp1'
CRS-2677: Stop of 'ora.ctssd' on 'jtp1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'jtp1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'jtp1'
CRS-2677: Stop of 'ora.cssd' on 'jtp1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'jtp1'
CRS-2673: Attempting to stop 'ora.driver.afd' on 'jtp1'
CRS-2677: Stop of 'ora.driver.afd' on 'jtp1' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'jtp1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'jtp1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@jtp1 bin]# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

檢視crs的alert.log發現磁碟組不能載入

[root@jtp1 ~]# tail -f /u01/app/grid/diag/crs/jtp1/crs/trace/alert.log
2018-04-02 18:30:21.227 [OHASD(8143)]CRS-8500: Oracle Clusterware OHASD process is starting with operating system process ID 8143
2018-04-02 18:30:21.230 [OHASD(8143)]CRS-0714: Oracle Clusterware Release 12.2.0.1.0.
2018-04-02 18:30:21.245 [OHASD(8143)]CRS-2112: The OLR service started on node jtp1.
2018-04-02 18:30:21.262 [OHASD(8143)]CRS-8017: location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred
2018-04-02 18:30:21.262 [OHASD(8143)]CRS-1301: Oracle High Availability Service started on node jtp1.
2018-04-02 18:30:21.567 [ORAROOTAGENT(8214)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 8214
2018-04-02 18:30:21.600 [CSSDAGENT(8231)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 8231
2018-04-02 18:30:21.607 [CSSDMONITOR(8241)]CRS-8500: Oracle Clusterware CSSDMONITOR process is starting with operating system process ID 8241
2018-04-02 18:30:21.620 [ORAAGENT(8225)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 8225
2018-04-02 18:30:22.146 [ORAAGENT(8316)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 8316
2018-04-02 18:30:22.211 [MDNSD(8335)]CRS-8500: Oracle Clusterware MDNSD process is starting with operating system process ID 8335
2018-04-02 18:30:22.215 [EVMD(8337)]CRS-8500: Oracle Clusterware EVMD process is starting with operating system process ID 8337
2018-04-02 18:30:23.259 [GPNPD(8369)]CRS-8500: Oracle Clusterware GPNPD process is starting with operating system process ID 8369
2018-04-02 18:30:24.275 [GPNPD(8369)]CRS-2328: GPNPD started on node jtp1.
2018-04-02 18:30:24.283 [GIPCD(8433)]CRS-8500: Oracle Clusterware GIPCD process is starting with operating system process ID 8433
2018-04-02 18:30:26.296 [CSSDMONITOR(8464)]CRS-8500: Oracle Clusterware CSSDMONITOR process is starting with operating system process ID 8464
2018-04-02 18:30:28.299 [CSSDAGENT(8482)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 8482
2018-04-02 18:30:28.496 [OCSSD(8497)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 8497
2018-04-02 18:30:29.538 [OCSSD(8497)]CRS-1713: CSSD daemon is started in hub mode
2018-04-02 18:30:36.015 [OCSSD(8497)]CRS-1707: Lease acquisition for node jtp1 number 1 completed
2018-04-02 18:30:37.087 [OCSSD(8497)]CRS-1605: CSSD voting file is online: AFD:CRS1; details in /u01/app/grid/diag/crs/jtp1/crs/trace/ocssd.trc.
2018-04-02 18:30:37.103 [OCSSD(8497)]CRS-1672: The number of voting files currently available 1 has fallen to the minimum number of voting files required 1.
2018-04-02 18:30:46.237 [OCSSD(8497)]CRS-1601: CSSD Reconfiguration complete. Active nodes are jtp1 .
2018-04-02 18:30:48.514 [OCTSSD(9302)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 9302
2018-04-02 18:30:48.535 [OCSSD(8497)]CRS-1720: Cluster Synchronization Services daemon (CSSD) is ready for operation.
2018-04-02 18:30:50.626 [OCTSSD(9302)]CRS-2407: The new Cluster Time Synchronization Service reference node is host jtp1.
2018-04-02 18:30:50.627 [OCTSSD(9302)]CRS-2401: The Cluster Time Synchronization Service started on host jtp1.
2018-04-02 18:31:04.202 [ORAROOTAGENT(8214)]CRS-5019: All OCR locations are on ASM disk groups [CRS], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc".
2018-04-02 18:41:00.225 [ORAROOTAGENT(8214)]CRS-5818: Aborted command 'start' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:9:3} in /u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc.
2018-04-02 18:41:03.757 [ORAROOTAGENT(8214)]CRS-5017: The resource action "ora.storage start" encountered the following error:
2018-04-02 18:41:03.757+Storage agent start action aborted. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc".
2018-04-02 18:41:03.760 [OHASD(8143)]CRS-2757: Command 'Start' timed out waiting for response from the resource 'ora.storage'. Details at (:CRSPE00221:) {0:9:3} in /u01/app/grid/diag/crs/jtp1/crs/trace/ohasd.trc.
2018-04-02 18:42:09.921 [ORAROOTAGENT(8214)]CRS-5019: All OCR locations are on ASM disk groups [CRS], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc".

檢查跟蹤檔案，發現查詢ASM_DISCOVERY_ADDRESS與ASM_DISCOVERY_ADDRESS屬性時出現

[root@jtp1 ~]# more /u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc
Trace file /u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc
Oracle Database 12c Clusterware Release 12.2.0.1.0 - Production Copyright 1996, 2016 Oracle. All rights reserved.

*** TRACE CONTINUED FROM FILE /u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root_93.trc ***

2018-04-02 18:42:09.165 : CSSCLNT:3554666240: clsssterm: terminating context (0x7f03c0229390)
2018-04-02 18:42:09.165 : default:3554666240: clsCredDomClose: Credctx deleted 0x7f03c0459470
2018-04-02 18:42:09.166 :    GPNP:3554666240: clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:399] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.Mode='remote'
2018-04-02 18:42:09.253 : CSSCLNT:3554666240: clsssinit: initialized context: (0x7f03c045c2c0) flags 0x115
2018-04-02 18:42:09.253 : CSSCLNT:3554666240: clsssterm: terminating context (0x7f03c045c2c0)
2018-04-02 18:42:09.254 :   CLSNS:3554666240: clsns_SetTraceLevel:trace level set to 1.
2018-04-02 18:42:09.254 :    GPNP:3554666240: clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:399] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.Mode='remote'
2018-04-02 18:42:09.257 : default:3554666240: Inited LSF context: 0x7f03c04f0420
2018-04-02 18:42:09.260 : CLSCRED:3554666240: clsCredCommonInit: Inited singleton credctx.
2018-04-02 18:42:09.260 : CLSCRED:3554666240: (:CLSCRED0101:)clsCredDomInitRootDom: Using user given storage context for repository access.
2018-04-02 18:42:09.294 : USRTHRD:3554666240: {0:9:3} 8033 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS

2018-04-02 18:42:09.300 : USRTHRD:3554666240: {0:9:3} 8033 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS

2018-04-02 18:42:09.356 : CLSCRED:3554666240: (:CLSCRED1079:)clsCredOcrKeyExists: Obj dom : SYSTEM.credentials.domains.root.ASM.Self.5c82286a084bcf37ffa014144074e5dd.root not found
2018-04-02 18:42:09.356 : USRTHRD:3554666240: {0:9:3} 7755 Error 4 opening dom root in 0x7f03c064c980

檢查ASM的alert.log 發現/dev/shm大小小於MEMORY_TARGET大小，並且給出了/dev/shm應該被設定的最小值

[root@jtp1 ~]# tail -f /u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
WARNING: ASM does not support ipclw. Switching to skgxp
WARNING: ASM does not support ipclw. Switching to skgxp
WARNING: ASM does not support ipclw. Switching to skgxp
* instance_number obtained from CSS = 1, checking for the existence of node 0...
* node 0 does not exist. instance_number = 1
Starting ORACLE instance (normal) (OS id: 9343)
2018-04-02T18:31:00.187055+08:00
CLI notifier numLatches:7 maxDescs:2301
2018-04-02T18:31:00.193961+08:00
WARNING: You are trying to use the MEMORY_TARGET feature. This feature requires the /dev/shm file system to be mounted for at least 1140850688 bytes. /dev/shm is either not mounted or is mounted with available space less than this size. Please fix this so that MEMORY_TARGET can work as expected. Current available is 1073573888 and used is 167936 bytes. Ensure that the mount point is /dev/shm for this directory.

修改/dev/shm的大小可以透過修改/etc/fstab來實現，將/dev/shm的大小修改為12G

[root@jtp1 bin]# df -h
Filesystem           Size  Used Avail Use% Mounted on
/dev/mapper/ol-root   49G   42G  7.9G  85% /
devtmpfs              12G   28K   12G   1% /dev
tmpfs                1.0G  164K  1.0G   1% /dev/shm
tmpfs                1.0G  9.3M 1015M   1% /run
tmpfs                1.0G     0  1.0G   0% /sys/fs/cgroup
/dev/sda1           1014M  141M  874M  14% /boot
[root@jtp1 bin]# vi /etc/fstab

#
# /etc/fstab
# Created by anaconda on Sat Mar 18 15:27:13 2017
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/ol-root     /                       xfs     defaults        0 0
UUID=ca5854cd-0125-4954-a5c4-1ac42c9a0f70 /boot                   xfs     defaults        0 0
/dev/mapper/ol-swap     swap                    swap    defaults        0 0


tmpfs                   /dev/shm                tmpfs   defaults,size=12G        0 0
tmpfs                   /run                    tmpfs   defaults,size=12G        0 0
tmpfs                  /sys/fs/cgroup           tmpfs   defaults,size=12G        0 0

重啟叢集后，再次檢查叢集資源狀態恢復正常

--------------------------------------------------------------------------------
[grid@jtp1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       jtp1                  STABLE
               ONLINE  ONLINE       jtp2                  STABLE
ora.CRS.dg
               ONLINE  ONLINE       jtp1                  STABLE
               ONLINE  ONLINE       jtp2                  STABLE
ora.DATA.dg
               ONLINE  ONLINE       jtp1                  STABLE
               ONLINE  ONLINE       jtp2                  STABLE
ora.FRA.dg
               ONLINE  ONLINE       jtp1                  STABLE
               ONLINE  ONLINE       jtp2                  STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       jtp1                  STABLE
               ONLINE  ONLINE       jtp2                  STABLE
ora.TEST.dg
               ONLINE  ONLINE       jtp1                  STABLE
               ONLINE  ONLINE       jtp2                  STABLE
ora.chad
               ONLINE  ONLINE       jtp1                  STABLE
               ONLINE  ONLINE       jtp2                  STABLE
ora.net1.network
               ONLINE  ONLINE       jtp1                  STABLE
               ONLINE  ONLINE       jtp2                  STABLE
ora.ons
               ONLINE  ONLINE       jtp1                  STABLE
               ONLINE  ONLINE       jtp2                  STABLE
ora.proxy_advm
               OFFLINE OFFLINE      jtp1                  STABLE
               OFFLINE OFFLINE      jtp2                  STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       jtp1                  STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       jtp2                  STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       jtp2                  STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       jtp2                  169.254.237.250 88.8
                                                             8.88.2,STABLE
ora.asm
      1        ONLINE  ONLINE       jtp1                  Started,STABLE
      2        ONLINE  ONLINE       jtp2                  Started,STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       jtp2                  STABLE
ora.jy.db
      1        ONLINE  OFFLINE                               STABLE
      2        ONLINE  OFFLINE                               STABLE
ora.jtp1.vip
      1        ONLINE  ONLINE       jtp1                  STABLE
ora.jtp2.vip
      1        ONLINE  ONLINE       jtp2                  STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       jtp2                  Open,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       jtp2                  STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       jtp1                  STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       jtp2                  STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       jtp2                  STABLE
--------------------------------------------------------------------------------

到此叢集恢復正常

修改/dev/shm大小造成Oracle 12c叢集啟動故障

相關文章