記一次AsmLib故障

531968912發表於2015-04-18

剛把客戶20T的庫恢復起來,執行了幾天,突然打電話通知一個節點掛了,vpn連線上去檢視crs日誌

[oracle@rac2 crsd]$ tail -100f crsd.log 
2014-07-19 21:37:02.223: [ CSSCLNT][1073453488]clsssInitNative: connect failed, rc 9
 
2014-07-19 21:37:02.225: [  CRSRTI][1073453488]0CSS is not ready. Received status 3 from CSS. Waiting for good status .. 
 
2014-07-19 21:37:03.599: [ COMMCRS][1110501696]clsc_connect: (0xb438700) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))
 
2014-07-19 21:37:03.599: [ CSSCLNT][1073453488]clsssInitNative: connect failed, rc 9
 
2014-07-19 21:37:03.600: [  CRSRTI][1073453488]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..

檢視心跳網路不通,重啟網路卡後問題解決,

[root@rac2 client]# ping rac1-priv
PING rac1-priv (192.168.2.81) 56(84) bytes of data.
From rac2-priv (192.168.2.83) icmp_seq=10 Destination Host Unreachable
From rac2-priv (192.168.2.83) icmp_seq=11 Destination Host Unreachable
From rac2-priv (192.168.2.83) icmp_seq=12 Destination Host Unreachable
From rac2-priv (192.168.2.83) icmp_seq=14 Destination Host Unreachable
From rac2-priv (192.168.2.83) icmp_seq=15 Destination Host Unreachable
From rac2-priv (192.168.2.83) icmp_seq=16 Destination Host Unreachable
 
--- rac1-priv ping statistics ---
19 packets transmitted, 0 received, +6 errors, 100% packet loss, time 18000ms
, pipe 3
[root@rac2 client]# ifconfig bond1
bond1     Link encap:Ethernet  HWaddr 78:2B:CB:0D:32:49
          inet addr:192.168.2.83  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::7a2b:cbff:fe0d:3249/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:105 errors:0 dropped:0 overruns:0 frame:0
          TX packets:21457 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:6720 (6.5 KiB)  TX bytes:1373758 (1.3 MiB)
 
[root@rac2 client]# ifdown bond1
[root@rac2 client]# ifup bond1
[root@rac2 client]# ping rac1-priv
PING rac1-priv (192.168.2.81) 56(84) bytes of data.
64 bytes from rac1-priv (192.168.2.81): icmp_seq=1 ttl=64 time=0.146 ms
64 bytes from rac1-priv (192.168.2.81): icmp_seq=2 ttl=64 time=0.102 ms
64 bytes from rac1-priv (192.168.2.81): icmp_seq=3 ttl=64 time=0.085 ms
64 bytes from rac1-priv (192.168.2.81): icmp_seq=4 ttl=64 time=0.095 ms
64 bytes from rac1-priv (192.168.2.81): icmp_seq=5 ttl=64 time=0.146 ms
64 bytes from rac1-priv (192.168.2.81): icmp_seq=6 ttl=64 time=0.099 ms

啟動crs後,發現監聽和資料庫例項不能正常啟動

[oracle@rac2 ~]$ crs_stat -t -v
Name           Type           R/RA   F/FT   Target    State     Host        
----------------------------------------------------------------------
ora.master.db  application    0/0    0/1    ONLINE    ONLINE    rac1 
ora....rtdb.cs application    0/0    0/1    ONLINE    ONLINE    rac1 
ora....er1.srv application    0/0    0/0    ONLINE    ONLINE    rac1 
ora....r1.inst application    0/5    0/0    ONLINE    ONLINE    rac1 
ora....r2.inst application    0/5    0/0    ONLINE    OFFLINE               
ora....pcdb.cs application    0/0    0/1    ONLINE    ONLINE    rac1 
ora....er1.srv application    0/0    0/0    ONLINE    ONLINE    rac1 
ora....SM1.asm application    0/5    0/0    ONLINE    ONLINE    rac1 
ora....N1.lsnr application    0/5    0/0    ONLINE    ONLINE    rac1 
ora....bn1.gsd application    0/5    0/0    ONLINE    ONLINE    rac1 
ora....bn1.ons application    0/3    0/0    ONLINE    ONLINE    rac1 
ora....bn1.vip application    0/0    0/0    ONLINE    ONLINE    rac1 
ora....SM2.asm application    0/5    0/0    ONLINE    ONLINE    rac2 
ora....N2.lsnr application    0/5    0/0    ONLINE    OFFLINE               
ora....bn2.gsd application    0/5    0/0    ONLINE    ONLINE    rac2 
ora....bn2.ons application    0/3    0/0    ONLINE    ONLINE    rac2 
ora....bn2.vip application    0/0    0/0    ONLINE    ONLINE    rac2

手工啟動報錯

[oracle@rac2 ~]$ srvctl start listener -n rac2
rac2:ora.rac2.LISTENER_rac2.lsnr:
rac2:ora.rac2.LISTENER_rac2.lsnr:LSNRCTL for Linux: Version 10.2.0.3.0 - Production on 19-JUL-2014 22:05:06
rac2:ora.rac2.LISTENER_rac2.lsnr:
rac2:ora.rac2.LISTENER_rac2.lsnr:Copyright (c) 1991, 2006, Oracle.  All rights reserved.
rac2:ora.rac2.LISTENER_rac2.lsnr:
rac2:ora.rac2.LISTENER_rac2.lsnr:Starting /opt/oracle/app/database/bin/tnslsnr: please wait...
rac2:ora.rac2.LISTENER_rac2.lsnr:
rac2:ora.rac2.LISTENER_rac2.lsnr:TNSLSNR for Linux: Version 10.2.0.3.0 - Production
rac2:ora.rac2.LISTENER_rac2.lsnr:System parameter file is /opt/oracle/app/database/network/admin/listener.ora
rac2:ora.rac2.LISTENER_rac2.lsnr:Log messages written to /opt/oracle/app/database/network/log/listener_rac2.log
rac2:ora.rac2.LISTENER_rac2.lsnr:TNS-01151: Missing listener name, LISTENER_rac2, in LISTENER.ORA
rac2:ora.rac2.LISTENER_rac2.lsnr:
rac2:ora.rac2.LISTENER_rac2.lsnr:Listener failed to start. See the error message(s) above...
rac2:ora.rac2.LISTENER_rac2.lsnr:
rac2:ora.rac2.LISTENER_rac2.lsnr:
rac2:ora.rac2.LISTENER_rac2.lsnr:LSNRCTL for Linux: Version 10.2.0.3.0 - Production on 19-JUL-2014 22:05:06
rac2:ora.rac2.LISTENER_rac2.lsnr:
rac2:ora.rac2.LISTENER_rac2.lsnr:Copyright (c) 1991, 2006, Oracle.  All rights reserved.
rac2:ora.rac2.LISTENER_rac2.lsnr:
rac2:ora.rac2.LISTENER_rac2.lsnr:Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=rac2-vip)(PORT=1521)(IP=FIRST)))
rac2:ora.rac2.LISTENER_rac2.lsnr:TNS-12541: TNS:no listener
rac2:ora.rac2.LISTENER_rac2.lsnr: TNS-12560: TNS:protocol adapter error
rac2:ora.rac2.LISTENER_rac2.lsnr:  TNS-00511: No listener
rac2:ora.rac2.LISTENER_rac2.lsnr:   Linux Error: 111: Connection refused
rac2:ora.rac2.LISTENER_rac2.lsnr:Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.0.83)(PORT=1521)(IP=FIRST)))
rac2:ora.rac2.LISTENER_rac2.lsnr:TNS-12541: TNS:no listener
rac2:ora.rac2.LISTENER_rac2.lsnr: TNS-12560: TNS:protocol adapter error
rac2:ora.rac2.LISTENER_rac2.lsnr:  TNS-00511: No listener
rac2:ora.rac2.LISTENER_rac2.lsnr:   Linux Error: 111: Connection refused
CRS-0215: Could not start resource 'ora.rac2.LISTENER_rac2.lsnr'.

懷疑監聽配置listener.ora檔案出現問題,檢視

[oracle@standbydbn2 ~]$ cd /opt/oracle/app/database/network/admin/
[oracle@standbydbn2 admin]$ ls -l
total 88
-rw-r--r-- 1 oracle oinstall  240 Jul 27  2011 1
-rw-r--r-- 1 oracle oinstall  378 Jul 27  2011 listener1107271PM5834.bak
-rw-r--r-- 1 oracle oinstall  448 Jul 17 16:09 listener1407174PM0951.bak
-rw-r--r-- 1 oracle oinstall  448 Jul 17 19:00 listener1407177PM0037.bak
-rw-r--r-- 1 oracle oinstall  448 Jul 17 19:01 listener1407177PM0113.bak
-rw-r--r-- 1 oracle oinstall    0 Jul 17 22:10 listener.ora
-rw-r--r-- 1 oracle oinstall  553 Jul 27  2011 listener.ora.20110727bak
-rw-r--r-- 1 oracle oinstall  402 Oct  8  2011 listener.ora.bak
drwxr-x--- 2 oracle oinstall 4096 Jul 26  2011 samples
-rw-r----- 1 oracle oinstall  172 Dec 26  2003 shrept.lst
-rw-r--r-- 1 oracle oinstall   35 Jul 17 16:09 sqlnet1407174PM0951.bak
-rw-r--r-- 1 oracle oinstall   35 Jul 17 19:00 sqlnet1407177PM0037.bak
-rw-r--r-- 1 oracle oinstall   35 Jul 17 19:01 sqlnet1407177PM0113.bak
-rw-r--r-- 1 oracle oinstall 4130 May  7 15:17 sqlnet.log
-rw-r--r-- 1 oracle oinstall   35 Jul 17 22:10 sqlnet.ora
-rw-r--r-- 1 oracle oinstall  416 Jul 27  2011 tnsnames1107271PM5834.bak
-rw-r--r-- 1 oracle oinstall 2393 Jul 17 16:09 tnsnames1407174PM0951.bak
-rw-r--r-- 1 oracle oinstall 2393 Jul 17 19:00 tnsnames1407177PM0037.bak
-rw-r--r-- 1 oracle oinstall 2393 Jul 17 19:01 tnsnames1407177PM0113.bak
-rw-r--r-- 1 oracle oinstall 2393 Jul 17 22:10 tnsnames.ora
-rw-r--r-- 1 oracle oinstall 1736 Jul 27  2011 tnsnames.ora.20110727bak
-rw-r--r-- 1 oracle oinstall 1977 Aug  2  2011 tnsnames.ora.20110802bak
[oracle@standbydbn2 admin]$ cat listener.ora
[oracle@standbydbn2 admin]$ cat listener.ora

檔案真是空的,按照正常節點的編輯檔案,監聽正常啟動。手工啟動db,發現alter報錯

Errors in file /opt/oracle/app/admin/master/bdump/master2_dbw0_25946.trc:
ORA-01157: cannot identify/lock data file 772 - see DBWR trace file
ORA-01110: data file 772: '+DATA5/xxx/datafile/xxx.dbf'
ORA-17503: ksfdopn:2 Failed to open file +DATA5/standby/datafile/xx.dbf
ORA-15001: diskgroup "DATA5" does not exist or is not mounted
ORA-15001: diskgroup "DATA5" does not exist or is not mounted

asmcmd進入發現DATA5不存在
[oracle@rac2 admin]$ export ORACLE_SID=+ASM2

[oracle@rac2 admin]$ asmcmd
ASMCMD> lsdg
State TYPE Rebal  Unbal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Name
MOUNTED  EXTERN  N      N 512 4096 1048576 6277049 494723 0 494723 0 DATA2/ MOUNTED  EXTERN  N      N 512 4096 1048576 6655909 388388 0 388388 0 DATA3/ MOUNTED  EXTERN  N      N 512 4096 1048576 8011682 546051 0 546051 0 DATA4/ MOUNTED  EXTERN  N      N 512 4096 1048576 6578238 339569 0 339569 0 NEW_DG/ MOUNTED  EXTERN  N      N 512 4096 1048576 307196 262681 0 262681 0 REDO01/ MOUNTED  EXTERN  N      N 512 4096 1048576 307196 262681 0 262681 0 REDO02/

在正常的節點檢視data5磁碟組所包含的磁碟,
因為系統使用asmlib,使用oracleasm檢視,發現data5的都不存在,

[root@rac1 admin]# oracleasm listdisks
NEW1
NEW2
NEW3
NEW4
NEW5
NEW6
VOL1
VOL10
VOL11
VOL12
VOL13
VOL14
VOL15
VOL16
VOL17
VOL18
VOL2
VOL3
VOL4
VOL5
VOL6
VOL7
VOL8
VOL9
VOLDATA1
VOLDATA10
VOLDATA11
VOLDATA12
VOLDATA13
VOLDATA14
VOLDATA15
VOLDATA16
VOLDATA17
VOLDATA18
VOLDATA19
VOLDATA2
VOLDATA20
VOLDATA21
VOLDATA22
VOLDATA23
VOLDATA24
VOLDATA25
VOLDATA26
VOLDATA27
VOLDATA28
VOLDATA29
VOLDATA3
VOLDATA30
VOLDATA31
VOLDATA32
VOLDATA33
VOLDATA34
VOLDATA4
VOLDATA5
VOLDATA6
VOLDATA7
VOLDATA8
VOLDATA9
VOLREDO1
VOLREDO2
[root@rac1 admin]# oracleasm querydisk -p VOL1
Disk "VOL1" defines a device with no label

發現VOL1這個磁碟的Lable丟失。

[root@rac1 admin]# oracleasm  scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Cleaning disk "VOL1"
Cleaning disk "VOL16"
Cleaning disk "VOL17"
Cleaning disk "VOL2"
Cleaning disk "VOL3"
Cleaning disk "VOL4"
Cleaning disk "VOL5"
Cleaning disk "VOL7"
Cleaning disk "VOLDATA1"
Cleaning disk "VOLDATA30"
Cleaning disk "VOLDATA33"
Cleaning disk "VOLDATA4"
Cleaning disk "VOLDATA7"
Scanning system for ASM disks...

掃了下盤,擦lable都丟失。
找到以前的記錄確定了具體磁碟,使用powermt檢視盤狀態

[root@rac2 ~]# powermt display dev=emcpowerc 
Pseudo name=emcpowerc
CLARiiON ID=CKM00111201809 [R910]
Logical device ID=6006016025B12C00069BFB6DF996E011 [LUN 6]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0; 
Owner: default=SP B, current=SP B       Array failover mode: 4
==============================================================================
--------------- Host ---------------   - Stor -   -- I/O Path --  -- Stats ---
###  HW Path               I/O Paths    Interf.   Mode    State   Q-IOs Errors
==============================================================================
   3 qla2xxx                  sde       SP A1     active  alive       0      0
   4 qla2xxx                  sdm       SP B1     active  alive       0      0

kfed確定磁碟名

[root@rac1 admin]# /opt/oracle/app/database/bin/kfed read /dev/emcpowerc1 
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:              2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check:                  1205065909 ; 0x00c: 0x47d3d8b5
kfbh.fcn.base:                      173 ; 0x010: 0x000000ad
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:                    VOL1 ; 0x028: length=4
kfdhdb.grpname:                   DATA5 ; 0x048: length=5
kfdhdb.fgname:                     VOL1 ; 0x068: length=4
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             33005137 ; 0x0a8: HOUR=0x11 DAYS=0x12 MNTH=0x7 YEAR=0x7de
kfdhdb.crestmp.lo:           3051740160 ; 0x0ac: USEC=0x0 MSEC=0x177 SECS=0x1e MINS=0x2d
kfdhdb.mntstmp.hi:             33005137 ; 0x0b0: HOUR=0x11 DAYS=0x12 MNTH=0x7 YEAR=0x7de
kfdhdb.mntstmp.lo:           3060555776 ; 0x0b4: USEC=0x0 MSEC=0x318 SECS=0x26 MINS=0x2d
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                  511993 ; 0x0c4: 0x0007cff9
kfdhdb.pmcnt:                         6 ; 0x0c8: 0x00000006
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn:                      2 ; 0x0d4: 0x00000002
kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]:            65535 ; 0x0da: 0xffff
kfdhdb.redomirrors[2]:            65535 ; 0x0dc: 0xffff
kfdhdb.redomirrors[3]:            65535 ; 0x0de: 0xffff
kfdhdb.dbcompat:              168820736 ; 0x0e0: 0x0a100000
kfdhdb.grpstmp.hi:             33005137 ; 0x0e4: HOUR=0x11 DAYS=0x12 MNTH=0x7 YEAR=0x7de
kfdhdb.grpstmp.lo:           3051595776 ; 0x0e8: USEC=0x0 MSEC=0xea SECS=0x1e MINS=0x2d
kfdhdb.ub4spare[0]:                   0 ; 0x0ec: 0x00000000
kfdhdb.ub4spare[1]:                   0 ; 0x0f0: 0x00000000

備份磁碟頭

[root@rac2 admin]# dd if=/dev/emcpowerc1 of=/tmp/VOL1.50m.dd bs=1M count=50

使用oracleasm renamedisk,這裡加-f是強制修改

[root@rac2 disks]# oracleasm renamedisk  -f /dev/emcpowerc1 VOL1
Writing disk header: done
Instantiating disk "VOL1": done

兩個節點掃盤、檢視

[root@rac2 disks]#  oracleasm listdisks
VOL1
....略
 
[root@rac1 admin]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "VOL1"

按照此步驟修復了出問題的磁碟,手工mount磁碟組,庫正常開啟。

奇怪的問題,此庫剛恢復了沒幾天,正在執行竟然asmlib的label丟失了。。。。。。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25462274/viewspace-1584989/,如需轉載,請註明出處,否則將追究法律責任。

相關文章