ORA-15055: unable to connect to ASM instance 造成 RAC的db不能啟動

xz43發表於2013-10-08
$ crsctl status res ora.racdb.db
NAME=ora.racdb.db
TYPE=ora.database.type
TARGET=ONLINE , ONLINE
STATE=OFFLINE, OFFLINE

啟動crs,發現db服務沒起來,如是嘗試單獨啟動

$ crs_start ora.racdb.db
Attempting to start `ora.racdb.db` on member `rac01`
Attempting to start `ora.racdb.db` on member `rac02`
ORA-00205: error in identifying control file, check alert log for more info
Start of `ora.racdb.db` on member `rac01` failed.
Attempting to stop `ora.racdb.db` on member `rac01`
ORA-00205: error in identifying control file, check alert log for more info
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory
Process ID: 0
Session ID: 0 Serial number: 0

Start of `ora.racdb.db` on member `rac02` failed.
Attempting to stop `ora.racdb.db` on member `rac02`
Stop of `ora.racdb.db` on member `rac01` succeeded.
CRS-2632: There are no more servers to try to place resource 'ora.racdb.db' on that would satisfy its placement policy
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory
Process ID: 0
Session ID: 0 Serial number: 0

Stop of `ora.racdb.db` on member `rac02` succeeded.
CRS-0223: Resource 'ora.racdb.db 1 1' has placement error.

CRS-0215: Could not start resource 'ora.racdb.db 2 1'.

還是不能啟動。

檢查oracle警告日誌
# tail -200 /app/oracle/product/oracle/diag/rdbms/racdb/racdb1/trace/alert_racdb1.log

lmon registered with NM - instance number 1 (internal mem no 0)


***********************************************************************

Fatal NI connect error 12547, connecting to:
 (DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/app/grid/11.2.0/bin/oracle)(ARGV0=oracle+ASM1_asmb_racdb1)(ENVS='ORACLE_HOME=/app/grid/11.2.0,ORACLE_SID=+ASM1')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(enable=setuser)(CONNECT_DATA=(CID=(PROGRAM=oracle@rac01)(HOST=rac01)(USER=oracle))))

  VERSION INFORMATION:
        TNS for Linux: Version 11.2.0.1.0 - Production
        Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.1.0 - Production
  Time: 08-OCT-2013 14:38:07
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12547
    
TNS-12547: TNS:lost contact
    ns secondary err code: 12560
    nt main err code: 517
    
TNS-00517: Lost contact
    nt secondary err code: 32
    nt OS err code: 0
ERROR: Failed to connect with connect string: (DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/app/grid/11.2.0/bin/oracle)(ARGV0=oracle+ASM1_asmb_racdb1)(ENVS='ORACLE_HOME=/app/grid/11.2.0,ORACLE_SID=+ASM1')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(enable=setuser))
Errors in file /app/oracle/product/oracle/diag/rdbms/racdb/racdb1/trace/racdb1_asmb_7238.trc:
ORA-15055: unable to connect to ASM instance
ORA-12547: TNS:lost contact
Reconfiguration started (old inc 0, new inc 4)
List of instances:
 1 2 (myinst: 1) 
 Global Resource Directory frozen
* allocate domain 0, invalid = TRUE 
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info 
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
Reconfiguration complete
Tue Oct 08 14:38:08 2013
LCK0 started with pid=23, OS id=7254 
Starting background process RSMN
Tue Oct 08 14:38:08 2013
RSMN started with pid=28, OS id=7257 
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /app/grid/crs
Tue Oct 08 14:38:09 2013
ALTER DATABASE MOUNT
This instance was first to mount
Starting background process ASMB
Tue Oct 08 14:38:09 2013
ASMB started with pid=30, OS id=7267 


***********************************************************************

Fatal NI connect error 12547, connecting to:
 (DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/app/grid/11.2.0/bin/oracle)(ARGV0=oracle+ASM1_asmb_racdb1)(ENVS='ORACLE_HOME=/app/grid/11.2.0,ORACLE_SID=+ASM1')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(enable=setuser)(CONNECT_DATA=(CID=(PROGRAM=oracle@rac01)(HOST=rac01)(USER=oracle))))

  VERSION INFORMATION:
        TNS for Linux: Version 11.2.0.1.0 - Production
        Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.1.0 - Production
  Time: 08-OCT-2013 14:38:09
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12547
    
TNS-12547: TNS:lost contact
    ns secondary err code: 12560
    nt main err code: 517
    
TNS-00517: Lost contact
    nt secondary err code: 32
    nt OS err code: 0
ERROR: Failed to connect with connect string: (DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/app/grid/11.2.0/bin/oracle)(ARGV0=oracle+ASM1_asmb_racdb1)(ENVS='ORACLE_HOME=/app/grid/11.2.0,ORACLE_SID=+ASM1')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(enable=setuser))
Starting background process ASMB
Tue Oct 08 14:38:09 2013
ASMB started with pid=31, OS id=7280 


***********************************************************************

Fatal NI connect error 12547, connecting to:
 (DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/app/grid/11.2.0/bin/oracle)(ARGV0=oracle+ASM1_asmb_racdb1)(ENVS='ORACLE_HOME=/app/grid/11.2.0,ORACLE_SID=+ASM1')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(enable=setuser)(CONNECT_DATA=(CID=(PROGRAM=oracle@rac01)(HOST=rac01)(USER=oracle))))

  VERSION INFORMATION:
        TNS for Linux: Version 11.2.0.1.0 - Production
        Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.1.0 - Production
  Time: 08-OCT-2013 14:38:09
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12547
    
TNS-12547: TNS:lost contact
    ns secondary err code: 12560
    nt main err code: 517
    
TNS-00517: Lost contact
    nt secondary err code: 32
    nt OS err code: 0
ERROR: Failed to connect with connect string: (DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/app/grid/11.2.0/bin/oracle)(ARGV0=oracle+ASM1_asmb_racdb1)(ENVS='ORACLE_HOME=/app/grid/11.2.0,ORACLE_SID=+ASM1')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(enable=setuser))
ORA-00210: cannot open the specified control file
ORA-00202: control file: '+DATA/racdb/control02.ctl'
ORA-17503: ksfdopn:2 Failed to open file +DATA/racdb/control02.ctl
ORA-15001: diskgroup "DATA" does not exist or is not mounted
ORA-15055: unable to connect to ASM instance
ORA-12547: TNS:lost contact
ORA-00210: cannot open the specified control file
ORA-00202: control file: '+DATA/racdb/control01.ctl'
ORA-17503: ksfdopn:2 Failed to open file +DATA/racdb/control01.ctl
ORA-15001: diskgroup "DATA" does not exist or is not mounted
ORA-15055: unable to connect to ASM instance
ORA-12547: TNS:lost contact
Starting background process ASMB
ORA-205 signalled during: ALTER DATABASE MOUNT...
Tue Oct 08 14:38:09 2013
ASMB started with pid=30, OS id=7290 
Tue Oct 08 14:38:09 2013
Shutting down instance (abort)
License high water mark = 1
USER (ospid: 7294): terminating the instance
Instance terminated by USER, pid = 7294
Tue Oct 08 14:38:11 2013
Instance shutdown complete

發現 asm 報錯,不能連線。如是,找到asm的日誌

# tail -f /app/grid/crs/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log 
...
Tue Oct 08 14:57:17 2013
Errors in file /app/grid/crs/diag/asm/+asm/+ASM1/trace/+ASM1_ora_8661.trc:
ORA-27070: async read/write failed
WARNING: IO Failed. group:0 disk(number.incarnation):0.0xe96ba85e disk_path:/dev/random
         AU:0 disk_offset(bytes):0 io_size:4096 operation:Read type:asynchronous
         result:I/O error process_id:8661
Errors in file /app/grid/crs/diag/asm/+asm/+ASM1/trace/+ASM1_ora_8661.trc:
ORA-27070: async read/write failed
WARNING: IO Failed. group:0 disk(number.incarnation):0.0xe96ba85e disk_path:/dev/random
         AU:0 disk_offset(bytes):0 io_size:4096 operation:Read type:asynchronous
         result:I/O error process_id:8661

根據日誌資訊找到報錯的trace檔案。

# tail -100 /app/grid/crs/diag/asm/+asm/+ASM1/trace/+ASM1_ora_8661.trc
...
*** 2013-10-08 14:57:17.778
ORA-27070: async read/write failed
WARNING: IO Failed. group:0 disk(number.incarnation):0.0xe96ba85e disk_path:/dev/random
         AU:0 disk_offset(bytes):0 io_size:4096 operation:Read type:asynchronous
         result:I/O error process_id:8661
         subsys:System iop:0x7fd245f3f108 bufp:0x7fd246078600 osderr:0x0 osderr1:0x0

*** 2013-10-08 14:57:27.144
ORA-27070: async read/write failed
WARNING: IO Failed. group:0 disk(number.incarnation):0.0xe96ba85e disk_path:/dev/random
         AU:0 disk_offset(bytes):0 io_size:4096 operation:Read type:asynchronous
         result:I/O error process_id:8661
         subsys:System iop:0x7fd245f3efe0 bufp:0x7fd246078600 osderr:0x0 osderr1:0x0

貌似是 disk_path:/dev/random 的 IO錯誤。奇怪,怎麼多了這個磁碟,於是切換到grid使用者

SQL> select name,path from v$asm_disk;

NAME                           PATH
------------------------------ --------------------------------------------------
                               /dev/random
FRA_0000                       /dev/asm-diskd
CRS_0000                       /dev/asm-diskb
DATA_0000                      /dev/asm-diskc

發現asm disk磁碟裡面多了一條記錄“/dev/random”。

SQL> select name,state,total_mb from v$asm_diskgroup;

NAME                           STATE         TOTAL_MB
------------------------------ ----------- ----------
FRA                            MOUNTED          51199
CRS                            MOUNTED          51199
DATA                           MOUNTED          51199

SQL> select group_number, name, path, total_mb, free_mb, total_mb-free_mb used_mb from v$asm_disk_stat;

GROUP_NUMBER NAME                           PATH                                                 TOTAL_MB    FREE_MB    USED_MB
------------ ------------------------------ -------------------------------------------------- ---------- ---------- ----------
           3 FRA_0000                       /dev/asm-diskd                                          51199      51097        102
           1 CRS_0000                       /dev/asm-diskb                                          51199      50803        396
           2 DATA_0000                      /dev/asm-diskc                                          51199      45917       5282


經過朋友指點
$ sqlplus / as sysasm
SQL> show parameter disk

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                       string      DATA, FRA
asm_diskstring                       string      /dev

修改第二個引數,把範圍進一步精確。
SQL> alter system set asm_diskstring='/dev/asm-disk*' scope=both sid='+ASM1';
SQL> alter system set asm_diskstring='/dev/asm-disk*' scope=both sid='+ASM2';
修改後,重啟crs服務。分別在2個節點上執行如下:
# /app/grid/11.2.0/bin/crsctl stop crs
# /app/grid/11.2.0/bin/crsctl start crs

自此,不再報關於“/dev/random”磁碟IO錯誤。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/9399028/viewspace-773919/,如需轉載,請註明出處,否則將追究法律責任。

相關文章