又是訊號量和共享記憶體沒有釋放的db待機

dotaddjj發表於2012-04-15

連線以前公司工作的一個生產庫,出現了下列錯誤。

SQL> conn / as sysdba

Connected to an idle instance.

SQL> select * from dual;

select * from dual

*

ERROR at line 1:

ORA-01012: not logged on

SQL> conn / as sysdba

Connected to an idle instance.

以為是資料庫內部hang住了,用hanganalyze工具檢視下是否有直接在os可以kill的程式

SQL> oradebug setmypid

Statement processed.

SQL> oradebug hanganalyze 3

Hang Analysis in /usr/oracle/admin/jhql/udump/jhql_ora_24770.trc

SQL>

SQL> quit

Disconnected

警告日誌並沒有明顯的錯誤

Sun Apr 15 12:12:44 2012

Job queue slave processes stopped

Waiting for shared server 'S000' to die

All dispatchers and shared servers shutdown

Sun Apr 15 12:14:45 2012

Starting ORACLE instance (normal)

[oracle@localhost bdump]$ less /usr/oracle/admin/jhql/udump/jhql_ora_24770.trc

==============

Open chains found:

Other chains found:

Chain 1 : :

<0/1649/1/0xbd1b5bc0/28452/No Wait>

Extra information that will be dumped at higher levels:

[level 5] : 1 node dumps -- [SINGLE_NODE] [SINGLE_NODE_NW] [IGN_DMP]

[level 10] : 7 node dumps -- [IGN]

State of nodes

([nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predec

essor):

[1647]/0/1648/1/0xbdd41f98/28454/IGN/1/2//none

[1648]/0/1649/1/0xbdd43500/28452/SINGLE_NODE_NW/3/4//none

[1649]/0/1650/1/0xbdd44a68/28450/IGN/5/6//none

[1650]/0/1651/1/0xbdd45fd0/28448/IGN/7/8//none

[1651]/0/1652/1/0xbdd47538/28446/IGN/9/10//none

[1652]/0/1653/1/0xbdd48aa0/28444/IGN/11/12//none

[1653]/0/1654/1/0xbdd4a008/28442/IGN/13/14//none

[1654]/0/1655/1/0xbdd4b570/28440/IGN/15/16//none

====================

END OF HANG ANALYSIS

====================

檢視hanganalyze的日誌,並沒有明顯的等待事件,想起以前也碰見過db意外中斷,是訊號量和共享記憶體未正常分配釋放導致

[oracle@localhost ~]$ ipcs -m

------ Shared Memory Segments --------

key shmid owner perms bytes nattch status

0x01b9ced8 2162689 oracle 640 1612709888 8

0x00000000 1802251 root 644 790528 2 dest

0x00000000 1835020 root 644 790528 2 dest

0x00000000 1867789 root 644 790528 2 dest

0x00000000 1900558 root 644 790528 2 dest

0x00000000 1933327 root 644 790528 2 dest

0x00000000 1966096 root 644 790528 2 dest

0x00000000 2064403 root 644 790528 2 dest

0x00000000 2097172 root 644 790528 2 dest

0x00000000 2129941 root 644 790528 2 dest

[oracle@localhost ~]$ ipcrm -m 2162689

[oracle@localhost ~]$ ipcs

------ Shared Memory Segments --------

key shmid owner perms bytes nattch status

0x00000000 2162689 oracle 640 1612709888 8 dest

0x00000000 1802251 root 644 790528 2 dest

0x00000000 1835020 root 644 790528 2 dest

0x00000000 1867789 root 644 790528 2 dest

0x00000000 1900558 root 644 790528 2 dest

0x00000000 1933327 root 644 790528 2 dest

0x00000000 1966096 root 644 790528 2 dest

0x00000000 2064403 root 644 790528 2 dest

0x00000000 2097172 root 644 790528 2 dest

0x00000000 2129941 root 644 790528 2 dest

------ Semaphore Arrays --------

key semid owner perms nsems

0x00000000 3080223 root 666 1

0x7697d424 3801120 oracle 640 1504

------ Message Queues --------

key msqid owner perms used-bytes messages

[oracle@localhost ~]$ ipcrm -s 3801120

[oracle@localhost ~]$ ipcs -a

------ Shared Memory Segments --------

key shmid owner perms bytes nattch status

0x00000000 1802251 root 644 790528 2 dest

0x00000000 1835020 root 644 790528 2 dest

0x00000000 1867789 root 644 790528 2 dest

0x00000000 1900558 root 644 790528 2 dest

0x00000000 1933327 root 644 790528 2 dest

0x00000000 1966096 root 644 790528 2 dest

0x00000000 2064403 root 644 790528 2 dest

0x00000000 2097172 root 644 790528 2 dest

0x00000000 2129941 root 644 790528 2 dest

------ Semaphore Arrays --------

key semid owner perms nsems

0x00000000 3080223 root 666 1

------ Message Queues --------

key msqid owner perms used-bytes messages

已經ipcrm手工釋放了分配的記憶體和訊號量

[oracle@localhost ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Sun Apr 15 12:33:18 2012

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

Connected to an idle instance.

SQL> select * from dual;

select * from dual

*

ERROR at line 1:

ORA-01034: ORACLE not available

SQL> startup;

ORACLE instance started.

Total System Global Area 1610612736 bytes

Fixed Size 2084296 bytes

Variable Size 1056965176 bytes

Database Buffers 536870912 bytes

Redo Buffers 14692352 bytes

Database mounted.

Database opened.

啟動後檢視警告日誌:

Sun Apr 15 12:32:31 2012

MMNL absent for 1201 secs; Foregrounds taking over

Sun Apr 15 12:32:32 2012

Errors in file /usr/oracle/admin/jhql/bdump/jhql_mman_28444.trc:

ORA-27157: Message 27157 not found; No message file for product=RDBMS, facility=ORA

ORA-27300: Message 27300 not found; No message file for product=RDBMS, facility=ORA; arguments: [semop] [43]

ORA-27301: Message 27301 not found; No message file for product=RDBMS, facility=ORA; arguments: [Identifier removed]

ORA-27302: Message 27302 not found; No message file for product=RDBMS, facility=ORA; arguments: [sskgpwwait1]

Sun Apr 15 12:32:32 2012

MMAN: terminating instance due to error 27157

Instance terminated by MMAN, pid = 28444

Sun Apr 15 12:32:33 2012

Errors in file /usr/oracle/admin/jhql/bdump/jhql_mman_28444.trc:

ORA-27300: Message 27300 not found; No message file for product=RDBMS, facility=ORA; arguments: [semctl] [22]

ORA-27301: Message 27301 not found; No message file for product=RDBMS, facility=ORA; arguments: [Invalid argument]

ORA-27302: Message 27302 not found; No message file for product=RDBMS, facility=ORA; arguments: [sskgpwrm1]

ORA-27157: Message 27157 not found; No message file for product=RDBMS, facility=ORA

ORA-27300: Message 27300 not found; No message file for product=RDBMS, facility=ORA; arguments: [semop] [43]

看來是後臺程式mman mmnl關閉db時出現程式釋放錯誤,直接用ipcs –m|-s|-q 檢視os的共享記憶體,訊息佇列等是否分配釋放完畢,再次告訴我們作業系統的診斷工具是多麼重要。

[@more@]

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25362835/viewspace-1057907/,如需轉載,請註明出處,否則將追究法律責任。

相關文章