剛裝完的資料庫報錯 ORA-01102 ORA-1102 signalled during....

quanshengaa發表於2015-01-18
   昨天剛裝完的一個資料庫在啟動的時候,報錯ORA-01102,而且安裝的時候也沒有看到哪裡有報錯資訊,一路都比較順利,
而且這也是第一次我碰到這個問題,當時我首先就檢查了alert日誌檔案,並把相關的錯誤資訊在metalink上檢視過了,
經過分析後判斷是由於程式間通訊被爭用導致,以下是我處理該問題的一個思路,並在最後附上了metalink原文以及朋友對該
問題的一個理解和處理辦法。
為什麼會發生如下錯誤,原因是多個使用者同時去訪問同一個資源就會發生獨佔模式,
因為在Linux裡面預設一個程式只被一個使用者訪問,要避免這個問題,在建立使用者的時候
指定預設去指定不同於其它使用者的優先順序就可以避免此類問題的發生。
sculkget: failed to lock /orasoft/product/10.2.0/db_1/dbs/lkWWL exclusive   同一個程式被多個使用者訪問發生了獨佔模式
sculkget: lock held by PID: 26312                                           發生獨佔模式的程式號為pid:26312
ORA-09968: Message 9968 not found; No message file for product=RDBMS, facility=ORA  並且沒有找到9968的資料訊號,同時了我們該訊號的型別
Linux Error: 11: Resource temporarily unavailable                           導致資源無法被正常利用
Additional information: 26312
Thu Nov 17 15:51:16 2011
ORA-1102 signalled during: ALTER DATABASE   MOUNT...
解決如上錯誤過程如下:
1、我們可以透過如下命令檢視到發生獨佔的程式名稱為ora_dbw0_wwl
[oracle@ora10g dbs]$ ps -ef|grep 26312
oracle   26312     1  0 15:43 ?        00:00:02 ora_dbw0_wwl
oracle   26663 26574  0 17:39 pts/1    00:00:00 grep 26312
2、進入資料庫,先關閉例項
[oracle@ora10g ~]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.1.0 - Production on Thu Nov 17 17:45:56 2011
Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL> shutdown immediate
ORA-01507: database not mounted

ORACLE instance shut down.
SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
進入到 $ORACLE_HOME/dbs,檢視到一個名為lkWWL的檔案,正常情況下是沒有這個檔案的
[oracle@ora10g ~]$ cd $ORACLE_HOME/dbs
[oracle@ora10g dbs]$ ls
hc_wwl.dat  initdw.ora  init.ora  lkWWL  orapwwwl  spfilewwl.ora
[oracle@ora10g dbs]$ su - root
口令:
透過fuser -u lkWWL 命令一看,果然果然程式沒有被釋放
[root@ora10g ~]# cd /orasoft/product/10.2.0/db_1/dbs
[root@ora10g dbs]# fuser -u lkWWL
lkWWL:               26306 26308 26310 26312 26314 26316 26318 26320 26322 26324 26326 26334 26336 26340 26354 26356
[root@ora10g dbs]# fuser -k lkWWL
lkWWL:               26306 26308 26310 26312 26314 26316 26318 26320 26322 26324 26326 26334 26336 26340 26354 26356
[root@ora10g dbs]# fuser -u lkWWL
重新啟動資料庫看看,這個時候資料庫沒有報錯了,能正常起來。
[root@ora10g dbs]# su - oracle
[oracle@ora10g ~]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.1.0 - Production on Thu Nov 17 17:47:50 2011
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area  285212672 bytes
Fixed Size                  1218992 bytes
Variable Size              92276304 bytes
Database Buffers          188743680 bytes
Redo Buffers                2973696 bytes
Database mounted.
Database opened.
SQL> col host_name format a20
SQL> select host_name,instance_name,status from v$instance
HOST_NAME            INSTANCE_NAME    STATUS
-------------------- ---------------- ------------
ora10g.localdomain   wwl              OPEN
SQL>

Metalink 原文如下:
analysis:
Problem Description:
====================  
You are trying to startup the database and you receive the following error: 
     ORA-01102:  cannot mount database in EXCLUSIVE mode
       Cause:  Some other instance has the database mounted exclusive 
               or shared.
      Action: Shutdown other instance or mount in a compatible mode.
    Problem Explanation:
====================  
A database is started in EXCLUSIVE mode by default.  Therefore, the 
ORA-01102 error is misleading and may have occurred due to one of the 
following reasons:  
  - there is still an "sgadef.dbf" file in the "ORACLE_HOME/dbs"
    directory 
  - the processes for Oracle (pmon, smon, lgwr and dbwr) still exist
  - shared memory segments and semaphores still exist even though the 
    database has been shutdown
  - there is a "ORACLE_HOME/dbs/lk" file
   Search Words:
=============  
ORA-1102, crash, immediate, abort, fail, fails, migration
Solution Description:
=====================  
Verify that the database was shutdown cleanly by doing the following:
  1. Verify that there is not a "sgadef.dbf" file in the directory
   "ORACLE_HOME/dbs".    
        % ls $ORACLE_HOME/dbs/sgadef.dbf
     If this file does exist, remove it.  
        % rm $ORACLE_HOME/dbs/sgadef.dbf  
2. Verify that there are no background processes owned by "oracle" 
          % ps -ef | grep ora_ | grep $ORACLE_SID
     If background processes exist, remove them by using the Unix 
   command "kill".  For example:
          % kill -9
  3. Verify that no shared memory segments and semaphores that are owned 
   by "oracle" still exist
          % ipcs -b
     If there are shared memory segments and semaphores owned by "oracle",
   remove the shared memory segments 
          % ipcrm -m
     and remove the semaphores 
          % ipcrm -s
     NOTE:  The example shown above assumes that you only have one 
          database on this machine.  If you have more than one
          database, you will need to shutdown all other databases
          before proceeding with Step 4.
  4. Verify that the "$ORACLE_HOME/dbs/lk" file does not exist
  5. Startup the instance
    Solution Explanation:
=====================  
The "lk" and "sgadef.dbf" files are used for locking shared memory.  It seems that even though no memory is allocated, Oracle thinks memory is  still locked.  By removing the "sgadef" and "lk" files you remove any knowledge oracle has of shared memory that is in use. Now the database can start.
 
我朋友對該問題的理解和解決辦法如下:
出現1102錯誤可能有以下幾種可能:
一、在HA系統中,已經有其他節點啟動了例項,將雙機共享的資源(如磁碟陣列上的裸裝置)佔用了;
 
二、說明Oracle被異常關閉時,有資源沒有被釋放,一般有以下幾種可能,
1、Oracle的共享記憶體段或訊號量沒有被釋放;
2、Oracle的後臺程式(如SMON、PMON、DBWn等)沒有被關閉;
3、用於鎖記憶體的檔案lk和sgadef.dbf檔案沒有被刪除。
 
solution:
method1:
首先,雖然我們的系統是HA系統,但是備節點的例項始終處在關閉狀態,這點透過在備節點上查資料庫狀態可以證實。
其次、是因系統掉電引起資料庫當機的,系統在接電後被重啟,因此我們排除了第二種可能種的1、2點。最可疑的就是第3點了。
查$ORACLE_HOME/dbs目錄:
$ cd $ORACLE_HOME/dbs
$ ls sgadef*
sgadef* not found
$ ls lk*
lkORA92

果然,lk檔案沒有被刪除。將它刪除掉
$ rm lk*

再啟動資料庫,成功。
 
如果懷疑是共享記憶體沒有被釋放,可以用以下命令檢視:
$ipcs -mop
IPC status from /dev/kmem as of Thu Jul  6 14:41:43 2006
T      ID     KEY        MODE        OWNER     GROUP NATTCH  CPID  LPID
Shared Memory:
m       0 0x411c29d6 --rw-rw-rw-      root      root      0   899   899
m       1 0x4e0c0002 --rw-rw-rw-      root      root      2   899   901
m       2 0x4120007a --rw-rw-rw-      root      root      2   899   901
m  458755 0x0c6629c9 --rw-r-----      root       sys      2  9113 17065
m       4 0x06347849 --rw-rw-rw-      root      root      1  1661  9150
m   65541 0xffffffff --rw-r--r--      root      root      0  1659  1659
m  524294 0x5e100011 --rw-------      root      root      1  1811  1811
m  851975 0x5fe48aa4 --rw-r-----    oracle  oinstall     66  2017 25076

然後它ID號清除共享記憶體段:
$ipcrm –m 851975

對於訊號量,可以用以下命令檢視:
$ ipcs -sop
IPC status from /dev/kmem as of Thu Jul  6 14:44:16 2006
T      ID     KEY        MODE        OWNER     GROUP
Semaphores:
s       0 0x4f1c0139 --ra-------      root      root
... ...
s      14 0x6c200ad8 --ra-ra-ra-      root      root
s      15 0x6d200ad8 --ra-ra-ra-      root      root
s      16 0x6f200ad8 --ra-ra-ra-      root      root
s      17 0xffffffff --ra-r--r--      root      root
s      18 0x410c05c7 --ra-ra-ra-      root      root
s      19 0x00446f6e --ra-r--r--      root      root
s      20 0x00446f6d --ra-r--r--      root      root
s      21 0x00000001 --ra-ra-ra-      root      root
s   45078 0x67e72b58 --ra-r-----    oracle  oinstall

根據訊號量ID,用以下命令清除訊號量:
$ipcrm -s 45078

如果是Oracle程式沒有關閉,用以下命令查出存在的oracle程式:
$ ps -ef|grep ora
  oracle 29976     1  0  Jun 22  ?         0:52 ora_dbw0_ora92
  oracle 29978     1  0  Jun 22  ?         0:51 ora_dbw1_ora92
  oracle  5128     1  0  Jul  5  ?         0:00 oracleora92 (LOCAL=NO)
... ...

然後用kill -9命令殺掉程式
$kill -9

method 2
[root@qa-oracle dbs]# fuser -u lkNDMSQA
lkNDMSQA:             6666(oracle)  6668(oracle)  6670(oracle)  6672(oracle)  6674(oracle)  6676(oracle)  6678(oracle)  6680(oracle)  6690(oracle)  6692(oracle)  6694(oracle)  6696(oracle)  6737(oracle)  6830(oracle)
果然該檔案沒釋放,用fuser命令kill掉:
[root@qa-oracle dbs]# fuser -k lkNDMSQA
lkNDMSQA:             6666  6668  6670  6672  6674  6676  6678  6680  6690  6692  6694  6696  6737  6830
[root@qa-oracle dbs]# fuser -u lkNDMSQA

總結:
當發生1102錯誤時,可以按照以下流程檢查、排錯:
如果是HA系統,檢查其他節點是否已經啟動例項;
檢查Oracle程式是否存在,如果存在則殺掉程式;
檢查訊號量是否存在,如果存在,則清除訊號量;
檢查共享記憶體段是否存在,如果存在,則清除共享記憶體段;
檢查鎖記憶體檔案lk和sgadef.dbf是否存在,如果存在,則刪除。
ORA-09968: unable to lock file lk$ORACLE_SID (2010-03-04 14:53)
分類: DBA
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
starting up 1 shared server(s) ...
Thu Mar  4 11:48:07 2010
ALTER DATABASE   MOUNT
Thu Mar  4 11:48:07 2010
sculkget: failed to lock /u01/app/oracle/product/10.2.0/db_1/dbs/lkFDS exclusive
sculkget: lock held by PID: 3443
Thu Mar  4 11:48:07 2010
ORA-09968: unable to lock file
Linux Error: 11: Resource temporarily unavailable
Additional information: 3443
Thu Mar  4 11:48:07 2010
ORA-1102 signalled during: ALTER DATABASE   MOUNT...
提示程式3443鎖定該資源,根據上次的啟動日誌發現該程式是Oracle的後臺程式
DBWR,根據文件提示236794.1可能是該程式已經掛死,導致資料庫無法正常執行。
fuser -u /u01/app/oracle/product/10.2.0/db_1/dbs/lkFDS
 

PMON started with pid=2, OS id=3437
MMAN started with pid=4, OS id=3441
PSP0 started with pid=3, OS id=3439
DBW0 started with pid=5, OS id=3443
LGWR started with pid=6, OS id=3445
CKPT started with pid=7, OS id=3447
SMON started with pid=8, OS id=3449
RECO started with pid=9, OS id=3451
CJQ0 started with pid=10, OS id=3453
MMON started with pid=11, OS id=3455
Tue Feb 16 11:08:17 2010
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=12, OS id=3457
Tue Feb 16 11:08:17 2010
starting up 1 shared server(s) ...
Tue Feb 16 11:08:18 2010
ALTER DATABASE   MOUNT
Tue Feb 16 11:08:22 2010
Setting recovery target incarnation to 2
Tue Feb 16 11:08:22 2010
Successful mount of redo thread 1, with mount id 1844152034
Tue Feb 16 11:08:22 2010
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Tue Feb 16 11:08:22 2010
ALTER DATABASE OPEN
losf 檢視鎖定程式
# lsof |grep lkFDS                                       
oracle     4476 oracle   17uR     REG        8,7         24    2911344 /var/oracle/product/10.2.0/db_1/dbs/lkFDS
oracle     4478 oracle   15uR     REG        8,7         24    2911344 /var/oracle/product/10.2.0/db_1/dbs/lkFDS
oracle     4480 oracle   15uR     REG        8,7         24    2911344 /var/oracle/product/10.2.0/db_1/dbs/lkFDS
oracle     4482 oracle   15uR     REG        8,7         24    2911344 /var/oracle/product/10.2.0/db_1/dbs/lkFDS
oracle     4484 oracle   15uR     REG        8,7         24    2911344 /var/oracle/product/10.2.0/db_1/dbs/lkFDS
oracle     4486 oracle   15uR     REG        8,7         24    2911344 /var/oracle/product/10.2.0/db_1/dbs/lkFDS
oracle     4488 oracle   15uR     REG        8,7         24    2911344 /var/oracle/product/10.2.0/db_1/dbs/lkFDS
oracle     4490 oracle   15uR     REG        8,7         24    2911344 /var/oracle/product/10.2.0/db_1/dbs/lkFDS
oracle     4492 oracle   15uR     REG        8,7         24    2911344 /var/oracle/product/10.2.0/db_1/dbs/lkFDS
oracle     4494 oracle   15uR     REG        8,7         24    2911344 /var/oracle/product/10.2.0/db_1/dbs/lkFDS
oracle     4496 oracle   15uR     REG        8,7         24    2911344 /var/oracle/product/10.2.0/db_1/dbs/lkFDS
oracle     4513 oracle   15u      REG        8,7         24    2911344 /var/oracle/product/10.2.0/db_1/dbs/lkFDS
oracle     4531 oracle   15u      REG        8,7         24    2911344 /var/oracle/product/10.2.0/db_1/dbs/lkFDS
oracle     4534 oracle   15u      REG        8,7         24    2911344 /var/oracle/product/10.2.0/db_1/dbs/lkFDS
oracle     4812 oracle   15u      REG        8,7         24    2911344 /var/oracle/product/10.2.0/db_1/dbs/lkFDS
fuser檢視鎖定程式
# fuser -u /u01/app/oracle/product/10.2.0/db_1/dbs/lkFDS 
/u01/app/oracle/product/10.2.0/db_1/dbs/lkFDS:  4476(oracle)  4478(oracle)  4480(oracle)  4482(oracle)  4484(oracle)  4486(oracle)  4488(oracle)  4490(oracle)  4492(oracle)  4494(oracle)  4496(oracle)  4513(oracle)  4531(oracle)  4534(oracle)  4812(oracle)
[root@CHN-DG-3-5CE ~]#
請教fuser的作用及具體用法!
fuser Command
Purpose
Identifies processes using a file or file structure.
Syntax
fuser [ -c | -d | -f ] [ -k ] [ -u ] [ -x ] [ -V ]File ...
Description
The fuser command lists the process numbers of local processes that use the
local or remote files specified by the File parameter. For block special
devices, the command lists the processes that use any file on that device.

c Uses the file as the current directory.
e Uses the file as a program's executable object.
r Uses the file as the root directory.
s Uses the file as a shared library (or other loadable object).
The process numbers are written to standard output in a line with spaces between
process numbers. A new line character is written to standard error after the
last output for each file operand. All other output is written to standard
error.
The fuser command will not detect processes that have mmap regions where that
associated file descriptor has since been closed.
Flags
-c Reports on any open files in the file system containing File.
-d Implies the use of the -c and -x flags. Reports on any open files which have
been unlinked from the file system (deleted from the parent directory). When
of the deleted file.
-f Reports on open instances of File only.
-k Sends the SIGKILL signal to each local process. Only the root user can kill a
process of another user.
-u Provides the login name for local processes in parentheses after the process
number.
-V Provides verbose output.
-x Used in conjunction with -c or -f, reports on executable and loadable objects
in addition to the standard fuser output.
Examples
  1. To list the process numbers of local processes using the /etc/passwd file,
     enter:
     fuser /etc/passwd
  2. To list the process numbers and user login names of processes using the
     fuser -u /etc/filesystems
  3. To terminate all of the processes using a given file system, enter:
     fuser -k -x -u /dev/hd1 -OR-
     fuser -kxuc /home
     Either command lists the process number and user name, and then terminates
     each process that is using the /dev/hd1 (/home) file system. Only the root
     user can terminate processes that belong to another user. You might want to
     use this command 
 
 

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/15797451/viewspace-1405814/,如需轉載,請註明出處,否則將追究法律責任。

相關文章