【DB】徹底搞清楚library cache lock的成因和解決方法(一)

xysoul_雲龍發表於2016-11-08

問題描述:
接到應用人員的報告,說是在任何對錶CSNOZ629926699966的操作都會hang,包括desc CSNOZ629926699966,
例如:

 > sqlplus 

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 10:11:06 2005

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production

SQL> conn pubuser/pubuser
Connected.
SQL> desc CSNOZ629926699966

。。。

這個程式 hang 了

。。。

 

詢問了一下之前有無特別的操作,業務人員說很久以前執行了指令碼,但是該教本執行很久都沒有結果,然後他就退出了會話,再之後,就出現了上面的情況。指令碼內容如下:
$ cat CSNOZ629926699966.sh
#!/bin/sh
sqlplus  << EOF  #use your username/password

create table CSNOZ629926699966 as select * from CSNOZ62992266cs
where mid not in ( select mid from  where servid='020999011964' and status in ('A','B','S'));

exit;



$

解決過程:
 > sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 10:19:13 2005

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production

SQL> select * from v$lock where block=1;

no rows selected

SQL> /

no rows selected

SQL> /

no rows selected

SQL> 
我們看到目前沒有鎖的資訊

SQL> select xidusn, object_id, session_id, locked_mode from v$locked_object;

。。。   

 XIDUSN  OBJECT_ID SESSION_ID LOCKED_MODE
---------- ---------- ---------- -----------
        14         18         37           3

。。。

SQL> /

。。。   

 XIDUSN  OBJECT_ID SESSION_ID LOCKED_MODE
---------- ---------- ---------- -----------
        14         18         37           3

。。。

SQL> /

。。。   

 XIDUSN  OBJECT_ID SESSION_ID LOCKED_MODE
---------- ---------- ---------- -----------
        14         18         37           3

。。。

SQL> 
查詢 v$locked_object,我們發現了一個可疑的會話,SID 37:

SQL> select object_name,owner,object_type from dba_objects where object_id=18;

。。。 。。。

OBJECT_NAME                    OWNER                          OBJECT_TYPE
------------------------------ ------------------------------ ------------------
OBJ$                           SYS                            TABLE

 

。。。 。。。

SQL>

奇怪怎麼一直有這個鎖??
初步猜測是由於SID為37的會話執行了上面的DDL語句,並在語句未完成前異常退出,
造成了所有訪問那個(DDL語句中涉及到的)物件的程式都hang了。


接下來我們看看等待事件:
SQL> select event,sid,p1,p2,p3 from v$session_wait where event not like 'SQL*%' and event not like 'rdbms%';

EVENT                                                                    P1         P2        SID
---------------------------------------------------------------- ---------- ---------- ----------
pmon timer                                                              300          0          1
ges remote message                                                       32          0          4
gcs remote message                                                       64          0          5
gcs remote message                                                       64          0          7
smon timer                                                              300          0         19
library cache lock                                               1.3835E+19 1.3835E+19         30
wakeup time manager                                                       0          0         22

7 rows selected.

SQL> /

EVENT                                                                    P1         P2        SID
---------------------------------------------------------------- ---------- ---------- ----------
pmon timer                                                              300          0          1
ges remote message                                                       32          0          4
gcs remote message                                                       64          0          5
gcs remote message                                                       64          0          7
smon timer                                                              300          0         19
library cache lock                                               1.3835E+19 1.3835E+19         30
wakeup time manager                                                       0          0         22

7 rows selected.

SQL> /

EVENT                                                                    P1         P2        SID
---------------------------------------------------------------- ---------- ---------- ----------
pmon timer                                                              300          0          1
ges remote message                                                       32          0          4
gcs remote message                                                       64          0          5
gcs remote message                                                       64          0          7
smon timer                                                              300          0         19
library cache lock                                               1.3835E+19 1.3835E+19         30
wakeup time manager                                                       0          0         22

7 rows selected.

SQL> /

EVENT                                                                    P1         P2        SID
---------------------------------------------------------------- ---------- ---------- ----------
pmon timer                                                              300          0          1
ges remote message                                                       32          0          4
gcs remote message                                                       64          0          5
gcs remote message                                                       64          0          7
smon timer                                                              300          0         19
library cache lock                                               1.3835E+19 1.3835E+19         30
wakeup time manager                                                       0          0         22

7 rows selected.

SQL>


我們注意到下面的事件:
EVENT                                                                    P1         P2        SID
---------------------------------------------------------------- ---------- ---------- ----------
。。。

library cache lock                                               1.3835E+19 1.3835E+19         30

。。。

P1 是控制程式碼地址(handle address),也就是'library cache lock'發生的地址。
P2 是一個狀態物件,在這裡,它表示在物件上載入的鎖的地址(lock address)。
P1 和 P2都是科學計數發表示的10進位制數。

這些資訊再次證實了上面的猜測,SID 37阻塞了SID 30。

找出這兩個可疑程式的sid和serial,然後對他們設定10046事件:
SQL> select sid,serial# from v$session where sid in (30,37);

       SID    SERIAL#
---------- ----------
        30      24167
        37       2707

SQL> exec dbms_system.set_ev(30,24167,10046,12,'');

PL/SQL procedure successfully completed.

SQL> exec dbms_system.set_ev(37,2707,10046,12,'');

PL/SQL procedure successfully completed.

SQL>

跟蹤期間我們們再次測試一下,看看有沒有其他線索。

新開一個程式,找出其sid, serial和spid等資訊:
 > sqlplus pubuser/pubuser                                                                                 

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 11:36:25 2005

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production

SQL> select distinct sid from v$mystat;

       SID
----------
        33

SQL> select sid,serial# from v$session where sid=33;

       SID    SERIAL#
---------- ----------
        33       6639

SQL> SELECT SPID,PID FROM V$PROCESS WHERE ADDR=(SELECT PADDR FROM V$SESSION WHERE SID=37);

SPID                PID
------------ ----------
20552                26

SQL> SELECT SPID,PID FROM V$PROCESS WHERE ADDR=(SELECT PADDR FROM V$SESSION WHERE SID=30);

SPID                PID
------------ ----------
22580                28

SQL> show parameter dump

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
background_core_dump                 string      partial
background_dump_dest                 string      /ora9i/app/oracle/admin/csmisc
                                                 /bdump
core_dump_dest                       string      /ora9i/app/oracle/admin/csmisc
                                                 /cdump
max_dump_file_size                   string      UNLIMITED
shadow_core_dump                     string      partial
user_dump_dest                       string      /ora9i/app/oracle/admin/csmisc
                                                 /udump
SQL>       


然後,再嘗試對 CSNOZ629926699966 表進行操作
SQL> desc CSNOZ629926699966

。。。

還是hang住了。

 

於是中斷這個操作(CTRL + C):

SQL> desc CSNOZ629926699966
ERROR:
ORA-01013: user requested cancel of current operation

 

SQL> select tname from tab where tname='CSNOZ629926699966';

no rows selected

SQL> 
檢視PUBUSER使用者下的這個表,居然不存在!!

進一步證實了前面的猜測,也就是說會話37阻塞了其他所有操作表CSNOZ629926699966的會話,造成了程式的hang,當然,包括上面的SID 30和SID 33的DDL語句

現在,我們結束10046的事件跟蹤:
SQL> exec dbms_system.set_ev(30,24167,0,0,'');

PL/SQL procedure successfully completed.

SQL> exec dbms_system.set_ev(37,2707,0,0,'');

PL/SQL procedure successfully completed.

SQL> 

根據上面記錄的資訊,我們知道這兩個會話產生的跟蹤資訊分別為:
SID為30的會話,產生的跟蹤檔案為:/ora9i/app/oracle/admin/csmisc/udump/csmisc2_ora_22580.trc
SID為37的會話,產生的跟蹤檔案為:/ora9i/app/oracle/admin/csmisc/udump/csmisc2_ora_20552.trc

 

看看trace檔案:
 > cd /ora9i/app/oracle/admin/csmisc/udump
 > ll -tlc
total 4432
-rw-r-----   1 ora9i      dba         332995 Jan 10 12:00 csmisc2_ora_22580.trc
-rw-r-----   1 ora9i      dba           3168 Jan 10 11:59 csmisc2_ora_20552.trc
-rw-r-----   1 ora9i      dba         407133 Jan  7 15:10 csmisc2_ora_2708.trc
-rw-r-----   1 ora9i      dba            640 Jan  7 14:48 csmisc2_ora_835.trc
-rw-r-----   1 ora9i      dba           1590 Dec 30 22:50 csmisc2_ora_16244.trc
-rw-r-----   1 ora9i      dba        1308403 Dec 30 22:44 csmisc2_ora_16033.trc
-rw-r-----   1 ora9i      dba            616 Dec 28 14:16 csmisc2_ora_2176.trc
-rw-r-----   1 ora9i      dba            644 Dec  8 18:22 csmisc2_ora_21083.trc
 > mailx -s "csmisc2_ora_22580.trc"  < csmisc2_ora_22580.trc
 > mailx -s "csmisc2_ora_20552.trc"  < csmisc2_ora_20552.trc
 > exit

SQL>

我們看到SID為30的會話,產生的跟蹤檔案(csmisc2_ora_22580.trc)為的主要內容是:
/ora9i/app/oracle/admin/csmisc/udump/csmisc2_ora_22580.trc
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production
ORACLE_HOME = /ora9i/app/oracle/product/920
System name: HP-UX
Node name: cs_dc02
Release: B.11.11
Version: U
Machine: 9000/800
Instance name: csmisc2
Redo thread mounted by this instance: 2
Oracle process number: 28
Unix process pid: 22580, image:  (TNS V1-V3)

*** 2005-01-10 11:31:49.416
*** SESSION ID:(30.24167) 2005-01-10 11:31:49.354
WAIT #0: nam='library cache lock' ela= 507258 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 505686 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507678 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507595 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507880 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507317 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507703 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507683 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 508265 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507100 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507684 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 505889 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507731 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507650 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507604 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507698 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

。。。 。。。


我們看到SID 30的跟蹤檔案中的等待事件就是在V$SESSION_WAIT中看到的'library cache lock' .

 

再看看SID為37的會話,產生的跟蹤檔案(csmisc2_ora_20552.trc)為的主要內容是:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production
ORACLE_HOME = /ora9i/app/oracle/product/920
System name: HP-UX
Node name: cs_dc02
Release: B.11.11
Version: U
Machine: 9000/800
Instance name: csmisc2
Redo thread mounted by this instance: 2
Oracle process number: 26
Unix process pid: 20552
, image:  (TNS V1-V3)

*** 2005-01-10 11:33:22.702
*** SESSION ID:(37.2707) 2005-01-10 11:33:22.690
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0
*** 2005-01-10 11:35:07.452
WAIT #1: nam='SQL*Net message from dblink' ela= 102293555 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 3 p1=675562835 p2=1 p3=0
*** 2005-01-10 11:36:55.980
WAIT #1: nam='SQL*Net message from dblink' ela= 105969709 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0
*** 2005-01-10 11:39:05.416
WAIT #1: nam='SQL*Net message from dblink' ela= 126390826 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0
*** 2005-01-10 11:41:12.878
WAIT #1: nam='SQL*Net message from dblink' ela= 124461520 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0
*** 2005-01-10 11:43:01.285
WAIT #1: nam='SQL*Net message from dblink' ela= 105859385 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0
*** 2005-01-10 11:44:48.200
WAIT #1: nam='SQL*Net message from dblink' ela= 104397696 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0


。。。 。。。

 

現在我們來dump 系統狀態(systemstate),看看更詳細的資訊。

首先簡單的介紹一下 event systemstate。
很多人把 systemstate 事件理解為dump發生的那一刻的系統內所有程式的資訊,這是個錯誤的概念,事實上,
轉儲 system state 產生的跟蹤檔案是從dump那一刻開始到dump任務完成之間一段事件內的系統內所有程式的資訊。

dump systemstate產生的跟蹤檔案包含了系統中所有程式的程式狀態等資訊。每個程式對應跟蹤檔案中的一段內容,反映該程式的狀態資訊,包括程式資訊,會話資訊,enqueues資訊(主要是lock的資訊),緩衝區的資訊和該程式在SGA區中持有的(held)物件的狀態等資訊。

那麼通常在什麼情況下使用systemstate比較合適呢?
 Oracle推薦的使用systemstate事件的幾種情況是:

  • 資料庫 hang 住了
  •  資料庫很慢
  •  程式正在hang
  •  資料庫出現某些錯誤
  •  資源爭用

dump systemstate的語法為:
    ALTER SESSION SET EVENTS 'immediate trace name systemstate level 10';

也可以使用ORADEBUG實現這個功能
    ORADEBUG DUMP SYSTEMSTATE level 10

如果希望在發生某種錯誤時除非systemstate事件,可以在引數檔案(spfile或者pfile)中設定event引數,
例如,當系統發生死鎖(出現ORA-00060錯誤)時dump systemstate:
    event = "60 trace name systemstate level 10"
 

言歸正傳,我們dump系統狀態:
SQL> ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 8';

Session altered.

SQL> host
 >cd /ora9i/app/oracle/admin/csmisc/udump
 > ll -ctl
-rw-r-----   1 ora9i      dba        1070863 Jan 10 13:02 csmisc2_ora_22580.trc
-rw-r-----   1 ora9i      dba        1345368 Jan 10 13:01 csmisc2_ora_22568.trc
-rwxrwxrwx   1 ora9i      dba          44114 Jan 10 12:52 ass1015.awk
-rw-r-----   1 ora9i      dba         407133 Jan  7 15:10 csmisc2_ora_2708.trc
-rw-r-----   1 ora9i      dba            640 Jan  7 14:48 csmisc2_ora_835.trc
-rw-r-----   1 ora9i      dba           1590 Dec 30 22:50 csmisc2_ora_16244.trc
-rw-r-----   1 ora9i      dba        1308403 Dec 30 22:44 csmisc2_ora_16033.trc
-rw-r-----   1 ora9i      dba            616 Dec 28 14:16 csmisc2_ora_2176.trc
-rw-r-----   1 ora9i      dba            644 Dec  8 18:22 csmisc2_ora_21083.trc
 >
 > mailx -s "22568"  < csmisc2_ora_22568.trc

這個跟蹤檔案很大(因為它包含了所有程式的資訊),那麼我們從哪裡開始看起呢?

首先,透過在跟蹤檔案中查詢字串"waiting for 'library cache lock'",我們找到了被阻塞程式的資訊:

PROCESS 28: ----------------被阻塞的Oracle程式,這裡PROCESS 28對應了V$PROCESS中的PID的值,
    也就是說我們可以根據這一資訊在V$PROCESS和V$SESSION找到被阻塞的會話的資訊
  ----------------------------------------
  SO: c000000109c83bf0, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
  (process) Oracle pid=28, calls cur/top: c00000010b277890/c00000010b277890, flag: (0) -
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 17 24 6
              last post received-location: ksusig
              last process to post me: c000000109c840f8 25 0
              last post sent: 0 0 15
              last post sent-location: ksasnd
              last process posted by me: c000000109c7ff90 1 6
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: c000000109eefda0
    O/S info: user: ora9i, term: pts/th, ospid: 22580  ----------------該程式的作業系統程式號,對應於V$PROCESS中的SPID
    OSD pid info: Unix process pid: 22580
, image:  (TNS V1-V3)
    ----------------------------------------
    SO: c000000109f02c68, type: 4, owner: c000000109c83bf0, flag: INIT/-/-/0x00
    (session) trans: 0000000000000000, creator: c000000109c83bf0, flag: (100041) USR/- BSY/-/-/-/-/-
              DID: 0002-001C-00000192, short-term DID: 0000-0000-00000000
              txn branch: 0000000000000000
              oct: 0, prv: 0, sql: c00000011f8ea068, psql: c00000011f8ea068, user: 50/PUBUSER
    O/S info: user: ora9i, term: , ospid: 22536, machine: cs_dc02
              program: 
 (TNS V1-V3)
    application name: SQL*Plus, hash 
value=3669949024
    waiting for 'library cache lock' blocking sess=0x0 seq=18589 wait_time=0
                handle address=c000000122e2a6d8, lock address=c00000011a449e20, 100*mode+namespace=515

。。。 。。。

    SO: c00000010b277890, type: 3, owner: c000000109c83bf0, flag: INIT/-/-/0x00
    (call) sess: cur c000000109f02c68, rec 0, usr c000000109f02c68; depth: 0
      ----------------------------------------
      SO: c00000011a449e20, type: 51, owner: c00000010b277890, flag: INIT/-/-/0x00
      LIBRARY OBJECT LOCK: lock=c00000011a449e20 handle=c000000122e2a6d8 request=S
      call pin=0000000000000000 session pin=0000000000000000
      htl=c00000011a449e90[c00000011a4bc350,c00000011a4bc350] htb=c00000011a4bc350
      user=c000000109f02c68 session=c000000109f02c68 count=0 flags=[00] savepoint=463
      the rest of the object was already dumped

。。。 。。。

 

請注意下面的資訊:
    waiting for 'library cache lock' blocking sess=0x0 seq=18589 wait_time=0
                handle address=c000000122e2a6d8, lock address=c00000011a449e20, 100*mode+namespace=515

這段資訊告訴我們ORACLE PID為 28的程式(PROCESS 28),正在等待'library cache lock' ,透過‘handle address=c000000122e2a6d8’我們可以找到阻塞它的會話的ORACLE PID資訊。

還要注意這段資訊:
      LIBRARY OBJECT LOCK: lock=c00000011a449e20 handle=c000000122e2a6d8 request=S
      call pin=0000000000000000 session pin=0000000000000000
      htl=c00000011a449e90[c00000011a4bc350,c00000011a4bc350] htb=c00000011a4bc350
      user=c000000109f02c68 session=c000000109f02c68 count=0 flags=[00] savepoint=463

這裡就是阻塞PROCESS 28程式的會話的資訊。

簡單的記住這個依據的要點是:

waiting session的'handle address'的值對應於blocking session的'handle'的值。


回過頭來,看看這個值,它應於上面我們在V$SESSION_WAIT中看到的P1和P2的值:
SQL> select to_number('C000000122E2A6D8','XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') from dual;

TO_NUMBER('C000000122E2A6D8','XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')
----------------------------------------------------------------
                                                      1.3835E+19

SQL>      

問題的成因已經基本上明確了,這裡推薦兩種解決問題的方法:
方法1,根據 c000000122e2a6d8 地址,我們可以得到當前在library cache中相應的鎖資訊:               
SQL> l
  1  select INST_ID,USER_NAME,KGLNAOBJ,KGLLKSNM,KGLLKUSE,KGLLKSES,KGLLKMOD,KGLLKREQ,KGLLKPNS,KGLLKHDL
  2* from X$KGLLK where KGLLKHDL = 'C000000122E2A6D8' order by KGLLKSNM,KGLNAOBJ
SQL> /

   INST_ID USER_NAME     KGLNAOBJ                 KGLLKSNM KGLLKUSE         KGLLKSES       KGLLKMOD  KGLLKREQ KGLLKPNS         KGLLKHDL
---------- ------------- ---------------------- ---------- ---------------- ---------------- ---------- ---------- ---------------- ----------------
         2 PUBUSER       CSNOZ629926699966              30 C000000109F02C68 C000000109F02C68      0          200               C000000122E2A6D8
         2 PUBUSER       CSNOZ629926699966              37 C000000108C99E28 C000000108C99E28      3          000               C000000122E2A6D8

SQL> 

按照Oracle推薦的做法,我們現在應該使用'alter system kill session'命令kill掉SID 37,
結果得到了ORA-00031錯誤:
SQL> alter system kill session '37,2707';

alter system kill session '37,2707'
*
ERROR at line 1:
ORA-00031: session marked for kill

SQL>

檢查SID 37的狀態:
SQL> set linesize 150
SQL> col program for a50
SQL> select sid,serial#,status,username,program from v$session where sid=37;

       SID    SERIAL# STATUS   USERNAME                       PROGRAM
---------- ---------- -------- ------------------------------ --------------------------------------------------
        37       2707 KILLED   PUBUSER                         (TNS V1-V3)

SQL> 
再次證實了我們最初的想法—— 有人在執行了某個需要執行很久的DDL(多數是語句效率低,當然不排除遭遇bug的可能),
然後沒等語句結束就異常退出了會話。

這個例子中我們在上面的跟蹤檔案已經找到了該會話對應的作業系統程式(SPID),如果在其他情況下,我們如何找到這種狀態為'KILLED'
的作業系統程式號(SPID)呢?
下面給出了一個方法,可以借鑑:
SQL> l
  1  SELECT s.username,s.status,
  2  x.ADDR,x.KSLLAPSC,x.KSLLAPSN,x.KSLLASPO,x.KSLLID1R,x.KSLLRTYP,
  3  decode(bitand (x.ksuprflg,2),0,null,1)
  4  FROM x$ksupr x,v$session s
  5  WHERE s.paddr(+)=x.addr
  6  and bitand(ksspaflg,1)!=0
  7* and s.sid=37
SQL> /

USERNAME                       STATUS   ADDR               KSLLAPSC   KSLLAPSN KSLLASPO       KSLLID1R KS D
------------------------------ -------- ---------------- ---------- ---------- ------------ ---------- -- -
PUBUSER                        KILLED   C000000109C831E0         41         15 16243                17

SQL>


x$ksupr.ADDR列的值對應了V$PROCESS 中的ADDR的值,知道了這個SPID的地址,找到這個作業系統程式(SPID)就簡單了,例如:
SQL> select spid,pid from v$process where addr='C000000109C831E0';

SPID                PID
------------ ----------
20552                26

SQL>

現在,我們只需要在作業系統上 kill 作業系統程式20552就可以了:
 > ps -ef | grep 20552
   ora9i 20552     1  0  Jan  8  ?         0:01 oraclecsmisc2 (LOCAL=NO)
   ora9i 14742 14740  0 17:19:02 pts/ti    0:00 grep 20552
 > kill -9 20552 
 > ps -ef | grep 20552
   ora9i 14966 14964  0 17:40:01 pts/ti    0:00 grep 20552
 >


再來檢查一下SID 37的資訊,我們看到這個會話是真的被kill掉了,
 > exit

SQL> select sid,serial#,status,username,program from v$session where sid=37;

no rows selected

SQL> l
  1  SELECT s.username,s.status,
  2  x.ADDR,x.KSLLAPSC,x.KSLLAPSN,x.KSLLASPO,x.KSLLID1R,x.KSLLRTYP,
  3  decode(bitand (x.ksuprflg,2),0,null,1)
  4  FROM x$ksupr x,v$session s
  5  WHERE s.paddr(+)=x.addr
  6  and bitand(ksspaflg,1)!=0
  7* and s.sid=37
SQL> /

no rows selected

SQL>

回到剛才hang住的會話,它已經恢復了正常操作,
並且我們已經得到了'ORA-04043: object CSNOZ629926699966 does not exist'這個正常的資訊:
SQL> desc CSNOZ629926699966

 


ERROR:
ORA-04043: object CSNOZ629926699966 does not exist


SQL>

在開一個會話,測試一把:
 > sqlplus pubuser/pubuser

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 17:42:16 2005

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production

SQL> set timing on
SQL> desc CSNOZ629926699966
ERROR:
ORA-04043: object CSNOZ629926699966 does not exist


SQL> 
當發出命令'desc CSNOZ629926699966'的時候,我們看到系統立刻返回了ORA-04043: object CSNOZ629926699966 does not exist'資訊,問題就此解決了。


這裡,簡單的介紹一下X$KGLLK,這個基表儲存了庫快取中物件的鎖的資訊,它對於解決這類問題特別有用,其名稱的含義如下:
  [K]ernel Layer                                                                
    [G]eneric Layer                                                             
      [L]ibrary Cache Manager  ( defined and mapped from kqlf )                 
        Object Locks                                                            
          X$KGLLK - Object [L]oc[K]s                                           

KGLNAOBJ 列包含了在librarky cache中的物件上執行命令的語句的前80個字元(其實從這裡我們也可以大大縮小範圍了)
X$KGLLK.KGLLKUSE 和 x$kgllk.KGLLKSES 對應於跟蹤檔案中的owner的值
X$KGLLK.KGLLKADR 
X$KGLLK.KGLLKHDL 對應於跟蹤檔案中的handle的值(handle=C000000122E2A6D8),也就是'library cache lock'的地址
X$KGLLK.KGLLKPNS 對應於跟蹤檔案中的session pin的值
X$KGLLK.KGLLKSPN對應於跟蹤檔案中的savepoint的值

我們再來看一下更全面的資訊:
SQL> set linesize 2000
SQL> select * from X$KGLLK where KGLLKHDL = 'C000000122E2A6D8' order by KGLLKSNM,KGLNAOBJ                           
  2  /

ADDR                   INDX    INST_ID KGLLKADR         KGLLKUSE         KGLLKSES           KGLLKSNM KGLLKHDL         KGLLKPNC         KGLLKPNS       KGLLKCNT   KGLLKMOD   KGLLKREQ   KGLLKFLG   KGLLKSPN KGLLKHTB           KGLNAHSH KGLHDPAR           KGLHDNSP USER_NAME                  KGLNAOBJ
---------------- ---------- ---------- ---------------- ---------------- ---------------- ---------- ---------------- ---------------- ---------------- ---------- ---------- ---------- ---------- ---------- ---------------- ---------- ---------------- ---------- ------------------------------ ------------------------------------------------------------
800003FB0007E4D0         33          2 C00000011A449E20 C000000109F02C68 C000000109F02C68         30 C000000122E2A6D8 00               00                    0          0          2          0        463 C00000011A4BC350 3990848181 C000000122E2A6D8          1 PUBUSER                            CSNOZ629926699966
800003FB0007E5B0         34          2 C00000011A44A150 C000000108C99E28 C000000108C99E28         37 C000000122E2A6D8 00               00                    1          3          0          0        179 C00000011A4BB328 3990848181 C000000122E2A6D8          1 PUBUSER                            CSNOZ629926699966

SQL> set linesize 100
SQL> l
  1* select * from X$KGLLK where KGLLKHDL = 'C000000122E2A6D8' order by KGLLKSNM,KGLNAOBJ
SQL> /

ADDR                   INDX    INST_ID KGLLKADR         KGLLKUSE         KGLLKSES           KGLLKSNM
---------------- ---------- ---------- ---------------- ---------------- ---------------- ----------
KGLLKHDL         KGLLKPNC         KGLLKPNS           KGLLKCNT   KGLLKMOD   KGLLKREQ   KGLLKFLG
---------------- ---------------- ---------------- ---------- ---------- ---------- ----------
  KGLLKSPN KGLLKHTB           KGLNAHSH KGLHDPAR           KGLHDNSP USER_NAME
---------- ---------------- ---------- ---------------- ---------- ------------------------------
KGLNAOBJ
------------------------------------------------------------
800003FB0007E4D0         33          2 C00000011A449E20 C000000109F02C68 C000000109F02C68         30
C000000122E2A6D8 00               00                        0          0          2          0
       463 C00000011A4BC350 3990848181 C000000122E2A6D8          1 PUBUSER
CSNOZ629926699966

800003FB0007E5B0         34          2 C00000011A44A150 C000000108C99E28 C000000108C99E28         37
C000000122E2A6D8 00               00                        1          3          0          0
       179 C00000011A4BB328 3990848181 C000000122E2A6D8          1 PUBUSER
CSNOZ629926699966


SQL> 

原文地址:http://blog.csdn.net/lunar2000/article/details/247578



來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29487349/viewspace-2128013/,如需轉載,請註明出處,否則將追究法律責任。

相關文章