[20180301]模擬cursor pin S wait on X.txt

lfree發表於2018-03-01

[20180301]模擬cursor pin S wait on X等待事件.txt

--//昨天看連結http://www.askmaclean.com/archives/cursor-pin-s-wait-on-x.html,自己也按照例子重複演示看看.
--//出現'cursor: pin * events'等待事件主要原因就是子游標太多,或者硬解析太多,還有就是一些sql語句太複雜,
--//導致分析時間太長,這些都是我的理解.

--//連結http://www.askmaclean.com/archives/cursor-pin-s-wait-on-x.html給出一些可能的情況.

該類等待事件一般是為了pin相關的子游標

'Cursor: pin S on X' 最常見的等待事件,  程式為了共享操作例如執行pin遊標而以SHRD S mode申請mutex, 但是未立即獲得。原因
是該遊標被其他程式以EXCL X mode 持有了。

實際該 cursor: pin S wait on X等待事件往往是由於其他因素誘發的。Mutex爭用僅僅是問題的症狀,但根本原因需要Database
Consultant 進一步挖掘。

下面我們列出一些已知的常見案例, 在這些例子中可以看到 我上面提到的 Mutex的爭用僅僅是偽爭用:

過多的子游標 High Version Counts

過多的子游標版本Version Count可能導致Mutex 爭用,一般一個SQL的Version Count不要高於500。

檢查High Version Count很簡單, 在AWR裡就有SQL ordered by High Version Count,也可以寫SQL查V$SQL、V$SQLAREA

昂貴的X$、V$檢視查詢

一些對於V$、X$檢視的查詢,需要訪問X$KGL*之類的fixed table,可能觸發Mutex爭用。
--//我測試過幾個會話訪問v$sql檢視,出現的是library cache: mutex X,連結:http://blog.itpub.net/267265/viewspace-2142625/
Mutex持有者得不到CPU

Mutex持有者若得不到足夠的CPU片可能一直阻塞他人,直到它拿到需要的CPU。

這種情況可能由於OS作業系統的實際情況或者使用Resource Manager而引起。需要配合AWR中的Host CPU、Instance CPu一起看。

已經被KILLED的SESSION仍持有Mutex

當session正持有Mutex,而其對應的Process被強制KILL掉, 則直到PMON徹底清理掉該Dead Process並釋放Mutex,其他session才能不再
等待。  診斷該類問題,最好能檢查PMON的TRACE。 當然也存在部分BUG會導致PMON清理過程非常慢。

舉例來說,bug 9312879描述了一種場景:PMON 需要獲得某個Mutex以便清理某個dead process,但是該Mutex又被其他程式持有,則PMON
甚至無法開始真正清理並釋放Mutex。

詳見 《深入理解Oracle中的Mutex》一文

如何模擬 cursor pin S wait on X 等待事件

--//我自己也遇到一例:http://blog.itpub.net/267265/viewspace-2141699/,當時網路出現環路,導致後續呼叫相同語句hang住.

1.環境:
SCOTT@book> @ &r/ver1
PORT_STRING                    VERSION        BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx            11.2.0.4.0     Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

2.測試:
--//session 1:
SCOTT@book> @ &r/s
SCOTT@book(274,7)> @ &r/spid
       SID    SERIAL# PROCESS                  SERVER    SPID       PID  P_SERIAL# C50
---------- ---------- ------------------------ --------- ------ ------- ---------- --------------------------------------------------
       274          7 36826                    DEDICATED 36827       21          4 alter system kill session '274,7' immediate;
--//程式號=36826.

SCOTT@book(274,7)> var b1 number;
SCOTT@book(274,7)> exec :b1 :=10;
PL/SQL procedure successfully completed.

SCOTT@book(274,7)> select * from dept where deptno=:b1;
    DEPTNO DNAME          LOC
---------- -------------- -------------
        10 ACCOUNTING     NEW YORK

3.使用gdb跟蹤session 1
$ gdb $(which oracle) 36827
...
(gdb) break kxsPeekBinds
Breakpoint 1 at 0x2123fa0
(gdb) command
Type commands for when breakpoint 1 is hit, one per line.
End with a line saying just "end".
>bt 4
>end
(gdb) cont
Continuing.

--//說明:break kxsPeekBinds 表示呼叫kxsPeekBinds中斷,呼叫bt 4命令.
(gdb) help bt
Print backtrace of all stack frames, or innermost COUNT frames.
With a negative argument, print outermost -COUNT frames.
Use of the 'full' qualifier also prints the values of the local variables.
--//gdb不是很熟悉,按照提示:Print backtrace of all stack frames, or innermost COUNT frames.

--//session 1:
SCOTT@book(274,7)> select * from dept where deptno=:b1;
    DEPTNO DNAME          LOC
---------- -------------- -------------
        10 ACCOUNTING     NEW YORK

--//可以發現正常,因為我跟蹤的呼叫繫結變數探查的函式,我以前的語句已經執行1次,這樣第2次執行不會呼叫繫結變數窺視的函式.
--//修改select為Select ,這樣語句會硬解析同時出現呼叫繫結變數窺視的函式.

SCOTT@book(274,7)> Select * from dept where deptno=:b1;
--//掛起.

--//gdb介面出現如下:
Breakpoint 1, 0x0000000002123fa0 in kxsPeekBinds ()
#0  0x0000000002123fa0 in kxsPeekBinds ()
#1  0x00000000025b821c in opitca ()
#2  0x0000000001ecf7a9 in kksSetBindType ()
#3  0x0000000009641a89 in kksfbc ()

--//session 3:
SYS@book> @ &r/wait
P1RAW            P2RAW            P3RAW                    P1         P2         P3        SID    SERIAL#       SEQ# EVENT                                    STATUS   STATE               WAIT_TIME_MICRO SECONDS_IN_WAIT WAIT_CLASS
---------------- ---------------- ---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------------------------------- -------- ------------------- --------------- --------------- --------------------
0000000062657100 0000000000000001 00               1650815232          1          0        274          7         58 SQL*Net message from client              ACTIVE   WAITED KNOWN TIME         120254317             182 Idle
--//並沒有出現cursor pin S wait on X

--//session 2:
SCOTT@book> @ &r/s
SCOTT@book(28,25)> @ &r/spid
       SID    SERIAL# PROCESS                  SERVER    SPID       PID  P_SERIAL# C50
---------- ---------- ------------------------ --------- ------ ------- ---------- --------------------------------------------------
        28         25 37014                    DEDICATED 37015       26         11 alter system kill session '28,25' immediate;

SCOTT@book> var b1 number;
SCOTT@book> exec :b1 :=20;
PL/SQL procedure successfully completed.

SCOTT@book(28,25)> Select * from dept where deptno=:b1;

--//掛起!!

SCOTT@book> select * from empy where t1 =:b1;
---//hang...在t1 = 之間加入空格.

--//session 3:
SYS@book> set numw 12
SYS@book> @ &r/wait
P1RAW            P2RAW            P3RAW                        P1             P2             P3            SID        SERIAL#           SEQ# EVENT                                    STATUS   STATE               WAIT_TIME_MICRO SECONDS_IN_WAIT WAIT_CLASS
---------------- ---------------- ---------------- -------------- -------------- -------------- -------------- -------------- -------------- ---------------------------------------- -------- ------------------- --------------- --------------- --------------------
0000000062657100 0000000000000001 00                   1650815232              1              0            274              7             58 SQL*Net message from client              ACTIVE   WAITED KNOWN TIME         120254317             437 Idle
00000000AB221382 0000011200000000 0000000500000000     2871137154  1176821039104    21474836480             28             25             42 cursor: pin S wait on X                  ACTIVE   WAITING                    54427696              54 Concurrency

SYS@book> select * from v$mutex_sleep;
MUTEX_TYPE                       LOCATION                                         SLEEPS      WAIT_TIME
-------------------------------- ---------------------------------------- -------------- --------------
Library Cache                    kgllkdl1  85                                          1              0
Cursor Pin                       kkslce [KKSCHLPIN2]                              132160              0

SYS@book> select * from v$mutex_sleep;
MUTEX_TYPE                       LOCATION                                         SLEEPS      WAIT_TIME
-------------------------------- ---------------------------------------- -------------- --------------
Cursor Pin                       kkslce [KKSCHLPIN2]                              144350              0

SYS@book> @ &r/mutex 5
old  18: ORDER BY sleeps DESC ) where rownum<= &1
new  18: ORDER BY sleeps DESC ) where rownum<= 5
          HASH         SLEEPS LOCATION             MUTEX_TYPE           SQLID         KGLNAOWN             C100
-------------- -------------- -------------------- -------------------- ------------- -------------------- -----------------------------------
    2871137154         339228 kkslce [KKSCHLPIN2]  Cursor Pin           g0vgm2upk44w2                      Select * from dept where deptno=:b1
--//停在LOCATION=kkslce,與前面gdb看到的一致.

--//做hanganalyze分析:
SYS@book> oradebug setmypid
Statement processed.
SYS@book> oradebug hanganalyze 3;
Hang Analysis in /u01/app/oracle/diag/rdbms/book/book/trace/book_ora_32088.trc

--//在執行中,看等待事件如下:
SYS@book> @ &r/wait
P1RAW            P2RAW            P3RAW                        P1             P2             P3            SID        SERIAL#           SEQ# EVENT                                    STATUS   STATE               WAIT_TIME_MICRO SECONDS_IN_WAIT WAIT_CLASS
---------------- ---------------- ---------------- -------------- -------------- -------------- -------------- -------------- -------------- ---------------------------------------- -------- ------------------- --------------- --------------- --------------------
0000000062657100 0000000000000001 00                   1650815232              1              0            274              7             58 SQL*Net message from client              ACTIVE   WAITED KNOWN TIME         120254317             656 Idle
00000000AB221382 0000011200000000 0000000500000000     2871137154  1176821039104    21474836480             28             25             45 cursor: pin S wait on X                  ACTIVE   WAITING                    57412911              57 Concurrency
00               00               00                            0              0              0             94             25             21 ksdxexeotherwait                         INACTIVE WAITING                    10567976              11 Other

--//檢查跟蹤檔案:
*** 2018-03-01 10:22:19.845
===============================================================================
HANG ANALYSIS:
  instances (db_name.oracle_sid): book.book
  oradebug_node_dump_level: 3
  analysis initiated by oradebug
  os thread scheduling delay history: (sampling every 1.000000 secs)
    0.000000 secs at [ 10:22:19 ]
      NOTE: scheduling delay has not been sampled for 0.231213 secs    0.000000 secs from [ 10:22:15 - 10:22:20 ], 5 sec avg
    0.000000 secs from [ 10:21:20 - 10:22:20 ], 1 min avg
    0.000000 secs from [ 10:17:20 - 10:22:20 ], 5 min avg
  vktm time drift history
===============================================================================

Chains most likely to have caused the hang:
[a] Chain 1 Signature: <not in a wait><='cursor: pin S wait on X'
     Chain 1 Signature Hash: 0x3a7b30c
[b] Chain 2 Signature: 'LNS ASYNC end of log'
     Chain 2 Signature Hash: 0x8ceed34f

===============================================================================
Non-intersecting chains:

-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (book.book)
                   os id: 37015
              process id: 26, oracle@gxqyydg4 (TNS V1-V3)
              session id: 28
        session serial #: 25
    }
    is waiting for 'cursor: pin S wait on X' with wait info:
    {
                      p1: 'idn'=0xab221382
                      p2: 'value'=0x11200000000
                      p3: 'where'=0x500000000
            time in wait: 46.772297 sec
      heur. time in wait: 5 min 34 sec
           timeout after: never
                 wait id: 44
                blocking: 0 sessions
             current sql: Select * from dept where deptno=:b1
             short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-__select()+19<-skgpwwait()+332<-kgxWait()+774<-kgxSharedExamine()+568<-kxsGetRuntimeLock()+240<-kkscsCheckCursor()+556<-kkscsSearchChildList()+1171<-kksfbc()+12417<-kksp
            wait history:
              * time between current wait and wait #1: 0.000021 sec
              1.       event: 'cursor: pin S wait on X'
                 time waited: 1 min 12 sec
                     wait id: 43              p1: 'idn'=0xab221382
                                              p2: 'value'=0x11200000000
                                              p3: 'where'=0x500000000
              * time between wait #1 and #2: 0.000021 sec
              2.       event: 'cursor: pin S wait on X'
                 time waited: 1 min 11 sec
                     wait id: 42              p1: 'idn'=0xab221382
                                              p2: 'value'=0x11200000000
                                              p3: 'where'=0x500000000
              * time between wait #2 and #3: 0.000020 sec
              3.       event: 'cursor: pin S wait on X'
                 time waited: 1 min 11 sec
                     wait id: 41              p1: 'idn'=0xab221382
                                              p2: 'value'=0x11200000000
                                              p3: 'where'=0x500000000
    }
    and is blocked by
=> Oracle session identified by:
    {
                instance: 1 (book.book)
                   os id: 36827
              process id: 21, oracle@gxqyydg4 (TNS V1-V3)
              session id: 274
        session serial #: 7
    }
    which is not in a wait:
    {
               last wait: 8 min 44 sec ago
                blocking: 1 session
             current sql: Select * from dept where deptno=:b1
             short stack: <none: error encountered - ORA-32516: cannot wait for process 'Unix process pid: 36827, image: oracle@gxqyydg4 (TNS V1-V3)' to finish executing ORADEBUG command 'SHORT_STACK'; wait time exceeds 30000 ms>
            wait history:
              1.       event: 'SQL*Net message from client'
                 time waited: 2 min 0 sec
                     wait id: 57              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.000101 sec
              2.       event: 'SQL*Net message to client'
                 time waited: 0.000004 sec
                     wait id: 56              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
              * time between wait #2 and #3: 0.000384 sec
              3.       event: 'SQL*Net message from client'
                 time waited: 2 min 1 sec
                     wait id: 55              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
    }

Chain 1 Signature: <not in a wait><='cursor: pin S wait on X'
Chain 1 Signature Hash: 0x3a7b30c
-------------------------------------------------------------------------------
...//省略
===============================================================================
Extra information that will be dumped at higher levels:
[level  4] :   1 node dumps -- [LEAF] [LEAF_NW]
[level  5] :   2 node dumps -- [NO_WAIT] [INVOL_WT] [SINGLE_NODE] [NLEAF] [SINGLE_NODE_NW]

State of ALL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[27]/1/28/25/0x85f9f0a0/37015/NLEAF/[273]
[273]/1/274/7/0x8595c500/36827/LEAF_NW/
[299]/1/300/15/0x85984060/36369/SINGLE_NODE/

*** 2018-03-01 10:22:19.848
===============================================================================
END OF HANG ANALYSIS
===============================================================================

*** 2018-03-01 10:22:19.849
===============================================================================
HANG ANALYSIS DUMPS:
  oradebug_node_dump_level: 3
===============================================================================

State of LOCAL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[27]/1/28/25/0x85f9f0a0/37015/NLEAF/[273]
[273]/1/274/7/0x8595c500/36827/LEAF_NW/
[299]/1/300/15/0x85984060/36369/SINGLE_NODE/


No processes qualify for dumping.

===============================================================================
HANG ANALYSIS DUMPS: END
===============================================================================

*** 2018-03-01 10:22:19.849
Oradebug command 'hanganalyze 3' console output:
Hang Analysis in /u01/app/oracle/diag/rdbms/book/book/trace/book_ora_37100.trc

--//對比等待事件,可以對上:
SYS@book> @ &r/wait
P1RAW            P2RAW            P3RAW                        P1             P2             P3            SID        SERIAL#           SEQ# EVENT                                    STATUS   STATE               WAIT_TIME_MICRO SECONDS_IN_WAIT WAIT_CLASS
---------------- ---------------- ---------------- -------------- -------------- -------------- -------------- -------------- -------------- ---------------------------------------- -------- ------------------- --------------- --------------- --------------------
0000000062657100 0000000000000001 00                   1650815232              1              0            274              7             58 SQL*Net message from client              ACTIVE   WAITED KNOWN TIME         120254317             949 Idle
00000000AB221382 0000011200000000 0000000500000000     2871137154  1176821039104    21474836480             28             25             49 cursor: pin S wait on X                  ACTIVE   WAITING                    63781034              64 Concurrency

---sid,serial#= 274,7 28,25,

SYS@book> select spid from v$process where addr in (select paddr from v$SESSION where sid in (274,28));
SPID
------
36827
37015
--//程式號也能對上.

--//轉載:http://www.oracleblog.org/working-case/deal-with-row-cache-lock/
--//我們來看看([nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor)部分:
nodenum是hanganalyze自己為了記錄這些會話而定製的編號,從0開始排起。
cnode是Node Id
sid是 Session ID
sess_srno是serial#
proc_ptr是Process Pointer
ospid 是OS Process ID
state 是node的狀態
adjlist是臨近的node(通常代表一個blocker node)
predecessor是Predecessor node ,通常代表一個 waiter node

state部分有幾個比較重要的狀態:
1.IN_HANG:這表示該node處於死鎖狀態,通常還有其他node(blocker)也處於該狀態
2.LEAF/LEAF_NW:該node通常是blocker。透過條目的"predecessor"列可以判斷這個node是否是blocker。LEAF說明該NODE沒有等待其他
  資源,而LEAF_NW則可能是沒有等待其他資源或者是在使用CPU.
3.NLEAF:通常可以看作這些會話是被阻塞的資源。發生這種情況一般說明資料庫發生效能問題而不是資料庫hang
4.IGN/IGN_DMP:這類會話通常被認為是空閒會話,除非其adjlist列裡存在node。如果是非空閒會話則說明其adjlist裡的node正在等待其他node釋放資源。
5.SINGLE_NODE/SINGLE_NODE_NW:近似於空閒會話

--//也可以執行如下,連結:http://blog.itpub.net/267265/viewspace-2141699/
/* Formatted on 2018/3/1 10:28:02 (QP5 v5.252.13127.32867) */
SELECT s.inst_id AS inst
      ,s.sid AS blocked_sid
      ,s.username AS blocked_user
      ,sa.sql_id AS blocked_sql_id
      ,TRUNC (s.p2 / 4294967296) AS blocking_sid
      ,b.username AS blocking_user
      ,b.sql_id AS blocking_sql_id
  FROM gv$session s
       JOIN gv$sqlarea sa ON sa.hash_value = s.p1
       JOIN gv$session b
          ON TRUNC (s.p2 / 4294967296) = b.sid AND s.inst_id = b.inst_id
       JOIN gv$sqlarea sa2 ON b.sql_id = sa2.sql_id
WHERE s.event = 'cursor: pin S wait on X';

          INST    BLOCKED_SID BLOCKED_USER                   BLOCKED_SQL_I   BLOCKING_SID BLOCKING_USER                  BLOCKING_SQL_
-------------- -------------- ------------------------------ ------------- -------------- ------------------------------ -------------
             1             28 SCOTT                          g0vgm2upk44w2            274 SCOTT                          g0vgm2upk44w2
--//sid=274阻塞了sid=28.

--//在gdb介面選擇繼續:
(gdb) cont
Continuing.

Program received signal SIGUSR2, User defined signal 2.
0x0000000002123fa0 in kxsPeekBinds ()
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000000009805bd5 in slaac_int ()
(gdb) cont
Continuing.

Breakpoint 1, 0x0000000002123fa0 in kxsPeekBinds ()
#0  0x0000000002123fa0 in kxsPeekBinds ()
#1  0x00000000025b821c in opitca ()
#2  0x0000000001ecf7a9 in kksSetBindType ()
#3  0x0000000009641a89 in kksfbc ()
(gdb) cont
Continuing.

--//知道session 1執行完成,在看session 2,執行也ok.
--//session 1:
SCOTT@book(274,7)> Select * from dept where deptno=:b1;
    DEPTNO DNAME          LOC
---------- -------------- -------------
        10 ACCOUNTING     NEW YORK
--//session 2:
SCOTT@book(28,25)> Select * from dept where deptno=:b1;
  =  DEPTNO DNAME          LOC
---------- -------------- -------------
        20 RESEARCH       DALLAS
--//session 3:
SYS@book> @ &r/wait
no rows selected

4.補充說明:
cursor: pin S wait on X 的p1,p2,p3引數:
--//參考%e7%ad%89%e5%be%85%e4%ba%8b%e4%bb%b6/

P1=2871137154
P2=1176821039104
P3=21474836480

--//idn 透過以前的測試表示hash_value
--//PARAMETER2引數:
--//它的前一半(高位):就是要加Mutex的會話的SID,這裡也就是274號會話。274號會話要申請Mutex。後一半(低位)是"引用計數"或模式。
SYS@book> @ &r/10to16 1176821039104
10 to 16 HEX   REVERSE16
-------------- -----------------------------------
0011200000000 0x00000000-12010000

--//0x112 = 274,就是session 1.

SYS@book> select 1176821039104,32,trunc(1176821039104/power(2,32)) ,mod(1176821039104,power(2,32))   from dual;
1176821039104             32 TRUNC(1176821039104/POWER(2,32)) MOD(1176821039104,POWER(2,32))
-------------- -------------- -------------------------------- ------------------------------
1176821039104             32                              274                              0

SYS@book> select sql_id,sql_text from v$sql where hash_value=2871137154 ;
SQL_ID        SQL_TEXT
------------- ------------------------------------------------------------
g0vgm2upk44w2 Select * from dept where deptno=:b1

5.測試指令碼:
--//附上mutex.sql指令碼:

# cat mutex.sql
column kglnaown format a20
column MUTEX_TYPE format a20
column kglnaobj format a100
column LOCATION format a20
  select * from (
  SELECT kglnahsh hash
        ,SUM (sleeps) sleeps
        ,location
        ,mutex_type
        ,kglobt03 sqlid
        ,kglnaown
        ,replace(kglnaobj,chr(13)) c100
    --,SUBSTR (kglnaobj, 1, 140) object
    FROM x$kglob, v$mutex_sleep_history
   WHERE kglnahsh = mutex_identifier
GROUP BY kglnaobj
        ,kglobt03
        ,kglnaown
        ,kglnahsh
        ,location
        ,mutex_type
ORDER BY sleeps DESC ) where rownum<= &1;

# cat wait.sql
select p1raw,p2raw,p3raw,p1,p2,p3,sid,serial#,seq#,event,status,state,wait_time_micro,seconds_in_wait,wait_class
from v$session where ( wait_class<>'Idle' or (status='ACTIVE' and STATE='WAITED KNOWN TIME'))
and sid not in (select sid from v$mystat where rownum=1)
order by event ;

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/267265/viewspace-2151395/,如需轉載,請註明出處,否則將追究法律責任。

相關文章