[20170707]cursor: pin S wait on X(10G)

lfree發表於2017-07-07

[20170707]cursor: pin S wait on X(10G).txt

--生產系統遇到1個bug,版本:
EXAM@xxx> @ &r/ver1
PORT_STRING         VERSION    BANNER
------------------- ---------- ----------------------------------------------------------------
x86_64/Linux 2.4.xx 10.2.0.4.0 Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi

--//我今天才介入解決問題,就是呼叫儲存過程某個update語句時,應用hang住了.我看等待事件cursor: pin S wait on X很容易聯想到
--//bug.

http://www.dba-oracle.com/t_cursor_pin_wait_on_x.htm

Question:  My AWR report shows "cursor: pin S wait on X" responsible for over 90% of my database processing time. How do
I fix the "cursor: pin S wait on X" event?

Answer:  The "cursor: pin S wait on X" is related to the new "mutex" option.  Oracle 10g release 2 and beyond replaced
some latch mechanisms with the mutex approach, claiming that they are faster and more efficient than traditional locking
mechanisms.  To improve cursor execution speed and hard pare time within the library cache, mutexes replace library
cache latches and library cache pins. Oracle claims that mutexes are faster and use less CPU, which is important for
CPU-bound database where large data buffers remove I/O as a primary source of contention.

Beware, Oracle bug 5184776 can affect 11g databases that deploy the mutex latching, causing excessive 'cursor pin s wait
on x' wait events.

If you suspect that you have a bug, you can revert to the pre-mutex method with the consent of Oracle Support using
these commands that bounce your database into the older latching mode, which may result in less deadlocks:

alter system set "_kks_use_mutex_pin"=false scope=spfile;
shutdown immediate
startup

NOTE:  This hidden parameter should only be used with the consent of Oracle support, and this parameter should NOT ever
be set in Oracle 12c

The cursor pin S wait on X wait event is mostly related to mutex and hard parse and it happens when a session waits for
this event when it is requesting a shared mutex pin and another session is holding an exclusive mutex pin on the same
cursor object. 

These are the parameters associated with this mutex wait event:

P1 Hash value of cursor
P2 Mutex value (top 2 bytes contains SID holding mutex in exclusive mode, and bottom two bytes usually hold the value 0)
P3 Mutex where (an internal code locator) OR'd with Mutex Sleeps 

--// 以下根據作者的執行的指令碼分析:
SELECT s.inst_id AS inst
      ,s.sid AS blocked_sid
      ,s.username AS blocked_user
      ,sa.sql_id AS blocked_sql_id
      ,TRUNC (s.p2 / 4294967296) AS blocking_sid
      ,b.username AS blocking_user
      ,b.sql_id AS blocking_sql_id
  FROM gv$session s
       JOIN gv$sqlarea sa ON sa.hash_value = s.p1
       JOIN gv$session b
          ON TRUNC (s.p2 / 4294967296) = b.sid AND s.inst_id = b.inst_id
       JOIN gv$sqlarea sa2 ON b.sql_id = sa2.sql_id
WHERE s.event = 'cursor: pin S wait on X';

Record View
As of: 2017/7/7 10:30:03

INST:             1
BLOCKED_SID:      1577
BLOCKED_USER:     EXAM
BLOCKED_SQL_ID:   guv6x80vk33wb
BLOCKING_SID:     1579
BLOCKING_USER:    EXAM
BLOCKING_SQL_ID:  guv6x80vk33wb

--//sid=1579 阻塞了 sid=1577.

select sql_id,loaded_versions,executions,loads,invalidations,parse_calls
from gv$sql
where inst_id=1 and sql_id='guv6x80vk33wb';

SQL_ID        LOADED_VERSIONS EXECUTIONS      LOADS INVALIDATIONS PARSE_CALLS
------------- --------------- ---------- ---------- ------------- -----------
guv6x80vk33wb               0          0          1             1           0

--//注:我前面做了alter system flush shared_pool;,這裡看到的EXECUTIONS次數是0.
--//另外我在重新整理前執行kill -9 os程式(當前正在執行這條sql語句的會話),我當時以為這樣問題就可以解決.指令碼如下:
select 'kill -9 '||spid,a.*  from V$PROCESS a where  addr in (
  SELECT paddr
    FROM v$session
   WHERE wait_class <> 'Idle' AND sql_id = 'guv6x80vk33wb'
);

--//檢視sid=1579的情況:
select * from v$session where sid=1579;
Record View
As of: 2017/7/7 10:32:19
SADDR:                      000000007C90B688
SID:                        1579
SERIAL#:                    34307
AUDSID:                     54134981
PADDR:                      000000007D545808
USER#:                      58
USERNAME:                   EXAM
COMMAND:                    6
OWNERID:                    2147483644
TADDR:                     
LOCKWAIT:                  
STATUS:                     KILLED
SERVER:                     DEDICATED
SCHEMA#:                    58
SCHEMANAME:                 EXAM
OSUSER:                     SYSTEM
PROCESS:                    3324:10116
MACHINE:                    WORKGROUP\BROKER
TERMINAL:                   BROKER
PROGRAM:                    Kodak.GCGateway.Adapter.Service.exe
TYPE:                       USER
SQL_ADDRESS:                00000000733A0DF8
SQL_HASH_VALUE:             924946315
SQL_ID:                     guv6x80vk33wb
SQL_CHILD_NUMBER:           3
PREV_SQL_ADDR:              0000000074233B68
PREV_HASH_VALUE:            152532753
PREV_SQL_ID:                7wp2a204jfxsj
PREV_CHILD_NUMBER:          5
PLSQL_ENTRY_OBJECT_ID:      116247
PLSQL_ENTRY_SUBPROGRAM_ID:  1
PLSQL_OBJECT_ID:           
PLSQL_SUBPROGRAM_ID:       
MODULE:                     Kodak.GCGateway.Adapter.Service.exe
MODULE_HASH:                163112698
ACTION:                    
ACTION_HASH:                0
CLIENT_INFO:                192.168.29.13
FIXED_TABLE_SEQUENCE:       183286255
ROW_WAIT_OBJ#:              -1
ROW_WAIT_FILE#:             0
ROW_WAIT_BLOCK#:            0
ROW_WAIT_ROW#:              0
LOGON_TIME:                 2017/6/28 13:03:40
LAST_CALL_ET:               768474
PDML_ENABLED:               NO
FAILOVER_TYPE:              NONE
FAILOVER_METHOD:            NONE
FAILED_OVER:                NO
RESOURCE_CONSUMER_GROUP:   
PDML_STATUS:                DISABLED
PDDL_STATUS:                ENABLED
PQ_STATUS:                  ENABLED
CURRENT_QUEUE_DURATION:     0
CLIENT_IDENTIFIER:         
BLOCKING_SESSION_STATUS:    UNKNOWN
BLOCKING_INSTANCE:         
BLOCKING_SESSION:          
SEQ#:                       31
EVENT#:                     254
EVENT:                      single-task message
P1TEXT:                    
P1:                         0
P1RAW:                      00
P2TEXT:                    
P2:                         0
P2RAW:                      00
P3TEXT:                    
P3:                         0
P3RAW:                      00
WAIT_CLASS_ID:              2723168908
WAIT_CLASS#:                6
WAIT_CLASS:                 Idle
WAIT_TIME:                  0
SECONDS_IN_WAIT:            768474
STATE:                      WAITING
SERVICE_NAME:               mid
SQL_TRACE:                  DISABLED
SQL_TRACE_WAITS:            FALSE
SQL_TRACE_BINDS:            FALSE

--//昏,這個漏網了,因為WAIT_CLASS=Idle,而且很奇怪STATUS=KILLED.如果前面執行條件沒有wait_class <> 'Idle',估計問題早解決了.
--//另外從時間看2017/6/28 13:03:40,正好也是網路出現環路的時間,正好也可以相互印證.
--//再次說明做事情要認真小心....

SELECT * FROM V$PROCESS WHERE addr IN (SELECT paddr FROM v$session WHERE sid = 1579);

Record View
As of: 2017/7/7 10:34:31

ADDR:              000000007D545808
PID:               20
SPID:              6086
USERNAME:          oracle
SERIAL#:           73
TERMINAL:          UNKNOWN
PROGRAM:           oracle@qyymid
TRACEID:          
BACKGROUND:       
LATCHWAIT:        
LATCHSPIN:        
PGA_USED_MEM:      556053
PGA_ALLOC_MEM:     779293
PGA_FREEABLE_MEM:  0
PGA_MAX_MEM:       779293

--//刪除程式號6086.
$ kill -9 6086

--//再測試ok,問題消失.順便貼分析時的awr報表的前面部分:

WORKLOAD REPOSITORY report for

DB Name         DB Id    Instance     Inst Num Release     RAC Host       
------------ ----------- ------------ -------- ----------- --- ------------
MID           2812601929 mid                 1 10.2.0.4.0  NO  qyymid     

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     44914 07-Jul-17 09:00:11        50       3.8
  End Snap:     44915 07-Jul-17 09:29:29        57       4.7
   Elapsed:               29.30 (mins)
   DB Time:                1.46 (mins)

--//很奇怪這個版本看到DB Time很小,僅僅1.46 (mins)

Cache Sizes
~~~~~~~~~~~                       Begin        End
                             ---------- ----------
               Buffer Cache:       136M       136M  Std Block Size:         8K
           Shared Pool Size:       320M       320M      Log Buffer:     6,152K

--//不是我配的(資料庫執行在虛擬機器上的),明顯Buffer Cache小了.不過這個伺服器做的事情很少.

Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                                   ---------------       ---------------
                  Redo size:              1,698.35             16,406.59
              Logical reads:                477.12              4,609.12
              Block changes:                 22.35                215.91
             Physical reads:                  0.30                  2.87
            Physical writes:                  0.66                  6.38
                 User calls:                 13.05                126.08
                     Parses:                102.20                987.24
                Hard parses:                  0.27                  2.58
                      Sorts:                178.27              1,722.16
                     Logons:                  2.18                 21.09
                   Executes:                103.94              1,004.12
               Transactions:                  0.10

  % Blocks changed per Read:    4.68    Recursive Call %:    98.56
Rollback per transaction %:    4.40       Rows per Sort:     1.86

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %:  100.00       Redo NoWait %:  100.00
            Buffer  Hit   %:   99.96    In-memory Sort %:  100.00
            Library Hit   %:   99.61        Soft Parse %:   99.74
         Execute to Parse %:    1.68         Latch Hit %:  100.00
Parse CPU to Parse Elapsd %:   30.64     % Non-Parse CPU:   85.20

Shared Pool Statistics        Begin    End
                              ------  ------
             Memory Usage %:   93.17   91.29
    % SQL with executions>1:   82.12   79.33
  % Memory for SQL w/exec>1:   83.60   80.99

Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
cursor: pin S wait on X           1,786,725      19,351     11 ###### Concurrenc
--//很奇怪,這個版本為什麼不把cursor: pin S wait on X計入db time時間.
CPU time                                             56          63.8          
log file sync                           356          24     66   26.9     Commit
control file parallel write             582          21     36   23.7 System I/O
log file parallel write                 749          20     27   22.6 System I/O
          -------------------------------------------------------------

--//查詢有問題sql_id=guv6x80vk33wb,也無法在awr報表中體現出來.

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/267265/viewspace-2141699/,如需轉載,請註明出處,否則將追究法律責任。

相關文章