解決RAC節點因cursor: pin S wait on X無法登陸案例一則

myownstars發表於2012-05-03

OS: Sunos 5.9
DB:兩節點RAC 10.2.0.2
症狀:開發人員報告無法連線第二個節點
 

首先登陸第二個節點,嘗試以sysdba登陸,果然一直hang在那裡
justin19.prod2> sqlplus / as sysdba
 

SQL*Plus: Release 10.2.0.2.0 - Production on Thu May 3 11:18:32 2012
 

Copyright (c) 1982, 2005, Oracle.  All Rights Reserved.
 

登陸第一個節點,檢視gv$session
 
SQL> select  event,count(*) from gv$session where INST_ID=2 group by event order by 2;

EVENT                                                              COUNT(*)
---------------------------------------------------------------- ----------
DIAG idle wait                                                            1
PX Deq: reap credit                                                       1
Streams AQ: waiting for time management or cleanup tasks                  1
pmon timer                                                                1
ges remote message                                                        1
smon timer                                                                1
Streams AQ: qmn coordinator idle wait                                     1
Streams AQ: qmn slave idle wait                                           1
gcs remote message                                                        2
jobq slave wait                                                           2
rdbms ipc message                                                        13
SQL*Net message from client                                              24
cursor: pin S wait on X                                                  38

13 rows selected.

發現第二個節點正在經歷嚴重的cursor: pin S wait on X,
Dedicated模式下使用者連線資料庫的時候需要一系列遞迴呼叫,倘若系統此時正在經歷嚴重的latch contention,則會一直hang在那裡。
現在初步懷疑由於cursor: pin S wait on X導致新的資料庫連線無法完成authentication所致
重新回到第二個節點進行hanganglyze分析
justin19.prod2> sqlplus -prelim / as sysdba
 

SQL*Plus: Release 10.2.0.2.0 - Production on Thu May 3 11:03:32 2012
 

Copyright (c) 1982, 2005, Oracle.  All Rights Reserved.
 

SQL> oradebug setmypid
Statement processed.
SQL> oradebug hanganalyze 3
Hang Analysis in /oralog/prod/dump/prod2_ora_2258.trc
 

==============
HANG ANALYSIS:
==============
Open chains found:
Other chains found:
Chain 1 : :
    <1/259/531/0xb14b86a0/3794/cursor: pin S wait on X>
Chain 2 : :
    <1/261/188/0xb14b7eb8/21153/cursor: pin S wait on X>
Chain 3 : :
    <1/263/17/0xb14b76d0/6030/cursor: pin S wait on X>
Chain 4 : :
    <1/265/9/0xb14b6700/2878/cursor: pin S wait on X>
Chain 5 : :
    <1/267/735/0xb14b6ee8/3981/cursor: pin S wait on X>
Chain 6 : :
    <1/269/25/0xb14b5f18/14917/cursor: pin S wait on X>
Chain 7 : :
    <1/271/447/0xb14b5730/14151/cursor: pin S wait on X>
Chain 8 : :
    <1/273/36/0xb14b4f48/7907/cursor: pin S wait on X>
Chain 9 : :
    <1/275/16/0xb14b4760/5699/cursor: pin S wait on X>
Chain 10 : :
    <1/277/1916/0xb14b3790/740/cursor: pin S wait on X>
Chain 11 : :
    <1/280/11476/0xb14b3f78/3879/cursor: pin S wait on X>
Chain 12 : :
    <1/282/1130/0xb14b27c0/17264/cursor: pin S wait on X>
Chain 13 : :
    <1/283/7025/0xb14b2fa8/28329/cursor: pin S wait on X>
Chain 14 : :
    <1/285/568/0xb14b1fd8/2203/cursor: pin S wait on X>
Chain 15 : :
    <1/287/42/0xb14b17f0/16978/cursor: pin S wait on X>
Chain 16 : :
    <1/289/47/0xb14b0820/14427/cursor: pin S wait on X>
Chain 17 :
 

跟蹤檔案顯示該instance正在經歷嚴重的cursor: pin S wait on X,和上一步的檢查結果吻合。
隨便挑兩個process驗證一下
SQL> oradebug setospid 3981
Oracle pid: 84, Unix process pid: 3981, image:
SQL> oradebug dump errorstack 3
Statement processed.
SQL> oradebug tracefile_name
/oralog/prod/dump/prod2_ora_3981.trc
SQL> oradebug setospid 14917
Oracle pid: 82, Unix process pid: 14917, image:
SQL> oradebug dump errorstack 3
Statement processed.
SQL> oradebug tracefile_name
/oralog/prod/dump/prod2_ora_14917.trc
 

 

/oralog/prod/dump/prod2_ora_3981.trc
=====================
PARSING IN CURSOR #1 len=564 dep=1 uid=0 ct=2 lid=0 tim=5636868599224 hv=1232072917 ad='ab10b7e8'
insert into sys.aud$( sessionid,entryid,statement,ntimestamp#, userid,userhost,terminal,action#,returncode, obj$creator,obj$name,auth$privileges,auth$grantee, new$owner,new$name
,ses$actions,ses$tid,logoff$pread, logoff$lwrite,logoff$dead,comment$text,spare1,spare2,  priv$used,clientid,sessioncpu,proxy$sid,user$guid, instance#,process#,xid,scn,auditid,s
qlbind,sqltext) values(:1,:2,:3,SYS_EXTRACT_UTC(SYSTIMESTAMP),     :4,:5,:6,:7,:8,     :9,:10,:11,:12,     :13,:14,:15,:16,:17,     :18,:19,:20,:21,:22,     :23,:24,:25,:26,:27,
     :28,:29,:30,:31,:32,:33,:34)
END OF STMT
PARSE #1:c=0,e=69,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=5636868599220
EXEC #1:c=10000,e=1843,p=0,cr=1,cu=4,mis=0,r=1,dep=1,og=4,tim=5636868604632
=====================
PARSING IN CURSOR #4 len=488 dep=1 uid=5 ct=42 lid=5 tim=5636868608097 hv=888628443 ad='a2b62b78'
ALTER SESSION SET NLS_LANGUAGE= 'AMERICAN' NLS_TERRITORY= 'AMERICA' NLS_CURRENCY= '$' NLS_ISO_CURRENCY= 'AMERICA' NLS_NUMERIC_CHARACTERS= '.,' NLS_CALENDAR= 'GREGORIAN' NLS_DATE_FORMAT= 'DD-MON-RR' NLS_DATE_LANGUAGE= 'AMERICAN' NLS_SORT= 'BINARY' TIME_ZONE= '+08:00' NLS_COMP= 'BINARY' NLS_DUAL_CURRENCY= '$' NLS_TIME_FORMAT= 'HH.MI.SSXFF AM' NLS_TIMESTAMP_FORMAT= 'DD-MON-RR HH.MI.SSXFF AM' NLS_TIME_TZ_FORMAT= 'HH.MI.SSXFF AM TZR' NLS_TIMESTAMP_TZ_FORMAT= 'DD-MON-RR HH.MI.SSXFF AM TZR'
END OF STMT
PARSE #4:c=0,e=3017,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=5636868608093
EXEC #4:c=0,e=329,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=5636868608512
*** 2012-05-03 11:04:21.573
Received ORADEBUG command 'dump errorstack 3' from process Unix process pid: 2258, image:
*** 2012-05-03 11:04:21.575
ksedmp: internal or fatal error
No current SQL statement being executed.
----- Call Stack Trace -----
calling              call     entry                argument values in hex    
location             type     point                (? means dubious value)   
-------------------- -------- -------------------- ----------------------------
ksedmp()+716         CALL     ksedst()             FFFFFFFF7FFEFCC0 ?
                                                   000000000 ? 0FFFFFFFF ?
                                                   FFFFFFFF7FFEEE68 ?
                                                   FFFFFFFF7FFF05C0 ?
                                                   000000008 ?
ksdxfdmp()+2224      PTR_CALL 0000000000000000     000000002 ? 10638A8CC ?
                                                   000000001 ? 000000000 ?
                                                   10638A000 ? 10638A8CC ?
ksdxcb()+2452        PTR_CALL 0000000000000000     000105400 ? 100E55E00 ?
                                                   106385298 ?
                                                   FFFFFFFF7FFF45A8 ?
                                                   1055F0750 ? 100E55E00 ?
 

/oralog/prod/dump/prod2_ora_14917.trc
PARSING IN CURSOR #1 len=564 dep=1 uid=0 ct=2 lid=0 tim=5635354056881 hv=1232072917 ad='ab10b7e8'
insert into sys.aud$( sessionid,entryid,statement,ntimestamp#, userid,userhost,terminal,action#,returncode, obj$creator,obj$name,auth$privileges,auth$grantee, new$owner,new$name
,ses$actions,ses$tid,logoff$pread, logoff$lwrite,logoff$dead,comment$text,spare1,spare2,  priv$used,clientid,sessioncpu,proxy$sid,user$guid, instance#,process#,xid,scn,auditid,s
qlbind,sqltext) values(:1,:2,:3,SYS_EXTRACT_UTC(SYSTIMESTAMP),     :4,:5,:6,:7,:8,     :9,:10,:11,:12,     :13,:14,:15,:16,:17,     :18,:19,:20,:21,:22,     :23,:24,:25,:26,:27,
     :28,:29,:30,:31,:32,:33,:34)
END OF STMT
PARSE #1:c=0,e=51,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=5635354056877
EXEC #1:c=0,e=17196,p=1,cr=2,cu=6,mis=0,r=1,dep=1,og=4,tim=5635354077836
=====================
PARSING IN CURSOR #4 len=83 dep=1 uid=37 ct=42 lid=37 tim=5635354078496 hv=3678277859 ad='aa71c630'
ALTER SESSION SET TIME_ZONE='+0:00' NLS_LANGUAGE='AMERICAN' NLS_TERRITORY='AMERICA'
END OF STMT
PARSE #4:c=0,e=142,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=5635354078488
EXEC #4:c=0,e=258,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=5635354078866
*** 2012-05-03 11:05:15.100
Received ORADEBUG command 'dump errorstack 3' from process Unix process pid: 2258, image:
*** 2012-05-03 11:05:15.102
ksedmp: internal or fatal error
No current SQL statement being executed.
----- Call Stack Trace -----
calling              call     entry                argument values in hex    
location             type     point                (? means dubious value)   
-------------------- -------- -------------------- ----------------------------
ksedmp()+716         CALL     ksedst()             FFFFFFFF7FFEFCC0 ?
                                                   000000000 ? 0FFFFFFFF ?
                                                   FFFFFFFF7FFEEE68 ?
                                                   FFFFFFFF7FFF05C0 ?
                                                   000000008 ?
ksdxfdmp()+2224      PTR_CALL 0000000000000000     000000002 ? 10638A8CC ?
                                                   000000001 ? 000000000 ?
                                                   10638A000 ? 10638A8CC ?
ksdxcb()+2452        PTR_CALL 0000000000000000     000105400 ? 100E55E00 ?
                                                   106385298 ?
                                                   FFFFFFFF7FFF45A8 ?
                                                   1055F0750 ? 100E55E00 ?
 進一步驗證了先前的推測
 

問題是現在該怎麼辦,總共有37個session在等待cursor: pin S wait on X,難道要全部kill?
應該沒這麼麻煩,只需找到最早登陸的session,它應該是一系列latch等待的blocker
--節點1
SQL> select sid,serial#,LOGON_TIME  from gv$session where INST_ID=2 and event='cursor: pin S wait on X' order by 3;
 

       SID    SERIAL# LOGON_TIME
---------- ---------- -------------------
       333      22715 2012-04-26 04:44:07
       327      24054 2012-04-26 04:49:34
       330      38061 2012-04-26 05:02:38
       324         53 2012-04-26 05:11:26
       323         94 2012-04-26 05:15:00
       321        338 2012-04-26 05:29:31
       318       1047 2012-04-26 06:43:49
       315       1160 2012-04-26 10:09:18
       316       2792 2012-04-26 10:41:45
       311       5712 2012-04-26 23:55:39
       309       1578 2012-04-27 03:06:40
       305      13682 2012-04-29 14:28:03
       313      34990 2012-04-30 09:23:34
       307      40237 2012-04-30 09:54:48
       301        924 2012-04-30 13:35:33
       295      13199 2012-05-02 05:11:36
       298      11061 2012-05-02 10:03:51
       294        596 2012-05-02 11:02:06
       303       5906 2012-05-02 11:09:54
       293        148 2012-05-02 11:16:02
       291        162 2012-05-02 11:23:27
       289         47 2012-05-02 11:25:20
       287         42 2012-05-02 11:26:44
       285        568 2012-05-02 11:50:23
       282       1130 2012-05-02 12:47:26
       283       7025 2012-05-03 04:43:21
       277       1916 2012-05-03 09:12:35
       280      11476 2012-05-03 09:14:18
       275         16 2012-05-03 09:15:32
       273         36 2012-05-03 09:16:28
       271        447 2012-05-03 09:35:40
       269         25 2012-05-03 09:36:00
       265          9 2012-05-03 10:01:15
       267        735 2012-05-03 10:01:51
       263         17 2012-05-03 10:02:55
       261        188 2012-05-03 10:10:48
       259        531 2012-05-03 10:33:02
       256        449 2012-05-03 11:18:32
--節點2
justin19.prod2> ps -ef | grep 1021
  ora102 17697 28975  0 11:28:13 pts/10   0:00 more /oralog/prod/dump/prod2_ora_10212.trc
  ora102 21136 11932  0 11:30:04 pts/12   0:00 grep 1021
  ora102 10212     1  0   Apr 26 ?        0:03 oracleprod2 (LOCAL=NO)
justin19.prod2> kill -9 10212
 

過了幾秒鐘,所有cursor: pin S wait on X消失,節點2可以正常登陸

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/15480802/viewspace-722658/,如需轉載,請註明出處,否則將追究法律責任。

相關文章