基於oracle 10.2.0.1 rac學習lms程式系列四

wisdomone1發表於2015-11-12

背景

   之前,我們瞭解了lmd程式一些概念,這兒我們學習下另一個重要RAC後臺程式,LMS程式。


結論

1,如果lms出現故障,會導致dml事務無法提交,即提交或回退hang住;並且遠端節點的DML操作會HANG住
2,lms程式的等待事件和lmd程式的相似,是gcs remote message
  引申,即和遠端節點的LMS程式進行通訊
3,lms程式出現故障,會引發ipc timeout,如果ipc timeout達到一定期限,會導致RAC節點重啟
4,lms程式恢復正常後,由於遠端的變更資料積壓,需要在對應節點進行例項恢復,以達到資料一致性的要求
5,效能指標gcs messages sent 也是反映資料庫DML或活動是否頻繁的一個維度,如果其值很高,說明資料庫非常繁忙
  當然我們說這只是一種維度,不是絕值,畢竟資料庫是非常複雜的,要由多個程式協作互動方可正常執行
  但它為我們分析問題提供一種思路


6,暫未發現LMS相關的隱含引數,當然也可能是我查詢方法不對或理解不深,還要繼續學習
7, 10706 event可以分析跟瞎RAC全域性佇列鎖的獲取過程
    其操作過程:


SQL> select program,spid,pid from v$process where addr='0000000083A62BD0';


PROGRAM                                          SPID                PID
------------------------------------------------ ------------ ----------
oracle@jingfa1 (TNS V1-V3)                       6496                 27




SQL> oradebug setospid 6496
Oracle pid: 27, Unix process pid: 6496, image: oracle@jingfa1 (TNS V1-V3)
SQL> oradebug event 10706 trace name context forever,level 12
Statement processed.




SQL> insert into t_lock values(18,18);


1 row created.


SQL> oradebug event 10706 trace name context off
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/admin/jingfa/udump/jingfa1_ora_6496.trc




測試                                                                                             



----oracle version
SQL> select * from v$version where rownum=1;


BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi


---lms含義
The LMS process maintains records of the data file statuses and each cached block by recording information in a Global Resource Directory (GRD). The LMS process also controls the flow of messages to remote instances and manages global data block access and transmits block images between the buffer caches of different instances. This processing is part of the Cache Fusion feature.


1,在RAC的GRD即全域性資源目錄中,維護資料檔案的狀況,並記錄全域性快取中資料塊的資訊
2,控制傳送到遠端節點的訊息
3,管理在節點間傳輸資料塊以及快取資料塊
4,它也是RAC機制CACHE FUSION功能一部分,也就是沒有LMS,CACHE FUSION無從談起,可見其重要性




----瞭解lms的作用
SQL> select addr,program,username,pid,spid from v$process where username='oracle' and lower(program) like '%lms%';


ADDR             PROGRAM                                          USERNAME               PID SPID
---------------- ------------------------------------------------ --------------- ---------- ------------
0000000083A58DB0 oracle@jingfa1 (LMS0)                            oracle                   7 30504


--hang lms
SQL> oradebug setospid 30504
Oracle pid: 7, Unix process pid: 30504, image: oracle@jingfa1 (LMS0)
SQL> oradebug suspend
Statement processed


--new session
SQL> select * from t_lock;


         A          B
---------- ----------
        11          1
         2          2


--重複多次測試
SQL> insert into t_lock values(3,3);


1 row created.


SQL> commit;
提交hang住


--resume lms
SQL> oradebug resume
Statement processed.


提交完成
SQL> commit;


Commit complete.




---再看看lms的等待事件
SQL> select sid,serial#,program,event,p1,p1text,p2,p2text,p3,p3text from v$session where paddr='0000000083A58DB0';


       SID    SERIAL# PROGRAM                   EVENT                                  P1 P1TEXT                  P2 P2TEXT                  P3 P3TEXT
---------- ---------- ------------------------- ------------------------------ ---------- --------------- ---------- --------------- ---------- ---------------
       165          1 oracle@jingfa1 (LMS0)     gcs remote message                     24 waittime                 0 poll                     0 event




--ipc timeout過久,會引發節點重啟
Wed Nov 11 04:38:23 2015
IPC Send timeout detected. Receiver ospid 30504
Wed Nov 11 04:38:24 2015
Errors in file /u01/app/oracle/admin/jingfa/bdump/jingfa1_lms0_30504.trc:
Wed Nov 11 04:38:26 2015
Trace dumping is performing id=[cdmp_20151111043845]
Wed Nov 11 04:39:27 2015
Unix process pid: 30504, image: oracle@jingfa1 (LMS0) resumed
Wed Nov 11 04:40:19 2015
Errors in file /u01/app/oracle/admin/jingfa/bdump/jingfa1_lmon_30500.trc:
ORA-29740: evicted by member 1, group incarnation 16
Wed Nov 11 04:40:19 2015
LMON: terminating instance due to error 29740
Wed Nov 11 04:40:19 2015
Errors in file /u01/app/oracle/admin/jingfa/bdump/jingfa1_lmd0_30502.trc:
ORA-29740: evicted by member , group incarnation 
Wed Nov 11 04:40:19 2015
Errors in file /u01/app/oracle/admin/jingfa/bdump/jingfa1_lms0_30504.trc:
ORA-29740: evicted by member , group incarnation 
Wed Nov 11 04:40:19 2015
System state dump is made for local instance
System State dumped to trace file /u01/app/oracle/admin/jingfa/bdump/jingfa1_diag_30496.trc
Wed Nov 11 04:40:19 2015
Trace dumping is performing id=[cdmp_20151111044019]
Wed Nov 11 04:40:23 2015
Instance terminated by LMON, pid = 30500




---檢視下lms程式的TRC檔案
[oracle@jingfa1 bdump]$ pwd
/u01/app/oracle/admin/jingfa/bdump
[oracle@jingfa1 bdump]$ ll -lht *lms*
-rw-rw---- 1 oracle oinstall 1.3K Nov 11 04:41 jingfa1_lms0_21223.trc
-rw-r----- 1 oracle oinstall  80K Nov 11 04:40 jingfa1_lms0_30504.trc
-rw-r----- 1 oracle oinstall 369K Nov 11 03:30 jingfa1_lms0_15273.trc
-rw-rw---- 1 oracle oinstall 356K Nov 10 07:28 jingfa1_lms0_21824.trc
-rw-rw---- 1 oracle oinstall 1.3K Nov 10 05:21 jingfa1_lms0_32705.trc
-rw-r----- 1 oracle oinstall 6.1K Nov 10 05:20 jingfa1_lms0_11222.trc
-rw-r----- 1 oracle oinstall 2.0K Nov 10 01:51 jingfa1_lms0_10351.trc


可見lms傳送gcs message以及資料塊(PI)
 0 GCS shadows cancelled, 0 closed
 0 GCS resources traversed, 0 cancelled
 0 GCS shadows traversed, 0 replayed, 0 duplicates
 lms 0 finished replaying gcs resources
 0 write requests issued in 1724 GCS resources
 0 PIs marked suspect, 0 flush PI msgs
 0 write requests issued in 1719 GCS resources
 0 PIs marked suspect, 0 flush PI msgs
 0 write requests issued in 1676 GCS resources
 0 PIs marked suspect, 0 flush PI msgs
 0 write requests issued in 1689 GCS resources
 0 PIs marked suspect, 0 flush PI msgs
 0 write requests issued in 143 GCS resources
 0 PIs marked suspect, 0 flush PI msgs
 lms 0 finished fixing gcs write protocol


--再從效能指標方面看看
SQL> select statistic#,name,class from v$statname where lower(name) like '%gcs%' and STATISTIC#=44;


STATISTIC# NAME                                                                  CLASS
---------- ---------------------------------------------------------------- ----------
        44 gcs messages sent                                                        32




SQL> select v$statname.name,v$sysstat.value from v$sysstat,v$statname where v$sysstat.statistic#=v$statname.statistic# and v$statname.statistic# in (44);


NAME                                                                  VALUE
---------------------------------------------------------------- ----------
gcs messages sent                                                      2840




可見發生重級別DML時,效能指標暴增
SQL> select * from t_lock;


         A          B
---------- ----------
        11          1
         2          2
         3          3
         3          3
         4          4
         1          1
         2          2
         3          3
         4          4
         5          5
         6          6


         A          B
---------- ----------
         7          7
         8          8
         9          9
        10         10


15 rows selected.


SQL> insert into t_lock select level,level from dual connect by level<=1000000;
1000000 rows created.






SQL> select v$statname.name,v$sysstat.value from v$sysstat,v$statname where v$sysstat.statistic#=v$statname.statistic# and v$statname.statistic# in (44);


NAME                                                                  VALUE
---------------------------------------------------------------- ----------
gcs messages sent                                                      4039




--hang lms
SQL> oradebug setospid 21223
Oracle pid: 7, Unix process pid: 21223, image: oracle@jingfa1 (LMS0)
SQL> oradebug suspend
Statement processed.


SQL> select v$statname.name,v$sysstat.value from v$sysstat,v$statname where v$sysstat.statistic#=v$statname.statistic# and v$statname.statistic# in (44);


NAME                                                                  VALUE
---------------------------------------------------------------- ----------
gcs messages sent                                                      4041


可見發生重量級DML,效能指標只會微量增高
SQL> insert into t_lock select level,level from dual connect by level<=1000000;
--dml hang




SQL> select v$statname.name,v$sysstat.value from v$sysstat,v$statname where v$sysstat.statistic#=v$statname.statistic# and v$statname.statistic# in (44);


NAME                                                                  VALUE
---------------------------------------------------------------- ----------
gcs messages sent                                                      4060


恢復lms,效能指標才會恢復正常,暴增起來
SQL> oradebug resume
Statement processed.


SQL> insert into t_lock select level,level from dual connect by level<=1000000;


1000000 rows created.




SQL> select v$statname.name,v$sysstat.value from v$sysstat,v$statname where v$sysstat.statistic#=v$statname.statistic# and v$statname.statistic# in (44);


NAME                                                                  VALUE
---------------------------------------------------------------- ----------
gcs messages sent                                                      5166


--再透過10706 EVENT分析下全域性佇列鎖的操作


SQL> select program,spid,pid from v$process where addr='0000000083A62BD0';


PROGRAM                                          SPID                PID
------------------------------------------------ ------------ ----------
oracle@jingfa1 (TNS V1-V3)                       6496                 27




SQL> oradebug setospid 6496
Oracle pid: 27, Unix process pid: 6496, image: oracle@jingfa1 (TNS V1-V3)
SQL> oradebug event 10706 trace name context forever,level 12
Statement processed.




SQL> insert into t_lock values(18,18);


1 row created.


SQL> oradebug event 10706 trace name context off
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/admin/jingfa/udump/jingfa1_ora_6496.trc


ksipget: LB,cd381e0a,b3a45175 opt=0x10020 mode=3 timeout=0   ---LB,cd381e0a,b3a45175是什麼,opt=0x10020又是什麼,說白了表t_lock加3級鎖(分析見下)
ksipget: lid=0x82f51e90 res=0 time=62
*** 2015-11-11 05:21:19.228
ksiprls: lid=0x82f51e90 opt=0x0
*** 2015-11-11 05:21:30.797
ksipget: LB,cd381e0a,b3a45175 opt=0x10020 mode=3 timeout=0
ksipget: lid=0x82f51e90 res=0 time=80
*** 2015-11-11 05:21:30.799
ksiprls: lid=0x82f51e90 opt=0x0
*** 2015-11-11 05:21:30.799
ksipget: TM,2860,0 opt=0x10021 mode=2 timeout=0
ksipget: lid=0x82f51e90 res=0 time=29




SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump library_cache 12
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/admin/jingfa/udump/jingfa1_ora_26571.trc


[oracle@jingfa1 bdump]$ more /u01/app/oracle/admin/jingfa/udump/jingfa1_ora_26571.trc|grep -i --color cd381e0a
  hash=cd381e0ab3a451752d5c0ad44df74f5e timestamp=11-09-2015 21:49:32
    LOCK INSTANCE LOCK: id=LBcd381e0ab3a45175
    PIN INSTANCE LOCK: id=NBcd381e0ab3a45175 mode=S release=F flags=[00]
[oracle@jingfa1 bdump]$ more /u01/app/oracle/admin/jingfa/udump/jingfa1_ora_26571.trc|grep -i --color b3a45175
  hash=cd381e0ab3a451752d5c0ad44df74f5e timestamp=11-09-2015 21:49:32
    LOCK INSTANCE LOCK: id=LBcd381e0ab3a45175
    PIN INSTANCE LOCK: id=NBcd381e0ab3a45175 mode=S release=F flags=[00]
[oracle@jingfa1 bdump]$ 


由上述的內容再上向推導,可知是library object handle,而它又屬於bucket 85854,這個對應表t_lock
BUCKET 85854:
  LIBRARY OBJECT HANDLE: handle=6fe77340 mutex=0x6fe77470(0)
  name=TBS_ZXY.T_LOCK 
  hash=cd381e0ab3a451752d5c0ad44df74f5e timestamp=11-09-2015 21:49:32
  namespace=TABL flags=KGHP/TIM/XLR/[00000020]
  kkkk-dddd-llll=0000-0709-0709 lock=N pin=0 latch#=1 hpc=0002 hlc=0002
  lwt=0x6fe773e8[0x6fe773e8,0x6fe773e8] ltm=0x6fe773f8[0x6fe773f8,0x6fe773f8]
  pwt=0x6fe773b0[0x6fe773b0,0x6fe773b0] ptm=0x6fe773c0[0x6fe773c0,0x6fe773c0]
  ref=0x6fe77418[0x6fe77418,0x6fe77418] lnd=0x6fe77430[0x6fe736f8,0x6fe77c60]
    LOCK INSTANCE LOCK: id=LBcd381e0ab3a45175 ---1個鎖
    PIN INSTANCE LOCK: id=NBcd381e0ab3a45175 mode=S release=F flags=[00]  --可見加了2個鎖
    INVALIDATION INSTANCE LOCK: id=IV0000286009163221 mode=S
    LIBRARY OBJECT: object=7e36b610
    type=TABL flags=EXS/LOC[0005] pflags=[0000] status=VALD load=0
    DATA BLOCKS:
    data#     heap  pointer    status pins change whr
    ----- -------- -------- --------- ---- ------ ---
        0 6fe77280 7e36b728 I/-/A/-/-    0 NONE   00 
        3 7e36aed8 7e367940 I/-/A/-/-    0 NONE   00 
        8 7e36b910 7e36b3a0 I/-/A/-/-    0 NONE   00 
        9 7e36ad88 7e36abd0 I/-/A/-/-    0 NONE   00 
       10 7e36ae10 7e36a7e8 I/-/A/-/-    0 NONE   00 
  BUCKET 85854 total object count=1

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/9240380/viewspace-1830192/,如需轉載,請註明出處,否則將追究法律責任。

相關文章