11gR2新特性:LMHB Lock Manager Heart Beat後臺程式
LMHB是11gR2中新引入的後臺程式,官方文件的介紹是Global Cache/Enqueue Service Heartbeat Monitor,Monitor the heartbeat of LMON, LMD, and LMSn processes,LMHB monitors LMON, LMD, and LMSn processes to ensure they are running normally without blocking or spinning。 Database and ASM instances, Oracle RAC
該程式負責監控LMON、LMD、LMSn等RAC關鍵的後臺程式,保證這些background process不被阻塞或spin。 LMHB可能是Lock Manager Heartbeat的縮寫。
我們來看一下該程式的trace跟蹤檔案以便了解其功能:
按照 100s -> 80s -> 100s -> 80s的間隔監控並輸出一次LMSn、LCKn、LMON、LMD等程式的狀態及wait chain,由kjfmGCR_HBCheckAll函式控制
*** 2012-02-03 00:03:10.066 ============================== LMS0 (ospid: 17247) has not moved for 77 sec (1328245389.1328245312) kjfmGCR_HBCheckAll: LMS0 (ospid: 17247) has status 2 : waiting for event 'gcs remote message' for 0 secs with wait_id 15327. ===[ Wait Chain ]=== Wait chain is empty. kjgcr_Main: KJGCR_ACTION - id 5 *** 2012-02-03 00:04:50.091 ============================== LMS0 (ospid: 17247) has not moved for 88 sec (1328245489.1328245401) kjfmGCR_HBCheckAll: LMS0 (ospid: 17247) has status 2 : waiting for event 'gcs remote message' for 0 secs with wait_id 24546. ===[ Wait Chain ]=== Wait chain is empty. kjgcr_Main: KJGCR_ACTION - id 5 LCK0 (ospid: 2662) has not moved for 95 sec (1309746735.1309746640) kjfmGCR_HBCheckAll: LCK0 (ospid: 2662) has status 6 ================================================== === LCK0 (ospid: 2662) Heartbeat Report ================================================== LCK0 (ospid: 2662) has no heartbeats for 95 sec. (threshold 70 sec) : Not in wait; last wait ended 80 secs ago. : last wait_id 2317342 at 'libcache interrupt action by LCK'. .. . Session Wait History: elapsed time of 1 min 20 sec since last wait 0: waited for 'libcache interrupt action by LCK' ..
大約每3分鐘輸出一次TOP CPU User,CPU使用率高的session資訊:
*** 2012-02-03 00:05:30.102 kjgcr_SlaveReqBegin: message queued to slave kjgcr_Main: KJGCR_ACTION - id 3 CPU is high. Top oracle users listed below: Session Serial CPU 29 7 0 156 23 0 3 1 0 4 1 0 5 1 0 *** 2012-02-03 00:08:30.147 kjgcr_SlaveReqBegin: message queued to slave kjgcr_Main: KJGCR_ACTION - id 3 CPU is high. Top oracle users listed below: Session Serial CPU 29 7 0 156 23 0 3 1 0 4 1 0 5 1 0
如果發現有session的CPU使用率極高,根據內部演算法可能會啟用 資源計劃(resource management plan) ,甚至於kill 程式:
*** 2012-02-03 00:08:35.149 kjgcr_Main: Reset called for action high cpu, identify users, count 0 *** 2012-02-03 00:08:35.149 kjgcr_Main: Reset called for action high cpu, kill users, count 0 *** 2012-02-03 00:08:35.149 kjgcr_Main: Reset called for action high cpu, activate RM plan, count 0 *** 2012-02-03 00:08:35.149 kjgcr_Main: Reset called for action high cpu, set BG into RT, count 0
從11.2.0.2 開始LMHB開始使用slave 程式GCRn來完成實際的任務(Global Conflict Resolution Slave Process Performs synchronous tasks on behalf of LMHB GCRn processes are transient slaves that are started and stopped as required by LMHB to perform synchronous or resource intensive tasks.) LMHB會控制GCRn程式的啟停,以便使用多個GCRn完成同步和緩解資源緊張的任務(例如kill程式)。
可以看到實際LMHB呼叫的多為kjgcr或kjfmGCR開頭的內部函式,GCR意為Global Conflict Resolution。
kjgcr_Main: KJGCR_ACTION – id 5
GCR 程式的trace :
*** 2011-11-28 02:42:44.466
kjgcr_SlaveActionCbk: Callback failed, check trace
Dumping GCR slave work message at 0x96b81fc0
GCR layer information: type = 1, index = 0
Unformatted dump of ksv layer header:
LMHB程式的出現是為了提高RAC的可用性,特別是在資源緊張的環境中他會主動地去嘗試kill掉最耗費資源的服務程式,以保證LMS等關鍵的RAC後臺程式能正常工作; 因為該程式定期監控LMS、LMON等後臺程式的等待事件及session的CPU使用率等資訊,所以該LMHB程式的跟蹤日誌也可能成為診斷RAC故障的之一,這是11.2.0.1以來RAC一個潛在的新特性和增強。
相關隱式引數
_lm_hb_callstack_collect_time hb diagnostic call stack collection time in seconds — 5s
_lm_hb_disable_check_list list of process names to be disabled in heartbeat check — none
11.2是第一個引入LMHB程式的版本,所以並不是太成熟,在實際過程中對於資源使用率很高的RAC系統而言LMHB可能會幫一些倒忙,若你確實遇到了相關的問題或者是在11.2 RAC上碰到了一些詭異的現象,那麼可以關注一下以下這些MOS Note:
ORA-29770 LMHB
Terminates Instance as LMON Waited for Control File IO or LIBRARY CACHE
or ROW CACHE Event for too Long [ID 1197674.1]
Bug 8888434 – LMHB crashes the instance with LMON waiting on controlfile read [ID 8888434.8]
Bug 11890804 – LMHB crashes instance with ORA-29770 after long “control file sequential read” waits [ID 11890804.8]
Bug 11890804: LMHB TERMINATE INSTANCE WHEN LMON WAIT CHANGE FROM CF READ AFTER 60 SEC
Bug 13467673: CSS MISSCOUNT AND ALL ASM DOWN WITH ORA-29770 BY LMHB
Bug 13390052: KJFMGCR_HBCHECKALL MESSAGES ARE CONTINUOUSLY LOGGED IN LMHB TRACE FILE.
Bug 13322797: LMHB TERMINATES THE INSTANCE DUE TO ERROR 29770
Bug 11827088 – Latch ‘gc element’ contention, LMHB terminates the instance [ID 11827088.8]
Bug 13061883: LMHB IS TERMINATING THE INSTANCE DURING SHUTDOWN IMMEDIATE
Bug 12564133 – ORA-600[1433] in LMHB process during RAC reconfiguration [ID 12564133.8]
Bug 12886605: ESSC: LMHB TERMINATE INSTANCE DUE TO 29770 – LMON WAIT ENQ: AM – DISK OFFLINE
Bug 12757321: LMHB TERMINATING THE INSTANCE DUE TO ERROR 29770
Bug 10296263: LMHB (OSPID: 15872): TERMINATING THE INSTANCE DUE TO ERROR 29770
Bug 11899415: ORA-29771 AND LMHB (OSPID: XXXX) KILLS USER (OSPID: XXX
Bug 10431752: SINGLE NODE RAC: LMHB TERMINATES INSTANCE DUE TO 29770
Bug 11656856: LMHB (OSPID: 27701): TERMINATING THE INSTANCE DUE TO ERROR 29770
Bug 10411143: INSTANCE CRASHES WITH IPC SEND TIMEOUT AND LMHB TERMINATES WITH ORA-29770
Bug 11704041: DATABASE INSTANCE CRASH BY LMHB PROCESS
Bug 10412545: ORA-29770 LMHB TERMINATE INSTANCE DUE TO VARIOUS LONG CSS WAIT
Bug 10147827: INSTANCE TERMINATED BY LMHB WITH ERROR ORA-29770
Bug 10016974: ORA-29770 LMD IS HUNG FOR MORE THAN 70 SECONDS AND LMHB TERMINATE INSTANCE
Bug 9376100: LMHB TERMINATING INSTANCE DUE ERROR 29770
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/31397003/viewspace-2136576/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 11gR2新特性---Gpnp守護程式
- 11GR2新特性(轉)
- oracle 11GR2 新特性Oracle
- Rational Quality Manager V1.0.1 的新特性
- 【RAC】11gR2 新特性: Rebootless RestartbootREST
- Flutter 3.7 新特性:介紹後臺isolate通道Flutter
- 【11gR2新特性】extent延遲建立
- Oracle 11g 新特性 – HM(Hang Manager)簡介Oracle
- 11gR2 新特性之—In-Memory Parallel executionParallel
- 11gR2新特性:STANDBY_MAX_DATA_DELAY
- oracle 11gR2 新特性 diskgroup 重新命名Oracle
- 11GR2新特性測試-閃迴歸檔
- 【11gR2新特性】result cache 的三種模式模式
- 【11gR2新特性】密碼區分大小寫密碼
- 11GR2的新特性Deferred Segment Creation
- oracle 11GR2新特性 Cluster Time Synchronization Service 配置Oracle
- Oracle 11gr2 的新特性-延遲段建立Oracle
- 【11g新特性】DDL_LOCK_TIMEOUT的作用
- 11gR2新特性之二 - Flash Cache 的SSD支援
- ORACLE 11G 新的後臺程式Oracle
- 11gR2 新特性之(一)Adaptive Cursor Sharing(ACS)APT
- 11gR2 新特性--待定的統計資訊(Pending Statistic)
- 【RAC】11gR2 新特性:Oracle Cluster Health Monitor(CHM)簡介Oracle
- ORACLE 11GR2 新特性CACHE表與以前的區別Oracle
- My Heart will go onGo
- Oracle 11gR2 ASM磁碟組管理與新特性實踐[1]OracleASM
- oracle 11gR2 新特性 orc和vote盤可以放在ASM中OracleASM
- DBMS_PARALLEL_EXECUTE 11GR2新特性,並行訂正大資料Parallel並行大資料
- 12c 新後臺程式 (文件 ID 2102856.1)
- 【11gR2新特性】DBMS_RESULT_CACHE管理結果快取的包快取
- ORACLE後臺程式Oracle
- oracle 後臺程式Oracle
- Clusterware 後臺程式
- Defne平臺新特性:非同步訊息框架非同步框架
- 新特性
- 將程式在後臺執行和殺掉後臺的程式
- iOS 8 之後UINavigationController新特性iOSUINavigationController
- “低程式碼”平臺特性概覽