利用hanganalyz/systemstate dump診斷資料庫hang
診斷資料庫是否hang
一般採用兩個命令:hanganalyze/dump systemstate,其中systemstate級別大於256時會對每個程式進行dump,當資料庫程式太多時會比較消耗資源,尤其在非solaris/linux平臺
Hanganalyze
|
|
||||||||||||||||||
Systemstate
|
操作步驟
非RAC系統
The ideal manner to collect the hanganalyze and systemstate dumps is to collect 2 sets. However, it is best to have the 2 hanganalyzes in one trace file and the 2 systemstate dumps is 2 separate trace files. In order to do this, you will need 3 separate sqlplus sessions. We will identify these as SQL1, SQL2, and SQL3.
- export
ORACLE_SID=PROD
## Replace PROD with the SID you want to trace
- Using SQL*Plus connect as SYSDBA using the following command:
sqlplus -prelim / as sysdba
Do this 3 times in 3 separate windows, creating 3 sqlplus sessions (SQL1, SQL2, and SQL3)
- In SQL1 execute the following:
SQL1> oradebug setmypid
SQL1> oradebug unlimit
SQL1> oradebug hanganalyze 3
- In SQL2 execute the following:
SQL2> oradebug setmypid
SQL2> oradebug unlimit
SQL2> oradebug dump systemstate 266
- Wait at least 2
minutes to give time to identify process state changes.
- In SQL1 execute the following:
SQL1> oradebug hanganalyze 3
- In SQL3 execute the following:
SQL3> oradebug setmypid
SQL3> oradebug unlimit
SQL3> oradebug dump systemstate 266
NOTE:
- Systemstate level 258 and 266 are only available in patchsets 9.2.0.6, 10.1.0.4 and base release 10.2.0.1 and higher. If you are not on one of these releases, use level 10 in place of the level suggested above.
- If you are using systemstate level 266 and it is taking much longer than expected to generate the dump file, then end this systemstate dump and try level 258.
RAC系統
The trace files will be written to the "diag" background process trace file in the background_dump_destination on each node.
- Identify the SID you want to trace
export ORACLE_SID=PROD
Replace PROD with the SID you want to trace
- Using SQL*Plus connect as "/ AS SYSDBA" using the following command:
sqlplus -prelim / as sysdba
- Execute the following:
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug -g all hanganalyze 3
SQL> oradebug -g all dump systemstate 266
- Wait at least 2
minutes to give time to identify process state changes.
- Execute the following:
SQL> oradebug -g all hanganalyze 3
SQL> oradebug -g all dump systemstate 266
Verify that the the "diag" background process trace file has been updated in the background_dump_destination on each node. Examine each "diag" file to ensure that it contains 2 hanganalyze dumps and a systemstate dump.
檢測資料庫是否 hang
所謂的hang是指oracle內部死鎖。當DML發生內部死鎖時,oracle可以自動處理;當核心級資源(latch/pin)死鎖時,通常oracle就會掛在那裡
Hanganaylze報告
當CYCLE出現時,意味著資料庫真的遭遇死鎖了
Cycle
1 :
<17/472/0x80057dd8/6674/library cache lock>
-- <13/7/0x800580f4/6676/library cache lock>
-- <21/204/0x8005a644/5813/library cache lock>
Dumping System_State and Fixed_SGA in process with ospid 6676
Dumping Process information for process with ospid 6676
Dumping Process information for process with ospid 6674
Dumping Process information for process with ospid 5813
f your hanganalyze output has a "CYCLE" section, the database is in a hang state.
- Note down the SID and OSPID for the blocker and waiter sessions to use in the Determine a Cause>Data Collection and Determine a Cause>Analysis steps.
- Go to the next step to collect additional data.
Open Chain
the "OPEN CHAIN" section reports sessions involved in a wait chain. A wait chain means that one session is blocking one or more other sessions. Open chains represents "stuck" or "locked" sessions
Hanganalyze #1
Open chains found: Chain
1 : <16/44773/0x265f15c/1948/SQL*Net message from client> -- <12/5/0x265fad4/2112/enqueue> -- <13/14/0x265fdfc/2076/enqueue> Chain
2 : <19/3/0x2660124/2392/No Wait> |
Hanganalyze #2
Open chains found: Chain
1 : <16/44773/0x265f15c/1948/SQL*Net message from client> -- <12/5/0x265fad4/2112/enqueue> -- <13/14/0x265fdfc/2076/enqueue> Chain
2 : <19/3/0x2660124/2392/No Wait> |
Hanganalyze #3
Open chains found: Chain
1 : <16/44773/0x265f15c/1948/SQL*Net message from client> -- <12/5/0x265fad4/2112/enqueue> -- <13/14/0x265fdfc/2076/enqueue> -- <19/3/0x2660124/2392/enqueue> |
透過對比三次報告,判斷會話是否阻塞時間過長,在上面例子中,需要對blocker和第一個waiter會話做errorstack;
若一個程式阻塞了10個或更多sesson,通常Open Chain之前會有Found資訊,若一個blocker出現多次,則需要收集errorstack
Found
34 objects waiting for <131/754/0x9fc1e8/576293/No Wait> Found
17 objects waiting for <245/2343/0xa19f48/575938/latch free> Found
24 objects waiting for <144/2183/0xa0c9b8/575457/latch free> |
Dump errorstack
SQL>
oradebug setospid
SQL> oradebug unlimit
SQL> oradebug dump errorstack 3
若沒有滿足以上條件的輸出結果,那麼資料庫應該是效能慢而非hang
檢視errorstack報告
包含call stack和process state兩部分,其中process state又有一系列的SO(state object)組成,大致分為process/session/library object lock
1 process
|
|||||||
(process) |
shows that the state object is of type process |
|
|||||
ospid: 25765 |
shows that the ospid is equal to 25765 |
|
2 session
|
|||
(session) |
shows that the state object is of type process |
|
|
sid: 118 |
shows that the sid for this session is 118. This is used to ensure that you are looking at the correct session. |
|
|
sql: 0x3076f3f8 |
shows that the address of the SQL for this session is 3076f3f8. This will be used to find the Active SQL in the next step. |
|
|
waiting for 'enq: TM - contention' |
shows that the wait event for this session is enq: TM - contention. This will be used when determining the type of lock associated with the hang or lock situation that you are troubleshooting. |
|
|
name|mode=544d0006 |
shows the value for P1 if the wait event is an enqueue. P1 is a hexadecimal value that represents the lock type and lock mode (lmode). See MetaLink for details on obtaining the lock type and lock mode from P1. MetaLink provides details on the lock types and lock modes. |
|
3 library object lock
透過session資訊裡的sql address搜查,可以找到相應的library object lock(搜尋時去掉0x字首)
|
|||
LIBRARY OBJECT LOCK |
shows that the state object is of type Library Object Lock |
|
|
LIBRARY OBJECT HANDLE: handle=3076f3f8 |
shows that the Library Object Handle address is 3076f3f8. This address is the same as the SQL address found in the "session" state object. This is use to verify that this is the state object containing the active SQL. |
|
|
name=lock table emp in exclusive mode |
show that the active SQL is "lock table emp in exclusive mode". |
|
Callstack
可以將顯示的function在MOS作為關鍵字搜尋,一般來說需要忽略kse/ksd打頭的
As a general rule, ignore the functions beginning with 'kse' and 'ksd'. Start by using the first 5 functions following the 'kse' and 'ksd' functions. You may need to try several combinations of search criteria before finding any previously identified issues that match the issue you are attempting to troubleshoot.
對該案例可以嘗試ntprd nsprecv nsrdr nsdo nsbrecv enq: TM conection關鍵字搜尋
|
通常情況下,以上步驟可以大致診斷出資料庫是否hang以及造成hang的根本原因
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/10271187/viewspace-1624989/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 透過v$wait_chains檢視診斷資料庫hang和ContentionAI資料庫
- ODX 診斷資料庫轉換工具 — DDC資料庫
- 資料庫異常智慧分析與診斷資料庫
- Part II 診斷和優化資料庫效能優化資料庫
- 大語言模型與資料庫故障診斷模型資料庫
- 使用SQL_TRACE進行資料庫診斷(轉)SQL資料庫
- MySQL使用event等待事件進行資料庫效能診斷MySql事件資料庫
- 詳解JAVA執行緒問題診斷工具Thread DumpJava執行緒thread
- 資料庫簡化運維,智慧診斷助手幫你搞定!資料庫運維
- 如何利用 Webshell 診斷 EDAS Serverless 應用WebshellServer
- 【巨杉資料庫SequoiaDB】巨杉Tech | 四步走,快速診斷資料庫叢集狀態資料庫
- 從監控到診斷:資料的力量
- Oracle 19.3資料庫impdp匯入view時hang住Oracle資料庫View
- pg_dump 備份,恢復資料庫資料庫
- 19c資料庫impdp匯入view時hang住資料庫View
- Oracle 28.6資料庫impdp匯入view時hang@11Oracle資料庫View
- 【恩墨學院】基於裸資料的異地資料庫效能診斷與最佳化資料庫
- 【YashanDB知識庫】yac修改引數後關閉資料庫hang住資料庫
- 當資料庫遇上"自動駕駛",阿里雲 DAS 在自治診斷的突破資料庫自動駕駛阿里
- 利用errorstack事件進行錯誤跟蹤和診斷Error事件
- [20180606]如何dump資料庫裡面的漢字.txt資料庫
- PG-pg_dump備份/恢復資料庫資料庫
- Memory Dump利用例項
- postgresql資料庫利用方式SQL資料庫
- Oracle如何診斷遠端訪問資料庫慢/超時等問題小結Oracle資料庫
- 利用反射讀取資料庫資料反射資料庫
- sqlplus -prelim和oradebug捕獲資料庫hang住的詳細資訊SQL資料庫
- 一次k8s docker下.net程式的異常行為dump診斷K8SDocker
- file-max設定過小導致oracle資料庫hang住Oracle資料庫
- B站大資料系統診斷實踐-SQLSCAN篇大資料SQL
- OPPO大資料診斷平臺設計與實踐大資料
- [JVM] 應用診斷工具之Fastthread(線上診斷)JVMASTthread
- oracle資料庫瘋狂生成dump把目錄撐滿Oracle資料庫
- ORACLE診斷案例Oracle
- 打造屬於自己的underscore系列 ( 二 ) - 資料型別診斷資料型別
- 利用RMAN備份重建資料庫資料庫
- 如何在因果推斷中更好地利用資料?
- win10設定-隱私-診斷和反饋-“可選診斷資料”呈灰色無法選擇怎麼辦?Win10
- 11月11日線上研討會預熱 | ODX診斷資料庫轉換工具 — VDC(ODX)資料庫