記一次開啟資料庫慢原因分析過程
最近在測試生產資料庫時發現資料庫在open過程很長慢,大概持續20分鐘左右,有時持續1-2個小時。日誌如下:
Fri Nov 23 10:53:51 2018
SMON: enabling tx recovery
Database Characterset is AL32UTF8
Fri Nov 23 10:54:02 2018
Decreasing number of real time LMS from 2 to 0
Fri Nov 23 10:54:05 2018
No Resource Manager plan active
Fri Nov 23 10:54:47 2018
Starting background process GTX0
Fri Nov 23 10:54:47 2018
GTX0 started with pid=41, OS id=14549130
Starting background process RCBG
Fri Nov 23 10:54:48 2018<==================
RCBG started with pid=43, OS id=14811234
replication_dependency_tracking turned off (no async multimaster replication found)
Fri Nov 23 14:07:01 2018<====================
從RCBG程式啟動到QMNC程式啟動之間間隔了180分鐘左右,而其他open過程佔用一分鐘左右。
Starting background process QMNC
Fri Nov 23 14:07:01 2018
QMNC started with pid=31, OS id=28246190
Fri Nov 23 14:07:57 2018
Completed: alter database open
。
為了進一步分析 RCBG 到 QMNC 啟動慢的原因,做如下日誌日誌跟蹤:
1. 做oracel hanganalyze
資料庫啟動到mount 狀態
在看到alert log列印如下字樣時,在另外的sqlplus視窗下收集3次,間隔2~3分鐘收集一次:
步驟如下:
sqlplus / as sysdba
oradebug setmypid
oradebug -G all dump systemstate 267
oradebug -G all hanganalyze 2
等2~3分鐘
oradebug -G all dump systemstate 267
oradebug -G all hanganalyze 2
等2~3分鐘
oradebug -G all dump systemstate 267
oradebug -G all hanganalyze 2
oradebug tracefile_name
alter 日誌產生如下檔案
No Resource Manager plan active
Fri Nov 23 15:09:25 2018
Starting background process GTX0
Fri Nov 23 15:09:25 2018
GTX0 started with pid=41, OS id=8585312
Starting background process RCBG
Fri Nov 23 15:09:26 2018
RCBG started with pid=43, OS id=18809034
replication_dependency_tracking turned off (no async multimaster replication found)
Fri Nov 23 15:39:33 2018
System State dumped to trace file /u01/app/oracle/diag/rdbms/erpdb/ erpdb 1/trace/erpdb 1_diag_31850618.trc
從檔案上看oracle 在open過程中等待如下
1.
-- Trace File --
Starting Systemstate 1
.........................................
Ass.Awk Version 1.0.31
~~~~~~~~~~~~~~~~~~~~~~
Source file : symbols1_diag_31850618.trc
System State 1 (2018-11-23 15:39:32.932)
~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
1: [DEAD]
2: waiting for 'pmon timer'
3: waiting for 'rdbms ipc message'
4: waiting for 'VKTM Logical Idle Wait'
5: waiting for 'rdbms ipc message'
6:
7: waiting for 'rdbms ipc message'
8: waiting for 'PING'
9: waiting for 'rdbms ipc message'
10: waiting for 'DIAG idle wait'
11: waiting for 'rdbms ipc message'
12: waiting for 'ges remote message'<===lmd
13: waiting for 'gcs remote message'<===lms
14: waiting for 'gcs remote message'<===lms
15: waiting for 'rdbms ipc message'
16: waiting for 'GCR sleep'
17: waiting for 'rdbms ipc message'
18: waiting for 'rdbms ipc message'
19: waiting for 'rdbms ipc message'
20: waiting for 'rdbms ipc message'
21: waiting for 'rdbms ipc message'
22: waiting for 'smon timer'
23: waiting for 'rdbms ipc message'
24: waiting for 'rdbms ipc message'
25: waiting for 'ASM background timer'
26: waiting for 'rdbms ipc message'
27: waiting for 'rdbms ipc message'
28:
29:
30: waiting for 'wait for unread message on broadcast channel'
31: waiting for 'class slave wait'
32: waiting for 'rdbms ipc message'
33: waiting for 'rdbms ipc message'
34: waiting for 'db file scattered read' (0x4f,0xa2766,0xb)
35: waiting for 'rdbms ipc reply' (from_process=6)[IPC 6]
37: waiting for 'rdbms ipc message'
38: waiting for 'rdbms ipc message'
39: waiting for 'rdbms ipc message'
40: waiting for 'rdbms ipc message'
41: waiting for 'rdbms ipc message'
43: waiting for 'wait for unread message on broadcast channel'
2.
flags2: (0x409) -/-/INC
DID: , short-term DID:
txn branch: 0x0
edition#: 0 oct: 0, prv: 0, sql: 0x0, psql: 0x0, user: 0/SYS
ksuxds FALSE at location: 0
service name: SYS$BACKGROUND
Current Wait Stack:
0: waiting for 'gcs remote message'
waittime=0x1e, poll=0x0, event=0x0
wait_id=51479 seq_num=51480 snap_id=1
wait times: snap=0.000577 sec, exc=0.000577 sec, total=0.000577 sec
wait times: max=0.030000 sec, heur=31 min 42 sec<==============
應該是例項啟動到dump時總共等待時間,這兒說明私網gcs訊息傳輸可能有問題。
為了排除私有網路卡的問題部署oswatch,從oswatch收集的資訊看,網路和資源都沒有問題。
2. 為了進一步確定原因
truss -aefdo /tmp/truss.out sqlplus / as sysdba
alter database open;
從日誌上分析可以看到
14876838: 56950981: 2.0001: kread(9, "01 -\0\006\0\0\0\0\011 i".., 8208) (sleeping...)
21954596: 92471451: 2.0001: kread(0, "\n l t e r d a t a b a".., 4096) (sleeping...)
21954596: 92471451: kread(0, " e x i t\n d a t a b a".., 4096) = 5
21954596: 92471451: 1344.6921: kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914<===
從它前面的kread讀取的終端輸入是exit來看應該是已經完成了open執行推出之後的os call了,所以上面的1344.6921秒是不能計入open過程的,排除這個之後來看,主要問題就是aio了,
listio64
數目高達123644次,而單次等待從0.0003到0.5271s不等
14876838: 56950981: kpread(24, 0x07000107C5755000, 4096, 0x0000004FBAD2A000, 0x000000000000007D) = 4096
14876838: 56950981: kpread(24, 0x07000107C5E14000, 4096, 0x000000E41E85F000, 0x000000000000007D) = 4096
14876838: 56950981: 0.5271: listio64(0x0000000010000004, 0x000000000FFFFFFF, 0x00000000FFFCC1E0, 0x0000000000000001, 0x0000000000000000, 0x0000000000000000, 0x00000001116E89A0, 0x0000000000000040) = 0x0000000000000000
,
---Continue---
有一點還是很奇怪,按理open的時候已經完成了crash recovery了,此時對資料檔案只有讀沒有寫了,但為何您這裡出來這麼多 kpread,還是令人不解。 為什麼從RCBG程式啟動到QMNC程式啟動之間間隔很長時間那,而其他後臺程式open過程佔用一分鐘左右。
從上邊的分析,問題又回到了原點。難道是oracle bug,
為了進一步確認問題,在mos搜到如下一篇文章:
Procedure to Manually Coalesce All the IOTs / Indexes Associated with Advanced Queueing Tables to Maintain Enqueue / Dequeue Performance; Reduce QMON CPU Usage and Redo Generation ( 檔案 ID 271855.1)
為了驗證文章的可靠性,透過此文章提供的指令碼,進行修復,資料庫能正常OPEN,並且很快。日誌如下:
Database Characterset is AL32UTF8
No Resource Manager plan active
Starting background process GTX0
Fri Nov 23 16:39:33 2018
GTX0 started with pid=36, OS id=16056540
Starting background process RCBG
Fri Nov 23 16:41:23 2018
RCBG started with pid=37, OS id=26214482
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Fri Nov 23 16:42:23 2018
QMNC started with pid=39, OS id=14287094
Fri Nov 23 16:43:33 2018
Completed: ALTER DATABASE OPEN
這個案例的後續是,如果經常使用AQ,資料庫可以不定期做碎片整理。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/10201716/viewspace-2939378/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 一次資料庫響應慢分析資料庫
- 資料庫——慢sql的原因資料庫SQL
- 資料庫查詢慢的原因資料庫
- 記一次公司倉庫資料庫伺服器死鎖過程資料庫伺服器
- 記錄一次資料庫CPU被打滿的排查過程資料庫
- 好文分享 | 記一次Oracle12c資料庫SQL短暫緩慢問題分析Oracle資料庫SQL
- 記一次MySQL資料遷移到SQLServer全過程MySqlServer
- 資料需求分析過程
- 排查Mysql突然變慢的一次過程MySql
- 網站開啟慢什麼原因呢?網站
- openGauss核心分析(九):資料庫表的建立過程資料庫
- 【資料庫】資料庫儲存過程(一)資料庫儲存過程
- 記一次 500併發,平均響應時間慢-調優過程~~
- 一次sqlldr匯入慢的解決過程SQL
- 記一次nodejs開發CLI的過程NodeJS
- 資料庫恢復過程資料庫
- 資料庫儲存過程資料庫儲存過程
- Oracle資料庫啟動過程及狀態詳解Oracle資料庫
- 記一次VMware的崩潰除錯分析過程除錯
- 網站開啟緩慢或打不開的原因網站
- 解Bug之路-記一次中介軟體導致的慢SQL排查過程SQL
- 資料庫擴充套件表設計過程記錄資料庫套件
- Linux 啟動過程分析Linux
- Liferay 啟動過程分析
- 記一次透過Memory Analyzer分析記憶體洩漏的解決過程記憶體
- openGauss核心分析(十):資料庫搜尋引的建立過程資料庫
- 資料庫的連線過程資料庫
- MySql資料庫——儲存過程MySql資料庫儲存過程
- 一次HIS系統卡頓原因排查過程分享
- 慢查詢日誌開啟分析
- 記一次簡單的Oracle離線資料遷移至TiDB過程OracleTiDB
- 【資料庫資料恢復】透過資料頁恢復Sql Server資料庫資料的過程資料庫資料恢復SQLServer
- Spring Security 啟動過程分析Spring
- Eureka Server啟動過程分析Server
- 大資料分析過程是如何的大資料
- 記一次SQL調優過程SQL
- 記一次生產資料庫“意外”重啟的經歷資料庫
- 一次DG故障診斷過程分析