一次db2 crash的調查
昨晚收到來自production DB2 server的兩封郵件,一個是dprop停了,另外一個是放db2diag.log的檔案系統滿了,於是開始調查原因。
在db2diag.log裡面發現了一些error資訊,db2 instance 發生了crash,並且自動重啟了,這造成了dprop的停止,crash的時候在檔案db2diag.log的目錄下生成了大量的db2 FODC檔案,造成檔案系統滿。
那db2 instance crash根本的原因是什麼那?
從下面的error log裡面,不難猜測是來自作業系統檔案訪問造成的。
從DMS這個關鍵字眼可以看出,是向一個DMS的表空間寫資料
從LOB這個關鍵字眼可以看出,是向放LOB資料的表空間寫資料
這些資訊還不足以找到這個檔案,那麼看下面的這個關於檔案大小的資訊
Target File Information :
Size = 1073741824
對資料庫的儲存目錄進行查詢
find "db path for DMS tablespace" -type f -print | xargs ls -l | sort -k5,5rn | head -30
終於找到了這個檔案
接著進行分析,根據這個檔案所在的檔案系統,找pv的相關資訊,並且結合作業系統級別的errpt輸出,也發現了同樣的問題。
現在很明顯了,問題的根本原因是和儲存有關係,接著把支援SAN服務的人拉到chat裡面,確認他們那邊確實發現了問題,他們已經修復了這個問題。
一些想法: DB2要是能在db2diag.log裡面直接顯示發生錯誤的檔案就好了,就不用多方面查詢,猜測了,這個猜測還是需要DBA來做的的,AIX的人無法根據db2diag.log裡面的錯誤資訊判斷哪個檔案出現了問題。
另外一點就是,一般發生錯誤的時候,在errpt的輸出裡面都會有相關的資訊的,所以多個方面一起確認是穩妥的方法。
MESSAGE : ZRC=0x840F0001=-2079391743=SQLO_ACCD "Access Denied"
DIA8701C Access denied for resource "", operating system return code
was "".
CALLED : OS, -, pwrite
OSERR : EROFS (30) "Read-only file system"
DATA #1 : File handle, PD_TYPE_SQO_FILE_HDL, 8 bytes
File Handle = 96
File System Block Size = 4096 bytes
File System Type = jfs2
File Handle Flags :
Require Sector Align = No
DIO/CIO Mode = Yes
Raw Block Device = No
Reserved Handle = No
Flush On Close = No
Thread-Level Lock = No
Write-through Mode = Yes
File Not Tracked = Yes
DATA #2 : unsigned integer, 8 bytes
2048
DATA #3 : signed integer, 8 bytes
392339456
DATA #4 : signed integer, 8 bytes
-1
DATA #5 : String, 105 bytes
Search for ossError*Analysis probe point after this log entry for further
self-diagnosis of this problem.
FUNCTION: DB2 Common, OSSe, ossErrorIOAnalysis, probe:100
CALLED : OS, -, pwrite
OSERR : EROFS (30) "Read-only file system"
DATA #1 : String, 116 bytes
A total of 3 analysis will be performed :
- User info
- Target file info
- I/O attempt
Target file handle = 96
DATA #2 : String, 184 bytes
Real user ID of current process = 213
Effective user ID of current process = 213
Real group ID of current process = 214
Effective group ID of current process = 214
DATA #3 : String, 41 bytes
current sbrk(0) value: 0x0000000128d55b40
DATA #4 : String, 263 bytes
Target File Information :
Size = 1073741824
Link = No
Reference path = N/A
Type = 0x8000
Permissions = rw-------
UID = 213
GID = 214
Last modified time = 1506505822
DATA #5 : String, 33 bytes
I/O attempt not implemented yet.
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x0900000014669E4C pdOSSeLoggingCallback + 0x59C
[1] 0x090000000D82337C oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1BC
[2] 0x090000000D8237F0 ossLogSysRC + 0x70
[3] 0x090000000D84BCBC ossErrorIOAnalysis__FCPC21OSSErrorAnalysisParam + 0xD7C
[4] 0x09000000164C3A84 sqloSystemErrorHandler + 0x518
[5] 0x09000000144CDCE4 sqloseekwrite64 + 0x68C
[6] 0x09000000144E8128 sqloWriteBlocks + 0x364
[7] 0x09000000144E7D34 sqlbWriteBlocks__FP16SqlbOpenFileInfoPvlUlUiPUlP12SQLB_GLOBALS + 0x58
[8] 0x0900000016B9759C sqlbDMSMapAndWrite__FP16SQLB_OBJECT_DESC14SQLB_AS_SCHEMEUiUsP12SQLB_POOL_CBPllPUiT8T7PcT8P12SQLB_GLOBALS + 0x254
[9] 0x0900000016B97B6C sqlbDMSMapAndWrite__FP16SQLB_OBJECT_DESC14SQLB_AS_SCHEMEUiUsP12SQLB_POOL_CBPllPUiT8T7PcT8P12SQLB_GLOBALS + 0x824
[10] 0x0900000015F1646C sqlbDMSDirectWrite__FP20SQLB_DIRECT_WRITE_CB + 0x4574
[11] 0x0900000017A9D684 sqlbDirectWrite__FP20SQLB_DIRECT_WRITE_CB + 0x2DC
[12] 0x0900000016C1A0B8 sqldx_diskwrite__FP9SQLDX_LWAUiiPcT2 + 0x564
[13] 0x0900000016C1B0C8 sqldxWriteLobData__FP9SQLDX_LWAUcUsPC12SQLDX_IOPARMPc + 0xA74
[14] 0x0900000012F98C5C sqldxReplaceLob__FP9SQLDX_LWAUcUsP8SQLDX_LDUiT5PcT5 + 0x234
[15] 0x0900000018ACA938 sqldxCreateLob__FP9SQLDX_LWAUcUsiT4PcT4P8SQLDX_LD + 0x388
[16] 0x0900000017C36858 sqldxLobCreate__FP8sqeAgentP8SQLD_TCBUcUsiT5PcT5P8SQLDX_LD + 0x6DC
[17] 0x0900000012E8C25C sqldmadx__FP8sqeAgentP8SQLD_TCBUcUsP9sqllf_csoiT6P8SQLD_LFDPUl + 0x858
[18] 0x090000001398BE78 @114@sqldFormatLobLongCol__FP8sqeAgentP10sqlz_valueT2P8SQLD_TCBiPiT2 + 0x900
[19] 0x0900000014E98E9C @114@sqldFormatNonDMSCols__FP8sqeAgentP13SQLD_TDATARECT2PP10SQLD_VALUEP8SQLD_TCBPsPiT7P8SQLD_CCBP17SQLU_FORMATTER_CBP8SQLD_VAL + 0xA54
[20] 0x0900000014E997A4 sqldFullFormatFixedVar__FP8sqeAgentP13SQLD_TDATARECP15SQLD_TDATAREC32T2PP15SQLD_TDATAREC32PP10SQLD_VALUEiP8SQLD_TCBPiPUcP15SQLD + 0x52C
[21] 0x0900000014579830 sqldRowInsert__FP8sqeAgentUsT2UcUliPP10SQLD_VALUEP8SQLZ_RIDPPv + 0x3B00
[22] 0x09000000145915C0 sqldRowInsert__FP8sqeAgentUsT2UcUliPP10SQLD_VALUEP8SQLZ_RIDPPv + 0x1A0
[23] 0x0900000014590F60 sqlrinsr__FP8sqlrr_cbUsT2iT2PP10SQLD_VALUEQ3_10sqlri_iudo11t_iudoFlags17t_iudoFlagsKernelP8SQLZ_RIDPPv + 0xD0
[24] 0x09000000145911B8 sqlriisr__FP8sqlrr_cb + 0x17C
[25] 0x09000000147C089C sqlriSectInvoke__FP8sqlrr_cbP12sqlri_opparm + 0x74
[26] 0x0900000014A7E5CC sqlrr_execute__FP14db2UCinterfaceP9UCstpInfo + 0xF9C
[27] 0x0900000014A7DA24 sqlrr_execute__FP14db2UCinterfaceP9UCstpInfo + 0x3F4
[28] 0x0900000014B97730 sqljsParseRdbAccessed__FP13sqljsDrdaAsCbP13sqljDDMObjectP14db2UCinterface + 0x1834
[29] 0x0900000014B92D94 @72@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x11B4
[30] 0x0900000014B92D94 @72@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x11B4
[31] 0x0900000014B93270 @72@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x1690
[32] 0x09000000130A2608 @72@sqljsDriveRequests__FP8sqeAgentP14db2UCconHandle + 0xA8
[33] 0x09000000130A3118 @72@sqljsDrdaAsInnerDriver__FP18SQLCC_INITSTRUCT_Tb + 0x5F8
[34] 0x09000000152CFB30 RunEDU__8sqeAgentFv + 0x4C784
[35] 0x09000000152400FC RunEDU__8sqeAgentFv + 0x124
[36] 0x0900000014153EA8 EDUDriver__9sqzEDUObjFv + 0x130
[37] 0x0900000015260034 sqloEDUEntry + 0x390
[38] 0x090000000051AE10 _pthread_body + 0xF0
[39] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF
在db2diag.log裡面發現了一些error資訊,db2 instance 發生了crash,並且自動重啟了,這造成了dprop的停止,crash的時候在檔案db2diag.log的目錄下生成了大量的db2 FODC檔案,造成檔案系統滿。
那db2 instance crash根本的原因是什麼那?
從下面的error log裡面,不難猜測是來自作業系統檔案訪問造成的。
從DMS這個關鍵字眼可以看出,是向一個DMS的表空間寫資料
從LOB這個關鍵字眼可以看出,是向放LOB資料的表空間寫資料
這些資訊還不足以找到這個檔案,那麼看下面的這個關於檔案大小的資訊
Target File Information :
Size = 1073741824
對資料庫的儲存目錄進行查詢
find "db path for DMS tablespace" -type f -print | xargs ls -l | sort -k5,5rn | head -30
終於找到了這個檔案
接著進行分析,根據這個檔案所在的檔案系統,找pv的相關資訊,並且結合作業系統級別的errpt輸出,也發現了同樣的問題。
現在很明顯了,問題的根本原因是和儲存有關係,接著把支援SAN服務的人拉到chat裡面,確認他們那邊確實發現了問題,他們已經修復了這個問題。
一些想法: DB2要是能在db2diag.log裡面直接顯示發生錯誤的檔案就好了,就不用多方面查詢,猜測了,這個猜測還是需要DBA來做的的,AIX的人無法根據db2diag.log裡面的錯誤資訊判斷哪個檔案出現了問題。
另外一點就是,一般發生錯誤的時候,在errpt的輸出裡面都會有相關的資訊的,所以多個方面一起確認是穩妥的方法。
MESSAGE : ZRC=0x840F0001=-2079391743=SQLO_ACCD "Access Denied"
DIA8701C Access denied for resource "", operating system return code
was "".
CALLED : OS, -, pwrite
OSERR : EROFS (30) "Read-only file system"
DATA #1 : File handle, PD_TYPE_SQO_FILE_HDL, 8 bytes
File Handle = 96
File System Block Size = 4096 bytes
File System Type = jfs2
File Handle Flags :
Require Sector Align = No
DIO/CIO Mode = Yes
Raw Block Device = No
Reserved Handle = No
Flush On Close = No
Thread-Level Lock = No
Write-through Mode = Yes
File Not Tracked = Yes
DATA #2 : unsigned integer, 8 bytes
2048
DATA #3 : signed integer, 8 bytes
392339456
DATA #4 : signed integer, 8 bytes
-1
DATA #5 : String, 105 bytes
Search for ossError*Analysis probe point after this log entry for further
self-diagnosis of this problem.
FUNCTION: DB2 Common, OSSe, ossErrorIOAnalysis, probe:100
CALLED : OS, -, pwrite
OSERR : EROFS (30) "Read-only file system"
DATA #1 : String, 116 bytes
A total of 3 analysis will be performed :
- User info
- Target file info
- I/O attempt
Target file handle = 96
DATA #2 : String, 184 bytes
Real user ID of current process = 213
Effective user ID of current process = 213
Real group ID of current process = 214
Effective group ID of current process = 214
DATA #3 : String, 41 bytes
current sbrk(0) value: 0x0000000128d55b40
DATA #4 : String, 263 bytes
Target File Information :
Size = 1073741824
Link = No
Reference path = N/A
Type = 0x8000
Permissions = rw-------
UID = 213
GID = 214
Last modified time = 1506505822
DATA #5 : String, 33 bytes
I/O attempt not implemented yet.
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x0900000014669E4C pdOSSeLoggingCallback + 0x59C
[1] 0x090000000D82337C oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1BC
[2] 0x090000000D8237F0 ossLogSysRC + 0x70
[3] 0x090000000D84BCBC ossErrorIOAnalysis__FCPC21OSSErrorAnalysisParam + 0xD7C
[4] 0x09000000164C3A84 sqloSystemErrorHandler + 0x518
[5] 0x09000000144CDCE4 sqloseekwrite64 + 0x68C
[6] 0x09000000144E8128 sqloWriteBlocks + 0x364
[7] 0x09000000144E7D34 sqlbWriteBlocks__FP16SqlbOpenFileInfoPvlUlUiPUlP12SQLB_GLOBALS + 0x58
[8] 0x0900000016B9759C sqlbDMSMapAndWrite__FP16SQLB_OBJECT_DESC14SQLB_AS_SCHEMEUiUsP12SQLB_POOL_CBPllPUiT8T7PcT8P12SQLB_GLOBALS + 0x254
[9] 0x0900000016B97B6C sqlbDMSMapAndWrite__FP16SQLB_OBJECT_DESC14SQLB_AS_SCHEMEUiUsP12SQLB_POOL_CBPllPUiT8T7PcT8P12SQLB_GLOBALS + 0x824
[10] 0x0900000015F1646C sqlbDMSDirectWrite__FP20SQLB_DIRECT_WRITE_CB + 0x4574
[11] 0x0900000017A9D684 sqlbDirectWrite__FP20SQLB_DIRECT_WRITE_CB + 0x2DC
[12] 0x0900000016C1A0B8 sqldx_diskwrite__FP9SQLDX_LWAUiiPcT2 + 0x564
[13] 0x0900000016C1B0C8 sqldxWriteLobData__FP9SQLDX_LWAUcUsPC12SQLDX_IOPARMPc + 0xA74
[14] 0x0900000012F98C5C sqldxReplaceLob__FP9SQLDX_LWAUcUsP8SQLDX_LDUiT5PcT5 + 0x234
[15] 0x0900000018ACA938 sqldxCreateLob__FP9SQLDX_LWAUcUsiT4PcT4P8SQLDX_LD + 0x388
[16] 0x0900000017C36858 sqldxLobCreate__FP8sqeAgentP8SQLD_TCBUcUsiT5PcT5P8SQLDX_LD + 0x6DC
[17] 0x0900000012E8C25C sqldmadx__FP8sqeAgentP8SQLD_TCBUcUsP9sqllf_csoiT6P8SQLD_LFDPUl + 0x858
[18] 0x090000001398BE78 @114@sqldFormatLobLongCol__FP8sqeAgentP10sqlz_valueT2P8SQLD_TCBiPiT2 + 0x900
[19] 0x0900000014E98E9C @114@sqldFormatNonDMSCols__FP8sqeAgentP13SQLD_TDATARECT2PP10SQLD_VALUEP8SQLD_TCBPsPiT7P8SQLD_CCBP17SQLU_FORMATTER_CBP8SQLD_VAL + 0xA54
[20] 0x0900000014E997A4 sqldFullFormatFixedVar__FP8sqeAgentP13SQLD_TDATARECP15SQLD_TDATAREC32T2PP15SQLD_TDATAREC32PP10SQLD_VALUEiP8SQLD_TCBPiPUcP15SQLD + 0x52C
[21] 0x0900000014579830 sqldRowInsert__FP8sqeAgentUsT2UcUliPP10SQLD_VALUEP8SQLZ_RIDPPv + 0x3B00
[22] 0x09000000145915C0 sqldRowInsert__FP8sqeAgentUsT2UcUliPP10SQLD_VALUEP8SQLZ_RIDPPv + 0x1A0
[23] 0x0900000014590F60 sqlrinsr__FP8sqlrr_cbUsT2iT2PP10SQLD_VALUEQ3_10sqlri_iudo11t_iudoFlags17t_iudoFlagsKernelP8SQLZ_RIDPPv + 0xD0
[24] 0x09000000145911B8 sqlriisr__FP8sqlrr_cb + 0x17C
[25] 0x09000000147C089C sqlriSectInvoke__FP8sqlrr_cbP12sqlri_opparm + 0x74
[26] 0x0900000014A7E5CC sqlrr_execute__FP14db2UCinterfaceP9UCstpInfo + 0xF9C
[27] 0x0900000014A7DA24 sqlrr_execute__FP14db2UCinterfaceP9UCstpInfo + 0x3F4
[28] 0x0900000014B97730 sqljsParseRdbAccessed__FP13sqljsDrdaAsCbP13sqljDDMObjectP14db2UCinterface + 0x1834
[29] 0x0900000014B92D94 @72@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x11B4
[30] 0x0900000014B92D94 @72@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x11B4
[31] 0x0900000014B93270 @72@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x1690
[32] 0x09000000130A2608 @72@sqljsDriveRequests__FP8sqeAgentP14db2UCconHandle + 0xA8
[33] 0x09000000130A3118 @72@sqljsDrdaAsInnerDriver__FP18SQLCC_INITSTRUCT_Tb + 0x5F8
[34] 0x09000000152CFB30 RunEDU__8sqeAgentFv + 0x4C784
[35] 0x09000000152400FC RunEDU__8sqeAgentFv + 0x124
[36] 0x0900000014153EA8 EDUDriver__9sqzEDUObjFv + 0x130
[37] 0x0900000015260034 sqloEDUEntry + 0x390
[38] 0x090000000051AE10 _pthread_body + 0xF0
[39] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/725820/viewspace-2145563/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- DB2 CRASHDB2
- 記一次全民K歌的crash定位過程
- mysql效能測試庫的一次CRASH恢復MySql
- DB2效能調優DB2
- 請問面試官?一次很有意思的調查活動面試
- 記一次記憶體佔用問題的調查過程記憶體
- 記一次Linux核心崩潰:kdump,crash,vmcoreLinux
- 一次DB2 DR演練DB2
- Swrve:調查顯示19%的遊戲玩家只會開啟一次遊戲遊戲
- db2查殺死鎖DB2
- 一次上線遇到的DB2 dprop問題DB2
- JVM調整:應對jvm crash 【轉自螞蟻小窩】JVM
- DB2的4種查詢許可權DB2
- 一次 JVM 調優的筆記JVM筆記
- 使用表查詢監控DB2的死鎖DB2
- DB2 Event Monitor使用與查詢DB2
- DB2查詢資料庫大小DB2資料庫
- 春節一次較波折的MySQL調優MySql
- 一次 rman 備份策略的調整
- Oracle鎖調查的指令碼Oracle指令碼
- IT職業生涯管理的調查
- 關於記一次 Go 服務記憶體洩漏問題調查Go記憶體
- 渠道查:解析問卷調查的秘密武器
- 檢查DB2 include目錄下的.h檔案DB2
- 記一次我的 MySQL 調優經歷MySql
- 一次快速排序引發的jvm調優排序JVM
- python效能調優的一次記錄Python
- postgres crash recovery
- SAP 錯誤日誌的調查
- 敏捷應用的現狀調查敏捷
- FlexJobs:遠端工作調查Flex
- qtick 題客調查QT
- 一次分頁查詢的優化優化
- iOS Crash的捕獲知識iOS
- 記一次SQL調優過程SQL
- SQL Server一次SQL調優案例SQLServer
- 一次rman備份 策略調整
- mysql效能的檢查和調優方法MySql