一次db2 crash的調查

zchbaby2000發表於2017-09-28
昨晚收到來自production DB2 server的兩封郵件,一個是dprop停了,另外一個是放db2diag.log的檔案系統滿了,於是開始調查原因。
在db2diag.log裡面發現了一些error資訊,db2 instance 發生了crash,並且自動重啟了,這造成了dprop的停止,crash的時候在檔案db2diag.log的目錄下生成了大量的db2 FODC檔案,造成檔案系統滿。
那db2 instance crash根本的原因是什麼那?
從下面的error log裡面,不難猜測是來自作業系統檔案訪問造成的。
從DMS這個關鍵字眼可以看出,是向一個DMS的表空間寫資料
從LOB這個關鍵字眼可以看出,是向放LOB資料的表空間寫資料
這些資訊還不足以找到這個檔案,那麼看下面的這個關於檔案大小的資訊
Target File Information :
  Size               = 1073741824
對資料庫的儲存目錄進行查詢
find "db path for DMS tablespace" -type f -print | xargs ls -l | sort -k5,5rn | head -30
終於找到了這個檔案
接著進行分析,根據這個檔案所在的檔案系統,找pv的相關資訊,並且結合作業系統級別的errpt輸出,也發現了同樣的問題。
現在很明顯了,問題的根本原因是和儲存有關係,接著把支援SAN服務的人拉到chat裡面,確認他們那邊確實發現了問題,他們已經修復了這個問題。

一些想法: DB2要是能在db2diag.log裡面直接顯示發生錯誤的檔案就好了,就不用多方面查詢,猜測了,這個猜測還是需要DBA來做的的,AIX的人無法根據db2diag.log裡面的錯誤資訊判斷哪個檔案出現了問題。
另外一點就是,一般發生錯誤的時候,在errpt的輸出裡面都會有相關的資訊的,所以多個方面一起確認是穩妥的方法。

MESSAGE : ZRC=0x840F0001=-2079391743=SQLO_ACCD "Access Denied"
          DIA8701C Access denied for resource "", operating system return code
          was "".
CALLED  : OS, -, pwrite
OSERR   : EROFS (30) "Read-only file system"
DATA #1 : File handle, PD_TYPE_SQO_FILE_HDL, 8 bytes
  File Handle              = 96
  File System Block Size   = 4096 bytes
  File System Type         = jfs2
  File Handle Flags :
    Require Sector Align   = No
    DIO/CIO Mode           = Yes
    Raw Block Device       = No
    Reserved Handle        = No
    Flush On Close         = No
    Thread-Level Lock      = No
    Write-through Mode     = Yes
    File Not Tracked       = Yes
DATA #2 : unsigned integer, 8 bytes
2048
DATA #3 : signed integer, 8 bytes
392339456
DATA #4 : signed integer, 8 bytes
-1
DATA #5 : String, 105 bytes
Search for ossError*Analysis probe point after this log entry for further
self-diagnosis of this problem.


FUNCTION: DB2 Common, OSSe, ossErrorIOAnalysis, probe:100
CALLED  : OS, -, pwrite
OSERR   : EROFS (30) "Read-only file system"
DATA #1 : String, 116 bytes
A total of 3 analysis will be performed :
 - User info
 - Target file info
 - I/O attempt
                                                       
 Target file handle = 96
DATA #2 : String, 184 bytes
  Real user ID of current process       = 213
  Effective user ID of current process  = 213
  Real group ID of current process      = 214
  Effective group ID of current process = 214
DATA #3 : String, 41 bytes
current sbrk(0) value: 0x0000000128d55b40
DATA #4 : String, 263 bytes
Target File Information :
  Size               = 1073741824
  Link               = No
  Reference path     = N/A
  Type               = 0x8000
  Permissions        = rw-------
  UID                = 213
  GID                = 214
  Last modified time = 1506505822               
DATA #5 : String, 33 bytes                             
I/O attempt not implemented yet.
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
  [0] 0x0900000014669E4C pdOSSeLoggingCallback + 0x59C
  [1] 0x090000000D82337C oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1BC
  [2] 0x090000000D8237F0 ossLogSysRC + 0x70
  [3] 0x090000000D84BCBC ossErrorIOAnalysis__FCPC21OSSErrorAnalysisParam + 0xD7C
  [4] 0x09000000164C3A84 sqloSystemErrorHandler + 0x518
  [5] 0x09000000144CDCE4 sqloseekwrite64 + 0x68C
  [6] 0x09000000144E8128 sqloWriteBlocks + 0x364
  [7] 0x09000000144E7D34 sqlbWriteBlocks__FP16SqlbOpenFileInfoPvlUlUiPUlP12SQLB_GLOBALS + 0x58
  [8] 0x0900000016B9759C sqlbDMSMapAndWrite__FP16SQLB_OBJECT_DESC14SQLB_AS_SCHEMEUiUsP12SQLB_POOL_CBPllPUiT8T7PcT8P12SQLB_GLOBALS + 0x254
  [9] 0x0900000016B97B6C sqlbDMSMapAndWrite__FP16SQLB_OBJECT_DESC14SQLB_AS_SCHEMEUiUsP12SQLB_POOL_CBPllPUiT8T7PcT8P12SQLB_GLOBALS + 0x824
  [10] 0x0900000015F1646C sqlbDMSDirectWrite__FP20SQLB_DIRECT_WRITE_CB + 0x4574
  [11] 0x0900000017A9D684 sqlbDirectWrite__FP20SQLB_DIRECT_WRITE_CB + 0x2DC
  [12] 0x0900000016C1A0B8 sqldx_diskwrite__FP9SQLDX_LWAUiiPcT2 + 0x564
  [13] 0x0900000016C1B0C8 sqldxWriteLobData__FP9SQLDX_LWAUcUsPC12SQLDX_IOPARMPc + 0xA74
  [14] 0x0900000012F98C5C sqldxReplaceLob__FP9SQLDX_LWAUcUsP8SQLDX_LDUiT5PcT5 + 0x234
  [15] 0x0900000018ACA938 sqldxCreateLob__FP9SQLDX_LWAUcUsiT4PcT4P8SQLDX_LD + 0x388
  [16] 0x0900000017C36858 sqldxLobCreate__FP8sqeAgentP8SQLD_TCBUcUsiT5PcT5P8SQLDX_LD + 0x6DC
  [17] 0x0900000012E8C25C sqldmadx__FP8sqeAgentP8SQLD_TCBUcUsP9sqllf_csoiT6P8SQLD_LFDPUl + 0x858
  [18] 0x090000001398BE78 @114@sqldFormatLobLongCol__FP8sqeAgentP10sqlz_valueT2P8SQLD_TCBiPiT2 + 0x900
  [19] 0x0900000014E98E9C @114@sqldFormatNonDMSCols__FP8sqeAgentP13SQLD_TDATARECT2PP10SQLD_VALUEP8SQLD_TCBPsPiT7P8SQLD_CCBP17SQLU_FORMATTER_CBP8SQLD_VAL + 0xA54
  [20] 0x0900000014E997A4 sqldFullFormatFixedVar__FP8sqeAgentP13SQLD_TDATARECP15SQLD_TDATAREC32T2PP15SQLD_TDATAREC32PP10SQLD_VALUEiP8SQLD_TCBPiPUcP15SQLD + 0x52C
  [21] 0x0900000014579830 sqldRowInsert__FP8sqeAgentUsT2UcUliPP10SQLD_VALUEP8SQLZ_RIDPPv + 0x3B00
  [22] 0x09000000145915C0 sqldRowInsert__FP8sqeAgentUsT2UcUliPP10SQLD_VALUEP8SQLZ_RIDPPv + 0x1A0
  [23] 0x0900000014590F60 sqlrinsr__FP8sqlrr_cbUsT2iT2PP10SQLD_VALUEQ3_10sqlri_iudo11t_iudoFlags17t_iudoFlagsKernelP8SQLZ_RIDPPv + 0xD0
  [24] 0x09000000145911B8 sqlriisr__FP8sqlrr_cb + 0x17C
  [25] 0x09000000147C089C sqlriSectInvoke__FP8sqlrr_cbP12sqlri_opparm + 0x74
  [26] 0x0900000014A7E5CC sqlrr_execute__FP14db2UCinterfaceP9UCstpInfo + 0xF9C
  [27] 0x0900000014A7DA24 sqlrr_execute__FP14db2UCinterfaceP9UCstpInfo + 0x3F4
  [28] 0x0900000014B97730 sqljsParseRdbAccessed__FP13sqljsDrdaAsCbP13sqljDDMObjectP14db2UCinterface + 0x1834
  [29] 0x0900000014B92D94 @72@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x11B4
  [30] 0x0900000014B92D94 @72@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x11B4
  [31] 0x0900000014B93270 @72@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x1690
  [32] 0x09000000130A2608 @72@sqljsDriveRequests__FP8sqeAgentP14db2UCconHandle + 0xA8
  [33] 0x09000000130A3118 @72@sqljsDrdaAsInnerDriver__FP18SQLCC_INITSTRUCT_Tb + 0x5F8
  [34] 0x09000000152CFB30 RunEDU__8sqeAgentFv + 0x4C784
  [35] 0x09000000152400FC RunEDU__8sqeAgentFv + 0x124
  [36] 0x0900000014153EA8 EDUDriver__9sqzEDUObjFv + 0x130
  [37] 0x0900000015260034 sqloEDUEntry + 0x390
  [38] 0x090000000051AE10 _pthread_body + 0xF0
  [39] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/725820/viewspace-2145563/,如需轉載,請註明出處,否則將追究法律責任。

相關文章