【RAC】因系統時間設定不當,造成RAC一節點叢集資源及資料庫關閉

xysoul_雲龍發表於2016-02-19


        下午接到一個同事電話,說一體機(RAC)第二個節點資料庫連不上了,讓我幫忙看看。我便登上系統,在第一個節點檢視資訊,如下

點選(此處)摺疊或開啟

  1. [grid@pwjkdb01 ~]$ crs_stat -t
  2. Name Type Target State Host
  3. ------------------------------------------------------------
  4. ora....PWJK.dg ora....up.type ONLINE ONLINE pwjkdb01
  5. ora.DBFS_DG.dg ora....up.type ONLINE ONLINE pwjkdb01
  6. ora....ER.lsnr ora....er.type ONLINE ONLINE pwjkdb01
  7. ora....N1.lsnr ora....er.type ONLINE OFFLINE
  8. ora....PWJK.dg ora....up.type ONLINE ONLINE pwjkdb01
  9. ora.asm ora.asm.type ONLINE ONLINE pwjkdb01
  10. ora.cvu ora.cvu.type ONLINE OFFLINE
  11. ora.gsd ora.gsd.type OFFLINE OFFLINE
  12. ora....network ora....rk.type ONLINE ONLINE pwjkdb01
  13. ora.oc4j ora.oc4j.type ONLINE OFFLINE
  14. ora.ons ora.ons.type ONLINE ONLINE pwjkdb01
  15. ora.pwdata.db ora....se.type ONLINE ONLINE pwjkdb01
  16. ora....rv1.svc ora....ce.type ONLINE ONLINE pwjkdb01
  17. ora....rv2.svc ora....ce.type ONLINE ONLINE pwjkdb01
  18. ora....SM1.asm application ONLINE ONLINE pwjkdb01
  19. ora....01.lsnr application ONLINE ONLINE pwjkdb01
  20. ora....b01.gsd application OFFLINE OFFLINE
  21. ora....b01.ons application ONLINE ONLINE pwjkdb01
  22. ora....b01.vip ora....t1.type ONLINE ONLINE pwjkdb01
  23. ora....b02.vip ora....t1.type ONLINE OFFLINE
  24. ora.scan1.vip ora....ip.type ONLINE OFFLINE
登入第二個節點,檢視部分資訊


點選(此處)摺疊或開啟

  1. [root@pwjkdb02 ~]# ps -ef |grep pmon
  2. root 6679 1 0 2015 ? 00:01:42 /usr/bin/perl -w /opt/oracle.cellos/compmon/exadata_mon_hw_asr.pl -server
  3. root 63055 62491 0 16:45 pts/1 00:00:00 grep pmon

由於業務系統關係,我檢視了下系統時間、執行時間,便嘗試啟動第二個節點叢集資源


點選(此處)摺疊或開啟

  1. [root@pwjkdb02 bin]# ./crsctl start crs
  2. CRS-4640: Oracle High Availability Services is already active
  3. CRS-4000: Command Start failed, or completed with errors.
  4. [root@pwjkdb02 bin]# ./crsctl start cluster
  5. CRS-2672: Attempting to start 'ora.asm' on 'pwjkdb02'
  6. CRS-2676: Start of 'ora.asm' on 'pwjkdb02' succeeded
  7. CRS-2672: Attempting to start 'ora.crsd' on 'pwjkdb02'
  8. CRS-2676: Start of 'ora.crsd' on 'pwjkdb02' succeeded
  9. [root@pwjkdb02 bin]#

節點二啟動正常,如下

點選(此處)摺疊或開啟

  1. [grid@pwjkdb02 ~]$ crs_stat -t
  2. Name Type Target State Host
  3. ------------------------------------------------------------
  4. ora....PWJK.dg ora....up.type ONLINE ONLINE pwjkdb01
  5. ora.DBFS_DG.dg ora....up.type ONLINE ONLINE pwjkdb01
  6. ora....ER.lsnr ora....er.type ONLINE ONLINE pwjkdb01
  7. ora....N1.lsnr ora....er.type ONLINE ONLINE pwjkdb02
  8. ora....PWJK.dg ora....up.type ONLINE ONLINE pwjkdb01
  9. ora.asm ora.asm.type ONLINE ONLINE pwjkdb01
  10. ora.cvu ora.cvu.type ONLINE ONLINE pwjkdb02
  11. ora.gsd ora.gsd.type OFFLINE OFFLINE
  12. ora....network ora....rk.type ONLINE ONLINE pwjkdb01
  13. ora.oc4j ora.oc4j.type ONLINE ONLINE pwjkdb02
  14. ora.ons ora.ons.type ONLINE ONLINE pwjkdb01
  15. ora.pwdata.db ora....se.type ONLINE ONLINE pwjkdb01
  16. ora....rv1.svc ora....ce.type ONLINE ONLINE pwjkdb01
  17. ora....rv2.svc ora....ce.type ONLINE ONLINE pwjkdb01
  18. ora....SM1.asm application ONLINE ONLINE pwjkdb01
  19. ora....01.lsnr application ONLINE ONLINE pwjkdb01
  20. ora....b01.gsd application OFFLINE OFFLINE
  21. ora....b01.ons application ONLINE ONLINE pwjkdb01
  22. ora....b01.vip ora....t1.type ONLINE ONLINE pwjkdb01
  23. ora....SM2.asm application ONLINE ONLINE pwjkdb02
  24. ora....02.lsnr application ONLINE ONLINE pwjkdb02
  25. ora....b02.gsd application OFFLINE OFFLINE
  26. ora....b02.ons application ONLINE ONLINE pwjkdb02
  27. ora....b02.vip ora....t1.type ONLINE ONLINE pwjkdb02
  28. ora.scan1.vip ora....ip.type ONLINE ONLINE pwjkdb02

啟動後,檢視部分日誌

資料庫日誌:

點選(此處)摺疊或開啟

  1. tail -100f alertpwjkdb02.log

  2. 4016-01-02 16:26:00.736
  3. [/u01/app/11.2.0.3/grid/bin/oraagent.bin(48051)]CRS-5011:Check of resource "pwdata" failed: details at "(:CLSN00007:)" in "/u01/app/11.2.0.3/grid/log/pwjkdb02/agent/crsd/oraagent_oracle/oraagent_oracle.log"
  4. 4016-01-02 16:26:01.262
  5. [crsd(9981)]CRS-2765:Resource 'ora.pwdata.db' has failed on server 'pwjkdb02'.
  6. 4016-01-02 16:26:01.329
  7. [crsd(9981)]CRS-2765:Resource 'ora.pwdata.pwdatasrv2.svc' has failed on server 'pwjkdb02'.
  8. 4016-01-02 16:26:01.329
  9. [crsd(9981)]CRS-2771:Maximum restart attempts reached for resource 'ora.pwdata.pwdatasrv2.svc'; will not restart.
  10. 4016-01-02 16:26:01.510
  11. [/u01/app/11.2.0.3/grid/bin/oraagent.bin(8988)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11.2.0.3/grid/log/pwjkdb02/agent/ohasd/oraagent_grid/oraagent_grid.log"
  12. 4016-01-02 16:26:01.614
  13. [ohasd(6722)]CRS-2765:Resource 'ora.asm' has failed on server 'pwjkdb02'.
  14. 4016-01-02 16:26:01.663
  15. [/u01/app/11.2.0.3/grid/bin/oraagent.bin(8988)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11.2.0.3/grid/log/pwjkdb02/agent/ohasd/oraagent_grid/oraagent_grid.log"
  16. ……………………………………
ASM例項警告日誌,如下所示,報錯為系統當前系統時間問題


點選(此處)摺疊或開啟

  1. tail -600f alert_+ASM2.log |more

  2. Warning: VKTM detected a time drift.
    Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
    Tue Jan 12 00:39:25 2016
    Warning: VKTM detected a time drift.
    Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
    Sat Jan 02 16:26:00 4016
    Warning: VKTM detected a time drift.
    Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
    Sat Jan 02 16:26:00 4016
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_pmon_9913.trc:
    ORA-01513: invalid current time returned by operating system
    PMON (ospid: 9913): terminating the instance due to error 1513
    Sat Jan 02 16:26:01 4016
    System state dump requested by (instance=2, osid=9913 (PMON)), summary=[abnormal instance termination].
    System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_diag_9923.trc
    Dumping diagnostic data in directory=[cdmp_19740622152201], requested by (instance=2, osid=9913 (PMON)), summary=[abnormal instance 
    termination].
    Sat Jan 02 16:26:01 4016
    ORA-1092 : opitsk aborting process
    Sat Jan 02 16:26:01 4016
    License high water mark = 24
    Instance terminated by PMON, pid = 9913
    USER (ospid: 49268): terminating the instance


檢視該節點作業系統操作歷史記錄,

點選(此處)摺疊或開啟

  1. vi .bash_profile

  2. su - oracle
  3. #1454306838
  4. sar 1 5
  5. #1454315175
  6. date
  7. #1454315199
  8. date 010216264016.00
  9. #64565627161
  10. date
  11. #64565627177
  12. date 0102162716.00
  13. #1451723220
  14. date
  15. #1451723223
  16. date
  17. #1451723225
  18. date
  19. #1451723227
  20. date
  21. #1451723233
  22. date
  23. #1451723250
  24. date
  25. #1451723302
  26. date
  27. #1451723310
  28. date
  29. #1451723315
  30. date
  31. #1451723323
  32. su - oracle
  33. #1451723446
  34. date
  35. #1451723534
  36. date 0201163016.00
  37. #1454315401
  38. date
  39. #1454315505
  40. date 0201163416.00
  41. #1454315642
  42. date
  43. #1454315647
  44. date
  45. #1454315648

透過以上我們可以找到一條記錄為:date 010216264016.00,再透過警告日誌及檢視其它叢集日誌,可以確認,由於更改作業系統時間造成RAC節點二叢集關閉,經過電話溝通,該同事發現系統時間慢5分鐘,直接在作業系統上更改(請注意,更改作業系統時間需謹慎,尤其資料庫系統執行狀態,以免影響業務應用),由於命令不熟,將時間改為4016年,帶來以上問題。

任何操作都有風險性,在做操作時,我們應該提前做好規劃、操作方案以及應急預案及風險性評估,切不可
想當然對線上系統做任何更改。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29487349/viewspace-1991263/,如需轉載,請註明出處,否則將追究法律責任。

相關文章