Oracle 11gR2光釺鏈路切換crs服務發生crash

wuweilong發表於2014-11-12

Oracle 11gR2光釺鏈路切換crs服務發生crash

 

背景:

我們將Oracle 11gR2(11.2.0.4)在RedHat EnterPrise 5.8上透過RDAC完成的多路徑鏈路冗餘,在部署完成後,我們需要做多路徑鏈路冗餘測試,我們的光釺鏈路連線方式如下。我們做多路徑測試完成了如下幾個組合:

拔線測試組合一:

1、 先拔下光釺鏈路 ②和④ 一切正常沒有問題;插上五分鐘後執行第2步。

2、 再拔下光釺鏈路 ①和③ 資料庫服務正常,crs程式crash無法訪問,手工重啟crs程式即可。

拔線測試組合二:

1、 先拔下光釺鏈路 ①和③ 一切正常沒有問題;插上五分鐘後執行第2步。

2、 再拔下光釺鏈路 ②和④ 資料庫服務正常,crs程式crash無法訪問,手工重啟crs程式即可。

拔線測試組合三:

1、 先拔下光釺鏈路 ①和④ 一切正常沒有問題;插上五分鐘後執行第2步。

2、 再拔下光釺鏈路 ②和③ 一切正常沒有問題;

拔線測試組合四:

1、 先拔下光釺鏈路 ②和③ 一切正常沒有問題;插上五分鐘後執行第2步。

2、 再拔下光釺鏈路 ①和④ 一切正常沒有問題;

控制器切換測試組合:

1、 進入儲存管理控制檯,檢視當前磁碟所在控制器為A控,手動全部切換到B,一切正常沒有問題。

2、 五分鐘之後,再次進入儲存管理控制檯,將所有磁碟從B控制器切換到A控制器,一切正常沒有問題。


 

      問題現象:

      問題發生在第一組和第二組的的測試2上面,問題現象如下:

  1. [grid@db01 ~] $ crs_stat -t -v
  2. CRS-0184: Cannot communicate with the CRS daemon.

  3. [root@db01 ~]# ps -ef|grep ora
  4. oracle 2687 1 0 00:12 ? 00:00:00 ora_pmon_woo
  5. oracle 2689 1 0 00:12 ? 00:00:00 ora_psp0_woo
  6. oracle 2691 1 0 00:12 ? 00:00:00 ora_vktm_woo
  7. oracle 2695 1 0 00:12 ? 00:00:00 ora_gen0_woo
  8. oracle 2697 1 0 00:12 ? 00:00:00 ora_diag_woo
  9. oracle 2699 1 0 00:12 ? 00:00:00 ora_dbrm_woo
  10. oracle 2701 1 0 00:12 ? 00:00:00 ora_dia0_woo
  11. oracle 2703 1 0 00:12 ? 00:00:00 ora_mman_woo
  12. oracle 2705 1 0 00:12 ? 00:00:00 ora_dbw0_woo
  13. oracle 2707 1 0 00:12 ? 00:00:00 ora_lgwr_woo
  14. oracle 2709 1 0 00:12 ? 00:00:01 ora_ckpt_woo
  15. oracle 2711 1 0 00:12 ? 00:00:00 ora_smon_woo
  16. oracle 2713 1 0 00:12 ? 00:00:00 ora_reco_woo
  17. oracle 2715 1 0 00:12 ? 00:00:00 ora_mmon_woo
  18. oracle 2717 1 0 00:12 ? 00:00:00 ora_mmnl_woo
  19. oracle 2719 1 0 00:12 ? 00:00:00 ora_d000_woo
  20. oracle 2721 1 0 00:12 ? 00:00:00 ora_s000_woo
  21. oracle 2728 1 0 00:12 ? 00:00:00 ora_rvwr_woo

  22. SQL> select host_name,instance_name,status from gv$instance;

  23. HOST_NAME INSTANCE_NAME STATUS
  24. ---------- ---------------- ------------
  25. db01 woo OPEN
  26. db02        woo                 OPEN

日誌排查:

OS message

  1. Oct 30 13:48:23 db01 kernel: lpfc 0000:1b:00.0: 1:(0):0203 Devloss timeout on WWPN 20:34:00:80:e5:3f:7b:f0 NPort x0000e4 Data: x0 x7 x0
  2. Oct 30 13:48:23 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:0 Selection Retry count exhausted
  3. Oct 30 13:48:23 db01 kernel: 7 [RAIDarray.mpp]oracledb:0:0 Path Failed
  4. Oct 30 13:48:23 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:0 No new path: fall to failover controller case. vcmnd SN 74635 pdev H8:C0:T0:L0 0x00/0x00/0x00 0x00010000 mpp_status:6
  5. Oct 30 13:48:23 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:0 Failed controller to 1. retry. vcmnd SN 74635 pdev H8:C0:T0:L0 0x00/0x00/0x00 0x00010000 mpp_status:6
  6. Oct 30 13:48:23 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:0 Selection Retry count exhausted
  7. Oct 30 13:48:23 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:0 No new path: fall to failover controller case. vcmnd SN 74625 pdev H8:C0:T0:L0 0x00/0x00/0x00 0x00010000 mpp_status:6
  8. Oct 30 13:48:23 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:0 Failed controller to 1. retry. vcmnd SN 74625 pdev H8:C0:T0:L0 0x00/0x00/0x00 0x00010000 mpp_status:6
  9. Oct 30 13:48:23 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:1 Selection Retry count exhausted
  10. Oct 30 13:48:23 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:1 No new path: fall to failover controller case. vcmnd SN 74645 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  11. Oct 30 13:48:23 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:1 Failed controller to 1. retry. vcmnd SN 74645 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  12. Oct 30 13:48:23 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:1 Selection Retry count exhausted
  13. Oct 30 13:48:23 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:1 No new path: fall to failover controller case. vcmnd SN 74644 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  14. Oct 30 13:48:23 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:1 Failed controller to 1. retry. vcmnd SN 74644 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  15. Oct 30 13:48:23 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:1 Selection Retry count exhausted
  16. Oct 30 13:48:23 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:1 No new path: fall to failover controller case. vcmnd SN 74643 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  17. Oct 30 13:48:23 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:1 Failed controller to 1. retry. vcmnd SN 74643 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  18. Oct 30 13:48:23 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:1 Selection Retry count exhausted
  19. Oct 30 13:48:23 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:1 No new path: fall to failover controller case. vcmnd SN 74642 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  20. Oct 30 13:48:23 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:1 Failed controller to 1. retry. vcmnd SN 74642 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  21. Oct 30 13:48:23 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:1 Selection Retry count exhausted
  22. Oct 30 13:48:23 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:1 No new path: fall to failover controller case. vcmnd SN 74634 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  23. Oct 30 13:48:23 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:1 Failed controller to 1. retry. vcmnd SN 74634 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  24. Oct 30 13:48:23 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:1 Selection Retry count exhausted
  25. Oct 30 13:48:23 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:1 No new path: fall to failover controller case. vcmnd SN 74632 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  26. Oct 30 13:48:23 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:1 Failed controller to 1. retry. vcmnd SN 74632 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  27. Oct 30 13:48:23 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:1 Selection Retry count exhausted
  28. Oct 30 13:48:23 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:1 No new path: fall to failover controller case. vcmnd SN 74631 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  29. Oct 30 13:48:23 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:1 Failed controller to 1. retry. vcmnd SN 74631 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  30. Oct 30 13:48:23 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:1 Selection Retry count exhausted
  31. Oct 30 13:48:23 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:1 No new path: fall to failover controller case. vcmnd SN 74624 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  32. Oct 30 13:48:23 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:1 Failed controller to 1. retry. vcmnd SN 74624 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  33. Oct 30 13:48:23 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:1 Selection Retry count exhausted
  34. Oct 30 13:48:23 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:1 No new path: fall to failover controller case. vcmnd SN 74622 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  35. Oct 30 13:48:23 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:1 Failed controller to 1. retry. vcmnd SN 74622 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  36. Oct 30 13:48:23 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:1 Selection Retry count exhausted
  37. Oct 30 13:48:24 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:1 No new path: fall to failover controller case. vcmnd SN 74621 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  38. Oct 30 13:48:24 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:1 Failed controller to 1. retry. vcmnd SN 74621 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  39. Oct 30 13:48:24 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:1 Selection Retry count exhausted
  40. Oct 30 13:48:24 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:1 No new path: fall to failover controller case. vcmnd SN 74620 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  41. Oct 30 13:48:24 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:1 Failed controller to 1. retry. vcmnd SN 74620 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  42. Oct 30 13:48:24 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:1 Selection Retry count exhausted
  43. Oct 30 13:48:24 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:1 No new path: fall to failover controller case. vcmnd SN 74619 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  44. Oct 30 13:48:24 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:1 Failed controller to 1. retry. vcmnd SN 74619 pdev H8:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
  45. Oct 30 13:48:24 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:2 Selection Retry count exhausted
  46. Oct 30 13:48:24 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:2 No new path: fall to failover controller case. vcmnd SN 74641 pdev H8:C0:T0:L2 0x00/0x00/0x00 0x00010000 mpp_status:6
  47. Oct 30 13:48:24 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:2 Failed controller to 1. retry. vcmnd SN 74641 pdev H8:C0:T0:L2 0x00/0x00/0x00 0x00010000 mpp_status:6
  48. Oct 30 13:48:24 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:2 Selection Retry count exhausted
  49. Oct 30 13:48:24 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:2 No new path: fall to failover controller case. vcmnd SN 74633 pdev H8:C0:T0:L2 0x00/0x00/0x00 0x00010000 mpp_status:6
  50. Oct 30 13:48:24 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:2 Failed controller to 1. retry. vcmnd SN 74633 pdev H8:C0:T0:L2 0x00/0x00/0x00 0x00010000 mpp_status:6
  51. Oct 30 13:48:24 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:2 Selection Retry count exhausted
  52. Oct 30 13:48:24 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:2 No new path: fall to failover controller case. vcmnd SN 74623 pdev H8:C0:T0:L2 0x00/0x00/0x00 0x00010000 mpp_status:6
  53. Oct 30 13:48:24 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:2 Failed controller to 1. retry. vcmnd SN 74623 pdev H8:C0:T0:L2 0x00/0x00/0x00 0x00010000 mpp_status:6
  54. Oct 30 13:48:24 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:3 Selection Retry count exhausted
  55. Oct 30 13:48:24 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:3 No new path: fall to failover controller case. vcmnd SN 74636 pdev H8:C0:T0:L3 0x00/0x00/0x00 0x00010000 mpp_status:6
  56. Oct 30 13:48:24 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:3 Failed controller to 1. retry. vcmnd SN 74636 pdev H8:C0:T0:L3 0x00/0x00/0x00 0x00010000 mpp_status:6
  57. Oct 30 13:48:24 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:3 Selection Retry count exhausted
  58. Oct 30 13:48:24 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:3 No new path: fall to failover controller case. vcmnd SN 74626 pdev H8:C0:T0:L3 0x00/0x00/0x00 0x00010000 mpp_status:6
  59. Oct 30 13:48:24 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:3 Failed controller to 1. retry. vcmnd SN 74626 pdev H8:C0:T0:L3 0x00/0x00/0x00 0x00010000 mpp_status:6
  60. Oct 30 13:48:24 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:3 Selection Retry count exhausted
  61. Oct 30 13:48:24 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:3 No new path: fall to failover controller case. vcmnd SN 74615 pdev H8:C0:T0:L3 0x00/0x00/0x00 0x00010000 mpp_status:6
  62. Oct 30 13:48:24 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:3 Failed controller to 1. retry. vcmnd SN 74615 pdev H8:C0:T0:L3 0x00/0x00/0x00 0x00010000 mpp_status:6
  63. Oct 30 13:48:24 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:3 Selection Retry count exhausted
  64. Oct 30 13:48:24 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:3 No new path: fall to failover controller case. vcmnd SN 74610 pdev H8:C0:T0:L3 0x00/0x00/0x00 0x00010000 mpp_status:6
  65. Oct 30 13:48:24 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:3 Failed controller to 1. retry. vcmnd SN 74610 pdev H8:C0:T0:L3 0x00/0x00/0x00 0x00010000 mpp_status:6
  66. Oct 30 13:48:24 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:4 Selection Retry count exhausted
  67. Oct 30 13:48:24 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:4 No new path: fall to failover controller case. vcmnd SN 74637 pdev H8:C0:T0:L4 0x00/0x00/0x00 0x00010000 mpp_status:6
  68. Oct 30 13:48:24 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:4 Failed controller to 1. retry. vcmnd SN 74637 pdev H8:C0:T0:L4 0x00/0x00/0x00 0x00010000 mpp_status:6
  69. Oct 30 13:48:24 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:4 Selection Retry count exhausted
  70. Oct 30 13:48:24 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:4 No new path: fall to failover controller case. vcmnd SN 74630 pdev H8:C0:T0:L4 0x00/0x00/0x00 0x00010000 mpp_status:6
  71. Oct 30 13:48:24 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:4 Failed controller to 1. retry. vcmnd SN 74630 pdev H8:C0:T0:L4 0x00/0x00/0x00 0x00010000 mpp_status:6
  72. Oct 30 13:48:24 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:4 Selection Retry count exhausted
  73. Oct 30 13:48:24 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:4 No new path: fall to failover controller case. vcmnd SN 74614 pdev H8:C0:T0:L4 0x00/0x00/0x00 0x00010000 mpp_status:6
  74. Oct 30 13:48:24 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:4 Failed controller to 1. retry. vcmnd SN 74614 pdev H8:C0:T0:L4 0x00/0x00/0x00 0x00010000 mpp_status:6
  75. Oct 30 13:48:24 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:4 Selection Retry count exhausted
  76. Oct 30 13:48:24 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:4 No new path: fall to failover controller case. vcmnd SN 74609 pdev H8:C0:T0:L4 0x00/0x00/0x00 0x00010000 mpp_status:6
  77. Oct 30 13:48:24 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:4 Failed controller to 1. retry. vcmnd SN 74609 pdev H8:C0:T0:L4 0x00/0x00/0x00 0x00010000 mpp_status:6
  78. Oct 30 13:48:24 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:5 Selection Retry count exhausted
  79. Oct 30 13:48:24 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:5 No new path: fall to failover controller case. vcmnd SN 74638 pdev H8:C0:T0:L5 0x00/0x00/0x00 0x00010000 mpp_status:6
  80. Oct 30 13:48:24 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:5 Failed controller to 1. retry. vcmnd SN 74638 pdev H8:C0:T0:L5 0x00/0x00/0x00 0x00010000 mpp_status:6
  81. Oct 30 13:48:25 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:5 Selection Retry count exhausted
  82. Oct 30 13:48:25 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:5 No new path: fall to failover controller case. vcmnd SN 74629 pdev H8:C0:T0:L5 0x00/0x00/0x00 0x00010000 mpp_status:6
  83. Oct 30 13:48:25 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:5 Failed controller to 1. retry. vcmnd SN 74629 pdev H8:C0:T0:L5 0x00/0x00/0x00 0x00010000 mpp_status:6
  84. Oct 30 13:48:25 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:5 Selection Retry count exhausted
  85. Oct 30 13:48:25 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:5 No new path: fall to failover controller case. vcmnd SN 74616 pdev H8:C0:T0:L5 0x00/0x00/0x00 0x00010000 mpp_status:6
  86. Oct 30 13:48:25 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:5 Failed controller to 1. retry. vcmnd SN 74616 pdev H8:C0:T0:L5 0x00/0x00/0x00 0x00010000 mpp_status:6
  87. Oct 30 13:48:25 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:5 Selection Retry count exhausted
  88. Oct 30 13:48:25 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:5 No new path: fall to failover controller case. vcmnd SN 74613 pdev H8:C0:T0:L5 0x00/0x00/0x00 0x00010000 mpp_status:6
  89. Oct 30 13:48:25 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:5 Failed controller to 1. retry. vcmnd SN 74613 pdev H8:C0:T0:L5 0x00/0x00/0x00 0x00010000 mpp_status:6
  90. Oct 30 13:48:25 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:6 Selection Retry count exhausted
  91. Oct 30 13:48:25 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:6 No new path: fall to failover controller case. vcmnd SN 74639 pdev H8:C0:T0:L6 0x00/0x00/0x00 0x00010000 mpp_status:6
  92. Oct 30 13:48:25 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:6 Failed controller to 1. retry. vcmnd SN 74639 pdev H8:C0:T0:L6 0x00/0x00/0x00 0x00010000 mpp_status:6
  93. Oct 30 13:48:25 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:6 Selection Retry count exhausted
  94. Oct 30 13:48:25 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:6 No new path: fall to failover controller case. vcmnd SN 74628 pdev H8:C0:T0:L6 0x00/0x00/0x00 0x00010000 mpp_status:6
  95. Oct 30 13:48:25 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:6 Failed controller to 1. retry. vcmnd SN 74628 pdev H8:C0:T0:L6 0x00/0x00/0x00 0x00010000 mpp_status:6
  96. Oct 30 13:48:25 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:6 Selection Retry count exhausted
  97. Oct 30 13:48:25 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:6 No new path: fall to failover controller case. vcmnd SN 74618 pdev H8:C0:T0:L6 0x00/0x00/0x00 0x00010000 mpp_status:6
  98. Oct 30 13:48:25 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:6 Failed controller to 1. retry. vcmnd SN 74618 pdev H8:C0:T0:L6 0x00/0x00/0x00 0x00010000 mpp_status:6
  99. Oct 30 13:48:25 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:6 Selection Retry count exhausted
  100. Oct 30 13:48:25 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:6 No new path: fall to failover controller case. vcmnd SN 74611 pdev H8:C0:T0:L6 0x00/0x00/0x00 0x00010000 mpp_status:6
  101. Oct 30 13:48:25 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:6 Failed controller to 1. retry. vcmnd SN 74611 pdev H8:C0:T0:L6 0x00/0x00/0x00 0x00010000 mpp_status:6
  102. Oct 30 13:48:25 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:7 Selection Retry count exhausted
  103. Oct 30 13:48:25 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:7 No new path: fall to failover controller case. vcmnd SN 74640 pdev H8:C0:T0:L7 0x00/0x00/0x00 0x00010000 mpp_status:6
  104. Oct 30 13:48:25 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:7 Failed controller to 1. retry. vcmnd SN 74640 pdev H8:C0:T0:L7 0x00/0x00/0x00 0x00010000 mpp_status:6
  105. Oct 30 13:48:25 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:7 Selection Retry count exhausted
  106. Oct 30 13:48:25 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:7 No new path: fall to failover controller case. vcmnd SN 74627 pdev H8:C0:T0:L7 0x00/0x00/0x00 0x00010000 mpp_status:6
  107. Oct 30 13:48:25 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:7 Failed controller to 1. retry. vcmnd SN 74627 pdev H8:C0:T0:L7 0x00/0x00/0x00 0x00010000 mpp_status:6
  108. Oct 30 13:48:25 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:7 Selection Retry count exhausted
  109. Oct 30 13:48:25 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:7 No new path: fall to failover controller case. vcmnd SN 74617 pdev H8:C0:T0:L7 0x00/0x00/0x00 0x00010000 mpp_status:6
  110. Oct 30 13:48:25 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:7 Failed controller to 1. retry. vcmnd SN 74617 pdev H8:C0:T0:L7 0x00/0x00/0x00 0x00010000 mpp_status:6
  111. Oct 30 13:48:25 db01 kernel: 94 [RAIDarray.mpp]oracledb:0:0:7 Selection Retry count exhausted
  112. Oct 30 13:48:25 db01 kernel: 496 [RAIDarray.mpp]oracledb:0:0:7 No new path: fall to failover controller case. vcmnd SN 74612 pdev H8:C0:T0:L7 0x00/0x00/0x00 0x00010000 mpp_status:6
  113. Oct 30 13:48:25 db01 kernel: 497 [RAIDarray.mpp]oracledb:0:0:7 Failed controller to 1. retry. vcmnd SN 74612 pdev H8:C0:T0:L7 0x00/0x00/0x00 0x00010000 mpp_status:6
  114. Oct 30 13:48:25 db01 kernel: 10 [RAIDarray.mpp]oracledb:1 Failover command issued
  115. Oct 30 13:48:25 db01 kernel: 801 [RAIDarray.mpp]Failover succeeded to oracledb:1


ASM alert日誌資訊:


  1. Th Thu Oct 30 13:47:40 2014
    kjbdomdet send to inst 2
    detach from dom 4, sending detach message to inst 2
    WARNING: Offline for disk OCR_VOTE1 in mode 0x7f failed.
    WARNING: Offline for disk OCR_VOTE2 in mode 0x7f failed.
    WARNING: Offline for disk OCR_VOTE3 in mode 0x7f failed.
    WARNING: Offline for disk OCR_VOTE4 in mode 0x7f failed.
    WARNING: Offline for disk OCR_VOTE5 in mode 0x7f failed.
    Thu Oct 30 13:47:40 2014
    List of instances:
     1 2
    Dirty detach reconfiguration started (new ddet inc 1, cluster inc 4)
     Global Resource Directory partially frozen for dirty detach
    * dirty detach - domain 4 invalid = TRUE 
     17 GCS resources traversed, 0 cancelled
    Dirty Detach Reconfiguration complete
    Thu Oct 30 13:47:40 2014
    WARNING: dirty detached from domain 4
    NOTE: cache dismounted group 4/0x4558828B (OCR_VOT001) 
    SQL> alter diskgroup OCR_VOT001 dismount force /* ASM SERVER:1163428491 */ 
    Thu Oct 30 13:47:40 2014
    NOTE: cache deleting context for group OCR_VOT001 4/0x4558828b
    GMON dismounting group 4 at 19 for pid 28, osid 17335
    NOTE: Disk OCR_VOTE1 in mode 0x7f marked for de-assignment
    NOTE: Disk OCR_VOTE2 in mode 0x7f marked for de-assignment
    NOTE: Disk OCR_VOTE3 in mode 0x7f marked for de-assignment
    NOTE: Disk OCR_VOTE4 in mode 0x7f marked for de-assignment
    NOTE: Disk OCR_VOTE5 in mode 0x7f marked for de-assignment
    NOTE:Waiting for all pending writes to complete before de-registering: grpnum 4
    Thu Oct 30 13:47:58 2014
     Received dirty detach msg from inst 2 for dom 4
    Thu Oct 30 13:47:58 2014
    List of instances:
     1 2
    Dirty detach reconfiguration started (new ddet inc 2, cluster inc 4)
     Global Resource Directory partially frozen for dirty detach
    * dirty detach - domain 4 invalid = TRUE 
     0 GCS resources traversed, 0 cancelled
    freeing rdom 4
    Dirty Detach Reconfiguration complete
    Thu Oct 30 13:48:10 2014
    NOTE:Waiting for all pending writes to complete before de-registering: grpnum 4
    Thu Oct 30 13:48:24 2014
    ASM Health Checker found 1 new failures
    Thu Oct 30 13:48:26 2014
    Errors in file /DBSoft/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_6826.trc:
    ORA-15079: ASM file is closed
    Errors in file /DBSoft/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_6826.trc:
    ORA-15079: ASM file is closed
    Errors in file /DBSoft/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_6826.trc:
    ORA-15079: ASM file is closed
    Errors in file /DBSoft/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_6826.trc:
    ORA-15079: ASM file is closed
    Errors in file /DBSoft/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_6826.trc:
    ORA-15079: ASM file is closed
    WARNING: requested mirror side 1 of virtual extent 0 logical extent 0 offset 6754304 is not allocated; I/O request failed
    WARNING: requested mirror side 2 of virtual extent 0 logical extent 1 offset 6754304 is not allocated; I/O request failed
    WARNING: requested mirror side 3 of virtual extent 0 logical extent 2 offset 6754304 is not allocated; I/O request failed
    Errors in file /DBSoft/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_6826.trc:
    ORA-15079: ASM file is closed
    ORA-15079: ASM file is closed
    ORA-15079: ASM file is closed
    Thu Oct 30 13:48:26 2014
    SQL> alter diskgroup OCR_VOT001 check /* proxy */ 
    Thu Oct 30 13:48:40 2014
    SUCCESS: diskgroup OCR_VOT001 was dismounted
    SUCCESS: alter diskgroup OCR_VOT001 dismount force /* ASM SERVER:1163428491 */
    Thu Oct 30 13:48:40 2014
    ORA-15032: not all alterations performed
    ORA-15001: diskgroup "OCR_VOT001" does not exist or is not mounted
    ERROR: alter diskgroup OCR_VOT001 check /* proxy */
    Thu Oct 30 13:48:40 2014
    NOTE: diskgroup resource ora.OCR_VOT001.dg is offline
    SUCCESS: ASM-initiated MANDATORY DISMOUNT of group OCR_VOT001
    Thu Oct 30 13:48:46 2014
    NOTE: client exited [6814]
    Thu Oct 30 13:48:47 2014
    NOTE: [crsd.bin@db01 (TNS V1-V3) 17643] opening OCR file
    Thu Oct 30 13:48:49 2014
    NOTE: [crsd.bin@db01 (TNS V1-V3) 17656] opening OCR file
    Thu Oct 30 13:48:51 2014
    NOTE: [crsd.bin@db01 (TNS V1-V3) 17673] opening OCR file
    Thu Oct 30 13:48:53 2014
    NOTE: [crsd.bin@db01 (TNS V1-V3) 17720] opening OCR file
    Thu Oct 30 13:48:55 2014
    NOTE: [crsd.bin@db01 (TNS V1-V3) 17740] opening OCR file
    Thu Oct 30 13:49:02 2014
    NOTE: [crsd.bin@db01 (TNS V1-V3) 17760] opening OCR file
    Thu Oct 30 13:49:04 2014
    NOTE: [crsd.bin@db01 (TNS V1-V3) 17773] opening OCR file
    Thu Oct 30 13:49:06 2014
    NOTE: [crsd.bin@db01 (TNS V1-V3) 17790] opening OCR file
    Thu Oct 30 13:49:08 2014
    NOTE: [crsd.bin@db01 (TNS V1-V3) 17812] opening OCR file
    Thu Oct 30 13:49:10 2014
    NOTE: [crsd.bin@db01 (TNS V1-V3) 17826] opening OCR file
    Thu Oct 30 13:49:42 2014
    NOTE: [crsctl.bin@db01 (TNS V1-V3) 18043] opening OCR file
    Thu Oct 30 13:49:43 2014
    NOTE: [crsctl.bin@db01 (TNS V1-V3) 18087] opening OCR file
    Thu Oct 30 13:49:43 2014
    NOTE: [crsctl.bin@db01 (TNS V1-V3) 18221] opening OCR file
    Thu Oct 30 13:49:44 2014
    NOTE: [crsctl.bin@db01 (TNS V1-V3) 18274] opening OCR file
    Thu Oct 30 13:49:44 2014
    NOTE: [crsctl.bin@db01 (TNS V1-V3) 18292] opening OCR file
    Thu Oct 30 13:49:45 2014
    NOTE: [crsctl.bin@db01 (TNS V1-V3) 18416] opening OCR file
    Thu Oct 30 13:49:45 2014
    NOTE: [crsctl.bin@db01 (TNS V1-V3) 18434] opening OCR file
    Thu Oct 30 13:49:46 2014
    NOTE: [crsctl.bin@db01 (TNS V1-V3) 18536] opening OCR file
    Thu Oct 30 13:49:47 2014
    NOTE: [crsctl.bin@db01 (TNS V1-V3) 18641] opening OCR file
    Thu Oct 30 13:49:47 2014
    NOTE: [crsctl.bin@db01 (TNS V1-V3) 18659] opening OCR file
    Thu Oct 30 13:53:36 2014
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
    Thu Oct 30 13:54:43 2014
    NOTE: [crsctl.bin@db01 (TNS V1-V3) 19810] opening OCR file

CRS 日誌:

  1. 2014-10-30 13:48:26.715: [ CRSPE][1174640960]{2:1454:184} RI [ora.OCR_VOT001.dg db02 1] new target state: [OFFLINE] old value: [ONLINE]
  2. 2014-10-30 13:48:26.716: [ CRSOCR][1166235968]{2:1454:184} Multi Write Batch processing...
  3. 2014-10-30 13:48:26.716: [ CRSPE][1174640960]{2:1454:184} RI [ora.OCR_VOT001.dg db02 1] new internal state: [STOPPING] old value: [STABLE]
  4. 2014-10-30 13:48:26.716: [ CRSPE][1174640960]{2:1454:184} Sending message to a**: id = 3284
  5. 2014-10-30 13:48:26.716: [ CRSPE][1174640960]{2:1454:184} CRS-2673: Attempting to stop \

故障處理有兩種方法:

多路徑切換層面,參考如下:

FailOverQuiescenceTime:

Quiescence Timeout before Failover (Mode Select Page 2C) command. Thetime,in seconds,the array will wait for a quiescence condition to clear for an explicitfailover operation. A typical setting is 20 seconds.

 

FailedPathCheckingInterval:

This parameter defines how long (in seconds) the MPP drivershould wait before initiating a path-validation action.Default value is 60 seconds.

 

Egg:


  1. [root@db01 ~]# cat /etc/mpp.conf
  2. VirtualDiskProductId=VirtualDisk
  3. DebugLevel=0x0
  4. NotReadyWaitTime=270
  5. BusyWaitTime=270
  6. QuiescenceWaitTime=270
  7. InquiryWaitTime=60
  8. MaxLunsPerArray=256
  9. MaxPathsPerController=4
  10. ScanInterval=60
  11. InquiryInterval=1
  12. MaxArrayModules=30
  13. ErrorLevel=3
  14. SelectionTimeoutRetryCount=0
  15. UaRetryCount=10
  16. RetryCount=10
  17. SynchTimeout=170
  18. FailOverQuiescenceTime=20
  19. FailoverTimeout=120
  20. FailBackToCurrentAllowed=1
  21. ControllerIoWaitTime=300
  22. ArrayIoWaitTime=600
  23. DisableLUNRebalance=0
  24. SelectiveTransferMaxTransferAttempts=5
  25. SelectiveTransferMinIOWaitTime=3
  26. IdlePathCheckingInterval=60
  27. RecheckFailedPathWaitTime=30
  28. FailedPathCheckingInterval=60
  29. ArrayFailoverWaitTime=300
  30. PrintSenseBuffer=0
  31. ClassicModeFailover=0
  32. AVTModeFailover=0
  33. LunFailoverDelay=3
  34. LoadBalancePolicy=1
  35. ImmediateVirtLunCreate=0
  36. BusResetTimeout=150
  37. LunScanDelay=2
  38. AllowHBAsgDevs=0
  39. S2ToS3Key=471f51f35ec5426e

ASM檢測時間方面:

只需要調整ASM隱含引數 _asm_hbeatiowait的值將其調大些,我這直接調到120了,重新執行五組測試,問題沒有再現,故障解決。

(參看隱含引數值得方法參考:

 

Egg:

  1. [root@db01 ~] # su – gird
  2. [grid@db01 ~] $ sqlplus sysasm/oracle
  3. SQL*Plus: Release 11.2.0.4.0 Production on Wed Nov 12 22:15:18 2014
  4. Copyright (c) 1982, 2013, Oracle. All rights reserved.

  5. Connected to:
  6. Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
  7. With the Partitioning, OLAP, Data Mining and Real Application Testing options

  8. SQL> alter system set \










來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/20674423/viewspace-1329772/,如需轉載,請註明出處,否則將追究法律責任。

相關文章