一次盤陣down掉導致的oracle rac失敗總結(原)
環境:ORACLE10g RAC + ASM +AIX
節點:192.168.5.15, 192.168.5.16
前天同事說資料庫不能啟動了,讓我去檢視下,我用crs_stat 發現 db02(16)機器online,db01主節點offline了。然後我用crs_stop –all關閉,然後又crs_start -all重啟了下,出現說沒有資源resource沒有或者failed的資訊,這個資訊原來我沒有見過,資訊上還顯示是vip失敗,我檢視了下ip,發現db01點的vip沒有了。Db02節點還正常。於是就用aix的命令(smitty mkinetvi)配置了虛擬ip,又進行了關閉和重啟crs,發現還是原來的問題。….後來找了1個多小時,最後lspv的時候,發現原來的pv沒有了,少了4個pv,暈!!!!分割槽不見了。這怎麼能啟動資料庫?然後跑到機房,看看是不是光纖卡,或者光線被誰給碰掉了, 結果正常。用IBM400的盤櫃軟體連上盤櫃,檢視盤櫃資訊,出現了警告燈,說什麼“邏輯路徑”錯誤,看來確實是盤櫃的問題。聯絡儲存廠商,後來來了工程師,檢查了下,並搞定了。怎麼搞定的,他也沒有說什麼,就是把光纖交換機重啟了下,光纖卡又插了插,就搞定了。不知道怎麼回事。
今天,同事給我說分割槽有了,我用lspv看了下,呵呵~ 分割槽都回來了。從2到9都是裸裝置,沒有pvid.
# lspv hdisk0 00cc85bf3d2db424 rootvg active hdisk1 00cc85bf404044eb rootvg active hdisk2 none None hdisk3 none None hdisk4 none None hdisk5 none None hdisk6 none None hdisk7 none None hdisk8 none None hdisk9 none None hdisk10 none None hdisk11 none None hdisk12 00cc85bf8266c2a8 datavg active # |
然後,crs_start –all啟動服務,出現如下錯誤:
ash-3.00$ crs_start -all Attempting to start `ora.db01.vip` on member `db01` Attempting to start `ora.db02.vip` on member `db02` Start of `ora.db02.vip` on member `db02` succeeded. Attempting to start `ora.db02.ASM2.asm` on member `db02` Start of `ora.db01.vip` on member `db01` failed. Attempting to start `ora.db01.vip` on member `db02` Start of `ora.db01.vip` on member `db02` succeeded. db02 : CRS-1019: Resource ora.db01.ASM1.asm (application) cannot run on db02 db02 : CRS-1019: Resource ora.db01.ASM1.asm (application) cannot run on db02 db02 : CRS-1019: Resource ora.db01.LISTENER_DB01.lsnr (application) cannot run n db02 db02 : CRS-1019: Resource ora.db01.ASM1.asm (application) cannot run on db02 Start of `ora.db02.ASM2.asm` on member `db02` succeeded. Attempting to start `ora.GASDB.GASDB2.inst` on member `db02` Start of `ora.GASDB.GASDB2.inst` on member `db02` succeeded. Attempting to start `ora.db02.LISTENER_DB02.lsnr` on member `db02` Start of `ora.db02.LISTENER_DB02.lsnr` on member `db02` succeeded. Attempting to start `ora.racdb.racdb2.inst` on member `db02` Start of `ora.racdb.racdb2.inst` on member `db02` succeeded. CRS-1002: Resource 'ora.db02.ons' is already running on member 'db02' CRS-1002: Resource 'ora.GASDB.db' is already running on member 'db01' Attempting to start `ora.db01.gsd` on member `db01` Attempting to start `ora.db01.ons` on member `db01` Attempting to start `ora.db02.gsd` on member `db02` Attempting to start `ora.racdb.db` on member `db01` Start of `ora.racdb.db` on member `db01` succeeded. Start of `ora.db01.gsd` on member `db01` succeeded. Start of `ora.db02.gsd` on member `db02` succeeded. Start of `ora.db01.ons` on member `db01` succeeded. CRS-0223: Resource 'ora.GASDB.GASDB1.inst' has placement error. CRS-0223: Resource 'ora.GASDB.db' has placement error. CRS-0223: Resource 'ora.db01.ASM1.asm' has placement error. CRS-0223: Resource 'ora.db01.LISTENER_DB01.lsnr' has placement error. CRS-0223: Resource 'ora.db02.ons' has placement error. CRS-0223: Resource 'ora.racdb.racdb1.inst' has placement error.
|
bash-3.00$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora....B1.inst application OFFLINE OFFLINE ora....B2.inst application ONLINE ONLINE db02 ora.GASDB.db application ONLINE ONLINE db01 ora....SM1.asm application OFFLINE OFFLINE ora....01.lsnr application OFFLINE OFFLINE ora.db01.gsd application ONLINE ONLINE db01 ora.db01.ons application ONLINE ONLINE db01 ora.db01.vip application ONLINE ONLINE db02 ora....SM2.asm application ONLINE ONLINE db02 ora....02.lsnr application ONLINE ONLINE db02 ora.db02.gsd application ONLINE ONLINE db02 ora.db02.ons application ONLINE ONLINE db02 ora.db02.vip application ONLINE ONLINE db02 ora.racdb.db application ONLINE ONLINE db01 ora....b1.inst application OFFLINE OFFLINE ora....b2.inst application ONLINE ONLINE db02 |
看來還是VIP錯誤。是不是我虛擬IP配錯了。Db02節點的vip沒有問題,看下db02的ip吧,一看之下,果然配錯了。
--15 bash-3.00# ifconfig -a en0: flags=5e080863,c0 ,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN> inet 192.168.5.15 netmask 0xffffff00 broadcast 192.168.5.255 tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0 en1: flags=5e080863,c0 ,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN> inet 10.168.5.15 netmask 0xff000000 broadcast 10.255.255.255 tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0 vi0: flags=84000041 inet 192.168.5.17 netmask 0xffffff00 lo0: flags=e08084b > inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1/0 tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1 |
--16 bash-3.00# ifconfig -a en0: flags=5e080863,c0 ,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN> inet 192.168.5.16 netmask 0xffffff00 broadcast 192.168.5.255 inet 192.168.5.18 netmask 0xffffff00 broadcast 192.168.5.255 inet 192.168.5.17 netmask 0xffffff00 broadcast 192.168.5.255 tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0 en1: flags=5e080863,c0 ,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN> inet 10.168.5.16 netmask 0xff000000 broadcast 10.255.255.255 tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0 lo0: flags=e08084b > inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1/0 tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1 |
後來又google了些解決方法。都沒有找到一個如何解決的步驟。不過我想既然還是vip的問題,就解決ip問題就ok了。
解決步驟如下:
1. ----ping db01, db02, db01_vip, db02_vip均能ping通
2. ----停止racdb資料庫服務 bash-3.00$ crs_stop ora.racdb.db Attempting to stop `ora.racdb.db` on member `db01` Stop of `ora.racdb.db` on member `db01` succeeded.
3.----用srvctl啟動db01節點,出現如下資訊 bash-3.00$ srvctl start nodeapps -n db01 db01:ora.db01.vip:IP:192.168.5.17 is not configured as alias (host=db01) db01:ora.db01.vip:IP:192.168.5.17 is not configured as alias (host=db01) CRS-0215: Could not start resource 'ora.db01.LISTENER_DB01.lsnr'.
4. ---檢查crs bash-3.00$ crsctl check crs CSS appears healthy CRS appears healthy EVM appears healthy
5. ---檢查vip crs_stat -p ora.db01.vip
6. ---關閉所有服務 #crs_stop -all
7. ---刪除db01的虛擬vi0, 新增en0的ip別名 #ifconfig vi0 192.168.5.17 delete
8. ---刪除db02的虛擬en0的17ip #ifconfig vi0 192.168.5.17 delete
9. ---2節點執行ifconfig -a 檢視ip |
--15 # ifconfig -a en0: flags=5e080863,c0 ,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN> inet 192.168.5.15 netmask 0xffffff00 broadcast 192.168.5.255 inet 192.168.5.17 netmask 0xffffff00 broadcast 192.168.5.255 tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0 en1: flags=5e080863,c0 ,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN> inet 10.168.5.15 netmask 0xff000000 broadcast 10.255.255.255 tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0 vi0: flags=84000041 lo0: flags=e08084b > inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1/0 tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1 --16 bash-3.00# ifconfig -a en0: flags=5e080863,c0 ,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN> inet 192.168.5.16 netmask 0xffffff00 broadcast 192.168.5.255 inet 192.168.5.18 netmask 0xffffff00 broadcast 192.168.5.255 tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0 en1: flags=5e080863,c0 ,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN> inet 10.168.5.16 netmask 0xff000000 broadcast 10.255.255.255 tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0 vi0: flags=84000000<64BIT> lo0: flags=e08084b > inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1/0 tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1 |
10. ---重啟服務 #crs_start –all bash-3.00$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora....B1.inst application ONLINE ONLINE db01 ora....B2.inst application ONLINE ONLINE db02 ora.GASDB.db application ONLINE ONLINE db01 ora....SM1.asm application ONLINE ONLINE db01 ora....01.lsnr application ONLINE ONLINE db01 ora.db01.gsd application ONLINE ONLINE db01 ora.db01.ons application ONLINE ONLINE db01 ora.db01.vip application ONLINE ONLINE db01 ora....SM2.asm application ONLINE ONLINE db02 ora....02.lsnr application ONLINE ONLINE db02 ora.db02.gsd application ONLINE ONLINE db02 ora.db02.ons application ONLINE ONLINE db02 ora.db02.vip application ONLINE ONLINE db02 ora.racdb.db application ONLINE ONLINE db01 ora....b1.inst application ONLINE ONLINE db01 ora....b2.inst application ONLINE ONLINE db02 |
最後解決OK!!!! 通過這次問題,其實主要要掌握RAC中的體系及概念還是很重要的,瞭解和掌握了這些,就能看到問題所在,並解決。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/3090/viewspace-672035/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- RAC oracle 許可權更改導致 實力啟動失敗Oracle
- 【RAC】因清理不完整導致RAC ASM例項建立失敗ASM
- 【面試總結】記一次失敗的 bilibili 面試總結(3)面試
- 【面試總結】記一次失敗的 bilibili 面試總結(2)面試
- 【面試總結】記一次失敗的 bilibili 面試總結(1)面試
- 一次心跳網路問題導致的節點新增失敗
- Oracle RAC啟動失敗-軟連結惹的禍Oracle
- ORACLE 分割槽索引UNUSABLE導致的DML操作失敗引起的血案Oracle索引
- CISSP第一次備考失敗總結
- 又一次stream_pool不足導致的expdp失敗的解決
- 面試失敗總結面試
- RAC環境中修改系統時間可能導致SRVCTL命令失敗
- /dev/bpf裝置缺失導致RAC安裝時HAIP啟動失敗devAI
- 什麼原因會導致raid掉陣AI
- Drone構建失敗,一次drone依賴下載超時導致構建失敗的爬坑記錄
- 故障分析 | DDL 導致的 Xtrabackup 備份失敗
- 獲取導致匯入失敗的資料
- 一個耗時的小失誤:shell限制導致Oracle介質上傳失敗Oracle
- Oracle RAC啟動失敗(DNS故障)OracleDNS
- Filestream/Windows Share導致Alwayson Failover失敗WindowsAI
- springboot衝突導致的發版失敗Spring Boot
- sock鎖檔案導致的MySQL啟動失敗MySql
- 獲取導致匯入失敗的資料(五)
- 獲取導致匯入失敗的資料(四)
- 獲取導致匯入失敗的資料(三)
- 獲取導致匯入失敗的資料(二)
- 糟糕的範圍管理導致專案失敗(轉)
- k8s-記一次安全軟體導致映象載入失敗K8S
- 解決一次gitlab因異常關機導致啟動失敗Gitlab
- 記一次失敗的 bilibili 面試總結_快取問題面試快取
- 有關修改作業系統時間導致例項down掉的一個案例作業系統
- Oracle 11g RAC表決盤和OCR盤掛載失敗引發的節點故障Oracle
- 源設定導致Docker映象構建失敗Docker
- stream pool設定過小導致impdp失敗
- selinux導致sqlplus登入失敗LinuxSQL
- 記一次Oracle RAC for aix 儲存雙控鎖盤導致ASM控制檔案損壞恢復OracleAIASM
- 迴圈引用導致的json序列化失敗JSON
- oracle rac 打PSU補丁30805461兩個問題(Java版本及空間不足導致失敗)OracleJava