rac節點無法啟動ORA-29702的問題及分析
今天在虛擬機器上啟動rac,發現有一個節點怎麼都起不了。另外一個節點沒問題。
SQL> startup nomount
ORA-29702: error occurred in Cluster Group Service operation
嘗試使用crs_stat檢視crs的元件狀態,也報錯了。
-bash-4.1$ crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
檢視alert日誌,發現在最後是因為29702的錯誤導致的。
SMON started with pid=20, OS id=12344
Sun May 11 04:10:28 2014
RECO started with pid=21, OS id=12346
Sun May 11 04:10:28 2014
MMON started with pid=22, OS id=12348
Sun May 11 04:10:28 2014
MMNL started with pid=23, OS id=12350
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
starting up 1 shared server(s) ...
USER (ospid: 12242): terminating the instance due to error 29702
Instance terminated by USER, pid = 12242
對於這個錯誤,oracle給出的解釋如下。
-bash-4.1$ oerr ora 29702
29702, 00000, "error occurred in Cluster Group Service operation"
// *Cause: An unexpected error occurred while performing a CGS operation.
// *Action: Verify that the LMON process is still active.
// Check the Oracle LMON trace files for errors.
// Also, check the related CSS trace file for errors.
檢視lmon的日誌如下:
Trace file /u04/app/11.2.0/db/diag/rdbms/racdb/RACDB1/trace/RACDB1_lmon_12324.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u04/app/11.2.0/db/product/11.2.0/dbhome_1
System name: Linux
Node name: rac1
Release: 2.6.32-71.el6.x86_64
Version: #1 SMP Wed Sep 1 01:33:01 EDT 2010
Machine: x86_64
VM name: VMWare Version: 6
Instance name: RACDB1
Redo thread mounted by this instance: 0
Oracle process number: 11
Unix process pid: 12324, image: oracle@rac1 (LMON)
*** 2014-05-11 04:10:27.777
*** SESSION ID:(130.1) 2014-05-11 04:10:27.777
*** CLIENT ID:() 2014-05-11 04:10:27.777
*** SERVICE NAME:() 2014-05-11 04:10:27.777
*** MODULE NAME:() 2014-05-11 04:10:27.777
*** ACTION NAME:() 2014-05-11 04:10:27.777
GES resources 5720 pool 3
GES enqueues 8361
GES IPC: Receivers 2 Senders 2
GES IPC: Buffers Receive 1000 Send (i:1030 b:471) Reserve 301
GES IPC: Msg Size Regular 1176 Batch 8376
Batching factor: enqueue replay 206, ack 229
Batching factor: cache replay 128 size per lock 64
*** 2014-05-11 04:10:28.644
kjxggin: CGS tickets = 1000
kgxgncin: CLSS init failed with status 3
kgxgncin: return status 3 (1311719766 SKGXN not av) from CLSS
kjxgmin: kgxgncin fails - (2)
kjxggin: generic group layer init fails
*** 2014-05-11 04:10:28.655
Global Enqueue Service Shutdown
對於該節點,使用crs_stat,crsctl的操作都無濟於事。
-bash-4.1$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
-bash-4.1$ crs_start -all
CRS-0184: Cannot communicate with the CRS daemon.
檢視程式,確實都起來了。
-bash-4.1$ ps -ef|grep d.bin
root 2103 1 0 May10 ? 00:00:51 /u04/app/11.2.0/grid/bin/ohasd.bin reboot
grid 2297 1 0 May10 ? 00:00:32 /u04/app/11.2.0/grid/bin/oraagent.bin
grid 2309 1 0 May10 ? 00:00:01 /u04/app/11.2.0/grid/bin/mdnsd.bin
grid 2320 1 0 May10 ? 00:00:36 /u04/app/11.2.0/grid/bin/gpnpd.bin
root 2330 1 0 May10 ? 00:00:14 /u04/app/11.2.0/grid/bin/orarootagent.bin
grid 2333 1 0 May10 ? 00:02:39 /u04/app/11.2.0/grid/bin/gipcd.bin
root 2348 1 1 May10 ? 00:12:00 /u04/app/11.2.0/grid/bin/osysmond.bin
root 2569 1 0 May10 ? 00:03:55 /u04/app/11.2.0/grid/bin/ologgerd -M -d /u04/app/11.2.0/grid/crf/db/rac1
grid 12569 9580 0 04:25 pts/1 00:00:00 grep d.bin
使用root使用者來停掉crs。但是報了錯。
root
[root@rac1 bin]# ./crsctl disable crs
CRS-4621: Oracle High Availability Services autostart is disabled.
[root@rac1 bin]# ./crsctl stop crs
CRS-2796: The command may not proceed when Cluster Ready Services is not running
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.
再次嘗試啟動,也是報錯。
[root@rac1 bin]# ./crsctl enable crs
CRS-4622: Oracle High Availability Services autostart is enabled.
[root@rac1 bin]# ./crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.
最後看到mos上有一個workaround,可以手動Kill掉那些crs的程式。當然了,在正式環境中還是得把psu打上。
[root@rac1 bin]# ps -fea | grep ohasd.bin | grep -v grep
root 2103 1 0 May10 ? 00:00:52 /u04/app/11.2.0/grid/bin/ohasd.bin reboot
[root@rac1 bin]# ps -fea | grep gipcd.bin | grep -v grep
grid 2333 1 0 May10 ? 00:02:41 /u04/app/11.2.0/grid/bin/gipcd.bin
[root@rac1 bin]# ps -fea | grep mdnsd.bin | grep -v grep
grid 2309 1 0 May10 ? 00:00:01 /u04/app/11.2.0/grid/bin/mdnsd.bin
[root@rac1 bin]# ps -fea | grep gpnpd.bin | grep -v grep
grid 2320 1 0 May10 ? 00:00:37 /u04/app/11.2.0/grid/bin/gpnpd.bin
[root@rac1 bin]# ps -fea | grep evmd.bin | grep -v grep
[root@rac1 bin]# ps -fea | grep crsd.bin | grep -v grep
[root@rac1 bin]# kill -9 2103 2333 2309 2320
再次嘗試啟動crs
[root@rac1 bin]# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@rac1 bin]# ./crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
啟動的時候有些慢,稍等一下,直接自己來啟庫了。這次起庫就沒有問題了。
-bash-4.1$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Sun May 11 04:41:03 2014
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup nomount
ORACLE instance started.
Total System Global Area 638853120 bytes
Fixed Size 2231072 bytes
Variable Size 482346208 bytes
Database Buffers 146800640 bytes
Redo Buffers 7475200 bytes
SQL> alter database mount;
Database altered.
SQL> alter database open;
Database altered.
SQL>
檢視crs的狀態,該起的都起了。兩個節點建立了一個小表做測試,沒有問題了。那個workaround的細節可以從MOS文件 ID 1233580.1裡面檢視。
-bash-4.1$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....ER.lsnr ora....er.type ONLINE ONLINE rac1
ora....N1.lsnr ora....er.type ONLINE ONLINE rac2
ora.asm ora.asm.type OFFLINE OFFLINE
ora.cvu ora.cvu.type OFFLINE OFFLINE
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE rac1
ora.oc4j ora.oc4j.type OFFLINE OFFLINE
ora.ons ora.ons.type ONLINE ONLINE rac1
ora....SM1.asm application OFFLINE OFFLINE
ora....C1.lsnr application ONLINE ONLINE rac1
ora.rac1.gsd application OFFLINE OFFLINE
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip ora....t1.type ONLINE ONLINE rac1
ora....SM2.asm application OFFLINE OFFLINE
ora....C2.lsnr application ONLINE ONLINE rac2
ora.rac2.gsd application OFFLINE OFFLINE
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip ora....t1.type ONLINE ONLINE rac2
ora.racdb.db ora....se.type ONLINE ONLINE rac2
ora.scan1.vip ora....ip.type ONLINE ONLINE rac2
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/8494287/viewspace-1349336/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- oracle兩節點RAC,由於gipc導致某節點crs無法啟動問題分析Oracle
- RAC節點啟動失敗--ASM無法連線ASM
- rac二節點例項redo故障無法啟動修復
- rac新增節點容易遇到的問題
- RAC二節點啟動異常
- 多路徑配置問題和ACFS啟用原因導致rac二節點不能正常啟動
- [Kubernetes]node節點pod無法啟動/節點刪除網路重置
- 解決ASM無法啟動問題ASM
- RAC節點hang住, oracle bug導致了cpu過高,無法啟動叢集隔離Oracle
- rac新增節點容易遇到的問題(11g)
- 11g rac新增節點容易遇到的問題
- ORA-29702複製RAC Oracle軟體啟動單例項Oracle單例
- Oracle Haip無法啟動問題學習OracleAI
- ListView的HeaderView包含的GridView滑動隱藏後無法點選問題分析ViewHeader
- qt6 QtOpcUa無法正常啟動問題QT
- 【ASM】ASM啟動無法找到spfile問題原因ASM
- 解決vscode安裝後無法啟動的問題VSCode
- Windows Defender無法開啟的問題Windows
- VMware DHCP Service服務無法啟動問題的解決
- 關於Oath2.0Startup類無法啟動的問題
- 排查 k8s 叢集 master 節點無法正常工作的問題K8SAST
- ORACLE RAC 11.2.0.4 FOR RHEL6叢集無法啟動的處理Oracle
- Oracle日常問題-資料庫無法啟動(案例二)Oracle資料庫
- Oracle日常問題處理-資料庫無法啟動Oracle資料庫
- mysql因為事務日誌問題無法啟動MySql
- ray叢集work節點無法在不同wifi遠端連線的問題WiFi
- Oracle 19c rac安裝,只能啟動一個節點的ASMOracleASM
- 3節點RAC資料庫夯故障分析資料庫
- 11.2.0.4 RAC CSSD服務無法啟動故障 unable to set priority to 4CSS
- Oracle 12c RAC CSSD程式無法啟動real time模式OracleCSS模式
- 寶塔皮膚mysql無法啟動問題如何解決MySql
- 記一次 Ubuntu 服務 Nginx 無法啟動問題UbuntuNginx
- 華納雲:如何解決hadoop叢集無法啟動的問題?Hadoop
- 【RAC】Oracle19.13之後的grid,節點重啟後不會自動驅動Oracle
- springboot中靜態頁面無法訪問及return無法重定向問題Spring Boot
- 應用使用JNDI,資料庫無法連線,導致的程序無法啟動問題處理資料庫
- Oracle RAC新增節點Oracle
- 關於XAMPP中Apache和Mysql因埠占用無法啟動的問題ApacheMySql
- MAC電腦出現問題,無法正常啟動怎麼辦?Mac