Oracle 12c RAC CSSD程式無法啟動real time模式
一、基礎環境
作業系統: Red Hat Enterprise Linux Server release 7.6 (Maipo)
資料庫: Oracle 12.1.0.2 RAC
二、問題描述
2022 年 11 月 18 日一套業務系統主機因硬體故障發生重啟,主機重啟後資料庫節點 1 無法正常啟動,節點 2 可以正常對外提供服務。節點 1css 程式無法啟動到 real time ,關閉 EDR 相關的 titanagent 服務後,重啟作業系統,可以正常啟動叢集和資料庫。
三、分析過程
1 、檢查主機重啟後叢集狀態
[grid@]#crsctl status res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crf
1 ONLINE ONLINE nadb01
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE OFFLINE STARTING
ora.cssdmonitor
1 ONLINE ONLINE nadb01
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE OFFLINE
ora.gipcd
1 ONLINE ONLINE nadb01
ora.gpnpd
1 ONLINE ONLINE nadb01
ora.mdnsd
1 ONLINE ONLINE nadb01
cssd 程式啟動異常。
2 、檢查資料庫叢集日誌
[gpnpd(231513)]CRS-2328:GPNPD started on node zadb03. 2022-11-18 10:56:09.210: [cssd(231620)]CRS-1713:CSSD daemon is started in clustered mode 2022-11-18 10:56:09.219: [cssd(231620)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00011:) in /u01/app/11.2.0.4/grid/log/newdb01/cssd/ocssd.log 2022-11-18 10:56:11.034: [ohasd(229354)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
從日誌看]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00011:) in /u01/app/12.1.0.2/grid/log/newdb01/cssd/ocssd.log
檢查 ocssd日誌
2022-11-18 10:56:09.210: [ CSSD][3219912512]clssscmain: Starting CSS daemon, version 11.2.0.4.0, in (clustered) mode with uniqueness value 1668740169 2022-11-18 10:56:09.210: [ CSSD][3219912512]clssscmain: Environment is production 2022-11-18 10:56:09.210: [ CSSD][3219912512]clssscmain: Core file size limit extended 2022-11-18 10:56:09.212: [ CSSD][3219912512]clssscmain: GIPCHA down 0 2022-11-18 10:56:09.213: [ CSSD][3219912512]clssscGetParameterOLR: OLR fetch for parameter logsize (8) failed with rc 21 2022-11-18 10:56:09.213: [ CSSD][3219912512]clssscExtendLimits: The current soft limit for file descriptors is 65536, hard limit is 65536 2022-11-18 10:56:09.213: [ CSSD][3219912512]clssscExtendLimits: The current soft limit for locked memory is 4294967295, hard limit is 4294967295 2022-11-18 10:56:09.213: [ CSSD][3219912512]clssscGetParameterOLR: OLR fetch for parameter priority (15) failed with rc 21 2022-11-18 10:56:09.213: [ CSSD][3219912512]clssscSetPrivEnv: Setting priority to 4 2022-11-18 10:56:09.219: [ CSSD][3219912512]clssscSetPrivEnv: unable to set priority to 4 2022-11-18 10:56:09.219: [ CSSD][3219912512]SLOS: cat=-2, opn=scls_set_priority_realtime, dep=1, loc=setsched unable to escalate to real time
從ocss日誌中可以看到ocssd程式啟動時無法得到較高的優先順序,無法啟動到real time。
Linux: GI OCSSD Fails to Start After cgroups Setting Change (Doc ID 1577784.1) 描述與此現象高度相似
Deployed Puppet which created a new cgroup-configuration by default. ls /cgroups/cpu.rt_* /cgroups/cpu.rt_period_us /cgroups/cpu.rt_runtime_us cat /cgroups/cpu.rt_* 1000000 950000 cat /cgroups/sysdefault/cpu.rt_* 1000000 0 ====>> 0 SOLUTION Option 1: Restore the default value and reboot the node: cat /etc/cgconfig.conf mount { memory = /cgroups; cpu = /cgroups; } group lu-adm { cpu { cpu.shares = 50; } memory { memory.memsw.limit_in_bytes = 500m; memory.limit_in_bytes = 200m; } } group sysdefault { cpu { cpu.shares = 1024; cpu.rt_period_us = 1000000; cpu.rt_runtime_us = 950000; ====>> changed from 0 back to default } } Workaround is to clear cgroup setting through 'cgclear' after consulting sysadmin. cgroup-configuration file changed in RHEL 6 and later versions RHEL 6 cd /sys/fs/cgroup/cpuacct/user.slice cat cpu.rt_period_us RHEL 7 path i.e File location : ls /sys/fs/cgroup/cpu/cpu.rt_* The file is not availble in all OS -- check with the OS Vendor for details.
3、檢查作業系統相關配置和服務
[root@ ~]# cat /etc/cgconfig.conf
cat: /etc/cgconfig.conf: No such file or directory
沒有cgconfig.conf 檔案
[root@ ~]# ls /sys/fs/cgroup/cpu/cpu.rt_*
/sys/fs/cgroup/cpu/cpu.rt_period_us /sys/fs/cgroup/cpu/cpu.rt_runtime_us
[root@ ~]#
[root@ ~]# cat /sys/fs/cgroup/cpu/cpu.rt_period_us
1000000
[root@ ~]# cat /sys/fs/cgroup/cpu/cpu.rt_runtime_us
950000
[root@~]#
cpu.rt_period_us和cpu.rt_runtime_us設定的就是推薦值950000
該文件《Linux: GI OCSSD Fails to Start After cgroups Setting Change (Doc ID 1577784.1)》的解決方案不適用。
4、reahat官方關於CPU的相關設定說明
How to configure a RHEL 7 or RHEL 8 system to be able to run programs requiring Real-Time Scheduling
當CPUAccounting引數enabled時,將不能建立real-time程式。排查system.conf配置檔案發現並沒有開啟CPUAccounting引數
5、檢查作業系統CPU Accounting、CPUQuots等
[root@ ~]# grep DefaultCPUAccounting /etc/systemd/system.conf
#DefaultCPUAccounting=no
但是在titanagent.service服務檔案中發現配置了CPUQuota=50%
[root@~]# cat /usr/lib/systemd/system/titanagent.service
[Unit]
Description=titanagent
After=network.target
[Service]
User=root
CPUQuota=50%
Type=forking
PIDFile=/var/run/titanagent.pid
ExecStartPre=/bin/bash -c “/titan/agent/titanagent -s”
ExecStart=/bin/bash -c “/titan/agent/titanagent -d -b /etc/titanagent”
ExecStop=/bin/bash -c “/titan/agent/titanagent -s”
ExecReload=/bin/bash -c “/titan/agent/titanagent -d -b /etc/titanagent”
PrivateTmp=no
Restart=always
RestartSec=60s
TimeoutSec=20s
TimeoutStopSec=30s
[Install]
WantedBy=multi-user.target
CPUQuota引數會隱性開啟CPUAccounting
6、禁用titanagent.service後,重啟主機叢集啟動正常
-the end-
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/28373936/viewspace-2925424/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 11.2.0.4 RAC CSSD服務無法啟動故障 unable to set priority to 4CSS
- ORACLE RAC 11.2.0.4 FOR RHEL6叢集無法啟動的處理Oracle
- ORACLE 12C RAC資料庫的啟停Oracle資料庫
- Oracle RAC自啟動Oracle
- ORACLE RAC 11.2.0.4 for RHEL6.8無法啟動之ORA000205&ORA17503&ORA01174Oracle
- 私有IP丟失造成Oracle 12C RAC叢集節點不能啟動Oracle
- oracle 12c RAC安裝,例項不能多節點同時啟動Oracle
- windows time服務無法啟動的解決方法Windows
- RAC節點啟動失敗--ASM無法連線ASM
- 無法啟動?教你進入安全模式模式
- Oracle 12c rac ocr和votedisk管理Oracle
- Oracle Haip無法啟動問題學習OracleAI
- Oracle RAC啟動失敗(DNS故障)OracleDNS
- Oracle 12c 使用RMAN搭建物理備庫(RAC to RAC)Oracle
- Oracle 12c叢集啟動故障Oracle
- oracle兩節點RAC,由於gipc導致某節點crs無法啟動問題分析Oracle
- RAC節點hang住, oracle bug導致了cpu過高,無法啟動叢集隔離Oracle
- rac二節點例項redo故障無法啟動修復
- Oracle 12C RAC CDB資料庫部署Oracle資料庫
- oracle 12c rac 詳細部署教程(二)Oracle
- oracle 12c rac 詳細部署教程(一)Oracle
- Oracle RAC的自定義service自啟動Oracle
- [20230225]12c Real-time materialized view 實時物化檢視的應用.txtZedView
- ORACLE 12C RAC 部署應用包準備Oracle
- Oracle RAC常見啟動失敗故障分析Oracle
- Lecture 05 Real-time Environment MappingAPP
- Lecture 12 Real-time Ray Tracing
- ORACLE 12C RAC 生產環境搭建介紹Oracle
- Oracle 12c RAC構築之二:共享磁碟配置Oracle
- Oracle日常問題-資料庫無法啟動(案例二)Oracle資料庫
- Oracle日常問題處理-資料庫無法啟動Oracle資料庫
- Oracle 12.2應用PSU後資料庫無法啟動Oracle資料庫
- RAC各個程式啟動的流程
- Oracle RAC啟動因CTSS導致的異常Oracle
- 【ASM】RAC19C因引數設定不當,asm無法啟動ASM
- win10系統無法正常啟動如何進入安全模式_win10無法正常啟動進入安全模式的步驟Win10模式
- Oracle:RAC 程式簡介Oracle
- The Trade Desk 接入 Adobe Real-Time CDP