【MOS】Cluster Health Monitor (CHM) FAQ (文件 ID 1328466.1 ID 2062234.1)
11gR2 新特性:Oracle Cluster Health Monitor(CHM)簡介
Cluster Health Monitor (CHM) FAQ (文件 ID 1328466.1)
In this Document
Purpose |
Questions and Answers |
What is the Cluster Health Monitor? |
What is the purpose of the Cluster Health Monitor? |
What platform does Cluster Health Monitor support and where can I get the Cluster Health Monitor? |
What is the resource name for Cluster Health Monitor in 11.2.0.2 or higher? |
Is stop/start ora.crf affecting clusterware function or cluster database function? |
Can the Cluster Health Monitor be installed on a single node, non-RAC server? |
Do Engineered Systems like Exadata have a default usage with CHM and if so, any specific version?? |
Where is oclumon? |
How do I collect the Cluster Health Monitor data? |
Why does “diagcollection.pl --collect --chmos” return “Cannot parse master from output: ERROR : in reading init file” error? |
How do you get the syntax of different options and explanations for those options for diagcollection.pl and oclumon? |
What is IPD/OS? |
How is the Cluster Health Monitor different from OSWatcher? |
Is the Cluster Health Monitor replacing OSWatcher? |
How much of overhead does the Cluster Health Monitor cause? |
Does CHM on Multiple Node configurations (e.g. 4 to 8 nodes) have scaling concerns? |
Will CDB and PDB result in any new information or special conditions using CHM? |
How much of disk space is needed for the Cluster Health Monitor? |
How do I find out the size of data collected and saved by the Cluster Health Monitor in my system? |
How can I increase the size of the Cluster Health Monitor repository ? |
What platforms can I run the Cluster Health Monitor? |
What steps are needed to install 11.2.0.2 when the Cluster Health Monitor from OTN is already running? |
Where does the Cluster Health Monitor from OTN installed in Linux? |
What logs and data should I gather before logging a SR for the Cluster Health Monitor error? |
How do I increase the trace level the Cluster Health Monitor? |
Can I use procwatcher to get the pstack of the Cluster Health Monitor regularly? |
What are the processes and components for the Cluster Health Monitor? |
What is oclumon? |
What is definition of some of the files like *.bdb, _db.* , *.ldb , log.* files created by tool in the BDB (Berkeley Database) location directory ? |
Where is the location for the log files for the Cluster Health Monitor from OTN (pre 11.2.0.2)? |
How do I fix the problem that the time in the oclumon report is in UTC time zone instead of the time zone of my server? |
Can I install CHM from OTN on 11.2.0.2? What if I stop and disable CHM resource (ora.crf) on 11.2.0.2? |
Where is the trace file for client like oclumon? How do I increase the trace level for oclumon? |
Can the Directory path to the CHM Repository be same on all nodes if shared storage is used? |
How much of data (how long in time) does the node store CHM data locally when it cannot communicate with the master? |
How often does CHM collect the system metric data? Can this be changed? |
What is the CHM retention time? |
How can you reduce the size of bdb file that became big for any reason? |
Can you set up CHM to run locally on each node? |
Can CHM be used on a single node non-RAC server? |
How to start and stop CHM that is installed as a part of GI in 11.2 and higher? |
Database - RAC/Scalability Community |
References |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 10.1.0.2 to 12.1.0.2 [Release 10.1 to 12.1]Information in this document applies to any platform.
PURPOSE
The Cluster Health Monitor FAQ is an evolving document that answers common questions about the Cluster Health Monitor
QUESTIONS AND ANSWERS
What is the Cluster Health Monitor?
What is the purpose of the Cluster Health Monitor?
By monitoring the data constantly, users can use the Cluster Health Monitor detect potential problem areas such as CPU load, memory constraints, and spinning processes before the problem causes an unwanted outage.
What platform does Cluster Health Monitor support and where can I get the Cluster Health Monitor?
The Cluster Health Monitor is integrated part of 11.2.0.2 Oracle Grid Infrastructure for Linux (not on Linux Itanium and IBM Linux Z) and Solaris (Sparc 64 and x86-64 only), so installing 11.2.0.2 Oracle Grid Infrastructure on those platforms will automatically install the Cluster Health Monitor. AIX will have the Cluster Health Monitor starting from 11.2.0.3. The Cluster Health Monitor is also enabled for Windows (except Windows Itanium) in 11.2.0.3.
Prior to 11.2.0.2 on Linux (not on Linux Itanium and IBM Linux Z), the Cluster Health Monitor can be downloaded from OTN.
The OTN version for Windows is not available. Please upgrade to 11.2.0.3 if you need CHM for Windows.
What is the resource name for Cluster Health Monitor in 11.2.0.2 or higher?
Is stop/start ora.crf affecting clusterware function or cluster database function?
Can the Cluster Health Monitor be installed on a single node, non-RAC server?
Do Engineered Systems like Exadata have a default usage with CHM and if so, any specific version??
Where is oclumon?
If the CHM is manually installed using the CHM file from OTN, then the location of oclumon is in:
Linux : /usr/lib/oracrf/bin
Windows : C:\Program Files\oracrf\bin
How do I collect the Cluster Health Monitor data?
For example, issue “/bin/diagcollection.pl --collect --crshome $ORA_CRS_HOME --chmos --incidenttime --incidentduration 05:00”
The above outputs the report that covers 5 hours from the time specified by incidenttime.
The incidenttime must be in MM/DD/YYYYHH:MN:SS where MM is month, DD is date, YYYY is year, HH is hour in 24 hour format, MN is minute, and SS is second. For example, if you want to put the incident time to start from 10:15 PM on June 01, 2011, the incident time is 06/01/201122:15:00. The incidenttime and incidentduration can be changed to capture more data.
Alternatively, ‘oclumon dumpnodeview -allnodes -v -last "11:59:59" > your-filename’ if diagcollection.pl fails with any reason. This will generate a report from the repository up to last 12 hours. The -last value can be changed to get more or less data.
Another example of using oclumon is 'oclumon dumpnodeview -allnodes -v -s "2012-06-01 22:15:00" -e "2012-06-02 03:15:00" > /tmp/chm.log '. The difference in this command is that it specifies the start (-s flag) and end time (-e flag).
In this case, the time format used is "YYYY-MM-DD HH24:MI:SS" like "2007-11-12 23:05:00".
Why does “diagcollection.pl --collect --chmos” return “Cannot parse master from output: ERROR : in reading init file” error?
The workaround for this is to issue
oclumon dumpnodeview -allnodes -v -last “amount of data needed”
For example, oclumon dumpnodeview -allnodes -v -last “01:00:00”
will provide last one hour of data from all nodes.
How do you get the syntax of different options and explanations for those options for diagcollection.pl and oclumon?
What is IPD/OS?
How is the Cluster Health Monitor different from OSWatcher?
Is the Cluster Health Monitor replacing OSWatcher?
On the other hand, if only one of the tools can be used, then Oracle recommends that the Cluster Health Monitor is used.
How much of overhead does the Cluster Health Monitor cause?
Does CHM on Multiple Node configurations (e.g. 4 to 8 nodes) have scaling concerns?
Will CDB and PDB result in any new information or special conditions using CHM?
How much of disk space is needed for the Cluster Health Monitor?
How do I find out the size of data collected and saved by the Cluster Health Monitor in my system?
To estimate the space required, use the following formula:
# of nodes * 720MB * 3 = Size required for 3 days retention
eg. for 4 node cluster: 4 * 720 * 3 = 8,640MB (8.4GB)
How can I increase the size of the Cluster Health Monitor repository ?
What platforms can I run the Cluster Health Monitor?
11.2.0.2: Solaris (Sparc 64 and x86-64 only), and Linux.
11.2.0.3: AIX, Solaris (Sparc 64 and x86-64 only), Linux, and Windows.
Cluster Health Monitor is NOT available for any Itanium platform such as Linux Itanium and Windows Itanium.
What steps are needed to install 11.2.0.2 when the Cluster Health Monitor from OTN is already running?
Where does the Cluster Health Monitor from OTN installed in Linux?
What logs and data should I gather before logging a SR for the Cluster Health Monitor error?
2) output of strace -v for osysmond.bin about 2 minutes.
3) strace -cp for about 2 min
4) oclumon dumpnodeview -v output for that node for 2 min.
5) output of "uname -a"
6) outpuft of "ps -eLf | grep osysmond.bin"
7) The ologgerd and sysmond log files in the CRS_HOME/log/ directory from all nodes
How do I increase the trace level the Cluster Health Monitor?
oclumon debug log all allcomp:
Higher the trace level, more detailed tracing is done, so do not forget to reset the trace level back to 1 (the trace level when the CHM is first installed) by issuing "oclumon debug log all allcomp:1"
Can I use procwatcher to get the pstack of the Cluster Health Monitor regularly?
What are the processes and components for the Cluster Health Monitor?
System Monitor Service (Sysmond) – the sysmond process collects the system statistics of the local node and sends the data to the master ologgerd. A sysmond process runs on every node and collects the system statistics including CPU, memory usage, platform info, disk info, nic info, process info, and filesystem info.
To find the master olggerd, one can use the following command:
oclumon manage -get master
What is oclumon?
You can also use oclumon to query and print the durations and the states for a resource on a node during a specified time period. These states are based on predefined thresholds for each resource metric and are denoted as red, orange, yellow, and green, indicating decreasing order of criticality.
What is definition of some of the files like *.bdb, _db.* , *.ldb , log.* files created by tool in the BDB (Berkeley Database) location directory ?
log.* - These are berkeley bdb logfiles which preserve changes before making them to the db files. We have checkpointing setup and it reuses the log files.
*.ldb - This is the local logging file and MUST be present on all servers.
Do not delete above files except in case of trying to reduce the size of bdb file that get grow to a large size. To reduce the size of bdb file, refer to the question "How can you reduce the size of bdb file that became big for any reason?" in this document.
Because it takes many days / weeks to resolve a problem like the node reboot or performance degradation, is there any way to keep the Cluster Health Monitor data for that long so that it can be replayed any time later when needed ?
Before 12.1.0.2, another way is to archive the whole BDB regularly (like every day) by making a copy of BDB file in the BDB location directory.
The way that CHMOS reads archived BDB is to start it in debug mode. It starts by using
ologdbg -d
After it starts, issue the oclumon dumpnodeview to get the data from the archived BDB.
For example, issue
oclumon dumpnodeview -n -s -e -v
Where is the location for the log files for the Cluster Health Monitor from OTN (pre 11.2.0.2)?
How do I fix the problem that the time in the oclumon report is in UTC time zone instead of the time zone of my server?
Can I install CHM from OTN on 11.2.0.2? What if I stop and disable CHM resource (ora.crf) on 11.2.0.2?
Where is the trace file for client like oclumon? How do I increase the trace level for oclumon?
Generally its not generated because, at the log level 0, there is no log data.
To see logs at higher log level one needs to do the following
1. oclumon [Enter the interactive mode]
2. query> debug log all allcomp:3
After this, any command execution will produce finer logs in oclumon.log
Can the Directory path to the CHM Repository be same on all nodes if shared storage is used?
How much of data (how long in time) does the node store CHM data locally when it cannot communicate with the master?
With a sampling interval of 1 second, ideally it will be around 1 hour of data. With 11.2.0.3, we have moved to sampling interval of 5 seconds, hence, in that case the data that can be retained is 4-5 hours of data.
How often does CHM collect the system metric data? Can this be changed?
Currently, the collection interval can not be changed.
What is the CHM retention time?
In 11.2.0.2, the retention time is determined by the size. The size has changed to 1GB. Depending on how large the cluster is, the retention time is different. For example, it is usually 6.9 hours for a one-node cluster when sampling interval is 1 second. Please issue "oclumon manage -get repsize" to find out the retention time of your cluster. The output is in seconds.
With sampling interval moving to 5 seconds in 11.2.0.3, the retention time becomes 5 times retention time with sampling interval 1 second.
It is recommended to set 72hours retention time.
How can you reduce the size of bdb file that became big for any reason?
oclumon manage -repos changesize .
As a temporary work around, you can kill ologgerd and delete the contents in the BDB directory. osysmond should respawn ologgerd and new bdb file will get created. The past data is lost when this is done.
Please note the minimum size must be >= 1024 MB (1 GB), otherwise CRS-9100 "Error setting Cluster Health Monitor repository size" will be reported.
Can you set up CHM to run locally on each node?
The Cluster Health Monitor that comes with the Grid Infrastructure install image must run with only one master ologgerd, so it can not be set up to run locally on each node.
Can CHM be used on a single node non-RAC server?
How to start and stop CHM that is installed as a part of GI in 11.2 and higher?
To stop CHM (or ora.crf resource managed by ohasd)
$GRID_HOME/bin/crsctl stop res ora.crf -init
To start CHM (or ora.crf resource managed by ohasd)
$GRID_HOME/bin/crsctl start res ora.crf -init
Database - RAC/Scalability Community
To discuss this topic further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support
How to relocate CHM repository and increase retention time (文件 ID 2062234.1)
In this Document
Goal |
Solution |
11.2 |
12.1 |
References |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.1 and laterInformation in this document applies to any platform.
GOAL
Often CHM data ages out when if not collected on time, this note provides steps to increase the retention time which is strongly recommended.
SOLUTION
11.2
In 11.2, the repository of CHM is in Grid home, to change the retention time:
$ /bin/oclumon manage -repos resize 259200
racnode1 --> retention check successful
racnode2 --> retention check successful
New retention is 259200 and will use 4525424640 bytes of disk space
CRS-9115-Cluster Health Monitor repository size change completed on all nodes.
Done
Note: the command line specifies for how many seconds to retain the data and it's recommended to be at least 259200 which is 3 days.
In case there's insufficient amount of space in Grid home, relocate CHM data with the following command:
$ /bin/oclumon manage -repos reploc /home/grid/chm
racnode1 --> Ready to commit new location
racnode2 --> Ready to commit new location
New retention is 259200 and will use 4525424640 bytes of disk space
CRS-9113-Cluster Health Monitor repository location change completed on all nodes. Restarting Loggerd.
Done
12.1
In 12c, the repository of CHM is GIMR which is a database, only retention time can be changed. To change the retention time:
1. Check how much space is needed for the expected retention time:
The Cluster Health Monitor repository is too small for the desired retention. Please first resize the repository to 3896 MB
Note: the command line specifies for how many seconds to retain the data and it's recommended to be at least 259200 which is 3 days. The output tells that the repository needs to be at least 3896 MB for 3 days.
2. Change the repository size:
The Cluster Health Monitor repository was successfully resized.The new retention is 259200 seconds.
REFERENCES
NOTE:1589394.1 - How to Move/Recreate GI Management Repository to Different Shared Storage (Diskgroup, CFS or NFS etc)
About Me
...............................................................................................................................
● 本文整理自網路
● 本文在itpub(http://blog.itpub.net/26736162)、部落格園(http://www.cnblogs.com/lhrbest)和個人微信公眾號(xiaomaimiaolhr)上有同步更新
● 本文itpub地址:http://blog.itpub.net/26736162/abstract/1/
● 本文部落格園地址:http://www.cnblogs.com/lhrbest
● 本文pdf版及小麥苗雲盤地址:http://blog.itpub.net/26736162/viewspace-1624453/
● 資料庫筆試面試題庫及解答:http://blog.itpub.net/26736162/viewspace-2134706/
● QQ群:230161599 微信群:私聊
● 聯絡我請加QQ好友(646634621),註明新增緣由
● 於 2017-06-02 09:00 ~ 2017-06-30 22:00 在魔都完成
● 文章內容來源於小麥苗的學習筆記,部分整理自網路,若有侵權或不當之處還請諒解
● 版權所有,歡迎分享本文,轉載請保留出處
...............................................................................................................................
拿起手機使用微信客戶端掃描下邊的左邊圖片來關注小麥苗的微信公眾號:xiaomaimiaolhr,掃描右邊的二維碼加入小麥苗的QQ群,學習最實用的資料庫技術。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/26736162/viewspace-2132364/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 【RAC】11gR2 新特性:Oracle Cluster Health Monitor(CHM)簡介Oracle
- 11g New Feature: Health monitor (Doc ID 466920.1)
- 【MOS】Redundant Interconnect ora.cluster_interconnect.haip (文件 ID 1210883.1)AI
- 11g R2新特徵:oracle cluster health moniter (CHM) 簡介特徵Oracle
- Health Monitor簡介
- Health Monitor 健康檢查
- 11g_Health_Monitor
- 【MOS】Top 5 Grid Infrastructure Startup Issues (文件 ID 1368382.1)ASTStruct
- Oracle RAC 11204 CHM(cluster healthy monitor) 檔案無限膨脹.Oracle
- Oracle 11g Health MonitorOracle
- Android FAQAndroid
- 【MOS】Parameter FILESIZE - Multiple Export Files (文件 ID 290810.1)Export
- 【MOS】Limitations of the Oracle Cost Based Optimizer (文件 ID 212809.1)MITOracle
- 11g New Feature: Health monitor
- DIY JavaAPI CHM文件JavaAPI
- squid官方faqUI
- 【MOS】Index Rebuild Is Hanging Or Taking Too Long (文件 ID 272762.1)IndexRebuild
- 【MOS】 EXPDP - ORA-39166 (Object Was Not Found) (文件 ID 1640392.1)Object
- oracle 11g health monitor健康監控Oracle
- monitor PX limits from Resource Manager for active sessions (文件 ID 240877.1)MITSession
- 【MOS】Troubleshooting 'enq: TX - index contention' Waits (文件 ID 873243.1)ENQIndexAI
- RDA 4 - Health Check / Validation Engine Guide [ID 250262.1]GUIIDE
- Android 8.0 開發者 FAQAndroid
- 【MOS】Top Ten Performance Mistakes Found in Oracle Systems. (文件 ID 858539.1)ORMOracle
- 【MOS】EXPDP Fails ORA-39165: Schema SYS Was Not Found (文件 ID 553402.1)AI
- 【MOS】12c RAC "enq: IV - contention" (文件 ID 2028503.1)ENQ
- Highly Available IP (HAIP) FAQ for Release 11.2 (文件 ID 1664291.1)AI
- Android CHM檔案閱讀器Android
- Oracle Text Health Check (Doc ID 823649.1)Oracle
- Android Profile--Memory MonitorAndroid
- 雲原生網路代理 MOSN FAQ
- 【MOS】Creating a PDB ... Fails With ORA-17630 (文件 ID 2090019.1)AI
- 安裝EBS前期檢查工具 - RDA - Health Check / Validation Engine GuideGUIIDE
- Android 效能測試——Memory Monitor 工具Android
- [20190306]11g health monitor.txt
- 【MOS】OCR/Vote disk 維護操作: (新增/刪除/替換/移動) (文件 ID 1674859.1)
- Although it does not provide the most Parajumpers On SaleIDE
- Android效能測試——Allocation Tracker(Device Monitor)Androiddev