Troubleshooting Oracle ClusterwareThis appendix introduces

maojinyu發表於2011-03-01

F Troubleshooting Oracle ClusterwareThis appendix introduces monitoring the Oracle Clusterware environment and explains how you can enable dynamic debugging to troubleshoot Oracle Clusterware processing, and enable debugging and tracing for specific components and specific Oracle Clusterware resources to focus your troubleshooting efforts.

This appendix contains the following topics:

•Monitoring Oracle Clusterware

•Dynamic Debugging

•Component Level Debugging

•Oracle Clusterware Shutdown and Startup

•Enabling and Disabling Oracle Clusterware Daemons

•Determining the Active Versions and Software Versions

•Diagnostics Collection Script

•Oracle Clusterware Alerts

•Resource Debugging

•Checking the Health of the Clusterware

•Clusterware Log Files and the Unified Log Directory Structure

•Troubleshooting the Oracle Cluster Registry

•Enabling Additional Tracing for Oracle Clusterware High Availability

Monitoring Oracle ClusterwareYou can use Oracle Enterprise Manager to monitor the Oracle Clusterware environment. When you log in to Oracle Enterprise Manager using a client browser, the Cluster Database Home page appears where you can monitor the status of both Oracle Clusterware environments. Monitoring can include such things as:

•Notification if there are any VIP relocations

•Status of the Oracle Clusterware on each node of the cluster using information obtained through the Cluster Verification Utility (cluvfy)

•Notification if node applications (nodeapps) start or stop

•Notification of issues in the Oracle Clusterware alert log for the OCR, voting disk issues (if any), and node evictions

The Cluster Database Home page is similar to a single-instance Database Home page. However, on the Cluster Database Home page, Oracle Enterprise Manager displays the system state and availability. This includes a summary about alert messages and job activity, as well as links to all the database and Automatic Storage Management (ASM) instances. For example, you can track problems with services on the cluster including when a service is not running on all of the preferred instances or when a service response time threshold is not being met.

You can use the Oracle Enterprise Manager Interconnects page to monitor the Oracle Clusterware environment. The Interconnects page shows the public and private interfaces on the cluster, the overall throughput on the private interconnect, individual throughput on each of the network interfaces, error rates (if any) and the load contributed by database instances on the interconnect, including:

•Overall throughput across the private interconnect

•Notification if a database instance is using public interface due to misconfiguration

•Throughput and errors (if any) on the interconnect

•Throughput contributed by individual instances on the interconnect

All of this information also is available as collections that have a historic view. This is useful in conjunction with cluster cache coherency, such as when diagnosing problems related to cluster wait events. You can access the Interconnects page by clicking the Interconnect tab on the Cluster Database home page.

Also, the Oracle Enterprise Manager Cluster Database Performance page provides a quick glimpse of the performance statistics for a database. Statistics are rolled up across all the instances in the cluster database in charts. Using the links next to the charts, you can get more specific information and perform any of the following tasks:

•Identify the causes of performance issues.

•Decide whether resources need to be added or redistributed.

•Tune your SQL plan and schema for better optimization.

•Resolve performance issues

The charts on the Cluster Database Performance page include the following:

•Chart for Cluster Host Load Average—The Cluster Host Load Average chart in the Cluster Database Performance page shows potential problems that are outside the database. The chart shows maximum, average, and minimum load values for available nodes in the cluster for the previous hour.

•Chart for Global Cache Block Access Latency—Each cluster database instance has its own buffer cache in its System Global Area (SGA). Using Cache Fusion, Oracle RAC environments logically combine each instance's buffer cache to enable the database instances to process data as if the data resided on a logically combined, single cache.

•Chart for Average Active Sessions—The Average Active Sessions chart in the Cluster Database Performance page shows potential problems inside the database. Categories, called wait classes, show how much of the database is using a resource, such as CPU or disk I/O. Comparing CPU time to wait time helps to determine how much of the response time is consumed with useful work rather than waiting for resources that are potentially held by other processes.

•Chart for Database Throughput—The Database Throughput charts summarize any resource contention that appears in the Average Active Sessions chart, and also show how much work the database is performing on behalf of the users or applications. The Per Second view shows the number of transactions compared to the number of logons, and the amount of physical reads compared to the redo size for each second. The Per Transaction view shows the amount of physical reads compared to the redo size for each transaction. Logons is the number of users that are logged on to the database.

In addition, the Top Activity drilldown menu on the Cluster Database Performance page enables you to see the activity by wait events, services, and instances. Plus, you can see the details about SQL/sessions by going to a prior point in time by moving the slider on the chart.

See Also:

Oracle Database 2 Day + Real Application Clusters Guide
Dynamic Debugging
You can use crsctl commands as the root user to enable dynamic debugging for Oracle Clusterware, the Event Manager (EVM), and the clusterware subcomponents. You can dynamically change debugging levels using crsctl commands. Debugging information remains in the Oracle Cluster Registry (OCR) for use during the next startup. You can also enable debugging for resources.

The crsctl syntax to enable debugging for Oracle Clusterware is:

crsctl debug log crs "CRSRTI:1,CRSCOMM:2"
The crsctl syntax to enable debugging for EVM is:

crsctl debug log evm "EVMCOMM:1"
The crsctl syntax to enable debugging for resources is:

crsctl debug log res "resname:1"
Component Level Debugging
You can use crsctl commands as the root user to enable dynamic debugging for the Oracle Clusterware Cluster Ready Services (CRS), Oracle Cluster Registry (OCR), Cluster Synchronization Services (CSS), and the Event Manager (EVM).

This section contains the following topics:

•Enabling Debugging for CRS, OCR, CSS, and EVM Modules

•Creating an Initialization File to Contain the Debugging Level

Enabling Debugging for CRS, OCR, CSS, and EVM Modules
You can enable debugging for the CRS, OCR, CSS, and EVM modules and their components by setting environment variables or by issuing crsctl debug commands using the following syntax:

crsctl debug log module_name component:debugging_level
You must issue the crsctl debug command as the root user, and supply the following information:

•module_name—The name of the module: CRS, EVM, or CSS.

•component—The name of a component for the CRS, OCR, EVM, or CSS module. See Table F-1 for a list of all of the components.

•debugging_level—A number from 1 to 5 to indicate the level of detail you want the debug command to return, where 1 is the least amount of debugging output and 5 provides the most detailed debugging output.

You can dynamically change the debugging level in the crsctl command, or you can configure an init file for changing the debugging level as described in "Creating an Initialization File to Contain the Debugging Level".

The following commands show examples of how to enable debugging for the various modules:

•To enable debugging for Oracle Clusterware:

crsctl debug log crs "CRSRTI:1,CRSCOMM:2"
•To enable debugging for OCR:

crsctl debug log crs "CRSRTI:1,CRSCOMM:2,OCRSRV:4"
•To enable debugging for EVM:

crsctl debug log evm "EVMCOMM:1"
•To enable debugging for resources

crsctl debug log res "resname:1"
To list the components that can be used for debugging, issue the crsctl lsmodules command using the following syntax and supply crs, evm, or css for the module_name parameter:

crsctl lsmodules module_name
Note:

You do not have to be the root user to run the crsctl command with the lsmodules option.
Table F-1 shows the components for the CRS, OCR, EVM, and CSS modules, respectively. Note that some of the component names are common between the CRS, EVM, and CSS daemons and may be enabled on that specific daemon. For example COMMNS is the NS layer and because each daemon uses the NS layer, you can enable this specific module component on any of the daemons to get specific debugging information.

Table F-1 Components for the CRS, OCR, EVM, and CSS Modules

CRS ModulesFoot 1 OCR ModulesFoot 2 EVM ModulesFoot 3 CSS ModulesFoot 4

CRSUI
CRSCOMM
CRSRTI
CRSMAIN
CRSPLACE
CRSAPP
CRSRES
CRSCOMM
CRSOCR
CRSTIMER
CRSEVT
CRSD
CLUCLS
CSSCLNT
COMMCRS
COMMNS
OCRAPI
OCRCLI
OCRSRV
OCRMAS
OCRMSG
OCRCAC
OCRRAW
OCRUTL
OCROSD


OCR Tools Modules


OCRCONF
OCRDUMP
OCRCHECK
EVMD
EVMDMAIN
EVMCOMM
EVMEVT
EVMAPP
EVMAGENT
CRSOCR
CLUCLS
CSSCLNT
COMMCRS
COMMNS
CSSD
COMMCRS
COMMNS


Footnote 1 List the CRS component modules using the crsctl lsmodules crs command.

Footnote 2 You cannot list the OCR modules using the crsctl lsmodules command.

Footnote 3 List the EVM component modules using the crsctl lsmodules evm command.

Footnote 4 List the CSS component modules using the crsctl lsmodules css command.

Creating an Initialization File to Contain the Debugging Level
This section describes how to specify the debugging level in an initialization file. This debugging information is stored for use during the next startup.

For each process that you want to debug, you can create an initialization file that contains the debugging level.

The initialization file name includes the name of the process that you are debugging (process_name.ini). The file is located in the |Oracle_home/log/hostname/admin/| directory.

For example, ORACLE_HOME/log/hostA/admin/clscfg.ini is the name for the CLSCFG debugging initialization file on hostA.

See Also:

"Enabling Debugging for CRS, OCR, CSS, and EVM Modules" for information about dynamically changing debugging levels by specifying the level number (from 1 to 5) on the crsctl command
Oracle Clusterware Shutdown and StartupYou can start or stop Oracle Clusterware by issuing crsctl start and stop commands.

Example 1 Stopping Oracle Clusterware
To stop Oracle Clusterware and its related resources on a specific node, issue the following command:

crsctl stop crs
Example 2 Starting Oracle Clusterware
To start Oracle Clusterware and its related resources on a specific node, issue the following command:

crsctl start crs
Note:

You must run these crsctl commands as the root user.
Enabling and Disabling Oracle Clusterware Daemons
When the Oracle Clusterware daemons are enabled, they start automatically at the time the node is started. To prevent the daemons from starting, you can disable them using crsctl commands. You can use crsctl commands as follows to enable and disable the startup of the Oracle Clusterware daemons.

Run the following command to enable startup for all of the Oracle Clusterware daemons:

crsctl enable crs
Run the following command to disable the startup of all of the Oracle Clusterware daemons:

crsctl disable crs
Notes:

•You must run these crsctl commands as the root user.

•Neiter of these commands is supported on Windows systems

Determining the Active Versions and Software Versions
You can determine the active version or the software version running on the local node cluster by issuing crsctl activeversion and softwarewareversion commands.

•The software version is the binary version of the software on a particular cluster node.

•The active version is the lowest software version running in a cluster.

These versions are used while upgrading a cluster.

Example 1 Determining the Active Version
To determine the active version on the local node, issue the following command:

crsctl query crs activeversion
Example 2 Determining the Software Version
To determine the software version on the local node, issue the following command:

crsctl query crs softwareversion
Diagnostics Collection Script
Every time an Oracle Clusterware error occurs, you should use run the diagcollection.pl script to collect diagnostic information from Oracle Clusterware in trace files. The diagnostics provide additional information so Oracle Support can resolve problems. Run this script from the following location:

CRS_home/bin/diagcollection.pl
Note:

You must run this script as the root user.
Oracle Clusterware Alerts
Oracle Clusterware posts alert messages when important events occur. The following is an example of an alert from the CRSD process:

2007-09-03 10:05:35.463
[cssd(3073)]CRS-1605:CSSD voting file is online: /dev/sdm2. Details in
/scratch/crs/log/stnsp012/cssd/ocssd.log.
2007-09-03 10:05:35.484
[cssd(3073)]CRS-1605:CSSD voting file is online: /dev/sdl3. Details in
/scratch/crs/log/stnsp012/cssd/ocssd.log.
[cssd(3073)]CRS-1601:CSSD Reconfiguration complete. Active nodes are
stnsp011 stnsp012 stnsp013 stnsp014 .
2007-09-03 10:05:36.949
[evmd(2218)]CRS-1401:EVMD started on node stnsp012.
2007-09-03 10:05:36.999
[crsd(2232)]CRS-1012:The OCR service started on node stnsp012.
2007-09-03 10:05:38.770
[crsd(2232)]CRS-1201:CRSD started on node stnsp012.
The location of this alert log on Linux, UNIX, and Windows systems is in the following directory path, where CRS_home is the name of the location of Oracle Clusterware: CRS_home/log/hostname/alerthostname.log.

The following example shows an EVMD alert:

[NORMAL] CLSD-1401: EVMD started on node %s
[ERROR] CLSD-1402: EVMD aborted on node %s. Error [%s]. Details in %s.
Resource Debugging
You can use crsctl command to enable resource debugging using the following syntax:

crsctl debug log res "ora.node1.vip:1"
This has the effect of setting the environment variable USER_ORA_DEBUG, to 1, before running the start, stop, or check action scripts for the ora.node1.vip resource.

Note:

You must run this crsctl command as the root user.
Checking the Health of the Clusterware
Use the crsctl check command to determine the health of your clusterware as in the following example:

crsctl check crs
Run the following command to determine the health of individual daemons where daemon is crsd, cssd or evmd:

crsctl check daemon
Note:

You do not have to be the root user to perform health checks.
Clusterware Log Files and the Unified Log Directory Structure
Oracle Database uses a unified log directory structure to consolidate the Oracle Clusterware component log files. This consolidated structure simplifies diagnostic information collection and assists during data retrieval and problem analysis.

Oracle Clusterware retains one current log file and five older log files that are 50 MB in size (300 MB of storage) for the cssd process, and one current log file and 10 older log files that are 10 MB in size (110 MB of storage) for the crsd process. In addition, Oracle Clusterware overwrites the oldest retained log file for any log file group when the current log file gets stored. Alert files are stored in the directory structures shown in Table F-2.

Table F-2 Locations of Oracle Clusterware Component Log Files
Component Log File LocationFoot 1
Cluster Ready Services Daemon (crsd) Log Files
CRS home/log/hostname/crsd

Oracle Cluster Registry (OCR) records l
For the OCR tools (OCRDUMP, OCRCHECK, OCRCONFIG) record log information in the following location:Foot 2

CRS_Home/log/hostname/client

The OCR server records log information in the following location:Foot 3

CRS_home/log/hostname/crsd

Oracle Process Monitor Daemon (OPROCD)
The following path is specific to LinuxFoot 4 :

/etc/oracle/hostname.oprocd.log

Cluster Synchronization Services (CSS)
CRS_home/log/hostname/cssd

Event Manager (EVM) information generated by evmd
CRS_home/log/hostname/evmd

Oracle RAC RACG
The Oracle RAC high availability trace files are located in the following two locations:

CRS_home/log/hostname/racg

and

$ORACLE_HOME/log/hostname/racg
Core files are in subdirectories of the log directory. Each RACG executable has a subdirectory assigned exclusively for that executable. The name of the RACG executable subdirectory is the same as the name of the executable.


Footnote 1 The directory structure is the same for Linux, UNIX, and Windows systems.

Footnote 2 To change the amount of logging, edit the path in the CRS_home/srvm/admin/ocrlog.ini file.

Footnote 3 To change the amount of logging, edit the path in the CRS_home/log/hostname/crsd/crsd.ini file.

Footnote 4 This path is dependent upon the installed Linux or UNIX platform.

Troubleshooting the Oracle Cluster Registry
This following topics in this section explain how to troubleshoot the OCR:

•Using the OCRDUMP Utility to View Oracle Cluster Registry Content

•Using the OCRCHECK Utility

•Oracle Cluster Registry Troubleshooting

Using the OCRDUMP Utility to View Oracle Cluster Registry Content
This section explains how to use the OCRDUMP utility to view OCR content for troubleshooting. The OCRDUMP utility enables you to view the OCR contents by writing OCR content to a file or stdout in a readable format.

You can use a number of options for OCRDUMP. For example, you can limit the output to a key and its descendents. You can also write the contents to an XML file that you can view using a browser. OCRDUMP writes the OCR keys as ASCII strings and values in a datatype format. OCRDUMP retrieves header information based on a best effort basis.

OCRDUMP also creates a log file in CRS_home/log/hostname/client. To change the amount of logging, edit the file CRS_Home/srvm/admin/ocrlog.ini.

To change the logging component, edit the entry containing the comploglvl= entry. For example, to change the logging of the ORCAPI component to 3 and to change the logging of the OCRRAW component to 5, make the following entry in the ocrlog.ini file:

comploglvl="OCRAPI:3;OCRRAW:5"
Note:

Make sure that you have file creation privileges in the CRS_home directory before using the OCRDUMP utility.
OCRDUMP Utility Syntax and Options
This section describes the OCRDUMP utility command syntax and usage. Run the ocrdump command with the following syntax where filename is the name of a target file to which you want Oracle Database to write the OCR output and where keyname is the name of a key from which you want Oracle Database to write OCR subtree content:

ocrdump [file_name|-stdout] [-backupfile backup_file_name] [-keyname keyname] [-xml] [-noheader]
Table F-3 describes the OCRDUMP utility options and option descriptions.

Table F-3 OCRDUMP Options and Option Descriptions

Options Description
file_name
The name of a file to which you want OCRDUMP to write output.

By default, output from the OCRDUMP utility is written to the predefined output file named OCRDUMPFILE. The file_name option redirects OCRDUMP output to the file that you specify.

-stdout
Use this option to redirect the OCRDUMP output to the text terminal that initiated the program.

If you do not redirect the output, output from the OCRDUMP utility is written to the predefined output file named OCRDUMPFILE by default.

-keyname
The name of an OCR key whose subtree is to be dumped.

-xml
Writes the output in XML format.

-noheader
Does not print the time at which you ran the command and when the OCR configuration occurred.

-backupfile
Option to identify a backup file.

backup_file_name
The name of the backup file with the content you want to view. You can query the backups using the ocrconfig -showbackup command.


OCRDUMP Utility ExamplesThe following ocrdump utility examples extract various types of OCR information and write it to various targets:

ocrdump
Writes the OCR content to a file called OCRDUMPFILE in the current directory.

ocrdump MYFILE
Writes the OCR content to a file called MYFILE in the current directory.

ocrdump -stdout -keyname SYSTEM
Writes the OCR content from the subtree of the key SYSTEM to stdout.

ocrdump -stdout -xml
Writes the OCR content to stdout in XML format.

Sample OCRDUMP Utility OutputThe following OCRDUMP examples show the KEYNAME, VALUE TYPE, VALUE, permission set (user, group, world) and access rights for two sample runs of the ocrdump command. The following shows the output for the SYSTEM.language key that has a text value of AMERICAN_AMERICA.WE8ASCII37.

[SYSTEM.language]
ORATEXT : AMERICAN_AMERICA.WE8ASCII37
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ,
OTHER_PERMISSION : PROCR_READ, USER_NAME : user, GROUP_NAME : group
}

The following shows the output for the SYSTEM.version key that has integer value of 3:

[SYSTEM.version]
UB4 (10) : 3
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ,
OTHER_PERMISSION : PROCR_READ, USER_NAME : user, GROUP_NAME : group
}
Using the OCRCHECK UtilityThe OCRCHECK utility displays the version of the OCR's block format, total space available and used space, OCRID, and the OCR locations that you have configured. OCRCHECK performs a block-by-block checksum operation for all of the blocks in all of the OCRs that you have configured. It also returns an individual status for each file as well as a result for the overall OCR integrity check.

The following example shows a sample of the OCRCHECK utility output:

Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262144
Used space (kbytes) : 16256
Available space (kbytes) : 245888
ID : 1918913332
Device/File Name : /dev/raw/raw1
Device/File integrity check succeeded
Device/File Name : /dev/raw/raw2
Device/File integrity check succeeded

Cluster registry integrity check succeeded
OCRCHECK creates a log file in the directory CRS_home/log/hostname/client. To change amount of logging, edit the file CRS_home/srvm/admin/ocrlog.ini.

Oracle Cluster Registry Troubleshooting
Table F-4 describes common OCR problems with corresponding resolution suggestions.

Table F-4 Common OCR Problems and Solutions

Problem Solution
Not currently using OCR mirroring and would like to enable it.
Run the ocrconfig command with the -replace option as described.

An OCR failed and you need to replace it. Error messages in Oracle Enterprise Manager or OCR log file.
Run the ocrconfig command with the -replace option as described.

An OCR has a misconfiguration.
Run the ocrconfig command with the -repair option as described.

You are experiencing a severe performance effect from OCR processing or you want to remove an OCR for other reasons.
Run the ocrconfig command with the -replace option as described .

An OCR has failed and before you can fix it, the node need to be rebooted with only one OCR.
Run the ocrconfig -repair command to remove the bad ocr file. Oracle Clusterware will not start if it cannot find all OCRs defined.


Enabling Additional Tracing for Oracle Clusterware High Availability
Oracle Support may ask you to enable tracing to capture additional information. Because the procedures described in this section may affect performance, only perform these activities with the assistance of Oracle Support. This section includes the following topics:

•Generating Additional Trace Information for a Running Resource

•Verifying Event Manager Daemon Communications

Generating Additional Trace Information for a Running Resource
To generate additional trace information for a running resource, Oracle recommends that you use CRSCTL commands. For example, issue the following command to turn on debugging for resources:

$ crsctl debug log res "resource_name:level"
For example, to set the value of the USR_ORA_DEBUG initialization parameter to 1 for the VIP resource, issue the following command:

$ crsctl debug log res ora.cwclu011.vip:1
Verifying Event Manager Daemon Communications
The event manager daemons (evmd) running on separate nodes communicate through specific ports. To determine whether the evmd for a node can send and receive messages, perform the test described in this section while running session 1 in the background.On node 1, session 1 enter:

$ evmwatch –A –t "@timestamp @@"
On node 2, session 2 enter:

$ evmpost -u "hello" [-h nodename]
Session 1 should show output similar to the following:

$ 21-Jul-2007 08:04:26 hello
Ensure that each node can both send and receive messages by executing this test in several permutations.

[@more@]

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/9907339/viewspace-1046674/,如需轉載,請註明出處,否則將追究法律責任。

相關文章