Troubleshooting Database Hang Issues (Doc ID 1378583.1)
In this Document
Purpose |
Troubleshooting Steps |
Priority: Immediate Resolution |
Diagnostic Data Collection |
Guidance for General Scenarios |
Guidance for Specific Scenarios |
Intermittent Hangs |
Troubleshooting Other Issues |
References |
Applies to:
Oracle Database - Personal Edition - Version 10.1.0.2 and laterOracle Database - Enterprise Edition - Version 10.1.0.2 and later
Oracle Database - Standard Edition - Version 10.1.0.2 and later
Information in this document applies to any platform.
***Checked for relevance on 14-Nov-2013***
Purpose
The diagnosis of the cause of database hangs is a complex task and can involve sifting through large volumes of trace which then needs correlating to determine a cause. Due to this, it may not be feasible to produce a resolution to these issues through a document such as this. However, this document can advise on the collection of first stage diagnostic information and other information.
This document supercedes :
Troubleshooting Steps
Priority: Immediate Resolution
A database 'hang' situation is a rare occurrence and can have many diverse causes, but a true hang can be devastating to the operation of a system. The priority is usually to get the system working as quickly as possible.
Often this means killing of processes or restating of the system which has the unfortunate side effect of removing all the evidence necessary to diagnose the cause of the issue.
Note that a hang and a spin situation can often be indistinguishable in as much as both can display the same symptoms (locking up the system making it inoperable). In terms of resolution, hangs and spins tend to have different directions, but in terms of initial diagnostics, these are similar. To determine between the two types see:
Diagnostic Data Collection
If possible, collect the following during the hang situation prior to any emergency resolution action:
The basic information to diagnose a hang is :
-
Starting from 11g release 1, the dia0 background processes starts collecting hanganalyze information and stores this in memory in the "hang analysis cache". It does this every 3 seconds for local hanganalyze information and every 10 seconds for global (RAC) hanganalyze information. This information can provide a quick view of hang chains occurring at the time of a hang being experienced.
Document 1428210.1 Troubleshooting Database Contention With V$Wait_Chains
-
Hanganalyze and Systemstate dumps:
Collected automatically, manually or possibly via HANGFG -
Document 362094.1 HANGFG User Guide
-
Snapshots of General database performance:
Typically with AWR reports -
Document 1363422.1 AWR Reports - Information Center
-
Some background system information:
Typically collected via RDA -
Document 314422.1 - Remote Diagnostic Agent (RDA) 4 - Getting Started
Guidance for General Scenarios
In general, diagnosing the cause of a hang is establishing which process or processes are holding a resource that is blocking the others. This can be established in many ways but as an example, might be a process that is completely stuck waiting for some other activity. When this activity is complete or is terminated, this frees up the waiters. In a simple example, a blocker waits for a holder and freeing the holder resolves the hang, but it is possible for a chain of holders and waiters to exist making identification of the real holder a more complex process.
In terms of interpreting the diagnostics:
-
Hanganalyze
Hanganalyze is probably the easiest hang diagnostic to interpret since it is designed to be a summary as opposed to exhaustive. Hanganalyze trace provides a quick pointer to the waiting and potential holder processes with 'cycles' (i.e. identification of processes that are blocking each other).
Document 215858.1 - Interpreting HANGANALYZE trace files to diagnose hanging and performance problems -
Systemstate
Systemstate trace gives an exhaustive trace of all processes that can be explored to determine some more detail about the holder itself. Its interpretation is complex and is beyond the scope of this article, although some of the specific articles below do touch on this. Generally these should be sent to support for interpretation.
-
General Systemwide Reports
AWR reports provides system wide information that can be useful in determining the general area that processes are waiting on and can also give useful comparison information from the build up, and the aftermath of an issue.
Document 1359094.1 FAQ: How to read an AWR report
-
RDA
RDA provides background structure information that can be useful for elimination of potential solutions once found.
Guidance for Specific Scenarios
The following articles cover diagnostics that can be used to approach specific situations that may lead to hangs:
Document 1342917.1 Troubleshooting 'latch: cache buffers chains' Wait Contention
Document 278316.1 Potential reasons for "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! "
Document 406570.1 How to Avoid Contention Based Hangs
Intermittent Hangs
Performance problems that are intermittent happen without warning, last for a short duration, and are extremely difficult to diagnose. In order to assist with these issues, Support have developed the following tools to assist collect information in such circumstances:
Document 352363.1 LTOM - The On-Board Monitor User Guide
Document 301137.1 OS Watcher User Guide
Troubleshooting Other Issues
For guidance troubleshooting other performance issues see:
References
NOTE:1342917.1 - Troubleshooting 'latch: cache buffers chains' Wait ContentionNOTE:1359094.1 - How to Use AWR Reports to Diagnose Database Performance Issues
NOTE:1363422.1 - Automatic Workload Repository (AWR) Reports - Start Point
NOTE:1377446.1 - * Troubleshooting Performance Issues
NOTE:1428210.1 - Troubleshooting Database Contention With V$Wait_Chains
NOTE:215858.1 - Interpreting HANGANALYZE trace files to diagnose hanging and performance problems for 9i and 10g.
NOTE:301137.1 - OSWatcher Black Box (Includes: [Video])
NOTE:314422.1 - Remote Diagnostic Agent (RDA) - Getting Started
NOTE:352363.1 - LTOM - The On-Board Monitor User Guide
NOTE:362094.1 - HANGFG User Guide
NOTE:370363.1 - CASE STUDY: Using Real-Time Diagnostic Tools to Diagnose Intermittent Database Hangs
NOTE:406570.1 - How to Avoid Contention Based Hangs
NOTE:452358.1 - How to Collect Diagnostics for Database Hanging Issues
NOTE:61552.1 - Troubleshooting Oracle Database Hanging Issues for versions from 7 to 9--Exhaustive.
NOTE:68738.1 - No Response from the Server, Does it Hang or Spin?
NOTE:976714.1 - 'PMON failed to acquire latch, see PMON dump' in Alert Log - How To Diagnose
|
|
- Oracle Database Products > Oracle Database Suite > Oracle Database > Oracle Database - Personal Edition > RDBMS > Database Level Performance Issues (not SQL Tuning)
- Oracle Database Products > Oracle Database Suite > Oracle Database > Oracle Database - Enterprise Edition > RDBMS > Database Level Performance Issues (not SQL Tuning)
- Oracle Database Products > Oracle Database Suite > Oracle Database > Oracle Database - Standard Edition > Generic RDBMS > Database Level Performance Issues (not SQL Tuning)
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/17252115/viewspace-1133584/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Troubleshooting Database Control Startup IssuesDatabase
- How to Collect Diagnostics for Database Hanging Issues (Doc ID 452358.1)Database
- Master Note - Troubleshooting DBCA Issues (文件 ID 1269459.1)AST
- Troubleshoot Grid Infrastructure Startup Issues (Doc ID 1050908.1)ASTStruct
- No Response from the Server, Does it Hang or Spin? (Doc ID 68738.1)Server
- Troubleshooting "Global Enqueue Services Deadlock detected" (Doc ID 1443482.1)ENQ
- TROUBLESHOOTING - ASM disk not found/visible/discovered issuesASM
- * Master Note: Database Performance Overview (Doc ID 402983.1)ASTDatabaseORMView
- How to Duplicate a Database in NOARCHIVELOG mode (Doc ID 275480.1)DatabaseHive
- How to Collect Diagnostics for Database Hanging Issues (文件 ID 452358.1)Database
- Oracle10G RMAN Database Duplication (Doc ID 259694.1)OracleDatabase
- How To Size the Database Smart Flash Cache (Doc ID 1317950.1)Database
- Troubleshooting Database Creation (121)Database
- Step by Step Troubleshooting Guide for TDE Error ORA-28374 (Doc ID 1541818.1)GUIIDEError
- Performing duplicate database with ASM/OMF/RMAN (Doc ID 340848.1)ORMDatabaseASM
- How To List All The Named Events Set For A Database (Doc ID 436036.1)Database
- Information On Installed Database Components and Schemas (Doc ID 472937.1)ORMDatabase
- Can Tapes Be Used For RMAN Convert Database (Doc ID 563816.1)Database
- Recover database after disk loss (Doc ID 230829.1)Database
- RMAN10g: backup copy of database (Doc ID 266980.1)Database
- Top 5 Database and/or Instance Performance Issues in RAC EnvironmentDatabaseORM
- Master Note of Linux OS Requirements for Database Server (Doc ID 851598.1)ASTLinuxUIREMDatabaseServer
- JavaVM Component Database PSU and RU Patches(Doc ID 1929745.1)JavaDatabase
- [轉帖]Release Schedule of Current Database Releases (Doc ID 742060.1)Database
- Pre 11.2 Database Issues in 11gR2 Grid Infrastructure Environment_948456.1DatabaseASTStruct
- Master Note for Handling Oracle Database Corruption Issues [ID 1088018.1]ASTOracleDatabase
- Troubleshooting ServeRAIDServerAI
- ORA-1578 Corrupt Block Found in Encrypted Database (Doc ID 1329437.1)BloCDatabase
- Database and/or Instance Performance Issues in RAC Environment_1373500.1DatabaseORM
- How to Collect Diagnostics for Database Hanging Issues_452358.1Database
- Recover a Database Having Added a Datafile Since Last Backup (Doc ID 29430.1)DatabaseAST
- 【健康檢查】How to Perform a Health Check on the Database (Doc ID 122669.1)ORMDatabase
- Hide RMAN Passwords when Connecting to a Target Database (Doc ID 183377.1)IDEDatabase
- Stuck recovery of database ORA-00600[3020] (Doc ID 283269.1)Database
- How to Troubleshoot Grid Infrastructure Startup IssuesASTStruct
- TOP Note: Solutions for Typical GI /RAC Database runInstaller Issues_1056713.1Database
- Checkpoint Tuning and Troubleshooting GuideGUIIDE
- 【MOS】Top 5 Grid Infrastructure Startup Issues (文件 ID 1368382.1)ASTStruct