How to Collect Diagnostics for Database Hanging Issues (Doc ID 452358.1)
In this Document
|
Goal |
|
Performance Service Request Diagnostic Collection (SRDC) documents |
|
Solution |
|
What is needed to diagnose "Database Hang" issues? |
|
A. Dumps and Traces |
|
Hanganalyze and Systemstate Dumps |
|
Collecting Hanganalyze and Systemstate Dumps |
|
Logging in to the system |
|
Collection commands for Hanganalyze and Systemstate: Non-RAC: |
|
Collection commands for Hanganalyze and Systemstate: RAC |
|
For 10g (and above) |
|
Explanation of Hanganalyze and Systemstate Levels |
|
Other Methods |
|
Capturing systemstates based upon an error |
|
V$wait_chains |
|
B. Provide AWR/Statspack snapshots of General database performance |
|
C. Gather an up to date RDA |
|
Proactive Methods to gather information on a Hanging System |
|
Oracle Enterprise Manager 12c Real-Time ADDM |
|
Retroactive Information Collection |
|
Community Discussions |
|
References |
APPLIES TO:
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Cloud Schema Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Database - Standard Edition - Version 9.0.1.0 and later
Information in this document applies to any platform.
GOAL
When a database appears to be hung, it is useful to collect information from the database in order to determine the root cause of the hang. The root cause of the hang can often be isolated and solved using the diagnostic information gathered. Alternatively, if this is not possible, we can use the information obtained in order to help eliminate future occurences
Performance Service Request Diagnostic Collection (SRDC) documents
Service Request Data Collection (SRDC) documents have been specially designed to provide the reader with the necessary instructions to provide a step by step guide to collecting information for a various common Database Performance Issues.
SOLUTION
WHAT IS NEEDED TO DIAGNOSE "DATABASE HANG" ISSUES?
Database hangs are characterised by a number of processes waiting for some other activities to complete. Typically there is one or more blockers that are stuck or perhaps working hard and not freeing resources quickly enough. In order to diagnose this the following diagnostics are needed:
A. Hanganalyze and Systemstate Dumps
B. AWR/Statspack snapshots of General database performance
C. Up to date RDA
Please refer to the relevant sections below for more details on how to collect these.
A. Dumps and Traces
Hanganalyze and Systemstate Dumps
Hanganalyze and Systemstate dumps provide information on the processes in the database at a specific point in time. Hanganalyze provides information on all processes involved in the hang chain, whereas systemstate provides information on all processes in the database. When looking at a potential hang situation, you need to determine whether a process is stuck or moving slowly. By collecting these dumps at 2 consecutive intervals this can be established. If a process is stuck, these traces also provide the information to start further diagnosis and possibly help to provide the solution.
- Hanganalyze is a summary and will confirm if the db is really hung or just slow and provides a consistent snapshot.
- Systemstate dump shows what each process on the database is doing
Collecting Hanganalyze and Systemstate Dumps
Note: Hanganalyze is global dump. So for gather it globally while in root container.
Logging in to the system
Using SQL*Plus connect as SYSDBA using the following command:
If there are problems making this connection then in 10gR2 and above, the sqlplus "preliminary connection" can be used :
Statement processed.
the tracefile will contain the following output:
ERROR: Can not perform hang analysis dump without a process state object and a session state object.
( process=(nil), sess=(nil) )
As a workaround connect to an existing process id, for example PMON/SMON, before performing the dump :
oradebug setospid < use an existing process id, for example PMON/SMON>
oradebug hanganalyze 3
For more about connecting with a preliminary connection, see:
Collection commands for Hanganalyze and Systemstate: Non-RAC:
Sometimes, database may actually just be very slow and not actually hanging. It is therefore recommended, where possible to get 2 hanganalyze and 2 systemstate dumps in order to determine whether processes are moving at all or whether they are "frozen".
Hanganalyze
oradebug setmypid
oradebug unlimit
oradebug hanganalyze 3
-- Wait one minute before getting the second hanganalyze
oradebug hanganalyze 3
oradebug tracefile_name
exit
Systemstate
oradebug setmypid
oradebug unlimit
oradebug dump systemstate 258
oradebug dump systemstate 258
oradebug tracefile_name
exit
Collection commands for Hanganalyze and Systemstate: RAC
There are 2 bugs affecting RAC that without the relevant patches being applied on your system, make using level 266 or 267 very costly. Therefore without these fixes in place it highly inadvisable to use these level
For information on these patches see:
Document 11827088.8 Bug 11827088 - Latch 'gc element' contention, LMHB terminates the instance
Note: both of these bugs are resolved in Oracle 11g Release 2 version 11.2.0.3.
For 10g (and above)
oradebug setmypid
oradebug unlimit
oradebug -g all hanganalyze 3
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 258
oradebug -g all dump systemstate 258
exit
In RAC environment, a dump will be created for all RAC instances in the DIAG trace file for each instance.
Explanation of Hanganalyze and Systemstate Levels
Hanganalyze levels:
- Level 3: In 11g onwards, level 3 also collects a short stack for relevant processes in hang chain
Systemstate levels:
- Level 258 is a fast alternative but we'd lose some lock element data
- Level 267 can be used if additional buffer cache / lock element data is needed with an understanding of the cost
Other Methods
If connection to the system is not possible in any form, then please refer to the following article which describes how to collect systemstates in that situation:
On RAC Systems, hanganalyze, systemstates and some other RAC information can be collected using the 'racdiag.sql' script, see:
Capturing systemstates based upon an error
Sometimes you may wish to collect systemstate information at the time that a particular error occurs. This can be done by setting an event in the session or system wide to trigger based upon the detection of an error. For example, if a hang was being encountered that was related to an ORA-00054 error, then you could capture a systemstate when the ORA-00054 occurs using the following command:
ALTER SYSTEM SET events '54 trace name systemstate level 258';
The next time an ORA-00054 is encountered, a systemstate will be dumped.
The tracing can be disabled with :
You can also set such events in the spfile. See:
V$wait_chains
Starting from 11g release 1, the dia0 background processes starts collecting hanganalyze information and stores this in memory in the "hang analysis cache". It does this every 3 seconds for local hanganalyze information and every 10 seconds for global (RAC) hanganalyze information. This information can provide a quick view of hang chains occurring at the time of a hang being experienced.
For more information see:
B. Provide AWR/Statspack snapshots of General database performance
Hangs are a visible effect of a number of potential causes, this can range from a single process issue to something brought on by a global problem.
Collecting information about the general performance of the database in the build up to, during and after the problem is of primary importance since these snapshots can help to determine the nature of the load on the database at these times and can provide vital diagnostic information. This may prove invaluable in identifying the area of the problem and ultimately resolving the issue.
To do this, please take and upload snapshot reports of database performance (AWR (or statspack) reports) immediately before, during and after the hang..
Please refer to the following article for details of what to collect:
C. Gather an up to date RDA
An up to date current RDA provides a lot of additional information about the configuration of the database and performance metrics and can be examined to spot background issues that may impact performance.
See the following note on My Oracle Support:
PROACTIVE METHODS TO GATHER INFORMATION ON A HANGING SYSTEM
On some systems a hang can occur when the DBA is not available to run diagnostics or at times it may be too late to collect the relevant diagnostics. In these cases, the following methods may be used to gather diagnostics:
- As an alternative to the manual collection method notes above, it is also possible to use the HANGFG script as described in the following note to collect the information:
Document 362094.1 HANGFG User GuideAdditionally, this script can collect information with lower impact on the target database.
- Procwatcher
Procwatcher is a tool that examines and monitors Oracle database and/or clusterware processes at a specific interval
The following notes explain how to use Procwatcher:
Document 459694.1 Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware Processes
Document 1352623.1 How To Troubleshoot Database Contention With Procwatcher - OSWatcher contains a built in analyzer that allows the data that has been collected to be automatically analyzed, pro-actively looking for cpu, memory, io and network issues. It is recommended that all users install and run OSW since it is invaluable for looking at issues on the OS and has very little overhead. It can also be extremely useful for looking at OS performance degradation that may be seen when a hang situation occurs.
Refer to the following for download, user guide and usage videos on OSWatcher:Document 301137.1 OSWatcher User Guide (Includes: [Video])
ORACLE ENTERPRISE MANAGER 12C REAL-TIME ADDM
Real-Time ADDM is a feature of Oracle Enterprise Manager Cloud Control 12c that allows you to analyze database performance automatically when you cannot logon to the database because it is hung or performing very slowly due to a performance issue. It analyzes current performance when database is hanging or running slow and reports sources of severe contention.
For more information see the following video:
RETROACTIVE INFORMATION COLLECTION
Sometimes we may only notice a hang after it has occurred. In this case the following information may help with Root Cause Analysis:
- A series of AWR/Statspack reports leading up to and during the hang
- ASH reports - one can obtain more granular reports during the time of the hang - even up to
one minute in time. - Raw ASH information. This can be obtained by issuing an ashdump trac. See:
Document 243132.1 10g and above Active Session History (Ash) And Analysis Of Ash Online And Offline
Document 555303.1 ashdump* scripts and post-load processing of MMNL traces - Alert log and any traces created at time of hang
On a RAC specifically check the following traces files as well: dia0, lmhb, diag and lmd0 traces - RDA as above
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29785807/viewspace-2795191/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 【CURSOR】How to Monitor and tune Open and Cached Cursors (Doc ID 1430255.1)
- [轉帖]Release Schedule of Current Database Releases (Doc ID 742060.1)Database
- How To Replicate Data from Oracle to Postgres Using GoldenGate (Doc ID 1544137.1OracleGo
- Oracle 最新PSU (Doc ID 2118136.2)Oracle
- expdp/impdp變慢 (Doc ID 2469587.1)
- Reading and Understanding Systemstate Dumps (Doc ID 423153.1)
- How To Turn SNMP On/Off ? [ID 472530.1]
- RMAN -- Frequently Asked Question (FAQ) (Doc ID 469777.1)
- Oracle 下載最新psu (Doc ID 2118136.2)Oracle
- cadvisor報錯failed to collect filesystem stats- rootDiskErr: du command failed on /rootfs/var/lib/docAI
- PRCT-1011 : Failed to run "oifcfg" (Doc ID 1380183.1)AI
- What is OPAQUE_TRANSFORM Hint and how to Control it [ID 780503.1]OpaqueORM
- Migration Of An Oracle Database Across OS Platforms [ID 733205.1]OracleDatabaseROSPlatform
- rfs (PID:146054): Database mount ID mismatch案例Database
- [20221222]How to Enable and Disable Database Options in oracle 11g.txtDatabaseOracle
- Physical Standby Switchover_status Showing Not Allowed. (Doc ID 1392763.1)
- oracle11g: Scheduler Maintenance Tasks or Autotasks (Doc ID 756734.1)OracleAINaN
- 【DBA】Relinking Oracle Home 常見問題 (Doc ID 2048232.1)Oracle
- How to Increase CSS Misscount in single instance ASM installations [ID 729878.1]CSSASM
- 11g-Reduce Transportable Tablespace Downtime using XTTS (Doc ID 1389592.1)TTS
- 7.53 COLLECT
- How to redirect to a specific web page after sign out from Entra IDWeb
- Can GoldenGate Replicate An Oracle Table That Contains Only CLOB Column(s)? (Doc ID 971833.1)GoOracleAI
- 12c DG新特性 - Active Data Guard Far Sync (Doc ID 2179719.1)
- 【GRID】Grid Infrastructure 啟動的五大問題 (Doc ID 1526147.1)ASTStruct
- 【DATAGUARD】Data Guard 12C 新特性:Far Sync Standby (Doc ID 2179719.1)
- ORA-04063: package body SYS.DBMS_REGISTRY_SYS has errors (Doc ID 2165402.1)PackageError
- 【ASM】ORA-27504 ORA-27300 ORA-27303 while starting ASM (Doc ID 2281441.1)ASMWhile
- 【RAC】Oracle 12c以及以上版本的diagsnap是什麼? (Doc ID 2469643.1)Oracle
- Copan Diagnostics領導層換人
- 用 Laravel Collect 收藏文章Laravel
- Go 語言中的 collect 使用Go
- sqlserver docSQLServer
- 基於GitHub Issues的評論系統--gitmentGithub
- How to ssh
- 自動部署基於issues的靜態部落格
- .NET 6 全新指標 System.Diagnostics.Metrics 介紹指標
- The database owner SID recorded in the master database differs from the database owner SID recorded in database 'DB_NAME'DatabaseAST