'PMON failed to acquire latch, see PMON dump' in Alert Log-976714.1

rongshiyuan發表於2013-01-08

AIUI

'PMON failed to acquire latch, see PMON dump' in Alert Log - How To Diagnose [ID 976714.1]

In this Document

This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review.

Applies to:

Oracle Server - Enterprise Edition - Version 10.2.0.1 and later
Oracle Server - Personal Edition - Version 10.2.0.1 and later
Oracle Server - Standard Edition - Version 10.2.0.1 and later
Information in this document applies to any platform.
***Checked for relevance on 30-July-2012***
***Checked for relevance on 03-August-2011***

Goal

Explain the steps necessary and the diagnostics to collect for a hang situation with messages such as :

PMON failed to acquire latch, see PMON dump

in the alert log.

Fix

Issue Description

This issue occurs because the PMON process is unable to get a resource for a fixed period of time. This can be a serious issue and hang the database since PMON will be unable to perform. its normal maintenance tasks and if the blocker is not freed will halt database activity.

Diagnostics

In these cases it is necessary to collect the following diagnostics:

Current RDA information. This will provide the Alert Log which is useful to identify the time of the messages and also to determine if there are any other errors at the time which might explain why PMON is unable to continue. Additionally an up to date current RDA provides a lot of additional information about the configuration of the database and performance metrics that may provide useful background to the problem.
Systemstates and hanganalyze trace at the time of the problem
AWR reports from immediately before, during and after the problem.
PMON trace file from the time of the problem

Apart from the PMON Trace, the diagnostics are very similar to the data required for a 'standard' hang situation. To collect this information see:

Document 452358.1 Database Hangs: What to collect for support.

Interpretation

PMON Trace
This file is the key start point for the analysis of the problem. It shows what resource PMON is unable to get and what is likely holding it. When used in conjunction with systemstates at the time of the probem, the cause of the issue can be determined and the reason for the holder keeping the resource for so long can be established.
Example trace:
```
*** SESSION ID:(115.1) 2009-12-09 06:37:09.015
PMON unable to acquire latch 38ef52390 Child library cache level=5 child#=29 
        Location from where latch is held: kgldtld: 2child: 
        Context saved from call: 3305698714
        state=busy, wlstate=free
    waiters [orapid (seconds since: put on list, posted, alive check)]:
     49 (174, 1260337028, 174)
     29 (174, 1260337028, 174)
     27 (174, 1260337028, 174)
     70 (171, 1260337028, 171)
     9 (134, 1260337028, 134)
     waiter count=5
    gotten 1905490 times wait, failed first 80 sleeps 0
    gotten 18325 times nowait, failed: 4
  possible holder pid = 26 spid=202
```
In this example, the trace tells us that the PMON process is unable to acquire the "Child library cache" latch:
```
PMON unable to acquire latch 38ef52390 Child library cache level=5 child#=29 
```
The possible holder process is shown as:
```
  possible holder pid = 26 spid=20
```
With an accompanying systemstate, this holding process can be found and its activity checked to see whether the activity is appropriate.
The second section of the PMON trace dumps part of the process state for the HOLDING process with a short stack.
```
----------------------------------------
SO: 443803fb8, type: 2, owner: 0, flag: INIT/-/-/0x00
  (process) Oracle pid=26, calls cur/top: 4439f7178/443845450, flag: (0) -
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 549 0 4
              last post received-location: kslpsr
              last process to post me: 403c01830 1 6
              last post sent: 0 0 24
              last post sent-location: ksasnd
              last process posted by me: 403c01830 1 6
  (latch info) wait_event=0 bits=20
        Location from where call was made: kgldtld: 2child: 
        Context saved from call: 3305698714
    waiting for 38ef524d0 Child library cache level=5 child#=27 
        Location from where latch is held: kglpin: 
        Context saved from call: 0
        state=busy, wlstate=free
        waiters [orapid (seconds since: put on list, posted, alive check)]:
         58 (174, 1260337028, 174)
         26 (174, 1260337028, 174)
         46 (174, 1260337028, 174)
         waiter count=3
        gotten 1098798 times wait, failed first 71 sleeps 13
        gotten 19775 times nowait, failed: 918
        possible holder pid = 37 spid=16089
    on wait list for 38ef524d0
    holding (efd=16) 38ef52390 Child library cache level=5 child#=29 
        Location from where latch is held: kgldtld: 2child: 
        Context saved from call: 3305698714
        state=busy, wlstate=free
        waiters [orapid (seconds since: put on list, posted, alive check)]:
         49 (174, 1260337028, 174)
         29 (174, 1260337028, 174)
         27 (174, 1260337028, 174)
         70 (171, 1260337028, 171)
         9 (134, 1260337028, 134)
         waiter count=5
    Process Group: DEFAULT, pseudo proc: 390f99190
    O/S info: user: oracle, term: UNKNOWN, ospid: 2024
    OSD pid info: Unix process pid: 2024, image: oracle@ipaddress
    Short stack dump: 
ksdxfstk()+36
```
You can see that this is dumping information from Process 26 (the holder) here :
```
  (process) Oracle pid=26, calls cur/top: 4439f7178/443845450, flag: (0) -
```
From lower down in this Process information, the holder is holding the 'Child library cache' latch:
```
 holding (efd=16) 38ef52390 Child library cache level=5 child#=29
```
(this is what is blocking PMON) confirming that this is the holder process we are looking for (it is possible that due to other activity that the holder has changed).

The next thing to determine is what the holder is actually doing - it will either be waiting for something (in which case it is not the base level holder) or on the CPU (in which case it is the blocker that is blocking the other process(es)).
This holder session (pid 26) is waiting for:
```
 waiting for 38ef524d0 Child library cache level=5 child#=27
```
which is a different Child (child#=27) of a different library cache latch (38ef524d0) .
The possible holder of this child# 27 is:
```
 possible holder pid = 37 spid=16089
```
This means that the holder that is blocking PMON is being blocked by another session (pid 37).
The PMON trace does not recursively dump all holders until we reach the base holder, so in order to determine the final blocking in this case systemstates and hanganalyze dumps are required to find this; PMON trace does not contain all the information required to drill down to the root causein this case.

Note: If the holder blocking PMON (pid 26) had been active on cpu then the PMON trace alone may have been sufficient to diagnose the issue since this would have been the base level blocker. However, in order to determine if there is any process movement, multiple systemstates are required, so it is always prudent to collect these so as to have all the required information.

Additionally, you may be able to determine some more information just from the PMON trace. The fact that PMON has detected a long wait and flagged it with a trace and alert log entry is a good indication that something abnormal is occurring and there is a problem of some sort with the blocker. In this case PMON is waiting for 'Child library cache' so it follows that it is trying to parse something and do something that relates to the storing of a cursor (or similar) in the library cache area of the shared pool. The holder is holding this resource, so was there anything at the time of the problem that might have been going on to cause a parse or similar activity to take a long time? This sort of analysis may assist with finding a potential cause.

Known Bugs:

Document 468740.1"Pmon Failed To Acquire Latch" Messages in Alert Log -Database Hung
Document 4632780.8Bug 4632780 - PMON "failed to acquire latch" during shutdown
Document 8502963.8Bug 8502963 - PMON fails to acquire latch and publish connection load balancing

References

@ BUG:4632780 - PMON FAILED TO ACQUIRE LATCH DURING SHUTDOWN IMMEDIATE
@ BUG:8502963 - PMON FAILS TO ACQUIRE LATCH AND PUBLISH CONNECTION LOAD BALANCING
NOTE:278316.1 - Troubleshooting: "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! "
NOTE:452358.1 - How to Collect Diagnostics for Database Hanging Issues
NOTE:4632780.8 - Bug 4632780 - PMON "failed to acquire latch" during shutdown
NOTE:468740.1 - "Pmon Failed To Acquire Latch" Messages in Alert Log -Database Hung
NOTE:8502963.8 - Bug 8502963 - PMON fails to acquire latch and publish connection load balancing

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/17252115/viewspace-752334/，如需轉載，請註明出處，否則將追究法律責任。

PMON failed to acquire latch, see PMON dump in alert log
2011-11-07
AIUI
PMON failed to acquire latch, see PMON dump
2008-09-05
AIUI
資料庫關閉Hang住，出現錯誤PMON failed to acquire latch, see PMON dump
2011-05-16
資料庫AIUI
Bug 4632780 - PMON "failed to acquire latch" during shutdown
2008-06-28
AIUI
PMON "failed to acquire latch" during shutdown-4632780.8
2013-01-09
AIUI
關閉資料庫出現PMON failed to acquire latch資訊
2012-11-19
資料庫AIUI
幫網友現場解決11204 data guard備庫大量報PMON failed to acquire latch
2015-12-16
AIUI
Killed Session Are Not Cleaned By PMON
2016-12-20
Session
Process Monitor Process (PMON) (121)
2007-11-01
oracle smon與pmon ckpt功能的作用(ZT)
2008-06-05
Oracle
oracle 11g pmon工作內容系列一
2015-11-04
Oracle
oracle 11g pmon工作內容系列二
2015-11-05
Oracle
oracle 11g pmon工作內容系列三
2015-11-05
Oracle
【RAC】PMON: terminating the instance due to error 481
2011-11-22
Error
PMON、SMON、DBWn、LGWR、CKPT、ARCH等後臺程式說明
2017-07-13
oracle 的伺服器程式（PMON, SMON,CKPT,DBWn,LGWR，ARCn）
2015-05-01
Oracle伺服器
關於pmon、smon、mman、mmon、mmnl後臺程式的解釋
2019-07-22
Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481
2017-10-27
AIError
ksuapc : ORA-1033 foreground process starts before PMON處理
2017-10-06
PMON TRYING TO START CJQ0 DURING DATABASE SHUTDOWN-ORA-443&ORA-450
2013-10-18
Database
oracle 11g latch之v$latch和systemstate dump檔案之系列四
2015-11-03
Oracle
iOS linker command failed with exit code 1 (use v to see invocation)
2017-12-13
iOSAI
Data guard 中 alert 日誌報錯 "FAL archive failed"
2015-12-02
HiveAI
kafka啟動報錯：kafka.common.KafkaException: Failed to acquire lock on file .lock
2017-10-18
KafkaExceptionAIUI
JAR creation failed. See details for additional information解決方案大全
2015-01-23
JARAIORM
Failed to start the Clusterware. Last 20 lines of the alert log follow:
2011-04-16
AIAST
iOS報錯：linker command failed with exit code 1 (use -v to see invocation)
2016-10-20
iOSAI
Job for firewalld.service failed because a timeout was exceeded. See "systemctl status firewalld.ser
2017-09-20
AI
簡易的解決方式linker command failed with exit code 1 (use -v to see invocation)
2018-04-10
AI
ios clang: error: linker command failed with exit code 1 (use -v to see invocation)解決方法
2013-08-12
iOSErrorAI
Android Bugs——Error:java.lang.RuntimeException: Some file crunching failed, see logs for details
2018-06-22
AndroidErrorJavaExceptionAI
Android Gradle Build Error:Some file crunching failed, see logs for details解決辦法
2017-09-13
AndroidGradleUIErrorAI
ALERT.LOG for ASM Shows "WARNING: failed to online diskgroup resource ora.GI.dg
2015-01-17
ASMAI
Oracle Latch及latch衝突
2007-08-17
Oracle
問題之Process m000 died, see its trace file&ksvcreate: Process(m000) creation failed
2011-06-30
AI
Unity3d build打包app時報錯：Failed to re-package resources. See the Console for details.
2017-11-15
Unity3DUIAPPAIPackage
Queries to view Alert Log content And Alert Location
2014-12-27
View
Oracle KSL Latch 管理層與 Latch管理
2012-05-30
Oracle

'PMON failed to acquire latch, see PMON dump' in Alert Log-976714.1

Applies to:

Goal

Fix

Issue Description

Diagnostics

Interpretation

Known Bugs:

References

相關內容

產品

關鍵字

相關文章