ORA-32701錯誤原因分析及處理方法
1 在處理ORA-32701前,需要先了解以下資料庫的hang管理器,參考Oracle的官方文件(Oracle Hang Manager (文件 ID 1534591.1))。
內容如下,可以認為如果資料庫發生hang,一般情況資料庫會自動處理的,但有時由於情況複雜,不能處理:
DETAILS Overview
Hang Manager has existed since Oracle Database 10.2.0.1. The main goal of Hang Manager is to reliably detect and,
if hang resolution is enabled, resolve hangs in a timely manner. Over various releases, Hang Manager has been
enhanced along with the wait event infrastructure on which it relies. However, it is only in Oracle Database 11.2.0.2
that Hang Manager actually resolves any hangs by terminating sessions and/or processes. This remains the default
operation in Oracle Database 12.1.0.1.
Hang Manager is only active when Oracle RAC is enabled. You can determine whether an Oracle RAC database is
used / enabled by querying the CLUSTER_DATABASE. If this parameter is set to "TRUE", Oracle RAC is enabled on
this database.
Terminology
Cross Boundary Hang
This is a hang which spans the ASM/DB boundary. In 12.1.0.1, it is possible for Hang Manager to detect hangs
which span the boundary between a database and ASM or ASM and a database.
Deadlock or Closed Chain
A deadlock is a number of sessions making no progress which are in a closed chain or cycle.
All sessions have a waiter and a blocker. The only way to break the chain is to cause one of the sessions to
either progress or terminate.
Hang or Open Chain
From Hang Manager's perspective, a hang is a number of sessions making no progress which are in an open chain
configuration. This is not a cycle or deadlock. An open chain or hang has a root or final blocker which is blocking
all other sessions in the chain. It also contains a final waiter which is at the end of the chain and is blocked by all
other sessions in the chain.
Immediate Waiter
The session in an open chain that is being blocked by the root of the hang.
Quality of Service (QoS) Management
Oracle Database QoS Management is an automated, policy-based product that monitors the workload requests
for an entire system. Oracle Database QoS Management manages the resources that are shared across applications,
and adjusts the system configuration to keep the applications running at the performance levels needed by your
business.
Root (also known as Victim or Final Blocker)
This is the session in an open chain or hang that is blocking all other sessions in the chain.
Self-Resolved Hang
This is a hang or deadlock that was detected by Hang Manager but no longer exists. This could either be because
the root or immediate waiter have progressed or either or both of them have been terminated but not by Hang
Manager.
Hang Manager in 12.1.0.1
Since Oracle Database 11.2.0.2, Hang Manager may resolve hangs it detects by terminating the root session or
process. By default, Hang Manager will not terminate an instance or evict a node. Also, Hang Manager will not
resolve all hangs that it detects. For example, hangs which involve a possible application issue are left to the user
to determine the correct course of action. Also, if Hang Manager determines that the instance on which the root
of a hang resides is experiencing high CPU or IO load, hang resolution will be delayed. This gives some time for
the root to progress and the hang to resolve itself. Finally, Hang Manager currently does not resolve any hangs
on ASM.
Starting with Oracle Database 12.1.0.1 and if QoS is active on the cluster, Hang Manager will use additional
information provided by QoS to determine if a hang should be ignored or resolved. If QoS would like Hang
Manager to resolve a hang, hang resolution may occur more quickly than Hang Manager's normal detection
and resolution times or hang resolution may be delayed.
Previous to 12.1.0.1, hangs were detected only within the database or ASM cluster. However, if a session
on ASM or the database was being blocked by a session on the other side of the ASM/Database boundary
(a Cross Boundary Hang), there was no way to detect it. In Oracle Database 12.1.0.1, Hang Manager was
enhanced to detect hangs which cross the ASM/Database boundary and outputs information to its various
trace files so it is easier to troubleshoot these hangs.
同hang管理器相關的檢視:
V$ Views
Hang Manager exposes some information via various V$ views. These are now documented in the Oracle
Database Reference. A summary of the information in these views is below. For a more detailed description of these views, please go to the appropriate section in the Oracle Database Reference.
V$HANG_INFO
This view contains the active hangs and deadlocks detected by Hang Manager. It does not include any hangs
which have been resolved by Hang Manager or have self-resolved. Only the latest 32 active hangs are in this view.
The root or victim of the hang is included in the information for each active hang in the view.
V$HANG_SESSION_INFO
This view contains the sessions in the main chain for all of the hangs included in the V$HANG_INFO view.
Only the first 20 sessions of the main chain, including the root of the hang, for each hang are kept in this view.
(G)V$HANG_STATISTICS
This view contains various Hang Manager statistics regarding hangs or deadlocks which it has detected. These statistics include: the number of hangs detected, the number of deadlocks detected, the number of hangs resolved by Hang Manager, the number of hangs ignored broken down by the reason for ignoring the hang, the number of hangs which self-resolved, etc.
2 資料庫的告警,檢視發生hang的相關資訊
節點1:
Sun Sep 27 00:31:23 2020
Errors in file /u01/app/oracle/diag/rdbms/test/test1/trace/test1_dia0_25054.trc (incident=920985):
ORA-32701: Possible hangs up to hang ID=1 detected --hang ID=1 說明資料庫自動診斷出Hang
Incident details in: /u01/app/oracle/diag/rdbms/test/test1/incident/incdir_920985/test1_dia0_25054_i920985.trc
DIA0 requesting termination of session sid:1244 with serial # 75 (ospid:88984) on instance 3
--請求終止例項3的會話,作業系統PID:88984
due to a LOCAL, HIGH confidence hang with ID=1.
Hang Resolution Reason: Although the number of affected sessions did not
justify automatic hang resolution initially, this previously ignored
hang was automatically resolved.
DIA0: Examine the alert log on instance 3 for session termination status of hang with ID=1.
--發現例項3的會話阻塞住會話,檢查例項3的alert日誌
節點3:
Sun Sep 27 00:31:23 2020
Errors in file /u01/app/oracle/diag/rdbms/test/test3/trace/test3_dia0_22856.trc (incident=880256):
ORA-32701: Possible hangs up to hang ID=1 detected --hang ID=1 說明資料庫自動診斷出Hang
Incident details in: /u01/app/oracle/diag/rdbms/test/test3/incident/incdir_880256/test3_dia0_22856_i880256.trc
DIA0 terminating blocker (ospid: 88984 sid: 1244 ser#: 75) of hang with ID = 1
--本節點Hang住了,故終止了此會話
requested by master DIA0 process on instance 1 --請求的程式來自於例項1
Hang Resolution Reason: Although the number of affected sessions did not
justify automatic hang resolution initially, this previously ignored
hang was automatically resolved.
by terminating session sid:1244 with serial # 75 (ospid:88984) --終止了作業系統程式
3 首先根據hang ID=1 和 hang ID=2 檢視如下檢視,此檢視會解釋每個程式碼的含義:
SYS@test3>select * from V$HANG_STATISTICS;
STATISTIC# NAME VALUE
---------- --------------------------------------------- ----------
0 number of deadlocks detected and ignored 0
1 number of hangs detected 0
2 number of local hangs 0
3 number of global hangs 0
4 number of transient hangs 0
5 hangs ignored due to high CPU on root's node 0
6 hangs ignored due to high IO on root's node 0
7 hangs ignored due to application contention 0
8 hangs ignored due to long running operations 0
9 hangs monitored due to archiving issues 0
10 hangs ignored due to archiving issues 0
11 hangs ignored, blocked by remote database 0
12 hangs ignored due to SQL parsing 0
13 hangs ignored due to dumping system state 0
14 hangs ignored, instance termination required 0
15 hangs ignored, only one active instance 0
16 number of explicitly resolved hangs 0
17 number of self-resolved hangs 0
18 total self-resolved hang time in seconds 0
19 minimum self-resolved hang time in seconds 0
20 maximum self-resolved hang time in seconds 0
21 number of HSC matched hangs 0
22 hangs resolved due to instance termination 0
23 rows selected.
4 分析資料庫生成的trace檔案,找出報錯的根本原因
節點1 檢視test1_dia0_25054_i920985.trc
Dump file /u01/app/oracle/diag/rdbms/test/test1/incident/incdir_920985/test1_dia0_25054_i920985.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0.4/db_1
System name: Linux
Node name: h0783
Release: 2.6.32-642.el6.x86_64
Version: #1 SMP Wed Apr 13 00:51:26 EDT 2016
Machine: x86_64
Instance name: test1
Redo thread mounted by this instance: 1
Oracle process number: 10
Unix process pid: 25054, image: oracle@h0783 (DIA0)
*** 2020-09-27 00:31:23.807
*** SESSION ID:(201.1) 2020-09-27 00:31:23.807
*** CLIENT ID:() 2020-09-27 00:31:23.807
*** SERVICE NAME:(SYS$BACKGROUND) 2020-09-27 00:31:23.807
*** MODULE NAME:() 2020-09-27 00:31:23.807
*** ACTION NAME:() 2020-09-27 00:31:23.807
Dump continued from file: /u01/app/oracle/diag/rdbms/test/test1/trace/test1_dia0_25054.trc
ORA-32701: Possible hangs up to hang ID=1 detected
========= Dump for incident 920985 (ORA 32701) ========
----- Beginning of Customized Incident Dump(s) -----
One or more resolvable hangs have been detected on your system.
An attempt will be made to resolve these hangs.
The stack included in this incident file is not the cause of the
problem. It is merely pointing to the module that triggered the
incident. The short stacks of the root or final blocked of the
hang and its waiter can be found in the DIA0 trace files on their
respective local instances.
'Resolvable Hangs' below indicate one or more hangs that were found
and identify the final blocking session and instance on which
they occurred. Since the current hang resolution state is 'PROCESS',
any hangs requiring session or process termination will be
automatically resolved.
The following information will assist Oracle Support Services
in further analysis of the root cause of the hang.
*** 2020-09-27 00:31:23.809
Resolvable Hangs in the System
Root Chain Total Hang
Hang Hang Inst Root #hung #hung Hang Hang Resolution
ID Type Status Num Sess Sess Sess Conf Span Action
----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------
1 HANG RSLNPEND 3 1244 2 2 HIGH LOCAL Terminate Process
--此處說明Hang的會話在節點3上,會話的ID為1244
Hang Resolution Reason: Although the number of affected sessions did not
justify automatic hang resolution initially, this previously ignored
hang was automatically resolved.
inst# SessId Ser# OSPID PrcNm Event
----- ------ ----- --------- ----- -----
3 2562 353 28939 M001 enq: WF - contention
3 1244 75 88984 M000 not in wait --此處說明會話1244 等待是事件為 not in wait
The incident file on instance 3 may contain information
about session 1244 with serial number 75 and operating system
process ID 88984. The incident file may contain a short stack
and a process state dump.
Victim Information
Ignored
HangID Inst# Sessid Ser Num OSPID Fatal BG Previous Hang Count
------ ----- ------ ------- --------- -------- ------------- -------
1 3 1244 75 88984 N Existing Hang 1
*** 2020-09-27 00:31:23.809
Wait-For-Graphs collected at 09/27/20 00:25:31) --以下資訊說明蒐集的圖形化阻塞的相關資訊
===============================================================================
Non-intersecting chains:
-------------------------------------------------------------------------------
Chain 1: --Chain 1 說明會話2562等待的事件為:'enq: WF - contention' ,被1244會話阻塞住,
1244會話的等待事件為 not int wait,但等待:event: 'gc current grant 2-way'
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 3 (test.test3)
os id: 28939
process id: 512, oracle@h0785 (M001)
session id: 2562
session serial #: 353
}
is waiting for 'enq: WF - contention' with wait info:
{
p1: 'name|mode'=0x57460006
p2: '0'=0x38
p3: '0'=0x0
time in wait: 1 min 53 sec
timeout after: never
wait id: 73008
blocking: 0 sessions
wait history:
* time between current wait and wait #1: 0.003403 sec
1. event: 'db file sequential read'
time waited: 0.000226 sec
wait id: 73007 p1: 'file#'=0x2
p2: 'block#'=0x1eeff
p3: 'blocks'=0x1
* time between wait #1 and #2: 0.000390 sec
2. event: 'db file sequential read'
time waited: 0.000207 sec
wait id: 73006 p1: 'file#'=0x2
p2: 'block#'=0xe178
p3: 'blocks'=0x1
* time between wait #2 and #3: 0.002593 sec
3. event: 'db file scattered read'
time waited: 0.000276 sec
wait id: 73005 p1: 'file#'=0x2
p2: 'block#'=0x1eefb
p3: 'blocks'=0x2
}
and is blocked by
=> Oracle session identified by:
{
instance: 3 (test.test3)
os id: 88984
process id: 446, oracle@h0785 (M000)
session id: 1244
session serial #: 75
}
which is not in a wait:
{
last wait: 1165 min 7 sec ago
blocking: 1 session
wait history:
1. event: 'gc current grant 2-way' --此類等待事件為資料庫從記憶體中讀取資料塊,發現沒有,
然後從磁碟中讀到記憶體中。故認為此等待事件正常。結合實際查詢,可以認為2562的會話被1224會話阻塞住,
但由於1224會話正在從磁碟大量的讀取資料塊到記憶體中,導致的阻塞。經細查,2562會話當時操作的SQL為:
delete from WRH$_SYSMETRIC_HISTORY tab where (:beg_snap <= tab.snap_id and tab.snap_id <= :end_snap
and dbid = :dbid) and not exists (select 1 from WRM$_BASELINE b where (tab.dbid = b.dbid)
and (tab.snap_id >= b.start_snap_id) and (tab.snap_id <= b.end_snap_id)) ,
1224會話操作的SQL為: insert into wrh$_sql_bind_metadata ,兩個會話都是操作AWR相關的表,故有可能導致阻塞。
time waited: 0.000321 sec
wait id: 1878 p1: ''=0x2
p2: ''=0x2a1a6
p3: ''=0x2010001
* time between wait #1 and #2: 0.000204 sec
2. event: 'gc current grant 2-way'
time waited: 0.000384 sec
wait id: 1877 p1: ''=0x2
p2: ''=0x39862
p3: ''=0x2010001
* time between wait #2 and #3: 0.000523 sec
3. event: 'gc current block 3-way'
time waited: 0.000692 sec
wait id: 1876 p1: ''=0x2
p2: ''=0x29dea
p3: ''=0x1
}
Chain 1 Signature: <not in a wait><='enq: WF - contention'
Chain 1 Signature Hash: 0xee4c6e02
-------------------------------------------------------------------------------
===============================================================================
Sessions in an involuntary wait or not in a wait:
檢視例項3的test3_dia0_22856_i880256.trc
Dump file /u01/app/oracle/diag/rdbms/test/test3/incident/incdir_880256/test3_dia0_22856_i880256.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0.4/db_1
System name: Linux
Node name: h0785
Release: 2.6.32-642.el6.x86_64
Version: #1 SMP Wed Apr 13 00:51:26 EDT 2016
Machine: x86_64
Instance name: test3
Redo thread mounted by this instance: 3
Oracle process number: 10
Unix process pid: 22856, image: oracle@h0785 (DIA0)
*** 2020-09-27 00:31:23.982
*** SESSION ID:(201.1) 2020-09-27 00:31:23.982
*** CLIENT ID:() 2020-09-27 00:31:23.982
*** SERVICE NAME:(SYS$BACKGROUND) 2020-09-27 00:31:23.982
*** MODULE NAME:() 2020-09-27 00:31:23.982
*** ACTION NAME:() 2020-09-27 00:31:23.982
Dump continued from file: /u01/app/oracle/diag/rdbms/test/test3/trace/test3_dia0_22856.trc
ORA-32701: Possible hangs up to hang ID=1 detected
========= Dump for incident 880256 (ORA 32701) ========
----- Beginning of Customized Incident Dump(s) -----
There are resolvable hangs on your system. Hang Manger will
attempt to resolve these hangs. Some information about these
is output below. Complete information is available in an
incident trace file on instance 1.
If the hang is to be resolved by terminating the session or
process that is the root or victim of the hang, additional
information will be output on the local instance of that
session or process. Below are the hangs for which resolution
will be attempted.
*** 2020-09-27 00:31:23.983
Resolvable Hangs in the System
Root Chain Total Hang
Hang Hang Inst Root #hung #hung Hang Hang Resolution
ID Type Status Num Sess Sess Sess Conf Span Action
----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------
1 HANG RSLNPEND 3 1244 2 2 HIGH LOCAL Terminate Process
Hang Resolution Reason: Although the number of affected sessions did not
justify automatic hang resolution initially, this previously ignored
hang was automatically resolved.
inst# SessId Ser# OSPID PrcNm Event
----- ------ ----- --------- ----- -----
3 2562 353 28939 M001 enq: WF - contention
3 1244 75 88984 M000 not in wait
Dumping process info of pid[446.88984] (sid:1244, ser#:75)
requested by master DIA0 process on instance 1.
*** 2020-09-27 00:31:23.983
Process diagnostic dump for oracle@h0785 (M000), OS id=88984,
pid: 446, proc_ser: 32, sid: 1244, sess_ser: 75
-------------------------------------------------------------------------------
os thread scheduling delay history: (sampling every 1.000000 secs)
0.000000 secs at [ 00:31:23 ]
NOTE: scheduling delay has not been sampled for 0.647808 secs 0.000000 secs from [ 00:31:19 - 00:31:24 ], 5 sec avg
0.000000 secs from [ 00:30:24 - 00:31:24 ], 1 min avg
0.000000 secs from [ 00:26:24 - 00:31:24 ], 5 min avg
loadavg : 3.04 3.01 2.86
Memory (Avail / Total) = 363436.51M / 516750.80M
Swap (Avail / Total) = 65536.00M / 65536.00M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 R oracle 88984 1 99 80 0 - 315843 ? Sep26 ? 19:28:37 ora_m000_test3
Short stack dump: --此處為trace生成的二進位制錯誤的堆鍵資訊
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-kghalf()+247<-kqlfbctc()+645<-kgligi()+31<-kglic_cbk()+345<-kglic0()+751<-kglic()+74
<-kqlfbct()+376<-qerfxFetch()+3072<-qerjotRowProc()+359<-qerhjInnerProbeHashTable()+491<-qerfxFetch()+1106<-rwsfcd()+103<-qerhjFetch()+621<-qerjotFetch()+2025
<-rwsfcd()+103<-qerltcFetch()+1223<-insexe()+691<-opiexe()+5632<-kpoal8()+2380<-opiodr()+917<-kpoodrc()+36<-rpiswu2()+1776<-kpoodr()+636<-upirtrc()+2436<-kpurcsc()+98
<-kpuexec()+10790<-OCIStmtExecute()+39<-kewrose_oci_stmt_exec()+55<-kewrgwxf1_gwrsql_exft_1()+317<-kewrgwxf_gwrsql_exft()+496<-kewrews_execute_wr_sql()+52
<-kewrftbs_flush_table_by_sql()+180<-kewrfabt_flush_attribute_table()+997<-kewrfsb_flush_sqlbind()+34<-kewrft_flush_table()+397<-kewrftec_flush_table_ehdlcx()+766
<-kewrftsq_flush_topsql()+768<-kewrft_flush_table()+397<-kewrftec_flush_table_ehdlcx()+766<-kewrfat_flush_all_tables()+1406<-kewrfsr_flush_snapshot_r()+191
<-kewrrfs_remote_flush_slave()+1169<-kebm_slave_main()+586<-ksvrdp()+1766<-opirip()+674<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201
<-__libc_start_main()+253
-------------------------------------------------------------------------------
Process diagnostic dump actual duration=0.340000 sec
(max dump time=15.000000 sec)
*** 2020-09-27 00:31:24.319 --以下為報錯是執行想相關SQL,相關的表為 insert into wrh$_sql_bind_metadata
current sql: insert into wrh$_sql_bind_metadata (snap_id, dbid, sql_id, name, position, dup_position,
datatype, datatype_string, character_sid, precision, scale, max_length) SELECT /*+ ordered use_nl(bnd)
index(bnd sql_id) */ :lah_snap_id, :dbid, bnd.sql_id, name, position, dup_position, datatype, dataty
----------------------------------------
SO: 0x1ec14370a8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x1ec14370a8, name=process, file=ksu.h LINE:12721, pg=0
(process) Oracle pid:446, ser:32, calls cur/top: 0x1d32cfb3c8/0x1d33002b20
flags : (0x2) SYSTEM
flags2: (0x30), flags3: (0x10)
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
5 根據第二步的會話ID,查詢在故障時間點執行的SQL,根據如下資訊,可以確定在故障點2562會話確實被1244會話阻塞住:
SYS@test3>select SESSION_ID,SESSION_SERIAL#,SAMPLE_TIME,WAIT_TIME,BLOCKING_SESSION,SQL_ID from
dba_hist_active_sess_history where INSTANCE_NUMBER=3 and SESSION_ID=1244 and
SAMPLE_TIME between to_timestamp('2020-09-27 00:20:00','yyyy-mm-dd HH24:MI:SS') and
to_timestamp('2020-09-27 00:40:00','yyyy-mm-dd HH24:MI:SS') order by 3;
SESSION_ID SESSION_SERIAL# SAMPLE_TIME WAIT_TIME BLOCKING_SESSION SQL_ID
---------- --------------- ------------------------------------ ---------- ---------------- -------------
1244 75 27-SEP-20 12.20.08.580 AM 321 82h7w3w07k63z
1244 75 27-SEP-20 12.20.18.620 AM 321 82h7w3w07k63z
1244 75 27-SEP-20 12.20.28.640 AM 321 82h7w3w07k63z
...........................
1244 75 27-SEP-20 12.27.40.744 AM 321 82h7w3w07k63z
1244 75 27-SEP-20 12.27.50.854 AM 321 82h7w3w07k63z
1244 75 27-SEP-20 12.28.00.894 AM 321 82h7w3w07k63z
1244 75 27-SEP-20 12.28.10.931 AM 321 82h7w3w07k63z
1244 75 27-SEP-20 12.28.20.971 AM 321 82h7w3w07k63z
1244 75 27-SEP-20 12.28.31.021 AM 321 82h7w3w07k63z
...........................
1244 75 27-SEP-20 12.32.42.270 AM 321 82h7w3w07k63z
1244 75 27-SEP-20 12.32.52.370 AM 321 82h7w3w07k63z
77 rows selected.
SYS@test3>select SESSION_ID,SESSION_SERIAL#,SAMPLE_TIME,WAIT_TIME,BLOCKING_SESSION,SQL_ID
from dba_hist_active_sess_history where
INSTANCE_NUMBER=3 and SESSION_ID=2562 and
SAMPLE_TIME between to_timestamp('2020-09-27 00:20:00','yyyy-mm-dd HH24:MI:SS') and
to_timestamp('2020-09-27 00:40:00','yyyy-mm-dd HH24:MI:SS')
order by 3;
SESSION_ID SESSION_SERIAL# SAMPLE_TIME WAIT_TIME BLOCKING_SESSION SQL_ID
---------- --------------- --------------------------------- ---------- ---------------- -------------
2562 347 27-SEP-20 12.20.28.640 AM 3 0v9v22h2nuc0q
2562 353 27-SEP-20 12.22.29.241 AM 0 7gfhbhxrxx8wf
2562 353 27-SEP-20 12.22.39.291 AM 408 9v9n97qj8z1dg
2562 353 27-SEP-20 12.22.49.421 AM 0 bzscyq07w79ab
2562 353 27-SEP-20 12.22.59.451 AM 0 bzscyq07w79ab
2562 353 27-SEP-20 12.23.09.511 AM 0 bzscyq07w79ab
2562 353 27-SEP-20 12.23.19.551 AM 0 2whm2vvjb98k7
2562 353 27-SEP-20 12.23.29.581 AM 186 acc988uzvjmmt
2562 353 27-SEP-20 12.23.39.621 AM 0 1244
2562 353 27-SEP-20 12.23.49.731 AM 0 1244
2562 353 27-SEP-20 12.23.59.751 AM 0 1244
2562 353 27-SEP-20 12.24.09.810 AM 0 1244
.....................
2562 353 27-SEP-20 12.32.32.240 AM 0 1244
2562 353 27-SEP-20 12.32.42.270 AM 0 1244
2562 353 27-SEP-20 12.32.52.370 AM 0 1244
2562 353 27-SEP-20 12.33.02.420 AM 149 18yuw9npdrxwc
2562 353 27-SEP-20 12.33.12.440 AM 0 9wygvu6cx2npy
2562 353 27-SEP-20 12.33.22.480 AM 0 9wygvu6cx2npy
2562 353 27-SEP-20 12.33.32.510 AM 0 gkbrnf9xhv80g
68 rows selected.
acc988uzvjmmt 執行的SQL:
delete from WRH$_SYSMETRIC_HISTORY tab where (:beg_snap <= tab.snap_id and tab.snap_id <= :end_snap
and dbid = :dbid) and not exists
(select 1 from WRM$_BASELINE b where (tab.dbid = b.dbid) and (tab.snap_id >= b.start_snap_id)
and (tab.snap_id <= b.end_snap_id))
6 查詢資料庫的AWR生成資訊,發現在故障點不能生成AWR:如下
Listing the last 3 days of Completed Snapshots
Snap
Instance DB Name Snap Id Snap Started Level
------------ ------------ --------- ------------------ -----
test3 test 46313 26 Sep 2020 00:00 1
46314 26 Sep 2020 01:00 1
46315 26 Sep 2020 02:00 1
46316 26 Sep 2020 03:00 1
46317 26 Sep 2020 04:00 1 --26日的4點 直接到27日的7:00
46325 27 Sep 2020 07:00 1
46326 27 Sep 2020 08:00 1
46327 27 Sep 2020 09:00 1
46328 27 Sep 2020 10:00 1
46329 27 Sep 2020 11:00 1
46330 27 Sep 2020 12:00 1
46331 27 Sep 2020 13:00 1
46332 27 Sep 2020 14:00 1
7 查詢x$KQLFBC, 發現等待2小時都無法出來結果,通過追蹤,找到執行計劃,發現此表由1.4億(138*1024*1024)條資料,
故導致執行2小時無法查詢出結果
SYS@test3> select count(*) from x$kqlfbc;
Plan hash value: 4012607404
----------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 18 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | FIXED TABLE FULL| X$KQLFBC | 138K| 18 (100)| 00:00:01 |
----------------------------------------------------------------------
14 rows selected.
SYS@test3>select count(*) from x$ksmsp;
COUNT(*)
----------
3102652
8 由於ORA-32701故障發生在wrh$_sql_bind_metadata及 x$kqlfbc表上,是由於系統有大量的繫結變數引起的,
故查詢如下引數,發現如下引數設定太小,需要調整。
SYS@test1>show parameter cursor
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cursor_bind_capture_destination string memory+disk
cursor_sharing string EXACT
cursor_space_for_time boolean FALSE
open_cursors integer 300
session_cached_cursors integer 500
PARAMETER VALUE USAGE
---------------------- ---------- -----
session_cached_cursors 500 258%
open_cursors 300 432%
HIGHEST_OPEN_CUR MAX_OPEN_CUR
---------------- --------------------
246 300
PARAMETER VALUE USAGE
-------------------------------------------------------- ---------------
session_cached_cursors 500 212%
open_cursors 300 356%
HIGHEST_OPEN_CUR MAX_OPEN_CUR
---------------- --------------------
236 300
PARAMETER VALUE USAGE
---------------------- ---------- -----
session_cached_cursors 500 249%
open_cursors 300 418%
9 ORA-32701問題的解決方法,以下的解決方法,是根據所有的Oracle官方文件總結出來的,
需要根據實際情況採用不同的方法,一般情況下一種方法不行,就使用第二種,直到解決問題。
一般情況下重啟資料庫例項就可以臨時的解決此問題,但生產環境一般是不允許重啟例項的,故需要
使用不重啟例項的方法解決。
9.1 蒐集如下表的統計資訊
exec dbms_stats.gather_table_stats('SYS', 'X$KEWRATTRNEW');
exec dbms_stats.gather_table_stats('SYS', 'X$KEWRTSQLPLAN');
exec dbms_stats.gather_table_stats('SYS', 'X$KEWRSQLIDTAB');
exec dbms_stats.gather_fixed_objects_stats;
exec dbms_stats.gather_schema_stats('SYS');
查詢相關表的統計資訊是否過期:
select OWNER, TABLE_NAME, LAST_ANALYZED from dba_tab_statistics where table_name='X$KEWRATTRNEW';
select OWNER, TABLE_NAME, LAST_ANALYZED from dba_tab_statistics where table_name='X$KEWRTSQLPLAN';
select OWNER, TABLE_NAME, LAST_ANALYZED from dba_tab_statistics where table_name='X$KEWRSQLIDTAB';
9.2 有時此表x$KQLFBC的資料量會特別大,會導致查詢不出來或不能生成AWR,故需要通過設定隱含引數和刷共享池來解決 。
SYS@test1>select count(1) from x$KQLFBC;
COUNT(1)
----------
113714
select count(*) from X$KQLFBC;
刷共享池
alter system flush shared_pool;
設定隱含引數
alter system set "_awr_disabled_flush_tables"='wrh$_sql_bind_metadata';
部署自動重新整理shared的指令碼
10 8,18 * * * /home/oracle/database_check_sxwy/scripts/flush_shared.sh
#!/bin/bash
export ORACLE_SID=test1
sqlplus / as sysdba <<EOF
alter system flush shared_pool;
exit;
EOF
9.3 第三種禁用方法
先查詢AWR Table Info (KEW layer)使用下面的SQL:
select table_id_kewrtb, table_name_kewrtb from x$kewrtb order by 1;
TABLE_ID_KEWRTB TABLE_NAME_KEWRTB
--------------- ----------------------------------------------------------------
0 WRM$_DATABASE_INSTANCE
1 WRM$_SNAPSHOT
2 WRM$_BASELINE
3 WRM$_WR_CONTROL
4 WRH$_STAT_NAME
。。。。。。。。。。
53 WRH$_TABLESPACE_SPACE_USAGE
54 WRH$_METRIC_NAME
55 WRH$_SYSMETRIC_SUMMARY
56 WRH$_SQL_BIND_METADATA
57 WRH$_JAVA_POOL_ADVICE
58 WRH$_THREAD
59 WRH$_SESSMETRIC_HISTORY
60 WRH$_OSSTAT
61 WRH$_OSSTAT_NAME
62 WRH$_SYS_TIME_MODEL
63 wri$_optstat_opr
64 WRH$_OPTIMIZER_ENV
65 WRM$_SNAP_ERROR
66 WRH$_SERVICE_WAIT_CLASS
67 WRH$_CR_BLOCK_SERVER
。。。。。。。。。。。
115 WRM$_WR_USAGE
116 WRH$_MVPARAMETER
117 WRH$_PERSISTENT_QMN_CACHE
118 WRH$_TABLESPACE
119 WRM$_SNAPSHOT_DETAILS
120 WRI$_SQLTEXT_REFCOUNT
121 rows selected.
從上面的查詢中得到table_id用於下面的level 值,比如我們這裡表wrh$_sql_bind_metadata對應的是56
alter session set events ‘immediate trace name awr_flush_table_off level 56’;
後期如果想開啟flushing to awr 如果第二種方法禁用,開啟使用
alter session set events ‘immediate trace name awr_flush_table_on level 56’ off;
9.4 第四種解決方法為資料庫打相關的補丁。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69996316/viewspace-2793845/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- npm 安裝錯誤及處理方法NPM
- hadoop常見錯誤及處理方法Hadoop
- ORA-600[kqlnrc_1]錯誤分析及處理
- thinkphp原始碼分析(四)—錯誤及異常處理篇PHP原始碼
- namespace mismatch require錯誤處理方法namespaceUI
- Laravel Excpetions(錯誤處理) 原始碼分析Laravel原始碼
- 錯誤處理
- php錯誤與異常處理方法PHP
- 【CONNECT】ORA-00020錯誤模擬及處理方法
- 【ERROR】ORA-8103錯誤分析處理Error
- Hystrix微服務容錯處理及回撥方法原始碼分析微服務原始碼
- TSM不能執行排程的原因分析及處理
- Oracle RAC 錯誤記錄以及處理方法Oracle
- PHP 錯誤處理PHP
- php錯誤處理PHP
- Go 錯誤處理Go
- Swift錯誤處理Swift
- Zabbix錯誤處理
- mysqldump錯誤處理MySql
- 資料庫連線錯誤的原因及解決方法資料庫
- 遠端連線錯誤程式碼及處理
- Yii2 之錯誤處理深入分析
- 錯誤處理:如何通過 error、deferred、panic 等處理錯誤?Error
- async/await 優雅的錯誤處理方法AI
- Windows ORA-12560錯誤處理方法Windows
- 【CONNECT】ORA-00020錯誤模擬及處理方法實驗
- PHP錯誤處理和異常處理PHP
- go的錯誤處理Go
- Python錯誤處理Python
- 常見的授權錯誤及原因
- Linux環境下段錯誤的產生原因及除錯方法小結Linux除錯
- ORA-00257 錯誤分析及解決方法
- mysql多源複製跳過錯誤處理方法MySql
- Host is not allowed to connect to this MySQL server 錯誤的處理方法MySqlServer
- SQL Server 連線常見錯誤及其處理方法SQLServer
- Android - Unparsed aapt error(s)錯誤的處理方法AndroidAPTError
- 收縮臨時表空間收縮方法及ORA-03297錯誤處理
- 膝上型電腦開機報錯故障原因及處理方式