1 在處理ORA-32701前，需要先了解以下資料庫的hang管理器，參考Oracle的官方文件（Oracle Hang Manager (文件 ID 1534591.1)）。

內容如下，可以認為如果資料庫發生hang，一般情況資料庫會自動處理的，但有時由於情況複雜，不能處理：

DETAILS Overview

Hang Manager has existed since Oracle Database 10.2.0.1. The main goal of Hang Manager is to reliably detect and,

if hang resolution is enabled, resolve hangs in a timely manner. Over various releases, Hang Manager has been

enhanced along with the wait event infrastructure on which it relies. However, it is only in Oracle Database 11.2.0.2

that Hang Manager actually resolves any hangs by terminating sessions and/or processes. This remains the default

operation in Oracle Database 12.1.0.1.

Hang Manager is only active when Oracle RAC is enabled. You can determine whether an Oracle RAC database is

used / enabled by querying the CLUSTER_DATABASE. If this parameter is set to "TRUE", Oracle RAC is enabled on

this database.

Terminology

Cross Boundary Hang

This is a hang which spans the ASM/DB boundary. In 12.1.0.1, it is possible for Hang Manager to detect hangs

which span the boundary between a database and ASM or ASM and a database.

Deadlock or Closed Chain

A deadlock is a number of sessions making no progress which are in a closed chain or cycle.

All sessions have a waiter and a blocker. The only way to break the chain is to cause one of the sessions to

either progress or terminate.

Hang or Open Chain

From Hang Manager's perspective, a hang is a number of sessions making no progress which are in an open chain

configuration. This is not a cycle or deadlock. An open chain or hang has a root or final blocker which is blocking

all other sessions in the chain. It also contains a final waiter which is at the end of the chain and is blocked by all

other sessions in the chain.

Immediate Waiter

The session in an open chain that is being blocked by the root of the hang.

Quality of Service (QoS) Management

Oracle Database QoS Management is an automated, policy-based product that monitors the workload requests

for an entire system. Oracle Database QoS Management manages the resources that are shared across applications,

and adjusts the system configuration to keep the applications running at the performance levels needed by your

business.

Root (also known as Victim or Final Blocker)

This is the session in an open chain or hang that is blocking all other sessions in the chain.

Self-Resolved Hang

This is a hang or deadlock that was detected by Hang Manager but no longer exists. This could either be because

the root or immediate waiter have progressed or either or both of them have been terminated but not by Hang

Manager.

Hang Manager in 12.1.0.1

Since Oracle Database 11.2.0.2, Hang Manager may resolve hangs it detects by terminating the root session or

process. By default, Hang Manager will not terminate an instance or evict a node. Also, Hang Manager will not

resolve all hangs that it detects. For example, hangs which involve a possible application issue are left to the user

to determine the correct course of action. Also, if Hang Manager determines that the instance on which the root

of a hang resides is experiencing high CPU or IO load, hang resolution will be delayed. This gives some time for

the root to progress and the hang to resolve itself. Finally, Hang Manager currently does not resolve any hangs

on ASM.

Starting with Oracle Database 12.1.0.1 and if QoS is active on the cluster, Hang Manager will use additional

information provided by QoS to determine if a hang should be ignored or resolved. If QoS would like Hang

Manager to resolve a hang, hang resolution may occur more quickly than Hang Manager's normal detection

and resolution times or hang resolution may be delayed.

Previous to 12.1.0.1, hangs were detected only within the database or ASM cluster. However, if a session

on ASM or the database was being blocked by a session on the other side of the ASM/Database boundary

(a Cross Boundary Hang), there was no way to detect it. In Oracle Database 12.1.0.1, Hang Manager was

enhanced to detect hangs which cross the ASM/Database boundary and outputs information to its various

trace files so it is easier to troubleshoot these hangs.

同hang管理器相關的檢視：

V$ Views

Hang Manager exposes some information via various V$ views. These are now documented in the Oracle

Database Reference. A summary of the information in these views is below. For a more detailed description of these views, please go to the appropriate section in the Oracle Database Reference.

V$HANG_INFO

This view contains the active hangs and deadlocks detected by Hang Manager. It does not include any hangs

which have been resolved by Hang Manager or have self-resolved. Only the latest 32 active hangs are in this view.

The root or victim of the hang is included in the information for each active hang in the view.

V$HANG_SESSION_INFO

This view contains the sessions in the main chain for all of the hangs included in the V$HANG_INFO view.

Only the first 20 sessions of the main chain, including the root of the hang, for each hang are kept in this view.

(G)V$HANG_STATISTICS

This view contains various Hang Manager statistics regarding hangs or deadlocks which it has detected. These statistics include: the number of hangs detected, the number of deadlocks detected, the number of hangs resolved by Hang Manager, the number of hangs ignored broken down by the reason for ignoring the hang, the number of hangs which self-resolved, etc.

2 資料庫的告警，檢視發生hang的相關資訊

節點1：

Sun Sep 27 00:31:23 2020

Errors in file /u01/app/oracle/diag/rdbms/test/test1/trace/test1_dia0_25054.trc (incident=920985):

ORA-32701: Possible hangs up to hang ID=1 detected --hang ID=1 說明資料庫自動診斷出Hang

Incident details in: /u01/app/oracle/diag/rdbms/test/test1/incident/incdir_920985/test1_dia0_25054_i920985.trc

DIA0 requesting termination of session sid:1244 with serial # 75 (ospid:88984) on instance 3

--請求終止例項3的會話，作業系統PID：88984

due to a LOCAL, HIGH confidence hang with ID=1.

Hang Resolution Reason: Although the number of affected sessions did not

justify automatic hang resolution initially, this previously ignored

hang was automatically resolved.

DIA0: Examine the alert log on instance 3 for session termination status of hang with ID=1.

--發現例項3的會話阻塞住會話，檢查例項3的alert日誌

節點3：

Sun Sep 27 00:31:23 2020

Errors in file /u01/app/oracle/diag/rdbms/test/test3/trace/test3_dia0_22856.trc (incident=880256):

ORA-32701: Possible hangs up to hang ID=1 detected --hang ID=1 說明資料庫自動診斷出Hang

Incident details in: /u01/app/oracle/diag/rdbms/test/test3/incident/incdir_880256/test3_dia0_22856_i880256.trc

DIA0 terminating blocker (ospid: 88984 sid: 1244 ser#: 75) of hang with ID = 1

--本節點Hang住了，故終止了此會話

requested by master DIA0 process on instance 1 --請求的程式來自於例項1

Hang Resolution Reason: Although the number of affected sessions did not

justify automatic hang resolution initially, this previously ignored

hang was automatically resolved.

by terminating session sid:1244 with serial # 75 (ospid:88984) --終止了作業系統程式

3 首先根據hang ID=1 和 hang ID=2 檢視如下檢視,此檢視會解釋每個程式碼的含義：

SYS@test3>select * from V$HANG_STATISTICS;

STATISTIC# NAME VALUE

---------- --------------------------------------------- ----------

0 number of deadlocks detected and ignored 0

1 number of hangs detected 0

2 number of local hangs 0

3 number of global hangs 0

4 number of transient hangs 0

5 hangs ignored due to high CPU on root's node 0

6 hangs ignored due to high IO on root's node 0

7 hangs ignored due to application contention 0

8 hangs ignored due to long running operations 0

9 hangs monitored due to archiving issues 0

10 hangs ignored due to archiving issues 0

11 hangs ignored, blocked by remote database 0

12 hangs ignored due to SQL parsing 0

13 hangs ignored due to dumping system state 0

14 hangs ignored, instance termination required 0

15 hangs ignored, only one active instance 0

16 number of explicitly resolved hangs 0

17 number of self-resolved hangs 0

18 total self-resolved hang time in seconds 0

19 minimum self-resolved hang time in seconds 0

20 maximum self-resolved hang time in seconds 0

21 number of HSC matched hangs 0

22 hangs resolved due to instance termination 0

23 rows selected.

4 分析資料庫生成的trace檔案，找出報錯的根本原因

節點1 檢視test1_dia0_25054_i920985.trc

Dump file /u01/app/oracle/diag/rdbms/test/test1/incident/incdir_920985/test1_dia0_25054_i920985.trc

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,

Data Mining and Real Application Testing options

ORACLE_HOME = /u01/app/oracle/product/11.2.0.4/db_1

System name: Linux

Node name: h0783

Release: 2.6.32-642.el6.x86_64

Version: #1 SMP Wed Apr 13 00:51:26 EDT 2016

Machine: x86_64

Instance name: test1

Redo thread mounted by this instance: 1

Oracle process number: 10

Unix process pid: 25054, image: oracle@h0783 (DIA0)

*** 2020-09-27 00:31:23.807

*** SESSION ID:(201.1) 2020-09-27 00:31:23.807

*** CLIENT ID:() 2020-09-27 00:31:23.807

*** SERVICE NAME:(SYS$BACKGROUND) 2020-09-27 00:31:23.807

*** MODULE NAME:() 2020-09-27 00:31:23.807

*** ACTION NAME:() 2020-09-27 00:31:23.807

Dump continued from file: /u01/app/oracle/diag/rdbms/test/test1/trace/test1_dia0_25054.trc

ORA-32701: Possible hangs up to hang ID=1 detected

========= Dump for incident 920985 (ORA 32701) ========

----- Beginning of Customized Incident Dump(s) -----

One or more resolvable hangs have been detected on your system.

An attempt will be made to resolve these hangs.

The stack included in this incident file is not the cause of the

problem. It is merely pointing to the module that triggered the

incident. The short stacks of the root or final blocked of the

hang and its waiter can be found in the DIA0 trace files on their

respective local instances.

'Resolvable Hangs' below indicate one or more hangs that were found

and identify the final blocking session and instance on which

they occurred. Since the current hang resolution state is 'PROCESS',

any hangs requiring session or process termination will be

automatically resolved.

The following information will assist Oracle Support Services

in further analysis of the root cause of the hang.

*** 2020-09-27 00:31:23.809

Resolvable Hangs in the System

Root Chain Total Hang

Hang Hang Inst Root #hung #hung Hang Hang Resolution

ID Type Status Num Sess Sess Sess Conf Span Action

----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------

1 HANG RSLNPEND 3 1244 2 2 HIGH LOCAL Terminate Process

--此處說明Hang的會話在節點3上，會話的ID為1244

Hang Resolution Reason: Although the number of affected sessions did not

justify automatic hang resolution initially, this previously ignored

hang was automatically resolved.

inst# SessId Ser# OSPID PrcNm Event

----- ------ ----- --------- ----- -----

3 2562 353 28939 M001 enq: WF - contention

3 1244 75 88984 M000 not in wait --此處說明會話1244 等待是事件為 not in wait

The incident file on instance 3 may contain information

about session 1244 with serial number 75 and operating system

process ID 88984. The incident file may contain a short stack

and a process state dump.

Victim Information

Ignored

HangID Inst# Sessid Ser Num OSPID Fatal BG Previous Hang Count

------ ----- ------ ------- --------- -------- ------------- -------

1 3 1244 75 88984 N Existing Hang 1

*** 2020-09-27 00:31:23.809

Wait-For-Graphs collected at 09/27/20 00:25:31) --以下資訊說明蒐集的圖形化阻塞的相關資訊

===============================================================================

Non-intersecting chains:

-------------------------------------------------------------------------------

Chain 1: --Chain 1 說明會話2562等待的事件為：'enq: WF - contention' ,被1244會話阻塞住，

1244會話的等待事件為 not int wait,但等待：event: 'gc current grant 2-way'

-------------------------------------------------------------------------------

Oracle session identified by:

{

instance: 3 (test.test3)

os id: 28939

process id: 512, oracle@h0785 (M001)

session id: 2562

session serial #: 353

}

is waiting for 'enq: WF - contention' with wait info:

{

p1: 'name|mode'=0x57460006

p2: '0'=0x38

p3: '0'=0x0

time in wait: 1 min 53 sec

timeout after: never

wait id: 73008

blocking: 0 sessions

wait history:

* time between current wait and wait #1: 0.003403 sec

1. event: 'db file sequential read'

time waited: 0.000226 sec

wait id: 73007 p1: 'file#'=0x2

p2: 'block#'=0x1eeff

p3: 'blocks'=0x1

* time between wait #1 and #2: 0.000390 sec

2. event: 'db file sequential read'

time waited: 0.000207 sec

wait id: 73006 p1: 'file#'=0x2

p2: 'block#'=0xe178

p3: 'blocks'=0x1

* time between wait #2 and #3: 0.002593 sec

3. event: 'db file scattered read'

time waited: 0.000276 sec

wait id: 73005 p1: 'file#'=0x2

p2: 'block#'=0x1eefb

p3: 'blocks'=0x2

}

and is blocked by

=> Oracle session identified by:

{

instance: 3 (test.test3)

os id: 88984

process id: 446, oracle@h0785 (M000)

session id: 1244

session serial #: 75

}

which is not in a wait:

{

last wait: 1165 min 7 sec ago

blocking: 1 session

wait history:

1. event: 'gc current grant 2-way' --此類等待事件為資料庫從記憶體中讀取資料塊，發現沒有，

然後從磁碟中讀到記憶體中。故認為此等待事件正常。結合實際查詢，可以認為2562的會話被1224會話阻塞住，

但由於1224會話正在從磁碟大量的讀取資料塊到記憶體中，導致的阻塞。經細查，2562會話當時操作的SQL為：

delete from WRH$_SYSMETRIC_HISTORY tab where (:beg_snap <= tab.snap_id and tab.snap_id <= :end_snap

and dbid = :dbid) and not exists (select 1 from WRM$_BASELINE b where (tab.dbid = b.dbid)

and (tab.snap_id >= b.start_snap_id) and (tab.snap_id <= b.end_snap_id)) ，

1224會話操作的SQL為： insert into wrh$_sql_bind_metadata ,兩個會話都是操作AWR相關的表，故有可能導致阻塞。

time waited: 0.000321 sec

wait id: 1878 p1: ''=0x2

p2: ''=0x2a1a6

p3: ''=0x2010001

* time between wait #1 and #2: 0.000204 sec

2. event: 'gc current grant 2-way'

time waited: 0.000384 sec

wait id: 1877 p1: ''=0x2

p2: ''=0x39862

p3: ''=0x2010001

* time between wait #2 and #3: 0.000523 sec

3. event: 'gc current block 3-way'

time waited: 0.000692 sec

wait id: 1876 p1: ''=0x2

p2: ''=0x29dea

p3: ''=0x1

}

Chain 1 Signature: <not in a wait><='enq: WF - contention'

Chain 1 Signature Hash: 0xee4c6e02

-------------------------------------------------------------------------------

===============================================================================

Sessions in an involuntary wait or not in a wait:

檢視例項3的test3_dia0_22856_i880256.trc

Dump file /u01/app/oracle/diag/rdbms/test/test3/incident/incdir_880256/test3_dia0_22856_i880256.trc

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,

Data Mining and Real Application Testing options

ORACLE_HOME = /u01/app/oracle/product/11.2.0.4/db_1

System name: Linux

Node name: h0785

Release: 2.6.32-642.el6.x86_64

Version: #1 SMP Wed Apr 13 00:51:26 EDT 2016

Machine: x86_64

Instance name: test3

Redo thread mounted by this instance: 3

Oracle process number: 10

Unix process pid: 22856, image: oracle@h0785 (DIA0)

*** 2020-09-27 00:31:23.982

*** SESSION ID:(201.1) 2020-09-27 00:31:23.982

*** CLIENT ID:() 2020-09-27 00:31:23.982

*** SERVICE NAME:(SYS$BACKGROUND) 2020-09-27 00:31:23.982

*** MODULE NAME:() 2020-09-27 00:31:23.982

*** ACTION NAME:() 2020-09-27 00:31:23.982

Dump continued from file: /u01/app/oracle/diag/rdbms/test/test3/trace/test3_dia0_22856.trc

ORA-32701: Possible hangs up to hang ID=1 detected

========= Dump for incident 880256 (ORA 32701) ========

----- Beginning of Customized Incident Dump(s) -----

There are resolvable hangs on your system. Hang Manger will

attempt to resolve these hangs. Some information about these

is output below. Complete information is available in an

incident trace file on instance 1.

If the hang is to be resolved by terminating the session or

process that is the root or victim of the hang, additional

information will be output on the local instance of that

session or process. Below are the hangs for which resolution

will be attempted.

*** 2020-09-27 00:31:23.983

Resolvable Hangs in the System

Root Chain Total Hang

Hang Hang Inst Root #hung #hung Hang Hang Resolution

ID Type Status Num Sess Sess Sess Conf Span Action

----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------

1 HANG RSLNPEND 3 1244 2 2 HIGH LOCAL Terminate Process

Hang Resolution Reason: Although the number of affected sessions did not

justify automatic hang resolution initially, this previously ignored

hang was automatically resolved.

inst# SessId Ser# OSPID PrcNm Event

----- ------ ----- --------- ----- -----

3 2562 353 28939 M001 enq: WF - contention

3 1244 75 88984 M000 not in wait

Dumping process info of pid[446.88984] (sid:1244, ser#:75)

requested by master DIA0 process on instance 1.

*** 2020-09-27 00:31:23.983

Process diagnostic dump for oracle@h0785 (M000), OS id=88984,

pid: 446, proc_ser: 32, sid: 1244, sess_ser: 75

-------------------------------------------------------------------------------

os thread scheduling delay history: (sampling every 1.000000 secs)

0.000000 secs at [ 00:31:23 ]

NOTE: scheduling delay has not been sampled for 0.647808 secs 0.000000 secs from [ 00:31:19 - 00:31:24 ], 5 sec avg

0.000000 secs from [ 00:30:24 - 00:31:24 ], 1 min avg

0.000000 secs from [ 00:26:24 - 00:31:24 ], 5 min avg

loadavg : 3.04 3.01 2.86

Memory (Avail / Total) = 363436.51M / 516750.80M

Swap (Avail / Total) = 65536.00M / 65536.00M

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

0 R oracle 88984 1 99 80 0 - 315843 ? Sep26 ? 19:28:37 ora_m000_test3

Short stack dump: --此處為trace生成的二進位制錯誤的堆鍵資訊

ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-kghalf()+247<-kqlfbctc()+645<-kgligi()+31<-kglic_cbk()+345<-kglic0()+751<-kglic()+74

<-kqlfbct()+376<-qerfxFetch()+3072<-qerjotRowProc()+359<-qerhjInnerProbeHashTable()+491<-qerfxFetch()+1106<-rwsfcd()+103<-qerhjFetch()+621<-qerjotFetch()+2025

<-rwsfcd()+103<-qerltcFetch()+1223<-insexe()+691<-opiexe()+5632<-kpoal8()+2380<-opiodr()+917<-kpoodrc()+36<-rpiswu2()+1776<-kpoodr()+636<-upirtrc()+2436<-kpurcsc()+98

<-kpuexec()+10790<-OCIStmtExecute()+39<-kewrose_oci_stmt_exec()+55<-kewrgwxf1_gwrsql_exft_1()+317<-kewrgwxf_gwrsql_exft()+496<-kewrews_execute_wr_sql()+52

<-kewrftbs_flush_table_by_sql()+180<-kewrfabt_flush_attribute_table()+997<-kewrfsb_flush_sqlbind()+34<-kewrft_flush_table()+397<-kewrftec_flush_table_ehdlcx()+766

<-kewrftsq_flush_topsql()+768<-kewrft_flush_table()+397<-kewrftec_flush_table_ehdlcx()+766<-kewrfat_flush_all_tables()+1406<-kewrfsr_flush_snapshot_r()+191

<-kewrrfs_remote_flush_slave()+1169<-kebm_slave_main()+586<-ksvrdp()+1766<-opirip()+674<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201

<-__libc_start_main()+253

-------------------------------------------------------------------------------

Process diagnostic dump actual duration=0.340000 sec

(max dump time=15.000000 sec)

*** 2020-09-27 00:31:24.319 --以下為報錯是執行想相關SQL，相關的表為 insert into wrh$_sql_bind_metadata

current sql: insert into wrh$_sql_bind_metadata (snap_id, dbid, sql_id, name, position, dup_position,

datatype, datatype_string, character_sid, precision, scale, max_length) SELECT /*+ ordered use_nl(bnd)

index(bnd sql_id) */ :lah_snap_id, :dbid, bnd.sql_id, name, position, dup_position, datatype, dataty

----------------------------------------

SO: 0x1ec14370a8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3

proc=0x1ec14370a8, name=process, file=ksu.h LINE:12721, pg=0

(process) Oracle pid:446, ser:32, calls cur/top: 0x1d32cfb3c8/0x1d33002b20

flags : (0x2) SYSTEM

flags2: (0x30), flags3: (0x10)

intr error: 0, call error: 0, sess error: 0, txn error 0

intr queue: empty

ksudlp FALSE at location: 0

5 根據第二步的會話ID，查詢在故障時間點執行的SQL,根據如下資訊，可以確定在故障點2562會話確實被1244會話阻塞住：

SYS@test3>select SESSION_ID,SESSION_SERIAL#,SAMPLE_TIME,WAIT_TIME,BLOCKING_SESSION,SQL_ID from

dba_hist_active_sess_history where INSTANCE_NUMBER=3 and SESSION_ID=1244 and

SAMPLE_TIME between to_timestamp('2020-09-27 00:20:00','yyyy-mm-dd HH24:MI:SS') and

to_timestamp('2020-09-27 00:40:00','yyyy-mm-dd HH24:MI:SS') order by 3;

SESSION_ID SESSION_SERIAL# SAMPLE_TIME WAIT_TIME BLOCKING_SESSION SQL_ID

---------- --------------- ------------------------------------ ---------- ---------------- -------------

1244 75 27-SEP-20 12.20.08.580 AM 321 82h7w3w07k63z

1244 75 27-SEP-20 12.20.18.620 AM 321 82h7w3w07k63z

1244 75 27-SEP-20 12.20.28.640 AM 321 82h7w3w07k63z

...........................

1244 75 27-SEP-20 12.27.40.744 AM 321 82h7w3w07k63z

1244 75 27-SEP-20 12.27.50.854 AM 321 82h7w3w07k63z

1244 75 27-SEP-20 12.28.00.894 AM 321 82h7w3w07k63z

1244 75 27-SEP-20 12.28.10.931 AM 321 82h7w3w07k63z

1244 75 27-SEP-20 12.28.20.971 AM 321 82h7w3w07k63z

1244 75 27-SEP-20 12.28.31.021 AM 321 82h7w3w07k63z

...........................

1244 75 27-SEP-20 12.32.42.270 AM 321 82h7w3w07k63z

1244 75 27-SEP-20 12.32.52.370 AM 321 82h7w3w07k63z

77 rows selected.

SYS@test3>select SESSION_ID,SESSION_SERIAL#,SAMPLE_TIME,WAIT_TIME,BLOCKING_SESSION,SQL_ID

from dba_hist_active_sess_history where

INSTANCE_NUMBER=3 and SESSION_ID=2562 and

SAMPLE_TIME between to_timestamp('2020-09-27 00:20:00','yyyy-mm-dd HH24:MI:SS') and

to_timestamp('2020-09-27 00:40:00','yyyy-mm-dd HH24:MI:SS')

order by 3;

SESSION_ID SESSION_SERIAL# SAMPLE_TIME WAIT_TIME BLOCKING_SESSION SQL_ID

---------- --------------- --------------------------------- ---------- ---------------- -------------

2562 347 27-SEP-20 12.20.28.640 AM 3 0v9v22h2nuc0q

2562 353 27-SEP-20 12.22.29.241 AM 0 7gfhbhxrxx8wf

2562 353 27-SEP-20 12.22.39.291 AM 408 9v9n97qj8z1dg

2562 353 27-SEP-20 12.22.49.421 AM 0 bzscyq07w79ab

2562 353 27-SEP-20 12.22.59.451 AM 0 bzscyq07w79ab

2562 353 27-SEP-20 12.23.09.511 AM 0 bzscyq07w79ab

2562 353 27-SEP-20 12.23.19.551 AM 0 2whm2vvjb98k7

2562 353 27-SEP-20 12.23.29.581 AM 186 acc988uzvjmmt

2562 353 27-SEP-20 12.23.39.621 AM 0 1244

2562 353 27-SEP-20 12.23.49.731 AM 0 1244

2562 353 27-SEP-20 12.23.59.751 AM 0 1244

2562 353 27-SEP-20 12.24.09.810 AM 0 1244

.....................

2562 353 27-SEP-20 12.32.32.240 AM 0 1244

2562 353 27-SEP-20 12.32.42.270 AM 0 1244

2562 353 27-SEP-20 12.32.52.370 AM 0 1244

2562 353 27-SEP-20 12.33.02.420 AM 149 18yuw9npdrxwc

2562 353 27-SEP-20 12.33.12.440 AM 0 9wygvu6cx2npy

2562 353 27-SEP-20 12.33.22.480 AM 0 9wygvu6cx2npy

2562 353 27-SEP-20 12.33.32.510 AM 0 gkbrnf9xhv80g

68 rows selected.

acc988uzvjmmt 執行的SQL：

delete from WRH$_SYSMETRIC_HISTORY tab where (:beg_snap <= tab.snap_id and tab.snap_id <= :end_snap

and dbid = :dbid) and not exists

(select 1 from WRM$_BASELINE b where (tab.dbid = b.dbid) and (tab.snap_id >= b.start_snap_id)

and (tab.snap_id <= b.end_snap_id))

6 查詢資料庫的AWR生成資訊，發現在故障點不能生成AWR：如下

Listing the last 3 days of Completed Snapshots

Snap

Instance DB Name Snap Id Snap Started Level

------------ ------------ --------- ------------------ -----

test3 test 46313 26 Sep 2020 00:00 1

46314 26 Sep 2020 01:00 1

46315 26 Sep 2020 02:00 1

46316 26 Sep 2020 03:00 1

46317 26 Sep 2020 04:00 1 --26日的4點直接到27日的7:00

46325 27 Sep 2020 07:00 1

46326 27 Sep 2020 08:00 1

46327 27 Sep 2020 09:00 1

46328 27 Sep 2020 10:00 1

46329 27 Sep 2020 11:00 1

46330 27 Sep 2020 12:00 1

46331 27 Sep 2020 13:00 1

46332 27 Sep 2020 14:00 1

7 查詢x$KQLFBC, 發現等待2小時都無法出來結果，通過追蹤，找到執行計劃，發現此表由1.4億（138*1024*1024）條資料，

故導致執行2小時無法查詢出結果

SYS@test3> select count(*) from x$kqlfbc;

Plan hash value: 4012607404

----------------------------------------------------------------------

----------------------------------------------------------------------

| 0 | SELECT STATEMENT | | | 18 (100)| |

| 1 | SORT AGGREGATE | | 1 | | |

| 2 | FIXED TABLE FULL| X$KQLFBC | 138K| 18 (100)| 00:00:01 |

----------------------------------------------------------------------

14 rows selected.

SYS@test3>select count(*) from x$ksmsp;

COUNT(*)

----------

3102652

8 由於ORA-32701故障發生在wrh$_sql_bind_metadata及 x$kqlfbc表上，是由於系統有大量的繫結變數引起的，

故查詢如下引數，發現如下引數設定太小，需要調整。

SYS@test1>show parameter cursor

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

cursor_bind_capture_destination string memory+disk

cursor_sharing string EXACT

cursor_space_for_time boolean FALSE

open_cursors integer 300

session_cached_cursors integer 500

PARAMETER VALUE USAGE

---------------------- ---------- -----

session_cached_cursors 500 258%

open_cursors 300 432%

HIGHEST_OPEN_CUR MAX_OPEN_CUR

---------------- --------------------

246 300

PARAMETER VALUE USAGE

-------------------------------------------------------- ---------------

session_cached_cursors 500 212%

open_cursors 300 356%

HIGHEST_OPEN_CUR MAX_OPEN_CUR

---------------- --------------------

236 300

PARAMETER VALUE USAGE

---------------------- ---------- -----

session_cached_cursors 500 249%

open_cursors 300 418%

9 ORA-32701問題的解決方法,以下的解決方法，是根據所有的Oracle官方文件總結出來的，

需要根據實際情況採用不同的方法，一般情況下一種方法不行，就使用第二種，直到解決問題。

一般情況下重啟資料庫例項就可以臨時的解決此問題，但生產環境一般是不允許重啟例項的，故需要

使用不重啟例項的方法解決。

9.1 蒐集如下表的統計資訊

exec dbms_stats.gather_table_stats('SYS', 'X$KEWRATTRNEW');

exec dbms_stats.gather_table_stats('SYS', 'X$KEWRTSQLPLAN');

exec dbms_stats.gather_table_stats('SYS', 'X$KEWRSQLIDTAB');

exec dbms_stats.gather_fixed_objects_stats;

exec dbms_stats.gather_schema_stats('SYS');

查詢相關表的統計資訊是否過期：

select OWNER, TABLE_NAME, LAST_ANALYZED from dba_tab_statistics where table_name='X$KEWRATTRNEW';

select OWNER, TABLE_NAME, LAST_ANALYZED from dba_tab_statistics where table_name='X$KEWRTSQLPLAN';

select OWNER, TABLE_NAME, LAST_ANALYZED from dba_tab_statistics where table_name='X$KEWRSQLIDTAB';

9.2 有時此表x$KQLFBC的資料量會特別大，會導致查詢不出來或不能生成AWR，故需要通過設定隱含引數和刷共享池來解決 。

SYS@test1>select count(1) from x$KQLFBC;

COUNT(1)

----------

113714

select count(*) from X$KQLFBC;

刷共享池

alter system flush shared_pool;

設定隱含引數

alter system set "_awr_disabled_flush_tables"='wrh$_sql_bind_metadata';

部署自動重新整理shared的指令碼

10 8,18 * * * /home/oracle/database_check_sxwy/scripts/flush_shared.sh

#!/bin/bash

export ORACLE_SID=test1

sqlplus / as sysdba <<EOF

alter system flush shared_pool;

exit;

EOF

9.3 第三種禁用方法

先查詢AWR Table Info (KEW layer)使用下面的SQL:

select table_id_kewrtb, table_name_kewrtb from x$kewrtb order by 1;

TABLE_ID_KEWRTB TABLE_NAME_KEWRTB

--------------- ----------------------------------------------------------------

0 WRM$_DATABASE_INSTANCE

1 WRM$_SNAPSHOT

2 WRM$_BASELINE

3 WRM$_WR_CONTROL

4 WRH$_STAT_NAME

。。。。。。。。。。

53 WRH$_TABLESPACE_SPACE_USAGE

54 WRH$_METRIC_NAME

55 WRH$_SYSMETRIC_SUMMARY

56 WRH$_SQL_BIND_METADATA

57 WRH$_JAVA_POOL_ADVICE

58 WRH$_THREAD

59 WRH$_SESSMETRIC_HISTORY

60 WRH$_OSSTAT

61 WRH$_OSSTAT_NAME

62 WRH$_SYS_TIME_MODEL

63 wri$_optstat_opr

64 WRH$_OPTIMIZER_ENV

65 WRM$_SNAP_ERROR

66 WRH$_SERVICE_WAIT_CLASS

67 WRH$_CR_BLOCK_SERVER

。。。。。。。。。。。

115 WRM$_WR_USAGE

116 WRH$_MVPARAMETER

117 WRH$_PERSISTENT_QMN_CACHE

118 WRH$_TABLESPACE

119 WRM$_SNAPSHOT_DETAILS

120 WRI$_SQLTEXT_REFCOUNT

121 rows selected.

從上面的查詢中得到table_id用於下面的level 值，比如我們這裡表wrh$_sql_bind_metadata對應的是56

alter session set events ‘immediate trace name awr_flush_table_off level 56’;

後期如果想開啟flushing to awr 如果第二種方法禁用，開啟使用

alter session set events ‘immediate trace name awr_flush_table_on level 56’ off;

9.4 第四種解決方法為資料庫打相關的補丁。

ORA-32701錯誤原因分析及處理方法

相關文章