ORA-32701錯誤原因分析及處理方法

xueshancheng發表於2021-09-26

1 在處理ORA-32701前,需要先了解以下資料庫的hang管理器,參考Oracle的官方文件(Oracle Hang Manager (文件 ID 1534591.1))。

內容如下,可以認為如果資料庫發生hang,一般情況資料庫會自動處理的,但有時由於情況複雜,不能處理:

DETAILS   Overview

Hang Manager has existed since Oracle Database 10.2.0.1. The main goal of Hang Manager is to reliably detect and,

 if hang resolution is enabled, resolve hangs in a timely manner. Over various releases, Hang Manager has been 

enhanced along with the wait event infrastructure on which it relies. However, it is only in Oracle Database 11.2.0.2 

that Hang Manager actually resolves any hangs by terminating sessions and/or processes. This remains the default 

operation in Oracle Database 12.1.0.1.

Hang Manager is only active when Oracle RAC is enabled. You can determine whether an Oracle RAC database is 

used / enabled by querying the CLUSTER_DATABASE. If this parameter is set to "TRUE", Oracle RAC is enabled on 

this database.


Terminology

Cross Boundary Hang

This is a hang which spans the ASM/DB boundary.  In 12.1.0.1, it is possible for Hang Manager to detect hangs 

which span the boundary between a database and ASM or ASM and a database.

Deadlock or Closed Chain

A deadlock is a number of sessions making no progress which are in a closed chain or cycle. 

 All sessions have a waiter and a blocker.  The only way to break the chain is to cause one of the sessions to 

either progress or terminate.

Hang or Open Chain

From Hang Manager's perspective, a hang is a number of sessions making no progress which are in an open chain 

configuration.  This is not a cycle or deadlock.  An open chain or hang has a root or final blocker which is blocking 

all other sessions in the chain.  It also contains a final waiter which is at the end of the chain and is blocked by all 

other sessions in the chain.

Immediate Waiter 

The session in an open chain that is being blocked by the root of the hang.

Quality of Service (QoS) Management

Oracle Database QoS Management is an automated, policy-based product that monitors the workload requests 

for an entire system. Oracle Database QoS Management manages the resources that are shared across applications, 

and adjusts the system configuration to keep the applications running at the performance levels needed by your 

business.

Root (also known as Victim or Final Blocker)

This is the session in an open chain or hang that is blocking all other sessions in the chain.

Self-Resolved Hang

This is a hang or deadlock that was detected by Hang Manager but no longer exists.  This could either be because 

the root or immediate waiter have progressed or either or both of them have been terminated but not by Hang 

Manager.

Hang Manager in 12.1.0.1

Since Oracle Database 11.2.0.2, Hang Manager may resolve hangs it detects by terminating the root session or 

process.  By default, Hang Manager will not terminate an instance or evict a node. Also, Hang Manager will not 

resolve all hangs that it detects. For example, hangs which involve a possible application issue are left to the user 

to determine the correct course of action.  Also, if Hang Manager determines that the instance on which the root 

of a hang resides is experiencing high CPU or IO load, hang resolution will be delayed.  This gives some time for 

the root to progress and the hang to resolve itself.  Finally, Hang Manager currently does not resolve any hangs 

on ASM.

Starting with Oracle Database 12.1.0.1 and if QoS is active on the cluster, Hang Manager will use additional

 information provided by QoS to determine if a hang should be ignored or resolved.  If QoS would like Hang 

Manager to resolve a hang, hang resolution may occur more quickly than Hang Manager's normal detection 

and resolution times or hang resolution may be delayed.

Previous to 12.1.0.1, hangs were detected only within the database or ASM cluster.  However, if a session 

on ASM or the database was being blocked by a session on the other side of the ASM/Database boundary

 (a Cross Boundary Hang), there was no way to detect it.  In Oracle Database 12.1.0.1, Hang Manager was

 enhanced to detect hangs which cross the ASM/Database boundary and outputs information to its various 

trace files so it is easier to troubleshoot these hangs.


同hang管理器相關的檢視:

V$ Views

Hang Manager exposes some information via various V$ views.  These are now documented in the Oracle 

Database Reference.  A summary of the information in these views is below. For a more detailed description of these views, please go to the appropriate section in the Oracle Database Reference.

V$HANG_INFO

This view contains the active hangs and deadlocks detected by Hang Manager. It does not include any hangs 

which have been resolved by Hang Manager or have self-resolved. Only the latest 32 active hangs are in this view.

The root or victim of the hang is included in the information for each active hang in the view.

V$HANG_SESSION_INFO

This view contains the sessions in the main chain for all of the hangs included in the V$HANG_INFO view. 

Only the first 20 sessions of the main chain, including the root of the hang, for each hang are kept in this view.

(G)V$HANG_STATISTICS

This view contains various Hang Manager statistics regarding hangs or deadlocks which it has detected. These statistics include: the number of hangs detected, the number of deadlocks detected, the number of hangs resolved by Hang Manager, the number of hangs ignored broken down by the reason for ignoring the hang, the number of hangs which self-resolved, etc.


2 資料庫的告警,檢視發生hang的相關資訊

節點1:

Sun Sep 27 00:31:23 2020

Errors in file /u01/app/oracle/diag/rdbms/test/test1/trace/test1_dia0_25054.trc  (incident=920985):

ORA-32701: Possible hangs up to hang ID=1 detected                    --hang ID=1 說明資料庫自動診斷出Hang

Incident details in: /u01/app/oracle/diag/rdbms/test/test1/incident/incdir_920985/test1_dia0_25054_i920985.trc

DIA0 requesting termination of session sid:1244 with serial # 75 (ospid:88984) on instance 3  

--請求終止例項3的會話,作業系統PID:88984

     due to a LOCAL, HIGH confidence hang with ID=1.

     Hang Resolution Reason: Although the number of affected sessions did not

    justify automatic hang resolution initially, this previously ignored

    hang was automatically resolved.

DIA0: Examine the alert log on instance 3 for session termination status of hang with ID=1.  

--發現例項3的會話阻塞住會話,檢查例項3的alert日誌



節點3:


Sun Sep 27 00:31:23 2020

Errors in file /u01/app/oracle/diag/rdbms/test/test3/trace/test3_dia0_22856.trc  (incident=880256):

ORA-32701: Possible hangs up to hang ID=1 detected                        --hang ID=1 說明資料庫自動診斷出Hang

Incident details in: /u01/app/oracle/diag/rdbms/test/test3/incident/incdir_880256/test3_dia0_22856_i880256.trc

DIA0 terminating blocker (ospid: 88984 sid: 1244 ser#: 75) of hang with ID = 1    

 --本節點Hang住了,故終止了此會話

    requested by master DIA0 process on instance 1                                    --請求的程式來自於例項1

    Hang Resolution Reason: Although the number of affected sessions did not

    justify automatic hang resolution initially, this previously ignored

    hang was automatically resolved.

    by terminating session sid:1244 with serial # 75 (ospid:88984)                      --終止了作業系統程式


首先根據hang ID=1 和 hang ID=2  檢視如下檢視,此檢視會解釋每個程式碼的含義:

SYS@test3>select * from V$HANG_STATISTICS;


STATISTIC# NAME                                               VALUE

---------- --------------------------------------------- ----------

         0 number of deadlocks detected and ignored               0

         1 number of hangs detected                               0

         2 number of local hangs                                  0

         3 number of global hangs                                 0

         4 number of transient hangs                              0

         5 hangs ignored due to high CPU on root's node           0

         6 hangs ignored due to high IO on root's node            0

         7 hangs ignored due to application contention            0

         8 hangs ignored due to long running operations           0

         9 hangs monitored due to archiving issues                0

        10 hangs ignored due to archiving issues                  0

        11 hangs ignored, blocked by remote database              0

        12 hangs ignored due to SQL parsing                       0

        13 hangs ignored due to dumping system state              0

        14 hangs ignored, instance termination required           0

        15 hangs ignored, only one active instance                0

        16 number of explicitly resolved hangs                    0

        17 number of self-resolved hangs                          0

        18 total self-resolved hang time in seconds               0

        19 minimum self-resolved hang time in seconds             0

        20 maximum self-resolved hang time in seconds             0

        21 number of HSC matched hangs                            0

        22 hangs resolved due to instance termination             0


23 rows selected.


4 分析資料庫生成的trace檔案,找出報錯的根本原因

節點1 檢視test1_dia0_25054_i920985.trc

Dump file /u01/app/oracle/diag/rdbms/test/test1/incident/incdir_920985/test1_dia0_25054_i920985.trc

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,

Data Mining and Real Application Testing options

ORACLE_HOME = /u01/app/oracle/product/11.2.0.4/db_1

System name: Linux

Node name: h0783

Release: 2.6.32-642.el6.x86_64

Version: #1 SMP Wed Apr 13 00:51:26 EDT 2016

Machine: x86_64

Instance name: test1

Redo thread mounted by this instance: 1

Oracle process number: 10

Unix process pid: 25054, image: oracle@h0783 (DIA0)



*** 2020-09-27 00:31:23.807

*** SESSION ID:(201.1) 2020-09-27 00:31:23.807

*** CLIENT ID:() 2020-09-27 00:31:23.807

*** SERVICE NAME:(SYS$BACKGROUND) 2020-09-27 00:31:23.807

*** MODULE NAME:() 2020-09-27 00:31:23.807

*** ACTION NAME:() 2020-09-27 00:31:23.807

 

Dump continued from file: /u01/app/oracle/diag/rdbms/test/test1/trace/test1_dia0_25054.trc

ORA-32701: Possible hangs up to hang ID=1 detected


========= Dump for incident 920985 (ORA 32701) ========

----- Beginning of Customized Incident Dump(s) -----

 

One or more resolvable hangs have been detected on your system.

An attempt will be made to resolve these hangs.

 

The stack included in this incident file is not the cause of the

problem.  It is merely pointing to the module that triggered the

incident.  The short stacks of the root or final blocked of the

hang and its waiter can be found in the DIA0 trace files on their

respective local instances.

 

'Resolvable Hangs' below indicate one or more hangs that were found

and identify the final blocking session and instance on which

they occurred. Since the current hang resolution state is 'PROCESS',

any hangs requiring session or process termination will be

automatically resolved.

 

The following information will assist Oracle Support Services

in further analysis of the root cause of the hang.

 


*** 2020-09-27 00:31:23.809

Resolvable Hangs in the System

                     Root       Chain Total               Hang               

  Hang Hang          Inst Root  #hung #hung  Hang   Hang  Resolution         

    ID Type Status   Num  Sess   Sess  Sess  Conf   Span  Action             

 ----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------

      1 HANG RSLNPEND    3  1244     2     2   HIGH  LOCAL Terminate Process     

 --此處說明Hang的會話在節點3上,會話的ID為1244

  Hang Resolution Reason: Although the number of affected sessions did not

    justify automatic hang resolution initially, this previously ignored

    hang was automatically resolved.

 

  inst# SessId  Ser#     OSPID PrcNm Event

  ----- ------ ----- --------- ----- -----

      3   2562   353     28939  M001 enq: WF - contention

      3   1244    75     88984  M000 not in wait                                 --此處說明會話1244  等待是事件為 not in wait

 

  The incident file on instance 3 may contain information

  about session 1244 with serial number 75 and operating system

  process ID 88984.  The incident file may contain a short stack

  and a process state dump.

 

Victim Information

                                                                      Ignored

  HangID  Inst#  Sessid  Ser Num      OSPID  Fatal BG  Previous Hang    Count

  ------  -----  ------  -------  ---------  --------  -------------  -------

       1      3    1244       75      88984     N      Existing Hang        1     


*** 2020-09-27 00:31:23.809

 

Wait-For-Graphs collected at 09/27/20 00:25:31)                                 --以下資訊說明蒐集的圖形化阻塞的相關資訊

===============================================================================

Non-intersecting chains:

 

-------------------------------------------------------------------------------

Chain 1:         --Chain 1 說明會話2562等待的事件為:'enq: WF - contention' ,被1244會話阻塞住,

1244會話的等待事件為 not int wait,但等待:event: 'gc current grant 2-way'

-------------------------------------------------------------------------------

    Oracle session identified by:

    {

                instance: 3 (test.test3)

                   os id: 28939

              process id: 512, oracle@h0785 (M001)

              session id: 2562

        session serial #: 353

    }

    is waiting for 'enq: WF - contention' with wait info:

    {

                      p1: 'name|mode'=0x57460006

                      p2: '0'=0x38

                      p3: '0'=0x0

            time in wait: 1 min 53 sec

           timeout after: never

                 wait id: 73008

                blocking: 0 sessions

            wait history:

              * time between current wait and wait #1: 0.003403 sec

              1.       event: 'db file sequential read'

                 time waited: 0.000226 sec

                     wait id: 73007           p1: 'file#'=0x2

                                              p2: 'block#'=0x1eeff

                                              p3: 'blocks'=0x1

              * time between wait #1 and #2: 0.000390 sec

              2.       event: 'db file sequential read'

                 time waited: 0.000207 sec

                     wait id: 73006           p1: 'file#'=0x2

                                              p2: 'block#'=0xe178

                                              p3: 'blocks'=0x1

              * time between wait #2 and #3: 0.002593 sec

              3.       event: 'db file scattered read'

                 time waited: 0.000276 sec

                     wait id: 73005           p1: 'file#'=0x2

                                              p2: 'block#'=0x1eefb

                                              p3: 'blocks'=0x2

    }

    and is blocked by

 => Oracle session identified by:

    {

                instance: 3 (test.test3)

                   os id: 88984

              process id: 446, oracle@h0785 (M000)

              session id: 1244

        session serial #: 75

    }

    which is not in a wait:

    {

               last wait: 1165 min 7 sec ago

                blocking: 1 session

            wait history:

              1.       event: 'gc current grant 2-way'    --此類等待事件為資料庫從記憶體中讀取資料塊,發現沒有,

然後從磁碟中讀到記憶體中。故認為此等待事件正常。結合實際查詢,可以認為2562的會話被1224會話阻塞住,

但由於1224會話正在從磁碟大量的讀取資料塊到記憶體中,導致的阻塞。經細查,2562會話當時操作的SQL為:

delete from WRH$_SYSMETRIC_HISTORY tab where (:beg_snap <= tab.snap_id and tab.snap_id <= :end_snap 

and dbid = :dbid) and not exists  (select 1 from WRM$_BASELINE b where (tab.dbid = b.dbid) 

and (tab.snap_id >= b.start_snap_id) and (tab.snap_id <= b.end_snap_id)) ,

 1224會話操作的SQL為: insert into wrh$_sql_bind_metadata  ,兩個會話都是操作AWR相關的表,故有可能導致阻塞。

                 time waited: 0.000321 sec

                     wait id: 1878            p1: ''=0x2

                                              p2: ''=0x2a1a6

                                              p3: ''=0x2010001

              * time between wait #1 and #2: 0.000204 sec

              2.       event: 'gc current grant 2-way'

                 time waited: 0.000384 sec

                     wait id: 1877            p1: ''=0x2

                                              p2: ''=0x39862

                                              p3: ''=0x2010001

              * time between wait #2 and #3: 0.000523 sec

              3.       event: 'gc current block 3-way'

                 time waited: 0.000692 sec

                     wait id: 1876            p1: ''=0x2

                                              p2: ''=0x29dea

                                              p3: ''=0x1

    }

 

Chain 1 Signature: <not in a wait><='enq: WF - contention'

Chain 1 Signature Hash: 0xee4c6e02

-------------------------------------------------------------------------------

 

===============================================================================

Sessions in an involuntary wait or not in a wait:

 

檢視例項3的test3_dia0_22856_i880256.trc

Dump file /u01/app/oracle/diag/rdbms/test/test3/incident/incdir_880256/test3_dia0_22856_i880256.trc

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,

Data Mining and Real Application Testing options

ORACLE_HOME = /u01/app/oracle/product/11.2.0.4/db_1

System name: Linux

Node name: h0785

Release: 2.6.32-642.el6.x86_64

Version: #1 SMP Wed Apr 13 00:51:26 EDT 2016

Machine: x86_64

Instance name: test3

Redo thread mounted by this instance: 3

Oracle process number: 10

Unix process pid: 22856, image: oracle@h0785 (DIA0)



*** 2020-09-27 00:31:23.982

*** SESSION ID:(201.1) 2020-09-27 00:31:23.982

*** CLIENT ID:() 2020-09-27 00:31:23.982

*** SERVICE NAME:(SYS$BACKGROUND) 2020-09-27 00:31:23.982

*** MODULE NAME:() 2020-09-27 00:31:23.982

*** ACTION NAME:() 2020-09-27 00:31:23.982

 

Dump continued from file: /u01/app/oracle/diag/rdbms/test/test3/trace/test3_dia0_22856.trc

ORA-32701: Possible hangs up to hang ID=1 detected


========= Dump for incident 880256 (ORA 32701) ========

----- Beginning of Customized Incident Dump(s) -----

 

There are resolvable hangs on your system.  Hang Manger will

attempt to resolve these hangs.  Some information about these

is output below.  Complete information is available in an

incident trace file on instance 1.

 

If the hang is to be resolved by terminating the session or

process that is the root or victim of the hang, additional

information will be output on the local instance of that

session or process.  Below are the hangs for which resolution

will be attempted.

 


*** 2020-09-27 00:31:23.983

Resolvable Hangs in the System

                      Root       Chain Total               Hang               

   Hang Hang          Inst Root  #hung #hung  Hang   Hang  Resolution         

     ID Type Status   Num  Sess   Sess  Sess  Conf   Span  Action             

  ----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------

      1 HANG RSLNPEND    3  1244     2     2   HIGH  LOCAL Terminate Process  

  Hang Resolution Reason: Although the number of affected sessions did not

    justify automatic hang resolution initially, this previously ignored

    hang was automatically resolved.

 

      inst# SessId  Ser#     OSPID PrcNm Event

      ----- ------ ----- --------- ----- -----

          3   2562   353     28939  M001 enq: WF - contention

          3   1244    75     88984  M000 not in wait

 

Dumping process info of pid[446.88984] (sid:1244, ser#:75)

    requested by master DIA0 process on instance 1.

 


*** 2020-09-27 00:31:23.983

Process diagnostic dump for oracle@h0785 (M000), OS id=88984,

pid: 446, proc_ser: 32, sid: 1244, sess_ser: 75 

-------------------------------------------------------------------------------

os thread scheduling delay history: (sampling every 1.000000 secs)

  0.000000 secs at [ 00:31:23 ]

    NOTE: scheduling delay has not been sampled for 0.647808 secs  0.000000 secs from [ 00:31:19 - 00:31:24 ], 5 sec avg

  0.000000 secs from [ 00:30:24 - 00:31:24 ], 1 min avg

  0.000000 secs from [ 00:26:24 - 00:31:24 ], 5 min avg

loadavg : 3.04 3.01 2.86

Memory (Avail / Total) = 363436.51M / 516750.80M

Swap (Avail / Total) = 65536.00M /  65536.00M

F S UID         PID   PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD

0 R oracle    88984      1 99  80   0 - 315843 ?     Sep26 ?        19:28:37 ora_m000_test3

Short stack dump:  --此處為trace生成的二進位制錯誤的堆鍵資訊

ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-kghalf()+247<-kqlfbctc()+645<-kgligi()+31<-kglic_cbk()+345<-kglic0()+751<-kglic()+74

<-kqlfbct()+376<-qerfxFetch()+3072<-qerjotRowProc()+359<-qerhjInnerProbeHashTable()+491<-qerfxFetch()+1106<-rwsfcd()+103<-qerhjFetch()+621<-qerjotFetch()+2025

<-rwsfcd()+103<-qerltcFetch()+1223<-insexe()+691<-opiexe()+5632<-kpoal8()+2380<-opiodr()+917<-kpoodrc()+36<-rpiswu2()+1776<-kpoodr()+636<-upirtrc()+2436<-kpurcsc()+98

<-kpuexec()+10790<-OCIStmtExecute()+39<-kewrose_oci_stmt_exec()+55<-kewrgwxf1_gwrsql_exft_1()+317<-kewrgwxf_gwrsql_exft()+496<-kewrews_execute_wr_sql()+52

<-kewrftbs_flush_table_by_sql()+180<-kewrfabt_flush_attribute_table()+997<-kewrfsb_flush_sqlbind()+34<-kewrft_flush_table()+397<-kewrftec_flush_table_ehdlcx()+766

<-kewrftsq_flush_topsql()+768<-kewrft_flush_table()+397<-kewrftec_flush_table_ehdlcx()+766<-kewrfat_flush_all_tables()+1406<-kewrfsr_flush_snapshot_r()+191

<-kewrrfs_remote_flush_slave()+1169<-kebm_slave_main()+586<-ksvrdp()+1766<-opirip()+674<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201

<-__libc_start_main()+253

 

-------------------------------------------------------------------------------

Process diagnostic dump actual duration=0.340000 sec

  (max dump time=15.000000 sec)


*** 2020-09-27 00:31:24.319   --以下為報錯是執行想相關SQL,相關的表為 insert into wrh$_sql_bind_metadata  

current sql: insert into wrh$_sql_bind_metadata   (snap_id, dbid,    sql_id, name, position, dup_position,    

datatype, datatype_string,    character_sid, precision, scale, max_length)  SELECT /*+ ordered use_nl(bnd) 

index(bnd sql_id) */      :lah_snap_id, :dbid,      bnd.sql_id, name, position, dup_position,      datatype, dataty

 

                    ----------------------------------------

                    SO: 0x1ec14370a8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3

                     proc=0x1ec14370a8, name=process, file=ksu.h LINE:12721, pg=0

                    (process) Oracle pid:446, ser:32, calls cur/top: 0x1d32cfb3c8/0x1d33002b20

                              flags : (0x2) SYSTEM

                              flags2: (0x30),  flags3: (0x10) 

                              intr error: 0, call error: 0, sess error: 0, txn error 0

                              intr queue: empty

                      ksudlp FALSE at location: 0


5  根據第二步的會話ID,查詢在故障時間點執行的SQL,根據如下資訊,可以確定在故障點2562會話確實被1244會話阻塞住:


SYS@test3>select SESSION_ID,SESSION_SERIAL#,SAMPLE_TIME,WAIT_TIME,BLOCKING_SESSION,SQL_ID from 

dba_hist_active_sess_history where     INSTANCE_NUMBER=3 and SESSION_ID=1244 and

     SAMPLE_TIME between to_timestamp('2020-09-27 00:20:00','yyyy-mm-dd HH24:MI:SS') and 

to_timestamp('2020-09-27 00:40:00','yyyy-mm-dd HH24:MI:SS')  order by 3;


SESSION_ID SESSION_SERIAL# SAMPLE_TIME                           WAIT_TIME BLOCKING_SESSION SQL_ID

---------- --------------- ------------------------------------ ---------- ---------------- -------------

      1244              75 27-SEP-20 12.20.08.580 AM                   321                  82h7w3w07k63z

      1244              75 27-SEP-20 12.20.18.620 AM                   321                  82h7w3w07k63z

      1244              75 27-SEP-20 12.20.28.640 AM                   321                  82h7w3w07k63z

...........................

      1244              75 27-SEP-20 12.27.40.744 AM                   321                  82h7w3w07k63z

      1244              75 27-SEP-20 12.27.50.854 AM                   321                  82h7w3w07k63z

      1244              75 27-SEP-20 12.28.00.894 AM                   321                  82h7w3w07k63z

      1244              75 27-SEP-20 12.28.10.931 AM                   321                  82h7w3w07k63z

      1244              75 27-SEP-20 12.28.20.971 AM                   321                  82h7w3w07k63z

      1244              75 27-SEP-20 12.28.31.021 AM                   321                  82h7w3w07k63z

...........................

      1244              75 27-SEP-20 12.32.42.270 AM                   321                  82h7w3w07k63z

      1244              75 27-SEP-20 12.32.52.370 AM                   321                  82h7w3w07k63z


77 rows selected. 


SYS@test3>select SESSION_ID,SESSION_SERIAL#,SAMPLE_TIME,WAIT_TIME,BLOCKING_SESSION,SQL_ID 

from dba_hist_active_sess_history where 

   INSTANCE_NUMBER=3 and SESSION_ID=2562 and

  SAMPLE_TIME between to_timestamp('2020-09-27 00:20:00','yyyy-mm-dd HH24:MI:SS') and 

to_timestamp('2020-09-27 00:40:00','yyyy-mm-dd HH24:MI:SS')

    order by 3; 


SESSION_ID SESSION_SERIAL# SAMPLE_TIME                        WAIT_TIME BLOCKING_SESSION SQL_ID

---------- --------------- --------------------------------- ---------- ---------------- -------------

      2562             347 27-SEP-20 12.20.28.640 AM                  3                  0v9v22h2nuc0q

      2562             353 27-SEP-20 12.22.29.241 AM                  0                  7gfhbhxrxx8wf

      2562             353 27-SEP-20 12.22.39.291 AM                408                  9v9n97qj8z1dg

      2562             353 27-SEP-20 12.22.49.421 AM                  0                  bzscyq07w79ab

      2562             353 27-SEP-20 12.22.59.451 AM                  0                  bzscyq07w79ab

      2562             353 27-SEP-20 12.23.09.511 AM                  0                  bzscyq07w79ab

      2562             353 27-SEP-20 12.23.19.551 AM                  0                  2whm2vvjb98k7

      2562             353 27-SEP-20 12.23.29.581 AM                186                  acc988uzvjmmt

      2562             353 27-SEP-20 12.23.39.621 AM                  0             1244

      2562             353 27-SEP-20 12.23.49.731 AM                  0             1244

      2562             353 27-SEP-20 12.23.59.751 AM                  0             1244

      2562             353 27-SEP-20 12.24.09.810 AM                  0             1244

.....................

      2562             353 27-SEP-20 12.32.32.240 AM                  0             1244

      2562             353 27-SEP-20 12.32.42.270 AM                  0             1244

      2562             353 27-SEP-20 12.32.52.370 AM                  0             1244

      2562             353 27-SEP-20 12.33.02.420 AM                149                  18yuw9npdrxwc

      2562             353 27-SEP-20 12.33.12.440 AM                  0                  9wygvu6cx2npy

      2562             353 27-SEP-20 12.33.22.480 AM                  0                  9wygvu6cx2npy

      2562             353 27-SEP-20 12.33.32.510 AM                  0                  gkbrnf9xhv80g


68 rows selected. 


acc988uzvjmmt 執行的SQL:

delete from WRH$_SYSMETRIC_HISTORY tab where (:beg_snap <= tab.snap_id and tab.snap_id <= :end_snap 

and dbid = :dbid) and not exists 

(select 1 from WRM$_BASELINE b where (tab.dbid = b.dbid) and (tab.snap_id >= b.start_snap_id) 

and (tab.snap_id <= b.end_snap_id))


6 查詢資料庫的AWR生成資訊,發現在故障點不能生成AWR:如下


Listing the last 3 days of Completed Snapshots


                                                        Snap

Instance     DB Name        Snap Id    Snap Started    Level

------------ ------------ --------- ------------------ -----

test3       test            46313 26 Sep 2020 00:00      1

                              46314 26 Sep 2020 01:00      1


                              46315 26 Sep 2020 02:00      1

                              46316 26 Sep 2020 03:00      1

                              46317 26 Sep 2020 04:00      1    --26日的4點 直接到27日的7:00

                              46325 27 Sep 2020 07:00      1

                              46326 27 Sep 2020 08:00      1

                              46327 27 Sep 2020 09:00      1

                              46328 27 Sep 2020 10:00      1

                              46329 27 Sep 2020 11:00      1

                              46330 27 Sep 2020 12:00      1

                              46331 27 Sep 2020 13:00      1

                              46332 27 Sep 2020 14:00      1






7 查詢x$KQLFBC, 發現等待2小時都無法出來結果,通過追蹤,找到執行計劃,發現此表由1.4億(138*1024*1024)條資料,

故導致執行2小時無法查詢出結果



SYS@test3> select count(*) from x$kqlfbc;


Plan hash value: 4012607404


----------------------------------------------------------------------

| Id  | Operation         | Name     | Rows  | Cost (%CPU)| Time     |

----------------------------------------------------------------------

|   0 | SELECT STATEMENT  |          |       |    18 (100)|          |

|   1 |  SORT AGGREGATE   |          |     1 |            |          |

|   2 |   FIXED TABLE FULL| X$KQLFBC |   138K|    18 (100)| 00:00:01 |

----------------------------------------------------------------------



14 rows selected.



SYS@test3>select count(*) from x$ksmsp;


  COUNT(*)

----------

   3102652


8 由於ORA-32701故障發生在wrh$_sql_bind_metadata及 x$kqlfbc表上,是由於系統有大量的繫結變數引起的,

故查詢如下引數,發現如下引數設定太小,需要調整。 

SYS@test1>show parameter cursor


NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

cursor_bind_capture_destination      string      memory+disk

cursor_sharing                       string      EXACT

cursor_space_for_time                boolean     FALSE

open_cursors                         integer     300

session_cached_cursors               integer     500

        

PARAMETER              VALUE      USAGE

---------------------- ---------- -----

session_cached_cursors   500       258%

open_cursors             300       432%



HIGHEST_OPEN_CUR MAX_OPEN_CUR

---------------- --------------------

             246 300



PARAMETER                 VALUE                          USAGE

-------------------------------------------------------- ---------------

session_cached_cursors      500                           212%

open_cursors                300                           356%


                                                                                                 


HIGHEST_OPEN_CUR MAX_OPEN_CUR

---------------- --------------------

             236 300


PARAMETER              VALUE      USAGE

---------------------- ---------- -----

session_cached_cursors   500       249%

open_cursors             300       418%


ORA-32701問題的解決方法,以下的解決方法,是根據所有的Oracle官方文件總結出來的,

需要根據實際情況採用不同的方法,一般情況下一種方法不行,就使用第二種,直到解決問題。

一般情況下重啟資料庫例項就可以臨時的解決此問題,但生產環境一般是不允許重啟例項的,故需要

使用不重啟例項的方法解決。



9.1 蒐集如下表的統計資訊

exec dbms_stats.gather_table_stats('SYS', 'X$KEWRATTRNEW');

exec dbms_stats.gather_table_stats('SYS', 'X$KEWRTSQLPLAN');

exec dbms_stats.gather_table_stats('SYS', 'X$KEWRSQLIDTAB');

exec dbms_stats.gather_fixed_objects_stats;

exec dbms_stats.gather_schema_stats('SYS');


查詢相關表的統計資訊是否過期:    

select OWNER, TABLE_NAME, LAST_ANALYZED from dba_tab_statistics where table_name='X$KEWRATTRNEW';

select OWNER, TABLE_NAME, LAST_ANALYZED from dba_tab_statistics where table_name='X$KEWRTSQLPLAN';

select OWNER, TABLE_NAME, LAST_ANALYZED from dba_tab_statistics where table_name='X$KEWRSQLIDTAB';



9.2 有時此表x$KQLFBC的資料量會特別大,會導致查詢不出來或不能生成AWR,故需要通過設定隱含引數和刷共享池來解決

SYS@test1>select count(1) from x$KQLFBC;


  COUNT(1)

----------

    113714

    

select count(*) from  X$KQLFBC;


刷共享池

alter system flush shared_pool;

設定隱含引數

alter system set "_awr_disabled_flush_tables"='wrh$_sql_bind_metadata';



部署自動重新整理shared的指令碼


10 8,18 * * *  /home/oracle/database_check_sxwy/scripts/flush_shared.sh    


#!/bin/bash                                                                                      

export ORACLE_SID=test1                                                                         

sqlplus / as sysdba <<EOF

alter system flush shared_pool;                           

exit;                                                                                            

EOF


9.3 第三種禁用方法

先查詢AWR Table Info (KEW layer)使用下面的SQL:


select table_id_kewrtb, table_name_kewrtb from x$kewrtb order by 1;


TABLE_ID_KEWRTB TABLE_NAME_KEWRTB

--------------- ----------------------------------------------------------------

              0 WRM$_DATABASE_INSTANCE

              1 WRM$_SNAPSHOT

              2 WRM$_BASELINE

              3 WRM$_WR_CONTROL

              4 WRH$_STAT_NAME

。。。。。。。。。。

             53 WRH$_TABLESPACE_SPACE_USAGE

             54 WRH$_METRIC_NAME

             55 WRH$_SYSMETRIC_SUMMARY

             56 WRH$_SQL_BIND_METADATA

             57 WRH$_JAVA_POOL_ADVICE

             58 WRH$_THREAD

             59 WRH$_SESSMETRIC_HISTORY

             60 WRH$_OSSTAT

             61 WRH$_OSSTAT_NAME

             62 WRH$_SYS_TIME_MODEL

             63 wri$_optstat_opr

             64 WRH$_OPTIMIZER_ENV

             65 WRM$_SNAP_ERROR

             66 WRH$_SERVICE_WAIT_CLASS

             67 WRH$_CR_BLOCK_SERVER

。。。。。。。。。。。

            115 WRM$_WR_USAGE

            116 WRH$_MVPARAMETER

            117 WRH$_PERSISTENT_QMN_CACHE

            118 WRH$_TABLESPACE

            119 WRM$_SNAPSHOT_DETAILS

            120 WRI$_SQLTEXT_REFCOUNT


121 rows selected.



從上面的查詢中得到table_id用於下面的level 值,比如我們這裡表wrh$_sql_bind_metadata對應的是56


alter session set events ‘immediate trace name awr_flush_table_off level 56’;


後期如果想開啟flushing to awr 如果第二種方法禁用,開啟使用


alter session set events ‘immediate trace name awr_flush_table_on level 56’ off;


9.4 第四種解決方法為資料庫打相關的補丁。


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69996316/viewspace-2793845/,如需轉載,請註明出處,否則將追究法律責任。

相關文章