DB Issue Trouble Shooting Guideline (資料庫問題處理指導)

tolywang發表於2007-11-22

DB Issue Trouble Shooting Guideline

DB Issues :

Performance Issues

DB Hung

Internal Error Raised

Lock / Dead Lock

Other Issues


Performance Issues :

Most of (more 90%) performance issues are caused by Application.

Few (less than 10%) of issues are caused by resource limitation, OS/RDBMS configuration and other reasons.

Work flow of identifying/solving Performance Issues.

Identifying and Solving : DB Server , AP Server, Middle Ware, Client Program

Latent Reasons

Bottleneck in Client Program

PLSQL Issue

Unreasonable configuration in RDBMS

Network Issue

Memory Issue

CPU Issue

IO Issue

Top Processes

TOP command in UNIX

Glance command in HP-UX

Task Manager in Windows

Top Sessions in DB

Find out the PID of top process from OS

In dedicated server, find out the mapping session info in DB by the PID

select b.spid, a.sid, a.username, s.sql_text from v$session a , v$process b, v$sqlarea s where a.PADDR = b.ADDR and a.sql_hash_value = s.hash_value and b.spid=&PID;

Find out the long operation in the transaction

select username,sid,opname,

round ( sofar *100 / totalwork,0)||'%' as progress,

time_remaining,sql_text

from v$session_longops , v$sqlarea

where time_remaining <> 0

and sql_address = address

and sql_hash_value = hash_value;

Related View:

V$PROCESS

V$SESSION

V$SQLAREA

v$session_longops

Find more from Statspack Report

Generate the report during peak time

Analyze the report

Important Sections in the report

Instance Efficiency Percentages

Top 5 Wait Events

SQL ordered by Gets for DB

SQL ordered by Reads for DB

Tablespace IO Stats for DB & File IO Stats for DB

Instance Efficiency Percentages

Buffer Nowait Ratio

Buffer Hit Ratio

Library Hit Ratio

Redo no-wait Ratio

In-memory Sort Ratio

Instance Efficiency Percentages

Instance Efficiency Percentages (Target 100%)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Buffer Nowait %: 100.00 Redo NoWait %: 100.00

Buffer Hit %: 99.08 In-memory Sort %: 99.60

Library Hit %: 99.46 Soft Parse %: 98.99

Execute to Parse %: 45.61 Latch Hit %: 100.00

Parse CPU to Parse Elapsd %: 87.88 % Non-Parse CPU: 100.00

Top 5 Wait Events:

DB File Scattered Read.

DB File Sequential Read

Free Buffer

Enqueue

Log Buffer Space

Log File Sync

log file parallel write

Top 5 Wait Events

Top 5 Timed Events

~~~~~~~~~~~~~~~~~~ % Total

Event Waits Time (s) Ela Time

-------------------------------------------- ------------ ----------- --------

db file sequential read 7,544,366 25,591 48.49

CPU time 14,126 26.76

log file sync 50,593 6,241 11.82

db file scattered read 1,473,891 2,154 4.08

log file parallel write 60,964 1,546 2.93

-------------------------------------------------------------

SQL ordered by Gets for DB

SQL ordered by Gets for DB: ICSSPRD Instance: icssprd Snaps: 179 -181

-> End Buffer Gets Threshold: 10000

-> Note that resources reported for PL/SQL includes the resources used by

all SQL statements called within the PL/SQL code. As individual SQL

statements are also reported, it is possible and valid for the summed

total % to exceed 100

CPU Elapsd

Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value

--------------- ------------ -------------- ------ -------- --------- ----------

91,163,470 567 160,782.1 19.2 850.46 949.33 1513226087

Module: SQL*Plus

SELECT DISTINCT b.filename

|| :"SYS_B_00" FROM css_tp_transaction_event a

,css_tp_txn_event_activity b WHERE a.trdng_ptnr_id

= :"SYS_B_01" AND a.event_type_id in (:"SYS

_B_02",:"SYS_B_03") AND a.txn_event_id = b.txn_ev

SQL ordered by Reads for DB:

SQL ordered by Reads for DB: ICSSPRD Instance: icssprd Snaps: 179 -181

-> End Disk Reads Threshold: 1000

CPU Elapsd

Physical Reads Executions Reads per Exec %Total Time (s) Time (s) Hash Value

--------------- ------------ -------------- ------ -------- --------- ----------

11,993,968 120 99,949.7 36.2 443.71 552.02 4292891611

Module: JDBC Connect Client

BEGIN CSSPLSQL.B2B_MSG_DETECTOR.GET_B2B_OUT_MSG_DETECTOR ( :V1,

:V2, :V3); END;

Tablespace IO Stats for DB & File IO Stats for DB

Tablespace IO Stats for DB: ICSSPRD Instance: icssprd Snaps: 179 -181

->ordered by IOs (Reads + Writes) desc

Tablespace

------------------------------

Av Av Av Av Buffer Av Buf

Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)

-------------- ------- ------ ------- ------------ -------- ---------- ------

CSS_BKG_DATA

1,699,930 472 4.3 4.3 109,215 30 5,034 10.9

Generate SQL report for further study

SQL> @?rdbmsadminsprepsql.sql

……

Enter value for begin_snap: 886

Enter value for end_snap: 910

Enter value for hash_value: 4206207905



Generate SQL report for further study

Plans in shared pool between Begin and End Snap Ids

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Shows the Execution Plans found in the shared pool between the begin and end

snapshots specified. The values for Rows, Bytes and Cost shown below are those

which existed at the time the first-ever snapshot captured this plan - these

values often change over time, and so may not be indicative of current values

-> Rows indicates Cardinality, PHV is Plan Hash Value

-> ordered by Plan Hash Value

--------------------------------------------------------------------------------

| Operation | PHV/Object Name | Rows | Bytes| Cost |

--------------------------------------------------------------------------------

|SELECT STATEMENT |----- 235857360 -----| | | 2021 |

|SORT UNIQUE | | 194 | 16K| 2021 |

| CONCATENATION | | | | |

SQL Trace on top SQL

SET TIMING ON

SET AUTOT[RACE] TRACE

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

159414 consistent gets

0 physical reads

0 redo size

146 bytes sent via SQL*Net to client

235 bytes received via SQL*Net from client

1 SQL*Net roundtrips to/from client

1 sorts (memory)

0 sorts (disk)

0 rows processed

DB Hung Issue

It’s fatal fault

Mostly, need work with OS & Oracle support team

Run RDA in DB server host to collect information as soon

Log SR in Oracle Metalink

Internal Error

ORA-00600 to ORA-07445

Mostly relate to Oracle internal bug

Log SR in Metalink

Send Trace file (under bdump folder) to Oracle

Lock Issue

Top Event Enqueue

Find out the TM, TX lock; Find out the holder

Work with dev team to change the logic to reduce/release the locks

select * from DBA_WAITERS;

SELECT /*+ rule */ lpad(' ',decode(l.xidusn ,0,3,0))||l.oracle_username User_name,

o.owner, o.object_name, o.object_type, s.sid, s.serial#

FROM v$locked_object l,dba_objects o,v$session s

WHERE l.object_id=o.object_id

AND l.session_id=s.sid

ORDER BY o.object_id,xidusn DESC;

select sid,type,id1,id2,lmode,request,ctime,block from v$lock where type

in ('TM','TX');

V$lock

dba_waiters

v$locked_objects

Dead Lock:

It’s logical issue

Two or more sessions locking each other

Find out the related SQL from trace file

Work with dev team to change the logic

DEADLOCK DETECTED

Current SQL statement for this session:

DELETE FROM CSS_SI_BL_DISTRIBUTION WHERE (SI_REQUEST_ID = :"SYS_B_0")

The following deadlock is not an ORACLE error. It is a

deadlock due to user error in the design of an application

or from issuing incorrect ad-hoc SQL. The following

information may aid in determining the deadlock:

Deadlock graph:

---------Blocker(s)-------- ---------Waiter(s)---------

Resource Name process session holds waits process session holds waits

TM-00005ed2-00000000 622 573 SX 156 226 SX SSX

TX-00060004-00063409 156 226 X 622 573 X

session 573: DID 0001-026E-00000012 session 226: DID 0001-009C-0000256C

session 226: DID 0001-009C-0000256C session 573: DID 0001-026E-00000012

Rows waited on:

Session 226: obj - rowid = 0000700D - AAAHANAC4AAAAAAAAA

(dictionary objn - 28685, file - 184, block - 0, slot - 0) -- CSS_SI_CARGO_FK4

Session 573: obj - rowid = 00005EB6 - AAAF62AC2AAApFQAAh

(dictionary objn - 24246, file - 182, block - 168272, slot - 33) -- CSS_SI_BL_DISTRIBUTION

Information on the OTHER waiting sessions:

Session 226:

pid=156 serial=25463 audsid=96195353 user: 33/CSSJAVA

O/S info: user: oracle, term: unknown, ospid: , machine: as04.cargosmart.com

program: JDBC Thin Client

application name: JDBC Thin Client, hash value=0

Current SQL Statement:

DELETE FROM CSS_SI_RELATED_BOOKING WHERE (SI_REQUEST_ID = :"SYS_B_0")

End of information on OTHER waiting sessions.

Other Issues

Rollback segment can not be extend

Snapshot of rollback segment is too old

Rollback segment can not be extend

Long transaction involve much of undo log

Find out the related SQL from the error information

SQL tuning

Split the transaction

Transfer to batch insert / direct loading

Extent rollback segment

Snapshot of rollback segment is too old

Long transaction running, unable to get the consistent snapshot in rollback segment

Find out the related SQL from the error information

SQL Tuning

Improve other much-undo consuming/holding transactions

Setup special large segment for long transactions

Extent rollback segment

Other OS commands for collect information

sar

iostat

vmstat

swapinfo

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/35489/viewspace-85032/,如需轉載,請註明出處,否則將追究法律責任。

相關文章