ADDM: Automatic Database Diagnostic Monitor

tthero00boo發表於2013-11-24
from:
http://blog.csdn.net/tianlesoftware/article/details/5630942

 


 ADDM(Automatic Database Diagnostic Monitor) 是植入Oracle資料庫的一個自診斷引擎.
 ADDM 通過檢查和分析AWR獲取的資料來判斷Oracle資料庫中可能的問題.

 Oracle10g中推出了新的優化診斷工具:資料庫自動診斷監視工具(Automatic Database Diagnostic Monitor :ADDM)
 和SQL優化建議工具(SQL Tuning Advisor: STA)。
 這兩個工具的結合使用,能使DBA節省大量優化時間,也大大減少了系統當機的危險。
 簡單點說,ADDM就是收集相關的統計資料到自動工作量知識庫(Automatic Workload Repository :AWR)中,
 而STA則根據這些資料,給出優化建議。

有了STA以後,它就可以根據ADDM採集到的資料直接給出優化建議,甚至給出優化後的語句。


與其他顧問一樣,ADDM也接受來自AWR的統計資訊和其他資訊。
只要生成快照,MMON程式就會自動執行ADDM(Automatic Database Diagnostic)。

自動生成的ADDM報告總是會包括當前快照與前一個快照之間的時間段,因此在預設情況下可以訪問每小時的ADDM報告。
如果希望ADDM報告跨越更長的時間段,那麼也可以手動的呼叫ADDM生成包括任意兩個快照之間時間段的報告。
自動快照以及手動收集快照都會觸發ADDM。ADDM報告預設在30天后清除。

 

ADDM能發現定位的問題包括:

 作業系統記憶體頁入頁出問題
 由於Oracle負載和非Oracle負載導致的CPU瓶頸問題
 導致不同資源負載的Top SQL語句和物件——CPU消耗、IO頻寬佔用、潛在IO問題、RAC內部通訊繁忙
 按照PLSQL和JAVA執行時間排的Top SQL語句.
 過多地連線 (login/logoff).
 過多硬解析問題——由於shared pool過小、書寫問題、繫結大小不適應、解析失敗原因引起的。
 過多軟解析問題
 索引查詢過多導致資源爭用.
 由於使用者鎖導致的過多的等待時間 (通過包dbms_lock加的鎖)
 由於DML鎖導致的過多等待時間(例如鎖住表了)
 由於管道輸出導致的過多等待時間(如通過包dbms_pipe.put進行管道輸出)
 由於併發更新同一個記錄導致的過多等待時間(行級鎖等待)
 由於ITL不夠導致的過多等待時間(大量的事務操作同一個資料塊)
 系統中過多的commit和rollback(logfile sync事件).
 由於磁碟頻寬太小和其他潛在問題(如由於logfile太小導致過多的checkpoint,
  MTTR設定問題,過多的undo操作等等)導致的IO效能問題I
 對於DBWR程式寫資料塊,磁碟IO吞吐量不足
 由於歸檔程式無法跟上redo日至產生的速度,導致系統變慢
 redo資料檔案太小導致的問題
 由於擴充套件磁碟分配導致的爭用
 由於移動一個物件的高水位導致的爭用問題
 記憶體太小問題——SGA Target, PGA, Buffer Cache, Shared Pool
 在一個例項或者一個機群環境中存在頻繁讀寫爭用的熱塊
 在一個例項或者一個機群環境中存在頻繁讀寫爭用的熱物件
 RAC環境中內部通訊問題
 LMS程式無法跟上導致鎖請求阻塞
 在RAC環境中由於阻塞和爭用導致的例項傾斜
 RMAN導致的IO和CPU問題
 Streams和AQ問題
 資源管理等待事件

ADDM提供了一個整體的優化方案.基於一段時間內的AWR snapshots(預設一小時一次)可以執行ADDM 分析,
它可以幫我們診斷在這段期間內資料庫可能存在的瓶頸.

ADDM分析的目標是減小吞吐量的度量值, 在這裡我們將它稱為"DB TIME".
DB TIME是一個累積值(資料庫伺服器處理使用者請求所花費的時間).
它包括了等待時間和CPU處理的時間(針對所有活躍的使用者程式而言),可以通過查詢下面兩個檢視來獲得它的值: 
V$SESS_TIME_MODEL, V$SYS_TIME_MODEL.

AWR收集的資料時放到記憶體中(share pool),通過一個新的後臺程式MMON定期寫到磁碟中。
所以10g的share pool要求比以前版本更大,一般推薦比以前大15-20%。

通過減小"DB TIME", 使用同樣多的系統資源,資料庫伺服器可以處理更多的使用者請求,也就是提高了吞吐量.
通過ADDM報告的問題是按照DB time排序的.

/* ADDM 分析的結果 */

ADDM 分析的結果以一些"Finding"的樣式來表達. 每個"Finding"都屬於以下三種型別之一:

1. 問題: 描述了導致資料庫效能問題的根源;
2. 徵兆: 包含了可能導致其他問題的資訊
3. 資訊: 報告其他沒有問題的模組


/* 設定ADDM */

預設情況下,ADDM已經被自動啟用,通過初始化引數檔案中的STATISTICS_LEVEL來控制.
這個引數應該被設定成TYPICAL或者ALL(預設值是TYPICAL).如果你將這個引數設定成basic,很多Oracle的特性將被遮蔽.

ADDM 對於I/O效能的評估分析在部分程度上依賴於這個DBIO_EXPECTED.
這個引數的含義是讀取一個資料塊所花費的平均時間(以微秒為單位). Oracle使用的是預設值(10毫秒=10000微秒),
對於現在流行的硬碟來說, 這是一個比較合適的值.如果你的硬碟比較陳舊,或者你有一個非常好的RAM DISK,請修改這個值.

DBIO_EXPECTED

ADDM analysis of I/O performance partially depends on a single argument, DBIO_
EXPECTED, that describes the expected performance of the I/O subsystem. The
value of DBIO_EXPECTED is the average time it takes to read a single database
block in microseconds. Oracle uses the default value of 10 milliseconds, which is an
appropriate value for most modern hard drives. If your hardware is signi?cantly
different, such as very old hardware or very fast RAM disks, consider using a
different value.


為了決定DBIO_EXPECTED這個引數該怎樣去正確地配置,需要完成下面的步驟

1. 基於你的機器的硬體,估量一下讀取單個資料庫塊所花費的平均時間.
注意:這個度量應該針對隨機的I/O(包括尋道的時間).傳統的值應該屬於5000-20000微秒這個區間.

2. 為接下來的ADDM執行設定一個時間引數. 例如:如果估計的值是8000微秒,你應該以SYS的身份執行
下面的過程:

EXECUTE DBMS_ADVISOR.SET_DEFAULT_PARAMETER ('ADDM','DBIO_EXPECTED',8000);

/* 診斷與ADDM相關的問題: */
為了診斷資料庫效能問題, ADDM分析可以跨越任意兩個snapshots,只要它們滿足下面兩個條件:
1. 兩個快照在建立過程中沒有錯誤並且沒有被刪除;
2. 兩個快照期間資料庫不能發生關閉和啟動的事件
(同statspack).

/* 執行 addm */
最簡單的執行ADDM分析的方法就是執行Enterprise Manager.
另外,也可以手工地執行 $ORACLE_HOME/rdbms/admin/addmrpt.sql以及dbms_advisor包.
這些指令碼和包可以被任何使用者執行,只要它們被賦予了ADVISOR的角色.

使用addmrpt.sql來執行
和statspack包中的spreport.sql非常相似

 /* 1. 手工建立一個AWR快照 */
SQL> exec DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();

 /* 2. 進行一些高負荷操作 */
ed
:set nonumber

declare
    v_var number;
begin
    for i in 1..10
    loop
        select count(*) into v_var from xujh.small_table ,xujh.big_table;
        dbms_output.put_line('time:'||i||':'||v_var);
    end loop;
end;
/

SQL> /
time:1:10000000
time:2:10000000
time:3:10000000
time:4:10000000
time:5:10000000
time:6:10000000
time:7:10000000
time:8:10000000
time:9:10000000
time:10:10000000

PL/SQL procedure successfully completed.

/* 3. 再建立一個AWR快照 */
SQL> exec DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();

/* 4. 建立一個優化診斷任務並執行 */

先獲取到兩次快照的ID:

SQL> select snap_id from
  2  (SELECT * FROM dba_hist_snapshot
  3   ORDER BY snap_id desc)
  4   where rownum <=2;

   SNAP_ID
----------
       393
       392

然後建立優化任務,並執行。

DECLARE
    task_name VARCHAR2(30) := 'DEMO_ADDM01';
    task_desc VARCHAR2(30) := 'ADDM Feature Test';
    task_id NUMBER;
BEGIN
    dbms_advisor.create_task('ADDM', task_id, task_name, task_desc, null);
    dbms_advisor.set_task_parameter(task_name, 'START_SNAPSHOT', 392);
    dbms_advisor.set_task_parameter(task_name, 'END_SNAPSHOT', 393);
    dbms_advisor.set_task_parameter(task_name, 'INSTANCE', 1);
    dbms_advisor.set_task_parameter(task_name, 'DB_ID', 2432893466);
    dbms_advisor.execute_task(task_name);
END;
/

SQL> /

PL/SQL procedure successfully completed.

/* 5. 檢視優化建議結果 */

通知函式dbms_advisor.get_task_report可以得到優化建議結果

SQL> SET LONG 1000000 PAGESIZE 0 LONGCHUNKSIZE 1000
SQL> COLUMN get_clob FORMAT a80
SQL> SELECT dbms_advisor.get_task_report('DEMO_ADDM01', 'TEXT', 'ALL') FROM DUAL;

/* 4,5 可以直接執行$ORACLE_HOME/rdbms/admin/addmrpt.sql來得到,
這個指令碼的執行過程和statspack指令碼執行過程類似:
*/

SQL> @?/rdbms/admin/addmrpt.sql

....

Listing the last 3 days of Completed Snapshots

                                                        Snap
Instance     DB Name        Snap Id    Snap Started    Level
------------ ------------ --------- ------------------ -----
myorcl11     MYORCL11           370 22 Nov 2013 01:16      1

                                371 22 Nov 2013 09:05      1
                                372 22 Nov 2013 10:00      1
                                373 22 Nov 2013 11:00      1
                                374 22 Nov 2013 12:00      1
                                375 22 Nov 2013 12:11      1
                                376 22 Nov 2013 13:00      1
                                377 22 Nov 2013 14:00      1

                                378 23 Nov 2013 13:35      1
                                379 23 Nov 2013 15:00      1

                                380 24 Nov 2013 07:51      1

                                381 24 Nov 2013 09:00      1
                                382 24 Nov 2013 10:00      1
                                383 24 Nov 2013 11:00      1
                                384 24 Nov 2013 12:00      1

                                385 24 Nov 2013 13:00      1

                                386 24 Nov 2013 14:23      1
                                387 24 Nov 2013 15:00      1
                                388 24 Nov 2013 15:53      1
                                389 24 Nov 2013 15:55      1
                                390 24 Nov 2013 16:30      1
                                391 24 Nov 2013 17:00      1
                                392 24 Nov 2013 17:24      1
                                393 24 Nov 2013 17:34      1

 

Specify the Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap: 386
Begin Snapshot Id specified: 386

Enter value for end_snap: 393
End   Snapshot Id specified: 393

 

Specify the Report Name
~~~~~~~~~~~~~~~~~~~~~~~
The default report file name is addmrpt_1_386_393.txt.  To use this name,
press to continue, otherwise enter an alternative.

Enter value for report_name:

Using the report name addmrpt_1_386_393.txt

$ less addmrpt_1_386_393.txt

/* 如果是RAC環境下,可以執行$ORACLE_HOME/rdbms/admin/addmrpti.sql,這指令碼的執行,
會多出要求輸入DB ID和instance ID的要求。
*/

 

/* 診斷結果分析 */

我們從上面的建議結果看到了,ADDM Report的結果與Statspack Report的結果大不相同。
Statspack Report的結果給出的都是統計資料、各種事件,然後由DBA根據這些資料給出優化建議,
而ADDM Report的結果包含就已經是給出的優化建議了
 
Summary of Findings
-------------------
   Description                     Active Sessions      Recommendations
                                   Percent of Activity
   ------------------------------  -------------------  ---------------
1  "User I/O" wait Class           .01 | 24.19          0
2  Hard Parse                      .01 | 17.97          0
3  Session Connect and Disconnect  0 | 5.17             1
4  "Scheduler" Wait Class          0 | 4.11             0
5  Commits and Rollbacks           0 | 3.72             1

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          Findings and Recommendations
          ----------------------------

Finding 1: "User I/O" wait Class
Impact is .01 active sessions, 24.19% of total activity.
--------------------------------------------------------
Wait class "User I/O" was consuming significant database time.
The throughput of the I/O subsystem was not significantly lower than expected.
The Oracle instance memory (SGA and PGA) was adequately sized.

   No recommendations are available.


Finding 2: Hard Parse
Impact is .01 active sessions, 17.97% of total activity.
--------------------------------------------------------
Hard parsing of SQL statements was consuming significant database time.
Hard parses due to cursor environment mismatch were not consuming significant
database time.
Hard parsing SQL statements that encountered parse errors was not consuming
significant database time.
Hard parses due to literal usage and cursor invalidation were not consuming
significant database time.
The Oracle instance memory (SGA and PGA) was adequately sized.

   No recommendations are available.


Finding 3: Session Connect and Disconnect
Impact is 0 active sessions, 5.17% of total activity.
-----------------------------------------------------
Session connect and disconnect calls were consuming significant database time.

   Recommendation 1: Application Analysis
   Estimated benefit is 0 active sessions, 5.17% of total activity.
   ----------------------------------------------------------------
   Action
      Investigate application logic for possible reduction of connect and
      disconnect calls. For example, you might use a connection pool scheme in
      the middle tier.


Finding 4: "Scheduler" Wait Class
Impact is 0 active sessions, 4.11% of total activity.
-----------------------------------------------------
Wait class "Scheduler" was consuming significant database time.

   No recommendations are available.


Finding 5: Commits and Rollbacks
Impact is 0 active sessions, 3.72% of total activity.
-----------------------------------------------------
Waits on event "log file sync" while performing COMMIT and ROLLBACK operations
were consuming significant database time.

   Recommendation 1: Host Configuration
   Estimated benefit is 0 active sessions, 3.72% of total activity.
   ----------------------------------------------------------------
   Action
      Investigate the possibility of improving the performance of I/O to the
      online redo log files.

   Rationale
      The average size of writes to the online redo log files was 5 K and the
      average time per write was 3 milliseconds.
   Rationale
      The total I/O throughput on redo log files was 6.9 K per second for
      reads and 7.2 K per second for writes.
   Rationale
      The redo log I/O throughput was divided as follows: 0% by RMAN and
      recovery, 50% by Log Writer, 0% by Archiver, 0% by Streams AQ and 49% by
      all other activity.

   Symptoms That Led to the Finding:
   ---------------------------------
      Wait class "Commit" was consuming significant database time.
      Impact is 0 active sessions, 3.72% of total activity.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

          Additional Information
          ----------------------

Miscellaneous Information
-------------------------
Wait class "Application" was not consuming significant database time.
Wait class "Concurrency" was not consuming significant database time.
Wait class "Configuration" was not consuming significant database time.
CPU was not a bottleneck for the instance.
Wait class "Network" was not consuming significant database time.

The database's maintenance windows were active during 100% of the analysis
period.


找到了有問題的SQL後我們就可以用 Oracle SQL Tuning Advisor 工具來優化該SQL...

 


/* 關於DBIO_EXPECTED 讀取一個資料塊所花費的平均時間(以微秒為單位)
   vmware 亂搞了一通  */

/* test 1 太空閒 iostat */
[root@localhost ~]# iostat -x -k
Linux 2.6.32-358.el6.i686 (localhost.myrh6)     11/24/2013      _i686_  (1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.81    0.00    1.48    3.12    0.00   91.59

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               1.00    11.52    3.79    3.61   104.22    60.50    44.51     0.14   18.77   6.21   4.60
sdb               0.00     1.07    0.03    0.06     0.35     4.52   110.25     0.00   48.96   8.73   0.08

/* 只開了一個會話,執行一個大查詢...,參考性結果 */

rkB/s = 104.22   blksize = 8k
select 1000*1000/(104.22/8) from dual;

76760.6985223566

/* test 2 非隨機 dd */
[root@localhost ~]# /usr/bin/time dd if=/opt/oracle/oradata/myorcl11/system01.dbf of=/dev/null bs=8k       
98561+0 records in
98561+0 records out
807411712 bytes (807 MB) copied, 30.2891 s, 26.7 MB/s
0.00user 2.23system 0:30.33elapsed 7%CPU (0avgtext+0avgdata 3072maxresident)k
1570632inputs+0outputs (0major+228minor)pagefaults 0swaps

[root@localhost ~]# echo $((807411712/1024/8))
98561 -- blocks
[root@localhost ~]# echo $((302891 * 100 / 98561)) --echo 不能算小數,30.2891*1000*1000懶得切計算器..
307

/* test 3 hdparm */
[root@localhost ~]# hdparm -T /dev/sda  --記憶體快取速度

/dev/sda:
 Timing cached reads:   7904 MB in  2.00 seconds = 3953.71 MB/sec

[root@localhost ~]# hdparm -t /dev/sda  --真正磁碟速度

/dev/sda:
 Timing buffered disk reads:  110 MB in  3.07 seconds =  35.86 MB/sec

select 1000*1000 /(35.86*1024/8) from dual;

217.86

/* 多次取平均,等到空了再看下...*/


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/28859270/viewspace-777443/,如需轉載,請註明出處,否則將追究法律責任。

相關文章