診斷與分析itpub壇友提出關於為何awr cpu usage非常高

dawn009發表於2014-08-19
個人簡介:
   8年oracle從業經驗,具備豐富的oracle技能,目前在國內北京某專業oracle服務公司從事高階技術顧問。
   
   服務過的客戶:
          中國電信
          中國移動
          中國聯通
          中國電通
          國家電網
          四川達州商業銀行
          湖南老百姓大藥房
          山西省公安廳
          中國郵政
          北京302醫院     
          河北廊坊新奧集團公司
  
 專案經驗:
           中國電信3G專案AAA系統資料庫部署及最佳化
           中國聯通CRM資料庫效能最佳化
           中國移動10086電商平臺資料庫部署及最佳化
           湖南老百姓大藥房ERR資料庫sql最佳化專案
           四川達州商業銀行TCBS核心業務系統資料庫模型設計和RAC部署及最佳化
           四川達州商業銀行TCBS核心業務系統後端批處理儲存過程功能模組編寫及最佳化
           北京高鐵訊號監控系統RAC資料庫部署及最佳化
           河南宇通客車資料庫效能最佳化
           中國電信電商平臺核心採購模組表模型設計及最佳化
           中國郵政儲蓄系統資料庫效能最佳化及sql最佳化
           北京302醫院資料庫遷移實施
           河北廊坊新奧data guard部署及最佳化
           山西公安廳身份證審計資料庫系統故障評估
           
  
 聯絡方式:
          手機:18201115468
           qq   :   305076427
           qq微博: wisdomone1
           
           新浪微博:wisdomone9
          
           qq群:275813900    
          
           itpub部落格名稱:wisdomone1    http://blog.itpub.net/9240380/


前言:
   itpub壇友提出的問題,詳細連結見下:
http://www.itpub.net/thread-1883599-1-1.html

分析:
DB Name DB Id Instance Inst num Startup Time Release RAC
HDDB 729144632 hddb1 1 19-6月 -14 03:06 11.2.0.4.0 YES




Host Name Platform CPUs Cores Sockets Memory (GB)
dbserver1 AIX-Based Systems (64-bit) 32 8   62.00


Snap Id Snap Time Sessions Cursors/Session Instances
Begin Snap 32367 13-8月 -14 10:02:33 587 4.8 2
End Snap: 32368 13-8月 -14 10:31:08 592 4.9 2
Elapsed:   28.60 (mins)      
DB Time:   1,036.10 (mins)      


資料庫版本11.2.0.4.0,rac架構

資料取樣間隔28分鐘
配置32個cpu,db time為1036分鐘,db time/elapsed為37,表明資料庫非常繁忙
32個物理CPU已滿負荷執行
另外,表明會存在等待5個cpu時間=5*28.6分鐘=143分鐘*60=8580s 

Top 10 Foreground Events by Total Wait Time

Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
DB CPU   9239.6   14.9  
reliable message 1,175,583 3924.9 3 6.3 Other
db file sequential read 429,967 1844.8 4 3.0 User I/O
SQL*Net more data to client 8,947,769 838.2 0 1.3 Network
log file sync 195,677 436.8 2 .7 Commit
gc current block 2-way 433,667 331.9 1 .5 Cluster
db file parallel read 40,693 260.8 6 .4 User I/O
gc cr block 2-way 137,863 125.1 1 .2 Cluster
SQL*Net message to client 32,641,422 99.9 0 .2 Network
gc current block busy 28,184 80.6 3 .1 Cluster 


上述TOP 10等待事件累計不到30%的 db time,資料庫這麼繁忙,還有70%的cpu的的資源消耗到哪兒去了

即db time*0.7=1036分鐘*0.7=725分鐘 

Wait Classes by Total Wait Time
Wait Class Waits Total Wait Time (sec) Avg Wait (ms) % DB time Avg Active Sessions
DB CPU   9,240   14.9 5.4
Other 6,603,229 4,114 1 6.6 2.4
User I/O 481,780 2,119 4 3.4 1.2
Network 42,069,399 1,190 0 1.9 0.7
Cluster 832,954 793 1 1.3 0.5
Commit 195,677 437 2 .7 0.3
System I/O 571,383 418 1 .7 0.2
Concurrency 2,157,806 100 0 .2 0.1
Configuration 512 3 5 .0 0.0
Application 714 1 1 .0 0.0 
這是各個等待事件型別的佔比分佈情況,由此圖可知哪些等待型別最嚴重,爾後進行針對性分析與診斷 

Host CPU

CPUs Cores Sockets Load Average Begin Load Average End %User %System %WIO %Idle
32 8   51.97 78.14 77.7 19.1 0.4 3.2


從主機層面來看,CPU空閒只有3.2%,表明目前cpu基本滿負荷運轉了

Instance CPU

%Total CPU %Busy CPU %DB time waiting for CPU (Resource Manager)
17.6 18.1 0.0


從資料庫佔用CPU資源來看,資料庫只消耗了主機cpu資源的17.6%,說明大量的cpu資源是由非oracle應用程式消耗了


Time Model Statistics 

Statistic Name Time (s) % of DB Time
sql execute elapsed time 52,477.58 84.42
DB CPU 9,239.60 14.86
parse time elapsed 422.57 0.68
PL/SQL execution elapsed time 41.28 0.07
hard parse elapsed time 18.58 0.03
sequence load elapsed time 17.49 0.03
hard parse (sharing criteria) elapsed time  15.02 0.02
connection management call elapsed time  12.12 0.02
hard parse (bind mismatch) elapsed time 3.45 0.01
repeated bind elapsed time 0.65 0.00
PL/SQL compilation elapsed time 0.38 0.00
failed parse elapsed time 0.05 0.00
DB time 62,166.08  
background elapsed time 2,233.68  
background cpu time 404.84

上述awr資訊源於如下檢視 

V$SYS_TIME_MODEL

V$SYS_TIME_MODEL displays the system-wide accumulated times for various operations. The time reported is the total elapsed or CPU time (in microseconds). Any timed operation will buffer at most 5 seconds of time data. Specifically, this means that if a timed operation (such as SQL execution) takes a long period of time to perform, the data published to this view is at most missing 5 seconds of the time accumulated for the operation.

The time values are 8-byte integers and can therefore hold approximately 580,000 years worth of time before wrapping. Background process time is not included in a statistic value unless the statistic is specifically for background processes.

Column Datatype Description
STAT_ID NUMBER Statistic identifier for the time statistic
STAT_NAME VARCHAR2(64) Name of the statistic (see )
VALUE NUMBER Amount of time (in microseconds) that the system has spent in this operation 

小結:顯示是一個指標的累計值
         如果指標和時間有關,其單位為microsecond,即一百萬分之一秒
         所有定時的操作至多在此檢視儲存5秒,也就是說,如果這個定時操作超過5秒,就把把超過5秒的資料清除出去
         此時間值有8個位元長,因此可以儲存接近580000年,才會重新開始初始化或截斷
         注意後面程式時間不會儲存在此,除非這個統計指標比較特別
      
  如下是上述統計相關的具體指標

Table 9-1 V$SESS_TIME_MODEL and V$SYS_TIME_MODEL Statistics

Statistic Name Description

DB Time

Amount of elapsed time (in microseconds) spent performing Database user-level calls. This does not include the elapsed time spent on instance background processes such as PMON.

DB CPU

Amount of CPU time (in microseconds) spent on database user-level calls. This does not include the CPU time spent on instance background processes such as PMON.

background elapsed time

Amount of elapsed time (in microseconds) consumed by database background processes.

background CPU time

Amount of CPU time (in microseconds) consumed by database background processes.

sequence load elapsed time

Amount of elapsed time spent getting the next sequence number from the data dictionary. If a sequence is cached, then this is the amount of time spent replenishing the cache when it runs out. No time is charged when a sequence number is found in the cache. For non-cached sequences, some time will be charged for every nextval call.

parse time elapsed

Amount of elapsed time spent parsing SQL statements. It includes both soft and hard parse time.

hard parse elapsed time

Amount of elapsed time spent hard parsing SQL statements.

SQL execute elapsed time

Amount of elapsed time SQL statements are executing. Note that for select statements this also includes the amount of time spent performing fetches of query results.

connection management call elapsed time

Amount of elapsed time spent performing session connect and disconnect calls.

failed parse elapsed time

Amount of time spent performing SQL parses which ultimately fail with some parse error.

failed parse (out of shared memory) elapsed time

Amount of time spent performing SQL parses which ultimately fail with error ORA-04031.

hard parse (sharing criteria) elapsed time

Amount of elapsed time spent performing SQL hard parses when the hard parse resulted from not being able to share an existing cursor in the SQL cache.

hard parse (bind mismatch) elapsed time

Amount of elapsed time spent performing SQL hard parses when the hard parse resulted from bind type or bind size mismatch with an existing cursor in the SQL cache.

PL/SQL execution elapsed time

Amount of elapsed time spent running the PL/SQL interpreter. This does not include time spent recursively executing/parsing SQL statements or time spent recursively executing the Java VM.

PL/SQL compilation elapsed time

Amount of elapsed time spent running the PL/SQL compiler.

inbound PL/SQL rpc elapsed time

Time inbound PL/SQL remote procedure calls have spent executing. It includes all time spent recursively executing SQL and JAVA, and therefore is not easily related to "PL/SQL execution elapsed time".

Java execution elapsed time

Amount of elapsed time spent running the Java VM. This does not include time spent recursively executing/parsing SQL statements or time spent recursively executing PL/SQL.

RMAN cpu time (backup/restore)

Amount of CPU time (in microseconds) spent in RMAN backup and restore operations.

repeated bind elapsed time

Amount of elapsed time spent giving new values to bind variables (rebinding).



如下是上述各個指標的圖譜關係
The relationships between the statistics listed in  form two trees in which all the time reported by a child in the tree is contained within the parent in the tree. The following are the relationship trees; the number is the level in the given tree.

1) background elapsed time
    2) background cpu time
          3) RMAN cpu time (backup/restore)
1) DB time
    2) DB CPU
    2) connection management call elapsed time
    2) sequence load elapsed time
    2) sql execute elapsed time
    2) parse time elapsed
          3) hard parse elapsed time
                4) hard parse (sharing criteria) elapsed time
                    5) hard parse (bind mismatch) elapsed time
          3) failed parse elapsed time
                4) failed parse (out of shared memory) elapsed time
    2) PL/SQL execution elapsed time
    2) inbound PL/SQL rpc elapsed time
    2) PL/SQL compilation elapsed time
    2) Java execution elapsed time
    2) repeated bind elapsed time





上述統計指標的一些指導原則(非常重要)
The relationship between a parent and a child in the tree indicates containment only. Keep the following in mind with regard to the tree:

  • Children do not necessarily add up to the parent.(子項不一定會累加到父項)

  • Children are not necessarily exclusive (that is, it is possible that they overlap).(子項之間不是互斥的,即可能子項之間有重疊計算部分

  • The union of children does not necessarily cover the whole of the parent.(子項合集並不一定會包括或覆蓋整個父項,即父項還可能包括其它的未列出來的子項



從上圖可知:
 
sql execute elapsed time/db time=84.42%
db cpu/db time=14.86%

表明sql execute elapsed time是db time的最大凶手,
把sql execute elapse time搞下去,就可以大大提升資料庫效能,
所以說只要最佳化sql,就可以把db time降下來,達到提升資料庫效能





沿著上述的分析邏輯,
我們檢視sql統計部分,主要檢視sql order by elapsed time即可

SQL Statistics

SQL ordered by Elapsed Time

  • Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
  • % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
  • %Total - Elapsed Time as a percentage of Total DB time
  • %CPU - CPU Time as a percentage of Elapsed Time
  • %IO - User I/O Time as a percentage of Elapsed Time
  • Captured SQL account for 82.7% of Total DB Time (s): 62,166
  • Captured PL/SQL account for 0.0% of Total DB Time (s): 62,166
Elapsed Time (s) Executions Elapsed Time per Exec (s) %Total %CPU %IO SQL Id SQL Module SQL Text
15,434.99 10,719 1.44 24.83 15.49 0.00   select , aa.title, aa.con...
12,280.97 10,704 1.15 19.76 14.26 0.00   select * from ( select , ...
7,623.18 5,587 1.36 12.26 16.01 0.00   select , aa.title, aa.con...
6,119.31 5,578 1.10 9.84 14.74 0.00   select * from ( select , ...
1,237.11 170 7.28 1.99 17.53 0.00   select count(t.business_id) nu... 
1,161.12 3,589 0.32 1.87 19.46 0.00   select  as id0_...
912.41 119,289 0.01 1.47 2.46 84.38   select  as id4_...
616.80 1,435,837 0.00 0.99 17.73 0.00   select  as id12...
601.76 473 1.27 0.97 17.96 0.00   select , aa.title, aa.con...
581.22 4,662 0.12 0.93 24.05 0.00   select  as id12... 

大家注意看,上述是根據物理消耗時間排名的top 10 sql,我們僅分析top 1 sql ,
sql_id為bbjh8y6nbzrvx,
其執行時間為15434.99s,%total列表明這個sql執行時間佔據db time的比例,這個比值達到15.49,其它sql
同理,不再複述,只要把類似這種sql最佳化,就可以大大提升資料庫效能



結論:
      資料庫非常繁忙,但佔用物理CPU比例不高,且主機CPU使用很高,可以從主機導面進行診斷
          
          比如使用nmon,oswatcher或者vmstat看看cpu到底被哪些非ORACLE程式佔用了
         
          db time=db cpu+sql execute elapsed time+parse time elapsed +pl/sql execution elapsed time(compilation)
          
          上述公式可以分析出來db time到底是被哪些元件消耗最多,然後可針對性進行最佳化工作
 
          檢視sql statistics的

          70%的db time被sql所消耗


擴充套件閱讀:
          Bug 5213488  Column value in AWR (TEXT) report overflows and only shows ########

            "No data exists for this section of the report" in Segment Statistics Section of AWR Report (文件 ID 1570007.1)

            How to Interpret the OS stats section of an AWR report (
文件 ID 762526.1)

           AWR Report Shows Huge Numbers For Av/Rd (
文件 ID 465106.1)

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29119536/viewspace-1254100/,如需轉載,請註明出處,否則將追究法律責任。

相關文章