Oracle Performance Tuning 11g2 (5)

yuntui發表於2016-11-03

5 Automatic Performance Statistics

This chapter discusses the gathering of performance statistics. This chapter contains the following topics:

這章討論收集效能統計資料。包含以下主題:

  • Overview of Data Gathering 資料收集概要
  • Overview of the Automatic Workload Repository AWR簡介
  • Managing the Automatic Workload Repository 管理AWR

5.1 Overview of Data Gathering

To effectively diagnose performance problems, statistics must be available. Oracle Database generates many types of cumulative statistics for the system, sessions, and individual SQL statements. Oracle Database also tracks cumulative statistics on segments and services. When analyzing a performance problem in any of these scopes, you typically look at the change in statistics (delta value) over the period you are interested in. Specifically, you look at the difference between the cumulative value of a statistic at the start of the period and the cumulative value at the end.

Cumulative values for statistics are generally available through dynamic performance views, such as the V$SESSTAT and V$SYSSTAT views. Note that the cumulative values in dynamic views are reset when the database instance is shutdown. The Automatic Workload Repository (AWR) automatically persists the cumulative and delta values for most of the statistics at all levels except the session level. This process is repeated on a regular time period and the result is called an AWR snapshot. The delta values captured by the snapshot represent the changes for each statistic over the time period.

A metric is another type of statistic collected by Oracle Database. A metric is defined as the rate of change in some cumulative statistic. That rate can be measured against a variety of units, including time, transactions, or database calls. For example, the number database calls per second is a metric. Metric values are exposed in some V$ views, where the values are the average over a fairly small time interval, typically 60 seconds. A history of recent metric values is available through V$ views, and some data is also persisted by AWR snapshots.

A third type of statistical data collected by Oracle is sampled data. The active session history (ASH) sampler performs the sampling. ASH samples the current state of all active sessions. The database collects this data into memory, where you can access it with a V$ view. AWR snapshot processing also writes it to persistent storage.

A powerful tool for diagnosing performance problems is the use of statistical baselines. A statistical baseline is collection of statistic rates usually taken over time period where the system is performing well at peak load. Comparing statistics captured during a period of bad performance to a baseline helps discover specific statistics that have increased significantly and could be the cause of the problem.

AWR supports the capture of baseline data by enabling you to specify and preserve a pair or range of AWR snapshots as a baseline. Carefully consider the time period you choose as a baseline; the baseline should be a good representation of the peak load on the system. In the future, you can compare these baselines with snapshots captured during periods of poor performance.

Oracle Enterprise Manager is the recommended tool for viewing both real time data in the dynamic performance views and historical data from the AWR history tables. Enterprise Manager can also be used to capture operating system and network statistical data that can be correlated with AWR data.

為了有效的診斷資料庫問題,統計資料是必須要有的。Oracle產生了許多型別的累積性統計資料,包含系統的,session的,和獨立SQL語句的。資料庫同時跟蹤段和service的累積統計資料。當你在任何一個範圍內(比如網路,SQL級別等等)分析某個效能問題時,關注你感興趣的那段時間裡更改過的統計資料(delta值,應該是高中數學上的那個變數名,或者是增量值)。特別是應該關注兩個階段之間的累積資料差值。

統計資料累積值通常透過 v$sesstat, v$sysstat 檢視獲得。注意這些值在資料庫例項重啟後將重置。AWR將除了session級別以外的累積或者增量統計資料持久化(即寫到硬碟上了,一般在sysaux表空間裡)。這個收集過程,間隙性(定期性)的去收集,最終的統計結果我們稱之為AWR快照(收集一次就產生一個快照)。這個被快照捕捉到的增量統計資料代表了自上一次以來的一個變化。

度量是oracle收集的另一類統計資料。(度量翻譯著看很不舒服,我將直接用英文來表示,只需要知道matric就是度量的意思就行) matric被定義為:增量統計變化的速率。這個速率是一系列被衡量的單元值,包含時間,事務,資料庫呼叫。例如,每秒鐘資料庫呼叫的次數就是一個matric。matric的值存在於v$的動態檢視中,檢視中的值是一些非常小的時間間隙內的平均值,比如在60秒內的平均值。最近的歷史matric值存在在v$檢視中,有些資料在AWR快照中。

第三種oracle收集的統計資料型別是取樣資料。ASH取樣器執行這個取樣工作。ASH取樣當前的所有活躍session的狀態。然後將這個取樣資料放到v$檢視中。AWR快照將這些資訊寫到硬碟上。

一個最強大的診斷效能問題工具是使用統計基線。統計基線是系統正常執行時,且執行在高峰期的一段時間內,系統收集的統計速率值的集合。效能差的時候獲取的統計資料與這個基線值相比較,就有助於幫助我們發現特定的一些明顯增加的統計值,這些可能就是導致問題發生的原因。

AWR支援指定和保留一對或一段時間內AWR的快照作為基線(至少需要兩個快照才能確定一個基線範圍,所以是一對。但是也可以由多個連續的快照組成一個基線)。在選擇這些快照作為基線一部分時要考慮清楚;這個基線是在系統穩定執行在高峰期的表現。未來你需要將效能差的時候的快照與做的這個基線進行比較的。

建議使用OEM來檢視實時的或歷史的AWR報表。OEM也可以被用來捕捉作業系統和網路方面的統計資料,然後把他們作為AWR的一部分。

This section covers the following topics: 這節包括了下面一些主題:

  • Database Statistics 資料庫統計資料
  • Operating System Statistics 作業系統統計資料
  • Interpreting Statistics 解析統計資料

5.1.1 Database Statistics

Database statistics provide information on the type of load on the database and the internal and external resources used by the database. This section describes some of the more important statistics.

資料庫的統計資料提供了各種型別負荷下資料庫的資訊,以及資料庫內部和外部的資源資料。本章描述那麼更加重要的一些統計資料。

 

5.1.1.1 Wait Events 等待事件

Wait events are statistics that are incremented by a server process or thread to indicate that it had to wait for an event to complete before being able to continue processing. Wait event data reveals various symptoms of problems that might be impacting performance, such as latch contention, buffer contention, and I/O contention.

To enable easier high-level analysis of the wait events, events are grouped into classes. The classes include: Administrative, Application, Cluster, Commit, Concurrency, Configuration, Idle, Network, Other, Scheduler, System I/O, and User I/O.

The wait classes are based on a common solution that usually applies to fixing a problem with the wait event. For example, exclusive TX locks are generally an application level issue and HW locks are generally a configuration issue.

等待事件是指一個server process顯示出它必須去等待某一事件出現才能繼續它的工作的一種逐步增加的統計資料。等待事件提示出可能影響效能的一些問題徵兆,例如latch競爭,buffer競爭,io競爭。

為了更容易的分析這些等待事件,我們將其歸成類。這些類包括了:Administrative, Application, Cluster, Commit, Concurrency, Configuration, Idle, Network, Other, Scheduler, System I/O, and User I/O. 這些在v$system_wait_class中wait_class列中可以看到一些。

事件類基於通用的解決方案,即可以被用作解決等待事件問題。例如,排他的TX鎖通常是一個應用級別的問題(Application),HW鎖通常是配置的問題(Configuration)。

下面的列表中包含通用的的一些類中的等待事件例子:

The following list includes common examples of the waits in some of the classes:

  • Application: locks waits caused by row level locking or explicit lock commands   行級鎖導致的鎖等待 或者顯示的鎖命令
  • Commit: waits for redo log write confirmation after a commit   在commit後等待redo寫入完成
  • Idle: wait events that signify the session is inactive, such as SQL*Net message from client 會話非活躍等待,比如等等客戶端的訊息(在v$session中,如果客戶端沒有操作的話,檢視wait,wait_class,wait_class#等等就是上面那個等待語句)
  • Network: waits for data to be sent over the network   等待資料透過網路傳送出去
  • User I/O: wait for blocks to be read off a disk   等待從磁碟中讀取資料塊

Wait event statistics for an instance include statistics for both background and foreground processes. Because you would typically focus your effort in tuning foreground activities, overall instance activity is broken down into foreground and background statistics in the relevant V$ views to facilitate tuning.

The V$SYSTEM_EVENT view shows wait event statistics for the foreground activities of an instance and the wait event statistics for the instance. The V$SYSTEM_WAIT_CLASS view shows these foreground and wait event instance statistics after aggregating to wait classes. V$SESSION_EVENT and V$SESSION_WAIT_CLASS show wait event and wait class statistics at the session level.

例項級的等待事件統計包括後臺和前臺程式的統計。因為你通常主要集中在調優前臺程式活動上,總的例項活動被分解成前臺和後臺統計資料,儲存在v$ 檢視中以協助調優。

v$system_event 顯示前臺活躍等待事件統計,和例項等待事件統計。

v$system_wait_class 顯示在聚集到類的前臺和例項等待事件統計(透過這個檢視可以瞭解一類的總的統計,理解成是v$system_event按wait_class進行group by一下)。

v$session_event, v$session_wait_class 顯示session級別的統計。

 

5.1.1.2 Time Model Statistics

When tuning an Oracle database, each component has its own set of statistics. To look at the system as a whole, it is necessary to have a common scale for comparisons. For this reason, most Oracle Database advisories and reports describe statistics in terms of time. In addition, the V$SESS_TIME_MODEL and V$SYS_TIME_MODEL views provide time model statistics. Using the common time instrumentation helps to identify quantitative effects on the database operations.

The most important of the time model statistics is DB time. This statistics represents the total time spent in database calls and is an indicator of the total instance workload. It is calculated by aggregating the CPU and wait times of all sessions not waiting on idle wait events (non-idle user sessions).

DB time is measured cumulatively from the time of instance startup. Because DB time it is calculated by combining the times from all non-idle user sessions, it is possible that the DB time can exceed the actual time elapsed after the instance started. For example, an instance that has been running for 30 minutes could have four active user sessions whose cumulative DB time is approximately 120 minutes.

The objective for tuning an Oracle system could be stated as reducing the time that users spend in performing some action on the database, or simply reducing DB time. Other time model statistics provide quantitative effects (in time) on specific actions, such as logon operations and hard and soft parses.

當最佳化資料庫時,每個資料庫部件都有自己的統計資料集。將系統將為一個整體看,需要有一個通用的比較尺度。基於這個原因,資料庫的許多建議和報告都依據時間進行統計描述。另外, v$sess_time_model, v$sys_time_model 提供了時間模型統計。 使用通用的時間手段有助於資料庫操作的量化統計。

最重要的一個時間模型統計是DB TIME。 這個統計代表了總的資料庫呼叫時間,以及指示了總的例項壓力(資料庫的負荷)。它的計算方法是: CPU時間 + (未花費在idle等待事件上的)所有session所花費的等待時間(非idle的使用者會話) ---> db time = cpu time + session wait time(non-idle user session)。 上面介紹過比如連線到資料庫以後,沒有做任何的操作時,server process就是牌Idle狀態的;只能接收到sql語句才會從Idle中醒過來。因此這個DB TIME就是所有系統花在非空閒狀態的總時間。

DB TIME是從例項啟動就開始累計了。 因為DB TIME是一個包括了所有non-idle使用者會話的時間,那麼DB TIME有可能會超過例項啟動後的總時間。例如,一個例項啟動30分鐘,但是一啟動便有4個session都花費了30分鐘去執行,那麼DB TIME就大約是120分鐘了。(也就是說session是併發的,他們在多CPU上是併發執行的,因此這個時間在生產系統上一定會超過資料庫例項啟動以來的總時間的。可能從v$instance中檢視例項啟動的時間,也可能從alert日誌中檢視)

資料庫最佳化的目標就是減少使用者花費在資料庫上的時間,或者簡單說就是減少DB TIME。 其他的一個時間模型統計提供了特定操作的量化結果,比如登入操作,硬軟解析操作。

5.1.1.3 Active Session History

The V$ACTIVE_SESSION_HISTORY view provides sampled session activity in the instance. Active sessions are sampled every second and are stored in a circular buffer in SGA. Any session that is connected to the database and is waiting for an event that does not belong to the Idle wait class is considered as an active session. This includes any session that was on the CPU at the time of sampling.

Each session sample is a set of rows and the V$ACTIVE_SESSION_HISTORY view returns one row for each active session per sample, returning the latest session sample rows first. Because the active session samples are stored in a circular buffer in SGA, the greater the system activity, the smaller the number of seconds of session activity that can be stored in the circular buffer. This means that the duration for which a session sample appears in the V$ view, or the number of seconds of session activity that is displayed in the V$ view, is completely dependent on the database activity.

As part of the AWR snapshots, the content of V$ACTIVE_SESSION_HISTORY is also flushed to disk. Because the content of this V$ view can get quite large during heavy system activity, only a portion of the session samples is written to disk.

By capturing only active sessions, a manageable set of data is represented with the size being directly related to the work being performed rather than the number of sessions allowed on the system. Using ASH enables you to examine and perform detailed analysis on both current data in the V$ACTIVE_SESSION_HISTORY view and historical data in the DBA_HIST_ACTIVE_SESS_HISTORY view, often avoiding the need to replay the workload to gather additional performance tracing information. ASH also contains execution plan information for each captured SQL statement. You can use this information to identify which part of SQL execution contributed most to the SQL elapsed time. The data present in ASH can be rolled up on various dimensions that it captures, including the following:

v$active_session_history 檢視提供了例項的取樣會話活動。 active session是每隔1秒鐘取樣一次,並且被儲存在一個SGA中的迴圈快取中。任何連線到資料庫,正在等待事的session,只要它不是屬於idle狀態,就屬於atcive session。 這也包含了正在使用CPU資源的session取樣。(也就是說屬於DB TIME要統計範圍的session都要被取樣,如果在取樣時有多個session在執行那麼就記錄多條,如果沒有active session,那麼在v$active_session_history表中就沒有此時間的記錄)

每個會話取樣是一組記錄(幾行記錄),v$active_session_history 檢視每次取樣為每個active session記錄一行,返回最近的記錄(按照sample_id,sample_time進行逆序排列的)。因為active session取樣被儲存在一個迴圈的SGA快取中,那麼系統壓力越大(活躍性越強),那個這個快取中能夠儲存的資料就越少(即只能儲存幾秒鐘的資料了)。這就意味著一個session能夠出現在v$檢視中,或者能夠儲存幾秒的活躍會話在v$檢視中,是完全依賴於資料庫的活躍性的(壓力程度)

v$active_session_history 的內容作為awr快照的一部分也被儲存到硬碟上因為在系統繁忙時 v$ 檢視裡的內容非常的多,所以只有一部分的session取樣被寫到硬碟上了。

透過僅僅捕捉active session, 一個可管理的資料集展現出來,連同直接相關聯的正在執行的任務規模,而不是目前系統中允許的session數。使用ASH,你將能檢查和展開更詳細的分析,在v$active_session_history 檢視和 dba_hist_active_sess_history的歷史檢視中,這樣常常可以避免系統重放的需要以便去收集額外的跟蹤效能資訊。ASH 還包含了每個捕捉到的SQL的執行計劃資訊。你可以使用這些資訊去確定哪部件SQL執行消耗了時間。ASH中的資料可以被分成幾個維度,包括以下:

  • SQL identifier of SQL statement     ---> SQL語句
  • SQL plan identifier and hash value of the SQL plan used to execute the SQL statement     ---> SQL的hash值
  • SQL execution plan information     ---> 執行計劃資訊
  • Object number, file number, and block number    ---> 物件號,檔案號,塊號(可以關聯dba_objects,dba_data_files,dba_extents)
  • Wait event identifier and parameters    ---> 等待事件
  • Session identifier and session serial number    ---> sid和serial#(可以關聯v$session)
  • Module and action name       ---> module和action name(在v$session中的module,action列)
  • Client identifier of the session       ---> session的客戶端標識
  • Service hash identifier        ---> service的hash標識(可以看到是從怎樣連線上來的,如果按service分資源組的話,都可以推斷出下來的consumer組)
  • Consumer group identifier      ---> 消費組標識(可以看到當時它在哪個消費組中)

You can gather ASH information over a specified duration into a report.

Active session history sampling is also available for Active Data Guard physical standby instances and Oracle Automatic Storage Management (Oracle ASM) instances. On these instances, the current session activity is collected and displayed in the V$ACTIVE_SESSION_HISTORY view, but not written to disk.

你可以定期的將收集的ASH資訊寫到報表中。

活躍會話歷史取樣同樣適用於 physical standby例項和ASM例項。在這些例項中,資訊仍然收到到了v$active_session_history檢視中,但是沒有被寫到硬碟上。(也就是說雖然有資料,但是在高峰期這些資料基本上看不到,因此收集了等於沒有收集,中國電費很貴的!)

 

5.1.1.4 System and Session Statistics

A large number of cumulative database statistics are available on a system and session level through the V$SYSSTAT and V$SESSTAT views.

更大的一部分累積資料庫統計是儲存在 v$sysstat, v$sesstat 檢視中的

 

5.1.2 Operating System Statistics

Operating system statistics provide information on the usage and performance of the main hardware components of the system, and the performance of the operating system itself. This information is crucial for detecting potential resource exhaustion, such as CPU cycles and physical memory, and for detecting bad performance of peripherals, such as disk drives.

Operating system statistics are an indication of how the hardware and operating system are working. Many system analysts react to a hardware resource shortage by installing more hardware. This is a reactionary response to a series of symptoms shown in the operating system statistics. It is best to consider operating system statistics as a diagnostic tool, similar to the way doctors use body temperature, pulse rate, and patient pain when making a diagnosis. To help identify bottlenecks, gather operating system statistics for all servers in the system under performance analysis.

作業系統統計資料提供資訊:硬體部件的使用率和效能,以及作業系統本身的效能。在檢查系統資源不足時這些系統非常重要,例如CPU運轉和實體記憶體,極差的外圍裝置效能,例如磁碟驅動。

作業系統統計是一個關於硬體和作業系統工作的指示。許多系統分析反映硬體資源不足,需要增加硬體配置。這是一個對作業系統統計顯示的一系列症狀的保守響應。最好將作業系統統計作為一個診斷的工具,就像醫生使用溫度,脈搏率,患者痛苦程度做診斷一樣。為了確定效能瓶頸,在效能分析時,收集所有系統的作業系統統計。

Operating system statistics include the following: 作業系統統計包含以下內容:

  • CPU Statistics CPU統計
  • Virtual Memory Statistics 虛擬記憶體統計
  • Disk I/O Statistics 磁碟IO統計
  • Network Statistics 網路統計

5.1.2.1 CPU Statistics

CPU utilization is the most important operating system statistic in the tuning process. Get CPU utilization for the entire system and for each individual CPU on multi-processor environments. Utilization for each CPU can detect single-threading and scalability issues.

Most operating systems report CPU usage as time spent in user space or mode and time spent in kernel space or mode. These additional statistics allow better analysis of what is actually being executed on the CPU.

On a system running Oracle Database, where only one application is typically running, the system runs database activity in user space. Activities required to service database requests (such as scheduling, synchronization, I/O, memory management, and process/thread creation and tear down) run in kernel mode. In a system where CPU is fully utilized, a healthy Oracle database runs between 65% and 95% in user space.

The V$OSSTAT view captures machine-level information in the database, making it easier for you to determine if hardware-level resource issues exist. The V$SYSMETRIC_HISTORY view shows a one-hour history of the Host CPU Utilization metric, a representation of percentage of CPU usage at each one-minute interval. The V$SYS_TIME_MODEL view supplies statistics on the CPU usage by the Oracle database. Using both sets of statistics enable you to determine whether the Oracle database or other system activity is the cause of the CPU problems.

在效能調優時,CPU使用率真是最重要的一個作業系統統計資料。獲取整個系統的CPU使用率真,以及每個單獨CPU的使用率。單獨CPU統計可以偵測是單個執行緒和擴充套件性的問題。

多數作業系統會報告CPU使用率,以花費在使用者空間或模式的時間和花費在核心空間或模式的時間(我個人通常叫核心模式,在讀linux核心時這些詞出現的會非常多)。這些額外的統計可以更好的分析CPU到底在執行些什麼。

在一個執行oracle的系統上,通常只有一個應用在執行,資料庫是執行在使用者空間的(即通常來說資料庫伺服器只有資料庫在執行,不會再讓應用程式在上面跑)。而為資料庫服務的請求執行在核心模式下(或者核心空間下)(比如:任務排程,同步,IO,記憶體管理,程式建立和刪除)。在一個CPU合理高負荷執行的系統中,資料庫應該使用65%~95%的CPU。(即如果CPU多數被作業系統佔用了,那就有問題了)

v$osstat 檢視捕捉機器級別的資訊,當硬體資源問題出現時使你更容易地判斷。(包括CPU個數,IDLE_TIME,BUSY_TIME,USER_TIME,SYS_TIME,IOWAIT_TIME,PHYSICAL_MEMORY_BYTES)

v$sysmetric_history 檢視顯示1小時的歷史CPU使用度量以1分鐘的間隙去展示CPU使用率(CPU Usage Per Sec,CPU Usage Per Txn)。

v$sys_time_model 檢視提供了資料庫的CPU使用(DB time,DB CPU)。使用這幾個統計資料,以確定是資料庫還是其他程式消耗了CPU。

 

5.1.2.2 Virtual Memory Statistics

Virtual memory statistics should mainly be used as a check to validate that there is very little paging or swapping activity on the system. System performance degrades rapidly and unpredictably when paging or swapping occurs.

Individual process memory statistics can detect memory leaks due to a programming failure to deallocate memory taken from the process heap. These statistics are necessary to validate that memory usage does not increase after the system has reached a steady state after startup. This problem is particularly acute on shared server applications on middle tier computers where session state may persist across user interactions, and on completion state information that is not fully deallocated.

虛擬記憶體統計應該主要被用作檢查驗證是否有一些換頁發生。一旦發生換頁系統效能將迅速而不可預測的下降。

個別程式記憶體統計可以偵測由於程式未釋放記憶體而造成記憶體洩露問題(一般是C語言malloc記憶體,這種分配的記憶體一定是在heap上的。如果程式死掉的話,作業系統會回收程式分配的記憶體,但最怕就是程式沒死卻一直在消耗記憶體;通常在系統執行穩定之後透過記憶體監控就可以得知是否有洩露發生,或者直接掃描程式也會發現此類情況)。這些統計是必須的,用於驗證系統達到一個穩定狀態後,記憶體的使用未一直增加。這種問題在共享服務應用的中間層上表現的比較嚴重,session狀態可能存留在使用者互動中,但在完成後卻沒有釋放掉。

 

5.1.2.3 Disk I/O Statistics

Because the database resides on a set of disks, the performance of the I/O subsystem is very important to the performance of the database. Most operating systems provide extensive statistics on disk performance. The most important disk statistics are the current response time and the length of the disk queues. These statistics show if the disk is performing optimally or if the disk is being overworked.

Measure the normal performance of the I/O system; typical values for a single block read range from 5 to 20 milliseconds, depending on the hardware used. If the hardware shows response times much higher than the normal performance value, then it is performing badly or is overworked. This is your bottleneck. If disk queues start to exceed two, then the disk is a potential bottleneck of the system.

Oracle Database also maintains a consistent set of I/O statistics for the I/O calls it issues. These statistics are captured for both single and multi block read and write operations in the following dimensions:

因為資料庫使用許多的硬碟,因此IO的效能對於資料庫非常的重要。多數作業系統都提供了大量的硬碟效能統計。最重要的磁碟統計是當前的響應時間和磁碟佇列長度。這些統計會顯示出磁碟目前工作正常還是高負荷動作。

測量正常的IO系統效能;依據不同的硬體配置,通常磁碟單塊讀取時間是5~20微秒(我的虛擬機器上大概是34微秒左右,透過DBMS_RESOURCE_MANAGER.CALIBRATE_IO可以測量出來)。假如硬體顯示響應時間比正常要高的話,那麼說明磁碟負荷很高或者出現問題了。這就是你的瓶頸。假如磁碟佇列中有超過2個,那麼磁碟就是系統一個潛在的瓶頸

資料庫也維護了一份關於IO呼叫的統計。這些統計是在單塊或多塊讀或寫操作時被捕捉的,表現在以下幾個方面:

  • Consumer group 消費組

When Oracle Database Resource Manager is enabled, the V$IOSTAT_CONSUMER_GROUP view captures I/O statistics for all consumer groups that are part of the currently enabled resource plan. The database samples cumulative statistics every hour and stores them as historical statistics in the AWR.

當資料庫資源管理功能開啟的話, v$iostat_consumer_group 檢視儲存所有目前資源計劃中的消費組的IO統計資訊。資料庫將每隔1小時取樣一次將資料累積起來供AWR作為歷史統計使用。

  • Database file 資料庫檔案

I/O statistics of database files that are or have been accessed are captured in the V$IOSTAT_FILE view.

資料庫檔案被訪問的IO統計儲存在 v$iostat_file 檢視中

  • Database function 資料庫功能

I/O statistics for database functions (such as the LGWR and DBWR) are captured in the V$IOSTAT_FUNCTION view.

資料庫的後臺IO統計(LGWR,DBWR所產生的IO消耗)儲存在 v$iostat_function 檢視中。

 

5.1.2.4 Network Statistics

You can use network statistics in much the same way as disk statistics to determine if a network or network interface is overloaded or not performing optimally. In today's networked applications, network latency can be a large portion of the actual user response time. For this reason, these statistics are a crucial debugging tool.

Oracle Database maintains a set of network I/O statistics in the V$IOSTAT_NETWORK view.

當網路介面負荷超標或者非最佳執行時,你可以像硬碟統計一樣使用網路統計資料。在今天的網路應用中,網路延遲佔據著使用者響應時間的很大一部分。基於這個原因,這些統計資料是除錯的重要工具。資料庫將網路IO的統計資訊儲存在 v$iostat_network 檢視中。

 

5.1.2.5 Operating System Data Gathering Tools

Table 5-1 shows the various tools for gathering operating statistics on UNIX.

下圖顯示了UNIX上的一些收集統計資訊的工具

Table 5-1 UNIX Tools for Operating Statistics

Component

UNIX Tool

CPU

sar, vmstat, mpstat, iostat

Memory

sar, vmstat

Disk

sar, iostat

Network

netstat

 

5.1.3 Interpreting Statistics 統計解析

When initially examining performance data, you can formulate potential theories by examining your statistics. One way to ensure that your interpretation of the statistics is correct is to perform cross-checks with other data. This establishes whether a statistic or event is really of interest. Also, because foreground activities are tunable, it is better to first analyze the statistics from foreground activities before analyzing the statistics from background activities.

當開始檢查效能資料時,你可以透過檢查你的統計資料形成你的理論。一種確認你解析正確的方法是再檢查其他的資料共同去驗證。這樣確認一個統計或事件是真正有用的。同時因為前臺程式是可調優的,因此最好是先分析前臺程式的統計資料,然後再分析後臺的活動統計。

下面討論一下一些常見的陷阱:

Some pitfalls are discussed in the following sections:

  • Hit ratios 命中率

When tuning, it is common to compute a ratio that helps determine whether there is a problem. Such ratios include the buffer cache hit ratio, the soft-parse ratio, and the latch hit ratio. Do not use these ratios as definitive identifiers of whether a performance bottleneck exists. Rather, use them as indicators. To identify whether a bottleneck exists, examine other related evidence.

當調優的時候,通常去計算一個比率以判斷是否存在一個問題。一些比率包括快取命中率,軟解析率,以及latch命中率。不要使用這些比率去明確的判斷一個效能瓶頸是否存在。而是把它作為一個參考。為了確認是否存在瓶頸,需要再去檢查一下其他的證據。

  • Wait events with timed statistics 基於時間的等待事件

Setting TIMED_STATISTICS to true at the instance level directs the database to gather wait time for events, in addition to available wait counts. This data is useful for comparing the total wait time for an event to the total elapsed time between the data collections. For example, if the wait event accounts for only 30 seconds out of a 2-hour period, then little is to be gained by investigating this event, although it may be the highest ranked wait event when ordered by time waited. However, if the event accounts for 30 minutes of a 45-minute period, then the event is worth investigating.

在例項組設定 timed_statistics = true,讓資料庫收集等待事件,包含等待次數。這個資料對於比較某一資料收集段間一個事件的總等待時間與總的消耗時間。例如,假如在過去的2小時裡有30秒的等待事件,那麼調查這個事件就是沒有太大意義的,儘管根據時間等待排序後它排在最前面也是沒意義的。然而,如果在45分鐘裡有30分鐘在等待,那麼就需要去仔細地調查了。

Note:

Timed statistics are automatically collected for the database if the initialization parameter STATISTICS_LEVEL is set to TYPICAL or ALL. If STATISTICS_LEVEL is set to BASIC, then you must set TIMED_STATISTICS to TRUE to enable collection of timed statistics. Note that setting STATISTICS_LEVEL to BASIC disables many automatic features and is not recommended.

If you explicitly set DB_CACHE_ADVICE, TIMED_STATISTICS, or TIMED_OS_STATISTICS, either in the initialization parameter file or by using ALTER_SYSTEM or ALTER SESSION, then the explicitly set value overrides the value derived from STATISTICS_LEVEL.

時間統計在 statistics_level = [ typical | all ]時自動開啟了。假如statistics_level = BASIC的話,那麼你必須手動去開啟timed_statistics。建議你永遠別設定成BASIC。

假如你顯式的設定了DB_CACHE_ADVICE, TIMED_STATISTICS 或者 TIME_OS_STATISTICS,無論是透過alter system還是alter session方式都將覆蓋statistics_level的值。

  • Comparing Oracle Database statistics with other factors 使用其他因素對比資料庫的統計

When looking at statistics, it is important to consider other factors that influence whether the statistic is of value. Such factors include the user load and the hardware capability. Even an event that had a wait of 30 minutes in a 45-minute period might not be indicative of a problem if you discover that there were 2000 users on the system, and the host hardware was a 64-node computer.

當檢視統計資料的時候,考慮一些會影響統計資料的因素是非常重要的,即誤導了統計資料的正確性。包括了使用者的壓力及硬體的能力。即使你發現在45分鐘內有30分鐘的等待事件也不意味著有問題,因為可能有2000個使用者同時在訪問資料庫,同時主機是一個64-節點計算機(不清楚什麼是64-node)

  • Wait events without timed statistics 非時間統計的等待事件

If TIMED_STATISTICS is false, then the amount of time waited for an event is not available. Therefore, it is only possible to order wait events by the number of times each event was waited for. Although the events with the largest number of waits might indicate the potential bottleneck, they might not be the main bottleneck. This can happen when an event is waited for a large number of times, but the total time waited for that event is small. The converse is also true: an event with fewer waits might be a problem if the wait time is a significant proportion of the total wait time. Without having the wait times to use for comparison, it is difficult to determine whether a wait event is really of interest.

假如timed_statistics設定成了FALSE,那麼時間的統計是沒有的。因此,只有一種可能就是透過每個事件的次數排序等待事件。儘管最大的等待時間事件可能暗示一個可能的瓶頸,他們可能不是主要瓶頸。這種情況在一個需要大量呼叫次數的等待時發生,但是單個的那個事件等待時間是比較小的。(簡單說就是發生了10000次併發,每個併發處理時間是1ms,那麼一共處理時間是10S,即使每個併發等待時間是0.1ms,加起來也有1S,這個時間不算什麼,因為是10000個併發加起來的時間。但是如果有一個交易發生了2次,每次正常處理時間是0.1ms,但是等待事件發生了0.4S,那總的0.4S*2的等待時間是比10000個併發加起來的1S是少的,但是明顯處理是慢的。因為人家10000個每一個等待時間只有0.1ms,是你2個交易0.4S = 400ms,因此人家的效能比你的高250倍。)。 反過來也是正確的:一個等待事件僅僅發生幾次,但是這個等待事件佔據了等待事件的大部分時間就可能是一個問題。如果等待事件的發生次數作為比較,即使真是一個問題,判斷問題所在也是非常困難的。

  • Idle wait events 空閒等待事件

Oracle Database uses some wait events to indicate if the Oracle server process is idle. Typically, these events are of no value when investigating performance problems, and they should be ignored when examining the wait events.

資料庫使用一些等待事件指示那麼idle的服務程式。通常情況下,這些事件中是沒有值記錄的,因此在統計的時候要把它們去掉(也就是說在where條件中加上 xxx <> ‘idle’)

  • Computed statistics 計算的統計

When interpreting computed statistics (such as rates, statistics normalized over transactions, or ratios), it is important to cross-verify the computed statistic with the actual statistic counts. This confirms whether the derived rates are really of interest: small statistic counts usually can discount an unusual ratio. For example, on initial examination, a soft-parse ratio of 50% generally indicates a potential tuning area. If, however, there was only one hard parse and one soft parse during the data collection interval, then the soft-parse ratio would be 50%, even though the statistic counts show this is not an area of concern. In this case, the ratio is not of interest due to the low raw statistic counts.

當研究計算的統計資料時(比如速率,事務的標準化統計,或者比率),使用交叉驗證計算的統計與實際的統計數量值是非常重要的。這確保派生出來的比率是真正有用的:小的統計數可能使不尋常的比率大大折扣。例如,在一開始檢查時,發現一個50%的軟解析比率通常暗示一個可能的調優區域。但是假如現在只有1條硬解析和1個軟解析發生在這段時間裡,那麼軟解析就是50%,儘管這個統計數顯示這個不是一個關注區域(感覺這裡oracle寫錯了,應該是統計顯示是要關注的地方,這樣才能突出說明這個50%是沒有意義的)。在這種情況下,這個比率就不應該是關注點,因為統計數太低。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30633755/viewspace-2127721/,如需轉載,請註明出處,否則將追究法律責任。

相關文章