6. RAC Performance Tuning

CPU and Wait Time Tuning Dimensions

一般情況下，等待事件顯著的話，比較可能需要instance/RAC tuning。這時增加CPU或節點很難改善問題。CPU使用率較高的話，有可能需要進行SQL tuning。這種情況下增加CPU或節點數一般可以得到較好的效果。（注意scale-up和speed-up的區別）

RAC-Specific Tuning

• Tune for a single instance first.

• Tune for RAC:

– Instance recovery

– Interconnect traffic

– Point of serialization can be exacerbated

• RAC-reactive tuning tools:

– Specific wait events

– System and enqueue statistics

– Enterprise Manager performance pages

– Statspack and AWR reports

• RAC-proactive tuning tools:

– AWR snapshots

– ADDM reports

RAC and Instance or Crash Recovery

1． Global Enqueue Services (GES) remasters the enqueues resource

2． Global Cache Services (GCS) remasters the cache resources (all GCS resource requests and write requests are temporarily suspended)

3． SMON of one surviving instance builds the recovery set

4． Resource claim (Buffer space for recovery and the resources that were identified in the previous reading of the redo logs)

5． Roll forward recovery set (GRD is now unfrozen. Any data blocks that are not in recovery can now be accessed)

Instance Recovery and Database Availability

A. Real Application Clusters is running on multiple nodes.

B. Node failure is detected.

C. The enqueue part of the GRD is reconfigured; resource management is redistributed to the surviving nodes. This operation occurs relatively quickly.

D. The cache part of the GRD is reconfigured and SMON reads the redo log of the failed instance to identify the database blocks that it needs to recover.

E. SMON issues the GRD requests to obtain all the database blocks it needs for recovery. After the requests are complete, all other blocks are accessible.

F. The Oracle server performs roll forward recovery. Redo logs of the failed threads are applied to the database, and blocks are available right after their recovery is completed.

G. The Oracle server performs rollback recovery. Undo blocks are applied to the database for all uncommitted transactions.

H. Instance recovery is complete and all data is accessible.

Instance Recovery and RAC

單機資料庫中，FAST_START_MTTR_TARGET引數代表的時間包括例項啟動和recover的時間。

在RAC環境中，此引數設定的意義不大。_FAST_START_INSTANCE_RECOVERY_TARGET引數設定了從instance recovery開始到GRD unfrozen，無須恢復的資料庫開放訪問開始的時間。

Here are some guidelines you can use to make sure that instance recovery in your RAC environment is faster:

• Use parallel instance recovery. (RECOVERY_PARALLISM)

• Increase PARALLEL_EXECUTION_MESSAGE_SIZE. (推薦設定4KB或8KB)

• Set PARALLEL_MIN_SERVERS. (推薦設定CPU_COUNT-1)

• Use Async I/O.

• Increase the size of the default buffer cache. (預設使用50%的buffer cache進行恢復，關注alertlog，如有必要可調大buffer cache)

Analyzing Cache Fusion Impact in RAC

• The cost of block access and cache coherency is represented by:

– Global Cache Services statistics

– Global Cache Services wait events

• The response time for Cache Fusion transfers is determined by:

– Overhead of the physical interconnect components

– IPC protocol

– GCS protocol

• The response time is not generally affected by disk I/O factors.

The time to process CR block request in the cache (build time + flush time + send time)

The time to process current block request in the cache (pin time + flush time + send time)

Wait Events for RAC

• Wait times are attributed to events that reflect the outcome of a request:

– Placeholders while waiting

– Precise events after waiting

Global cache waits are summarized in a broader category called Cluster Wait Class.

Global Cache Wait Events: Overview

• gc current/cr request: act as placeholders until the request completes

• gc [current/cr] [2/3]-way: The request is processed immediately

• gc [current/cr] block busy:

• gc [current/cr] grant 2-way: A grant is given without any significant delays (disk read followed)

• gc current grant busy

• gc [current/cr] [block/grant] congested: The request spent more than 1 ms in internal queues

• gc [current/cr] [failure/retry]

• gc buffer busy

2-way Block Request: Example

1. SGA1 sends a request to SGA2, begins to wait on gc current block request.

2. LGWR of SGA2 flushes recovery information to its local redo log files.

3. SGA2 sends the block to SGA1, when the block arrives, the wait event changes to gc current block 2-way

R(send) + W(small msg) + S(process msg,process block,send) + W(block) + R(receive block)

3-way Block Request: Example

R(send) + W(small msg) + S(process msg,send) + W(small msg) + S(process msg,process

block,send) + W(block) + R(receive block)

2-way Grant: Example

R(send) + W(small msg) + S(process msg,send) + W(small msg) + R(receive block)

Global Enqueue Waits: Overview

• Enqueues are synchronous.

• Enqueues are global resources in RAC.

• The most frequent waits are for:

– TX: Transaction enqueue; used for transaction demarcation and tracking

– TM: Table or partition enqueue; used to protect table definitions during DML operations

– HW: High-water mark enqueue; acquired to synchronize a new block operation

– SQ: Sequence enqueue; used to serialize incrementing of an Oracle sequence number

– US: Undo segment enqueue; mainly used by the Automatic Undo Management (AUM) feature

– TA: Enqueue used mainly for transaction recovery as part of instance recovery

• The waits may constitute serious serialization points.

Session and System Statistics

• Use V$SYSSTAT to characterize the workload.

• Use V$SESSTAT to monitor important sessions.

• V$SEGMENT_STATISTICS includes RAC statistics.

• RAC-relevant statistic groups are:

– Global Cache Service statistics

– Global Enqueue Service statistics

– Statistics for messages sent

• V$ENQUEUE_STATISTICS determines the enqueue with the highest impact.

• V$INSTANCE_CACHE_TRANSFER breaks down GCS statistics into block classes.

Most Common RAC Tuning Tips

• Application tuning is often the most beneficial

• Resizing and tuning the buffer cache

• Reducing long full-table scans in OLTP systems

• Using Automatic Segment Space Management

• Increasing sequence caches

• Using partitioning to reduce interinstance traffic

• Avoiding unnecessary parsing

• Minimizing locking usage

• Removing unselective indexes

• Configuring interconnect properly

Index Block Contention: Considerations

Wait events: enq: TX – index contention / gc buffer busy / gc current block busy / gc current split

加劇索引塊競爭的因素：key單調增加、頻繁的leaf block splits、Low tree depth

Ways to alleviate the performance impact of globally hot index blocks and leaf block splits:

• Global index hash partitioning

• Increasing the sequence cache, if the key value is derived from a sequence

• Use natural keys as opposed to surrogate keys

• Use reverse key indexes

Oracle Sequences and Index Contention

當sequence用作生成主鍵時，容易造成索引塊的競爭。增大sequence的cache值，有利於減少索引塊的競爭，提高leaf block的instance affinity

Undo Block Considerations

當包含活動事務的索引塊被多個instance同時訪問時，會造成劇烈的undo block shipment。這種情況經常發生在應用程式頻繁對最近插入的資料進行讀取時。如果資料插入相對於commit操作來說過於頻繁的話，一般會發生這種情況。解決方法包括增加提交次數以及增大sequence的cache值等。

High-Water Mark Considerations

Wait events: enq: HW – contention / gc current grant

當插入操作頻繁導致segment頻繁擴充套件時容易造成HWM enqueue。使用uniform and large extent sizes for locally managed and automatic space management segments能減輕這種情況。

Concurrent Cross-Instance Calls: Considerations

在資料倉儲應用中較為常見的truncate動作，如在多個例項中併發進行可能會造成一定的問題。這是由於truncate操作會產生一個cross-instance的呼叫，即髒塊的flush動作，這會產生一個serialization點。其他cross-instance的呼叫也可能引發類似的問題。

RAC-Specific ADDM Findings

• Hot block (with block details) with high read/write contention within an instance and across the cluster

• Hot object with high read/write contention within an instance and across the cluster

• Cluster interconnect latency issues in a RAC environment

• LMS congestion issues: LMS processes are not able to keep up with lock requests.

• Top SQL that encounters interinstance messaging

• Contention on other instances: Basically, multiple instances are updating the same set of blocks concurrently.

[@more@]

[RAC]ORACLE Database 10g RAC for Administrators學習筆記（三）

相關文章