按照oracle效能改進方法論的步驟來優化系統!

warehouse發表於2008-05-22

doc上的原話,記錄一下,感覺寫的很精彩!

[@more@]

Steps in The Oracle Performance Improvement Method

  1. Get candid feedback from users. Determine the performance project's scope and subsequent performance goals, as well as performance goals for the future. This process is key in future capacity planning.
  2. Get a full set of operating system, database, and application statistics from the system when the performance is both good and bad. If these are not available, then get whatever is available. Missing statistics are analogous to missing evidence at a crime scene: They make detectives work harder and it is more time-consuming.
  3. Sanity-check the operating systems of all machines involved with user performance. By sanity-checking the operating system, you look for hardware or operating system resources that are fully utilized. List any over-used resources as symptoms for analysis later. In addition, check that all hardware shows no errors or diagnostics.
  4. Check for the top ten most common mistakes with Oracle, and determine if any of these are likely to be the problem. List these as symptoms for analysis later. These are included because they represent the most likely problems.

  5. Build a conceptual model of what is happening on the system using the symptoms as clues to understand what caused the performance problems.

  6. Propose a series of remedy actions and the anticipated behavior to the system, and apply them in the order that can benefit the application the most. A golden rule in performance work is that you only change one thing at a time and then measure the differences. Unfortunately, system downtime requirements might prohibit such a rigorous investigation method. If multiple changes are applied at the same time, then try to ensure that they are isolated.
  7. Validate that the changes made have had the desired effect, and see if the user's perception of performance has improved. Otherwise, look for more bottlenecks, and continue refining the conceptual model until your understanding of the application becomes more accurate.
  8. Repeat the last three steps until performance goals are met or become impossible due to other constraints.

    This method identifies the biggest bottleneck and uses an objective approach to performance improvement. The focus is on making large performance improvements by increasing application efficiency and eliminating resource shortages and bottlenecks. In this process, it is anticipated that minimal (less than 10%) performance gains are made from instance tuning, and large gains (100% +) are made from isolating application inefficiencies.

    How to Check the Operating System

    The following list should be considered when checking operating system symptoms.

    • Check CPU utilization in user and kernel space for the total system and on each CPU.
    • Confirm that there is no paging or swapping.
    • Check that network latencies between machines are acceptable.
    • Find disks with poor response times or long queues.
    • Confirm that there are no hardware errors.

    A Sample Decision Process for Performance Conceptual Modeling

    Conceptual modeling is almost deterministic. However, as your performance tuning experience increases, you will appreciate that there are no real rules to follow. A flexible "heads up" approach is required to interpret the various statistics and make good decisions.

    This section illustrates how a performance engineer might look for bottlenecks. Use this only as a guideline for the process. With experience, performance engineers add to the steps involved. This analysis assumes that statistics for both the operating system and the database have been gathered.

    1. Is the response time/batch run time acceptable for a single user on an empty or lightly loaded machine?

      If it is not acceptable, then the application is probably not coded or designed optimally, and it will never be acceptable in a multiple user situation when system resources are shared. In this case, get application internal statistics, and get SQL Trace and SQL plan information. Work with developers to investigate problems in data, index, transaction SQL design, and potential deferral of work to batch/background processing.

    2. Is all the CPU being utilized?

      If the kernel utilization is over 40%, then investigate the operating system for network transfers, paging, swapping, or process thrashing. Otherwise, move onto CPU utilization in user space. Check to see if there are any non-database jobs consuming CPU on the machine limiting the amount of shared CPU resources, such as backups, file transforms, print queues, and so on. After determining that the database is using most of the CPU, investigate the top SQL by CPU utilization. These statements form the basis of all future analysis. Check the SQL and the transactions submitting the SQL for optimal execution. In Oracle Server releases prior to 9i, use buffer gets as the measure for CPU usage. With release 9i, Oracle provides the actual CPU statistics in V$SQL.

      See Also:

      Oracle9i Database Reference for more information on V$SQL

      If the application is optimal and there are no inefficiencies in the SQL execution, consider rescheduling some work to off-peak hours or using a bigger machine.

    3. At this point, the system performance is unsatisfactory, yet the CPU resources are not fully utilized.

      In this case, you have serialization and unscalable behavior within the server. Get the WAIT_EVENTS statistics from the server, and determine the biggest serialization point. If there are no serialization points, then the problem is most likely outside the database, and this should be the focus of investigation. Elimination of WAIT_EVENTS involves modifying application SQL and tuning database parameters. This process is very iterative and requires the ability to drill down on the WAIT_EVENTS systematically to eliminate serialization points.

    Top Ten Mistakes Found in Oracle Systems

    This section lists the most common mistakes found in Oracle systems. By following Oracle's performance improvement methodology, you should be able to avoid these mistakes altogether. If you find these mistakes in your system, then re-engineer the application where the performance effort is worthwhile.

    See Also:

    Oracle9i Database Performance Tuning Guide and Reference for more information on wait events

    1. Bad Connection Management

      The application connects and disconnects for each database interaction. This problem is common with stateless middleware in application servers. It has over two orders of magnitude impact on performance, and it is totally unscalable.

    2. Bad Use of Cursors and the Shared Pool

      Not using cursors results in repeated parses. If bind variables are not used, then there is hard parsing of all SQL statements. This has an order of magnitude impact in performance, and it is totally unscalable. Use cursors with bind variables that open the cursor and execute it many times. Be suspicious of applications generating dynamic SQL.

    3. Getting Database I/O Wrong

      Many sites lay out their databases poorly over the available disks. Other sites specify the number of disks incorrectly, because they configure disks by disk space and not I/O bandwidth.

    4. Redo Log Setup Problems

      Many sites run with too few redo logs that are too small. Small redo logs cause system checkpoints to continuously put a high load on the buffer cache and I/O system. If there are too few redo logs, then the archive cannot keep up, and the database will wait for the archive process to catch up.

    5. Serialization of data blocks in the buffer cache due to lack of free lists, free list groups, transaction slots (INITRANS), or shortage of rollback segments.

      This is particularly common on INSERT-heavy applications, in applications that have raised the block size to 8K or 16K, or in applications with large numbers of active users and few rollback segments.

    6. Long Full Table Scans

      Long full table scans for high-volume or interactive online operations could indicate poor transaction design, missing indexes, or poor SQL optimization. Long table scans, by nature, are I/O intensive and unscalable.

    7. In Disk Sorting

      In disk sorts for online operations could indicate poor transaction design, missing indexes, or poor SQL optimization. Disk sorts, by nature, are I/O-intensive and unscalable.

    8. High Amounts of Recursive (SYS) SQL

      Large amounts of recursive SQL executed by SYS could indicate space management activities, such as extent allocations, taking place. This is unscalable and impacts user response time. Recursive SQL executed under another user ID is probably SQL and PL/SQL, and this is not a problem.

    9. Schema Errors and Optimizer Problems

      In many cases, an application uses too many resources because the schema owning the tables has not been successfully migrated from the development environment or from an older implementation. Examples of this are missing indexes or incorrect statistics. These errors can lead to sub-optimal execution plans and poor interactive user performance. When migrating applications of known performance, export the schema statistics to maintain plan stability using the DBMS_STATS package.

      Likewise, optimizer parameters set in the initialization parameter file can override proven optimal execution plans. For these reasons, schemas, schema statistics, and optimizer settings should be managed together as a group to ensure consistency of performance.

    10. Use of Nonstandard Initialization Parameters

      These might have been implemented based on poor advice or incorrect assumptions. In particular, parameters associated with SPIN_COUNT on latches and undocumented optimizer features can cause a great deal of problems that can require considerable investigation.

    Performance Characteristics of Hardware Configurations

    Again, today's systems are so different and complex that hard and fast rules for performance analysis cannot be made. However, this section provides some of the numbers that you should consider.

    1. Disk characteristics:
      • Size 512MB - 36GB
      • Seek 5 - 10msec
      • Transfer 5 - 10msec
      • Thoughput 20 - 40 I/O seconds for each disk
      • Controller throughput at 1750 I/Os a second
    2. Speed reading from memory should be 1 - 10 microseconds.
    3. Point-to-point network latencies should be 1 - 25msec.
    4. Busy systems: (worst case)
      • Operational systems - 60% usr, 40% sys
      • Decision support systems - 90% usr, 10% sys

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/19602/viewspace-1004403/,如需轉載,請註明出處,否則將追究法律責任。

相關文章