RMAN: Monitoring Recovery Manager Jobs [ID 144640.1]

尛樣兒發表於2012-11-11

RMAN: Monitoring Recovery Manager Jobs [ID 144640.1]
修改時間:2012-3-30型別:BULLETIN狀態:PUBLISHED優先順序:3
***Checked for relevance on 04-AUG-2009***
***Checked for relevance on 29-Mar-2012*** 


Monitoring Recovery Manager Jobs
-----------------------------
 
PURPOSE
-------

To inform. all the Oracle Recovery Manager Documentation has the information
to resolve most problems.

 
SCOPE & APPLICATION
-------------------

All RMAN users should review the RMAN Troubleshooting section of the online 
documentation provided.


Sometimes it is useful to identify what a server session performing
a backup or copy operation is doing. You have access to several views
that can assist in monitoring the progress of or obtaining information
about RMAN jobs:

View               Description
----------------   -----------------------------------------------------
V$PROCESS          Identifies currently active processes. 

V$SESSION          Identifies currently active sessions. Use this view to
                   determine which Oracle database server sessions correspond
                   to which RMAN allocated channels.

V$SESSION_LONGOPS  Provides progress reports on long-running operations.

V$SESSION_WAIT     Lists the events or resources for which sessions are waiting. 


Correlating Server Sessions with Channels

To identify which server sessions correspond to which RMAN channels, use the
set command with the command id parameter. The command id parameter enters the
specified string into the CLIENT_INFO column of the V$SESSION dynamic performance
view. Join V$SESSION with V$PROCESS to correlate the server session with the channel.

The CLIENT_INFO column of V$SESSION contains information for each Recovery Manager
server session. The data appears in one of the following formats:

     o id=string 

     This form. appears for the first connection to the target database established
     by RMAN. 

     o id=string, ch=channel_id 

     This form. appears for all allocated channels. 

The SPID column of V$PROCESS identifies the operating system process number.


To correlate a process with a channel during a backup:

 1. Start RMAN and connect to the target database and, optionally, the recovery 
    catalog database. For example, enter: 

     % rman target / catalog rman/rman@rcat



 2. Set the command id parameter after allocating the channels and then back up the 
    desired object. For example, enter: 

    RMAN> run {
            allocate channel t1 type disk;
            allocate channel t2 type disk;
            set command id to 'rman';

            backup 
              incremental level 0
              filesperset 5
              tablespace 'SYSTEM';
          # optionally, issue a host command to access the operating system prompt
            host;
            sql 'ALTER SYSTEM ARCHIVE LOG ALL';
       }

3.Start a SQL*Plus session and then query the joined V$SESSION and V$PROCESS
  views while the RMAN job is executing. For example, enter: 

  SQL> SELECT sid, spid, client_info 
       FROM v$process p, v$session s 
       WHERE p.addr = s.paddr 
        AND client_info LIKE '%id=rman%';

     SID        SPID      CLIENT_INFO  
     ---------- --------- -----------------------------------------------------
     8          21973     id=rman  
     16         22057     id=rman    
     17         22068     id=rman,ch=t1 
     18         22070     id=rman,ch=t2


Monitoring Job Progress

Monitor the progress of backups, copies, and restores by querying the view
V$SESSION_LONGOPS. 

Each server session performing a backup, restore, or copy reports its progress
compared to the total amount of work required for that particular part of the
restore. For example, if you perform. a restore using two channels, and each
channel has two backup sets to restore (a total of 4 sets), then each server
session reports its progress through a single set. When that set is completely
restored, RMAN starts reporting progress on the next set to restore.

 1. Start RMAN and connect to the target database and, optionally, the recovery catalog
    database. For example, enter: 

     % rman target / catalog rman/rman@rcat



 2. Start an RMAN job. For example, enter: 

    RMAN> run {
            allocate channel t1 type disk;
            backup database;
          }


 3. While the job is running, execute a script. containing the following SQL statement: 

    SQL> SELECT sid, serial#, context, sofar, totalwork,
                round(sofar/totalwork*100,2) "% Complete"
         FROM v$session_longops
         WHERE opname LIKE 'RMAN%'
         AND opname NOT LIKE '%aggregate%'
         AND totalwork != 0
         AND sofar <> totalwork
        /

If you repeat the query while the backup progresses, then you see output such as the
following: 

     SQL> @longops
            SID    SERIAL#    CONTEXT      SOFAR  TOTALWORK % Complete
     ---------- ---------- ---------- ---------- ---------- ----------
              8         19          1      10377      36617      28.34

     SQL> @longops
            SID    SERIAL#    CONTEXT      SOFAR  TOTALWORK % Complete
     ---------- ---------- ---------- ---------- ---------- ----------
              8         19          1      21513      36617      58.75

     SQL> @longops
            SID    SERIAL#    CONTEXT      SOFAR  TOTALWORK % Complete
     ---------- ---------- ---------- ---------- ---------- ----------
              8         19          1      29641      36617      80.95

     SQL> @longops
            SID    SERIAL#    CONTEXT      SOFAR  TOTALWORK % Complete
     ---------- ---------- ---------- ---------- ---------- ----------
              8         19          1      35849      36617       97.9

     SQL> @longops
     no rows selected

 4. If you run the script. at intervals of two minutes or more and the % Complete
    column does not increase, then RMAN is encountering a problem. Query
    V$SESSION_WAIT to determine which events are being waited for.
    For example, enter: 

     SQL> SELECT sid, seconds_in_wait AS sec_wait, event FROM v$session_wait
          WHERE wait_time = 0
          ORDER BY sid;

            SID   SEC_WAIT EVENT
     ---------- ---------- -----------------------------------------------
              1  368383335 pmon timer
              2       1097 rdbms ipc message
              3     387928 rdbms ipc message
              4          0 rdbms ipc message
              5       1408 smon timer
              6     386114 rdbms ipc message
              7     387626 rdbms ipc message
              8       1060 SQL*Net message from client
              9       1060 SQL*Net message from client
             12       1060 SQL*Net message from client
             13       2366 SQL*Net message from client
             14       2757 SQL*Net message from client
     12 rows selected.

Note: The V$SESSION_WAIT view shows only Oracle events, not media manager events.

5. If you do not want to run the sqlplus queries and wish to see the percentage
   of completion as the job is running you can add "debug io" to you backup or
   script. This will produce output to stdio show the jobs progress.

   RMAN> run {
            allocate channel t1 type disk;
            debug io;
            backup database;
            debug off;
           }

You can change the option to "debug on" which is equivlent to "debug all" to
get even more detail about the sql and pl/sql being executed by RMAN.

Monitoring Job Performance

Monitor backup and restore performance by querying V$BACKUP_SYNC and
V$BACKUP_ASYNC_IO. For a complete description of the contents of these
views and how you can use them to tune backup performance, see Oracle8i
Designing and Tuning for Performance.

  The following rows will exist for a backup or restore:
   - One row for each datafile.
   - One aggregate datafile row.
   - One row for each backup piece.

  Whether or not the I/O is synchronous depends on how it is viewed by the
  controlling process.  Asynchronous I/O can be provided either by native
  platform. services or by I/O slaves.

  With synchronous I/O, it is difficult to identify specific bottlenecks because
  all synchronous I/O is a bottleneck to the process.

  Compare the bytes-per-second rate with the device's maximum throughput rate.
  If the bytes-per-second rate is lower than that device specifies, consider
  tuning that part of the backup/restore process.



1. Mb/s = Min(disk Mb/s, tape Mb/s)

2. Backup speed is a tradeoff to restore speed. The faster the backup the slower
   the restore can be.  When not all the datafiles are restored.

3. Disk Mb/s
   Read from physical disks during backup until all controllers saturate.
   The controller saturates as more disks are read: 1 disk 3 Mb/s, 2 disks 5 Mb/s,
   3 disks 6 Mb/s, 4 disks 6.2 Mb/s, 5 disks 6.2 Mb/s, 6 disks 6.2 Mb/s; nothing
   is gained by using more than 5 disks.

Performance Rules

Two ways to spin more than one disk.

a. Multiplex many datafiles into a backup set (FILESPERSET)
   Worst case scenario: Restoring one datafile out of N. Will need to read all the
   backup set. Example: A 10Gb backupset is created in an hour (10Gb/hour) with 10
   datafiles.  If restoring only one datafile, the perceived restore output would
   be 1Gb/hour.

b. Use logical disks that are interlaced into several physical disks. One read/write
   from one logical disk, spins many physical drives. If on restore the logical disks
   does not spin the same number of disks, the disk might be the bottleneck on restore.

4 Tape Mb/s
  The more tapes, the faster the backup and restore, so buy more tapes.
  The faster the tape drives (Redwood?s: 20 Mb/s, DLT7000: 3 Mb/s) the faster the
  backup. Allocate ONE channel per physical device. If more channels than physical
  drives are used, then the backup sets will be intermingled (similar to multiplexing
  datafiles).


Bottlenecks with Async IO

When using synchronous I/O, you can easily determine how much time
backup jobs require because devices only perform. one I/O task at a time. 
When using asynchronous I/O, it is more difficult to measure the
bytes-per-second rate, for the following reasons:

  -  Asynchronous processing implies that more than one task occurs at a time.
  -  Oracle I/O uses a polling rather than an interrupt mechanism to determine
     when  each I/O request completes.
  -  Because the backup or restore process is not immediately notified of I/O
     completion by the operating system, you cannot determine the duration of
     each I/O. 

If LONG_WAITS+SHORT_WAITS is a significant fraction of IO_COUNT, then the file
is probably a bottleneck. 

Some platforms implementation of AIO can cause the caller to wait for I/O
completion when performing a non-blocking poll for I/O.

Waits times should zero to avoid bottlenecks.

  -  LONG_WAITS= # of times the backup/restore process told the O/S to wait
     until an I/O was complete.
  -  SHORT_WAITS= # of times the backup/restore process made an O/S call to
     poll for I/O completion in a non-blocking mode.

  Note:

  If the SHORT_WAIT_TIME_TOTAL is low compared to the total time
  for this file, then the delay is most likely caused by other
  factors, such as process swapping. If possible, tune your operating
  system so the I/O wait time appears in the LONG_WAIT_TIME_TOTAL column.


Using Fixed Views to Monitor Backup Operations

Use V$BACKUP_SYNC_IO and V$BACKUP_ASYNC_IO to determine the source
of backup or restore bottlenecks and to determine the progress of
backup jobs.  V$BACKUP_SYNC_IO contains rows when the I/O is synchronous
to the process (or thread, on some platforms) performing the backup.
V$BACKUP_ASYNC_IO contains rows when the I/O is asynchronous. Asynchronous
I/O is obtained either with I/O processes or because it is supported by the
underlying operating system.


Information Sources:

Online Oracle Documentation
http://www.oracle.com/technetwork/indexes/documentation/index.html




RELATED DOCUMENTS
-----------------

Oracle Recovery Manager Users Guide
Oracle Backup and Recovery User's Guide Chapter 21 Tuning RMAN Performance

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23135684/viewspace-748962/,如需轉載,請註明出處,否則將追究法律責任。

相關文章