oracle實驗記錄 (oracle 詳細分析redo(2))

fufuh2o發表於2009-10-23

*******關於log buffer的 設定************
 

 

 

Tuning the Log Buffer Size
Normally, tuning the log buffer size is a matter of careful balance, but you can have your cake and eat it too by using the hidden _log_io_size parameter.

How is log buffer space used?
The log buffer in the SGA is internally divided into blocks of the log block size. The value specified by the log_buffer parameter must be a multiple of the log block size. On some operating systems it is rounded up at instance start-up if necessary. Each block in the log buffer maps to a block in the current online log file, most of the time.

Consider for example the instance illustrated in the following figure. This instance has 100 log blocks in the log buffer. After a log switch, an SGA variable (b) is set to indicate that the base disk block for the log buffer is block 2. Another SGA variable (i) is used for the index into the log buffer for redo generation. This is shown as pointing to log buffer block 30 shortly after the log switch. A little later, after several log writes have occurred, the base disk block for the log buffer is disk block 82. This maps to log buffer block 81. Redo generation continues and wraps from log buffer block 100 to log buffer block 1. At the moment illustrated the index into the log buffer for redo generation is block 5. This maps to disk block 106. The log buffer is used cyclically in this way until i maps to the last block in the log file.

 

When does LGWR write?
There are two classes of LGWR writes: background writes, and sync writes. The distinction is merely as to whether another process has to wait for the log write to complete.

A background write by LGWR can be triggered in either of two ways. Whenever a process allocates space in the log buffer, the number of used log buffer blocks is calculated. This calculation is based on the two SGA variables mentioned above, and is performed under the protection of the redo allocation latch. If the number of used blocks is greater than or equal to the number of blocks specified by the _log_io_size parameter, and if LGWR is not already active, then LGWR is posted to perform. a background write. The default value for _log_io_size is 1/3 of the log buffer expressed in log blocks (with an upper bound equivalent to 1M under Oracle8). If more than this number of log buffer blocks are used, then the redo allocation latch is released and the redo writing latch is taken to check whether LGWR is already active.

When LGWR is waiting to be posted, it sleeps on an rdbms ipc message wait with a timeout of 3 seconds (as does DBWn). A background write may also be performed if this sleep times out and LGWR finds that there is some redo available to write.

The most common cause of log sync writes is the end of a transaction. When a process finishes a transaction, normally with a commit, it generates a commit marker in the redo stream. However, the transaction is not recoverable until the log block containing that commit marker has been flushed to disk. Therefore, before the process finishing the transaction can continue, it must wait for LGWR to flush that log block to disk. That process therefore posts LGWR and then sleeps on a log file sync wait with a timeout of 1 second. If several commits occur in distinct transactions before LGWR wakes up, then the commit markers are all flushed to disk in a single sync write. This is sometimes called a group commit .

The only other process to wait for log sync writes is DBWn. DBWn cannot write a database block to disc if the redo for the most recent change to that block has not yet been flushed to the redo log file. Otherwise, if the instance were to crash before that redo could be written, there would be no way to rollback those uncommitted changes, because the redo for the corresponding undo would not yet have been written either, because it would have been part of the same unwritten redo entry.

To prevent this, Oracle maintains a record of the most recent change to each current mode database block in the buffer header structure. This is stored as a redo block address (RBA) comprised of the log file sequence number, the log file block number and an offset into the block in bytes. There are three redo block addresses in the buffer header structure - the low RBA, the recovery RBA and the high RBA. It is the high RBA that is used at this point. Before writing a batch of database blocks, DBWn finds the highest high redo block address that needs to be synced before the batch can be written. DBWn then takes the redo allocation latch to ensure that the required redo block address has already been written by LGWR, and if not, it posts LGWR and sleeps on a log file sync wait.

What if the log buffer is too small?
If the log buffer is too small, then log buffer space waits will be seen during bursts of redo generation. LGWR may not begin to write redo until the _log_io_size threshold (by default, 1/3 of the log buffer or 1M whichever is less) has been exceeded, and the remainder of the log buffer may be filled before LGWR can complete its writes and free some space in the log buffer.

Ideally, the log buffer should be large enough to cope with all bursts of redo generation, without any log buffer space waits. Commonly, the most severe bursts of redo generation occur immediately after a log switch, when redo generation has been disabled for some time, and there is a backlog of demand for log buffer space.

Can the log buffer be too big?
However, if the log buffer is very big, then the default _log_io_size threshold will be big also, and so background writes may seldom be triggered. This means that all the redo will have to be flushed by sync writes, and so log file sync waits will take longer than otherwise. This impacts commit response time, and possibly DBWn performance as well.

Setting _log_io_size
Of course, it is not necessary to carefully balance log buffer space waits against log file sync waits. You can have both a large log_buffer setting to avoid log buffer space waits, and a relatively small _log_io_size setting to minimize log file sync waits and reduce their duration.

However, _log_io_size must not be so small as to keep LGWR unduly active. LGWR uses the redo allocation latch both before and after each write. If LGWR is over active, then there is increased risk of redo allocation latch contention. An over active LGWR also wastes CPU time by performing multiple small writes, and by running more often which involves extra context switches. This can impact LGWR's operating system scheduling priority, which in turn degrades LGWR performance even further.

For raw log files, or file system based log files with direct I/O enabled, the optimal setting for _log_io_size is normally just below the maximum physical I/O size supported by the operating system (device driver). For file system based log files without direct I/O, a smaller setting normally works better.

Note that the _log_io_size setting must be specified in log blocks, whereas the log_buffer parameter is set in bytes. Note further that setting _log_io_size is no excuse for having a wastefully large log_buffer setting. It is very rare to require a log buffer more than a few hundred kilobytes in size.


大致翻譯下:
LGWR 分為2種寫,一種為 後臺寫,一種為同步寫 ,後臺寫 在4種情況下發生 1.每3S LGWR啟動一次,2.若dbwn啟動時發現dirty block對應的redo entries還沒寫入redofile,則先觸發lgwr將redo entries寫入redofile,3._log_io_size控制 1/3 log buffer寫入(使用的日誌緩衝數超過了該值,那麼redo allocation latch將會被釋放並且將使用redo writing latch檢查lgwr是否為active。
),4.log buffer 中redo 1m,同步寫只有一種情況 COMMIT.

 

關於rba(redo block address)
the log file sequence number (4 bytes)
the log file block number (4 bytes)
the byte offset into the block at which the redo record starts (2 bytes)

rba:由日誌檔案序號,日誌檔案block號,redo record在 redo block中的起始位置


RBAs are used in the following important ways.

With respect to a dirty block in the buffer cache, the low RBA is the address of the redo for the first change that was applied to the block since it was last clean, and the high RBA is the address of the redo for the most recent change to have been applied to the block.
Dirty buffers are maintained on the buffer cache checkpoint queues in low RBA order. The checkpoint RBA is the point up to which DBWn has written buffers from the checkpoint queues if incremental checkpointing is enabled -- otherwise it is the RBA of last full thread checkpoint. The checkpoint RBA is copied into the checkpoint progress record of the controlfile by the checkpoint heartbeat once every 3 seconds. Instance recovery, when needed, begins from the checkpoint RBA recorded in the controlfile. The target RBA is the point up to which DBWn should seek to advance the checkpoint RBA to satisfy instance recovery objectives.

The on-disk RBA is the point up to which LGWR has flushed the redo thread to the online log files. DBWn may not write a block for which the high RBA is beyond the on-disk RBA. Otherwise transaction recovery (rollback) would not be possible, because the redo needed to undo a change is always in the same redo record as the redo for the change itself.

 

當user update一個塊 ,比如塊A,block a成為dirty block, oracle為這個block a 產生redo record ,此redo record在 logfile中的 位置就是 rba,過一會 user 又一個update更新block a ,block此時已經是dirty block, 又會為它生成一條redo record,則第一個update對應的 redo record 的rba叫做block a的 lrba(low redo block address),第2個update的redo record對應的 rba叫做 hrba(high redo block address),lrba-hrba就是 檢查點佇列上 dirty block 的 順序(checkpoint queue)

dbwn 寫 dirty bufer時候 ,dbwn會找到需要同步的high rba,然後獲得redo allocation latch 保證要寫入datafile的塊已經寫入(用high rba與 on disk rba相比),否則觸發lgwr(8I前在log file sync事件上等待,8i後不需要了)

ON DISK RBA:就是 redo log file中 最後一條redo record 的 rba,如果dbwr 要寫的dirty block的 rba比這個大,證明 對應的redo record 還沒寫入disk,要觸發lgwr


lgwr幾種情況具體解釋:

1. 3S 自動觸發lgwr
lgwr處於空閒狀態,將依賴於rdbms ipc message 等待,處於修面狀態,當 3S後 出現超時, 如果lgwr發現有要redo 寫出,那麼lgwr將執行寫操作 log file parallel write等待事件將會出現(LGWR的一個專屬事件,將log buffer寫到log file group member時 lgwr程式在該事件上等待完成)

SQL> alter system set event='10046 trace name context forever,level 12' scope=sp
file;

系統已更改。

SQL> startup force
ORA-32004: obsolete and/or deprecated parameter(s) specified

SQL> show parameter background_dump_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
background_dump_dest                 string      F:\ORACLE\PRODUCT\10.2.0\ADMIN
                                                 \XHTEST\BDUMP
檢視lgwr的trace

 


WAIT #0: nam='rdbms ipc message' ela= 16934 timeout=204 p2=0 p3=0 obj#=-1 tim=17681800618
WAIT #0: nam='log file parallel write' ela= 2322 files=1 blocks=1 requests=1 obj#=-1 tim=17681820724
WAIT #0: nam='rdbms ipc message' ela= 889179 timeout=197 p2=0 p3=0 obj#=-1 tim=17682733749
WAIT #0: nam='log file parallel write' ela= 304 files=1 blocks=2 requests=1 obj#=-1 tim=17682734184
WAIT #0: nam='rdbms ipc message' ela= 14637 timeout=108 p2=0 p3=0 obj#=-1 tim=17682750577
WAIT #0: nam='log file parallel write' ela= 314 files=1 blocks=2 requests=1 obj#=-1 tim=17682750990
WAIT #0: nam='rdbms ipc message' ela= 3228 timeout=107 p2=0 p3=0 obj#=-1 tim=17682755665
WAIT #0: nam='log file parallel write' ela= 700 files=1 blocks=2 requests=1 obj#=-1 tim=17682756449
WAIT #0: nam='rdbms ipc message' ela= 3745 timeout=107 p2=0 p3=0 obj#=-1 tim=17682761894


可以看到 這2個事件交替出現,是3S 發生一次lgwr寫

 


SQL> alter system set event='10046 trace name context off' scope=spfile;

系統已更改。

SQL> startup force


2.事務提交

commit or rollback :發出這兩條命令的時候將 是同步提交,發生lgwr
當事務提交時 在這些redo沒有寫入 redo log file時,是不能恢復的,oracle在 redo stream中記錄一個 提交標記,在事務返回成功標誌給使用者前,必須等待lgwr寫完成,
程式通知lgwr寫執行,並以log file sync等待事件 休眠,超時1S


SQL> set linesize 132
SQL> column name format a30
SQL> column value format a25
SQL> select
  2    x.ksppinm  name,
  3    y.ksppstvl  value,
  4    y.ksppstdf  isdefault,
  5    decode(bitand(y.ksppstvf,7),1,'MODIFIED',4,'SYSTEM_MOD','FALSE')  ismod,
  6    decode(bitand(y.ksppstvf,2),2,'TRUE','FALSE')  isadj,x. KSPPDESC
  7  from
  8    sys.x$ksppi x,
  9    sys.x$ksppcv y
 10  where
 11    x.inst_id = userenv('Instance') and
 12    y.inst_id = userenv('Instance') and
 13    x.indx = y.indx and
 14    x.ksppinm like '%_&par%'
 15  order by
 16    translate(x.ksppinm, ' _', ' ')
 17  /
輸入 par 的值:  wait_for_sync
原值   14:   x.ksppinm like '%_&par%'
新值   14:   x.ksppinm like '%_wait_for_sync%'

NAME                           VALUE                     ISDEFAULT ISMOD      IS
ADJ
------------------------------ ------------------------- --------- ---------- --
---
KSPPDESC
--------------------------------------------------------------------------------
----------------------------------------------------
_wait_for_sync                 TRUE                      TRUE      FALSE      FA
LSE
wait for sync on commit MUST BE ALWAYS TRUE

 

_wait_for_sync   引數設定為false 可以 避免 redo file sync等待,提交的程式不會等待 lgwr完成.


3.log buffer 1/3滿

SQL> set linesize 132
SQL> column name format a30
SQL> column value format a25
SQL> select
  2    x.ksppinm  name,
  3    y.ksppstvl  value,
  4    y.ksppstdf  isdefault,
  5    decode(bitand(y.ksppstvf,7),1,'MODIFIED',4,'SYSTEM_MOD','FALSE')  ismod,
  6    decode(bitand(y.ksppstvf,2),2,'TRUE','FALSE')  isadj,x. KSPPDESC
  7  from
  8    sys.x$ksppi x,
  9    sys.x$ksppcv y
 10  where
 11    x.inst_id = userenv('Instance') and
 12    y.inst_id = userenv('Instance') and
 13    x.indx = y.indx and
 14    x.ksppinm like '%_&par%'
 15  order by
 16    translate(x.ksppinm, ' _', ' ')
 17  /
輸入 par 的值:  log_io_size
原值   14:   x.ksppinm like '%_&par%'
新值   14:   x.ksppinm like '%_log_io_size%'

NAME                           VALUE                     ISDEFAULT ISMOD      IS
ADJ
------------------------------ ------------------------- --------- ---------- --
---
KSPPDESC
--------------------------------------------------------------------------------
----------------------------------------------------
_log_io_size                   0                         TRUE      FALSE      FA
LSE
automatically initiate log write if this many redo blocks in buffer


default為0表示dirty redo block佔用log buffer 1/3時啟用,如果這個值改為非0,若該引數值不大於log buffer 1/2時,該值作為
啟動lgwr的一個啟動值,若大於 log buffer 1/2 則忽略設定的值以 log buffer  的1/2 為 lGWR的啟動條件.


4.dbwr寫前
dbwr要寫出的 dirty block的hight rba超過 lgwr的 on-disk rba , dbwr通知lgwr執行寫出,(8I 前此時dbwr 等待log file sync事件,8i後 dbwr將這些要寫到disk datafile的dirty block放入一個延遲佇列,同時通知lgwr寫出,這樣 dbwr可以繼續執行,無需等待。

 

 

 


關於log buffer space
當會話由於日誌緩衝區空間不足而無法將重做日誌條目複製到日誌緩衝區時,會話將在log buffer space事件上等待。LGWR負責寫出重做條目,騰出日誌緩衝區空間。
當會話必須等待日誌緩衝區中的空間變成可用以寫入新的資訊時產生該事件。
LGWR程式週期性地從日誌緩衝區寫入重做日誌檔案,使得日誌緩衝區可以重複可以。
該等待事件表示:應用程式生成重做日誌的速度比LGWR程式將其寫入重做日誌檔案的速度快。

SQL> conn / as sysdba
已連線。
SQL> desc v$sesstat;
 名稱                                      是否為空? 型別
 ----------------------------------------- -------- -----------------

 SID                                                NUMBER
 STATISTIC#                                         NUMBER
 VALUE                                              NUMBER
SQL> desc v$statname;
 名稱                                      是否為空? 型別
 ----------------------------------------- -------- -----------------

 STATISTIC#                                         NUMBER
 NAME                                               VARCHAR2(64)
 CLASS                                              NUMBER
 STAT_ID                                            NUMBER

連線一個這兩個表

SQL> ed
已寫入 file afiedt.buf

  1* select sid,value from v$sesstat s where statistic#=(select statistic# from
v$statname t where name='redo buffer allocation retries')
SQL> /

       SID      VALUE
---------- ----------
       137          0
       146          0
       148          0
       150          0
       156          0
       157          0
       159          0
       160          0
       161          0
       162          0
       163          0

       SID      VALUE
---------- ----------
       164          0
       165          0
       166          0
       167          0
       168          0
       169          0
       170          0

已選擇18行。
  1* select name,value,class from v$sysstat s where statistic#=(select statistic
# from v$statname t where name='redo buffer allocation retries')
SQL> /

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
     CLASS
----------
redo buffer allocation retries                                            0
         2


SQL> col name format a40
SQL> /

NAME                                          VALUE      CLASS
---------------------------------------- ---------- ----------
redo buffer allocation retries                    0          2


以上 查詢可以看到 session procss或 system級 必須等待 空間次數(等待lgwr flush  log buffer)

redo buffer allocation retries :表示再次嘗試從log buffer中分空間的次數(最好為0),process第一次沒請求成功 會觸發LGWR 然後等待完成或其它PROCESS 已經觸發而 這個程式等待lgwr完成,完成後再次嘗試從log buffer中分配空間
用 redo buffer allocation retries和redo entries可以計算出從PGA copy change vector到SGA LOG BUFFER 時必須等待的重做記錄的數量所佔的比例
最好為0 或<1%如果>1%且不斷邊大 說明從PGA copy change vector到SGA LOG BUFFER時必須等待log buffer空的日誌塊,可以考慮加大log buffer or  提高lgwr寫效率


所以設定 log buffer 最好是lgwr寫到online redofile時,log buffer還有剩餘空間可以 讓 其它程式使用,當lgwr完成後 log buffer不要有太多剩餘空間

LGWR 程式啟動 受 下面這個隱藏引數影響,default為0表示dirty redo block佔用log buffer 1/3時啟用,如果這個值改為非0,若該引數值不大於log buffer 1/2時,該值作為
啟動lgwr的一個啟動值,若大於 log buffer 1/2 則忽略設定的值以 log buffer  的1/2 為 lGWR的啟動條件

 

 

_log_io_size                   0                         TRUE      FALSE      FA
LSE
automatically initiate log write if this many redo blocks in buffer


另外 只要log buffer超過1M 那麼lgwr就會啟動


log buffer大小設定
1.5*(平均每個事務產生的redo record大小*每秒事務的提交量)


SQL> select to_number(value,99999999999) from v$sysstat where name='user commits';

TO_NUMBER(VALUE,99999999999)
----------------------------
                         1631

找到user commit次數

SQL> select (sysdate - startup_time)*24*60*60 as seconds from v$instance;

   SECONDS
----------
     19957

SQL> select 1631/19957 from dual;

1631/19957
----------
 .08172571
 實驗環境,可以看到每秒提交事務的數量非常少.

SQL> select value  from v$sysstat where name = 'redo blocks written
';

value
----------
     17494


  1* select a.redoblocks/b.trancount from (select value redoblocks from v$syssta
t where name='redo blocks written') a ,(select value trancount from v$sysstat wh
ere name='user commits') b
SQL> /

A.REDOBLOCKS/B.TRANCOUNT
------------------------
              10.3542499
計算 出平均 每個事務產生的redo record大小 (按塊)

SQL> select 1.5*10.3542499* .08172571 from dual;

1.5*10.3542499*.08172571
------------------------
              1.26931264

 

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/12020513/viewspace-617286/,如需轉載,請註明出處,否則將追究法律責任。

相關文章