Oracle Redo and Undo

chenoracle發表於2020-02-02

Oracle Redo and Undo

Oracle Core Essential Internals for DBAs and Developers

在我看來,Oracle 最重要的特性,是 Oracle 6 版本出現的 change vector 描述資料塊改變的機制,也是Redo undo 的核心。

but in my opinion the single most important feature of Oracle is one  that first appeared in version 6: the change vector, a mechanism for describing changes to data blocks, the heart of redo and undo.

 

Basic Data Change   基礎資料改變

Oracle 有趣的一個特點是會記錄兩次資料,第一次是將最近的資料寫入到 data files( 為了提高效率,最新的資料記錄在記憶體中,批量重新整理到 data files) ,第二次是將資料寫入到 redo log files( 用來描述如何重建 data files)

One of the strangest features of an Oracle database is that it records your data twice. One copy of the   data exists in a set of data files which hold something that is nearly the latest, up-to-date version of your data (although the newest version of some of the data will be in memory, waiting to be copied to disc); the other copy of the data exists as a set of instructions—the redo log files—telling you how to re-create   the content of the data files from scratch.

 

1.1 The Approach 方法

在資料庫更改時,你發出了更改資料的命令,Oracle並不是馬上在data file(如果資料在記憶體中,去data buffer cache查詢資料)裡找到對應的資料,然後去修改對應的資料。而是通過如下四個關鍵步驟來完成資料的修改:

1.建立如何修改資料的描述。

2.建立如何回退資料的描述。

3.建立如何產生回退資料描述的描述。

4.修改資料。

Under the Oracle approach to data change, when you issue an instruction to change an item of data, Oracle doesn’t just go to a data file (or the in-memory copy if the item happens to be buffered), find the item, and change it. Instead, Oracle works through four critical steps to make the change happen.

Stripped to the bare minimum of detail, these are

1. Create a description of how to change the data item.

2. Create a description of how to re-create the original data item if needed.

3. Create a description of how to create the description of how to re-create the original data   item.

4. Change the data item.

 

其中第三步看上去不好理解,下面換一種方式描述這四個步驟:

1.建立生成data block對應的Redo change vector

2.建立Undo record,在Undo表空間生成Undo Block,用於資料回退。

3.建立生成undo record對應的Redo change vector

4.修改資料

The tongue-twisting nature of the third step gives you some idea of how convoluted the mechanism is, but all will become clear. With the substitution of a few technical labels in these steps, here’s another way of describing the actions of changing a data block:

1. Create a redo change vector describing the change to the data block.

2. Create an undo record for insertion into an undo block in the undo tablespace.

3. Create a redo change vector describing the change to the undo block.

4. Change the data block.

 

具體技術細節、執行順序和Oracle版本、事務的性質、執行變更命令前各資料塊的狀態等有關。

The exact sequence of steps and the various technicalities around the edges vary depending on the version of Oracle, the nature of the transaction, how much work has been done so far in the transaction, what the states of the various database blocks were before you executed the instruction, whether or not you’re looking at the first change of a transaction, and so on.

 

1.2 An Example 例:

從一個最簡單的資料更改示例開始,更新OLTP事務中間的一行,該事務已經更新了一組分散的行。事實上,在歷史(以及最普遍的)案例中,步驟的順序與我在前一節中列出的順序不同。

I’m going to start with the simplest example of a data change, which you might expect to see as you updated a single row in the middle of an OLTP transaction that had already updated a scattered set of rows. In fact, the order of the steps in the historic (and most general) case is not the order I’ve listed in the preceding section.

這些步驟實際上是按照3、1、2、4的順序進行的,在修改undo block和data block之前,將這兩個重做更改向量組合成一個重做更改記錄並複製到redo log (buffer)中。這意味著更準確的版本是:

The steps actually go in the order 3, 1, 2, 4, and the two redo change vectors are combined into a single redo change record and copied into the redo log (buffer) before the undo block and data block are modified (in that order). This means a slightly more accurate version of my list of actions would be:

1.建立生成undo record對應的Redo change vector (描述undo block的改變)。

2.建立生成data block對應的Redo change vector (描述redo block的改變)。

3.建立生成undo record和data block的Redo change vector 合併成一個Redo record寫入log buffer(便於重做等)。

4.建立Undo record寫入Undo block(便於事務回退等)。

5.更改資料。

1. Create a redo change vector describing how to insert an undo record into an undo block.

2. Create a redo change vector for the data block change.

3. Combine the redo change vectors into a redo record and write it to the log buffer.

4. Insert the undo record into the undo block.

5. Change the data block.

下面是一個小示例,取自一個執行Oracle 9.2.0.8的系統(在上一個版本中,很容易建立該機制的最一般示例)。

Here’s a little sample, taken from a system running Oracle 9.2.0.8 (the last version in which it’s easy to create the most generic example of the mechanism).

我們將執行一個update語句,通過在兩個表塊之間來回跳轉來更新五行,並在更新前後將各種資訊位轉儲到流程跟蹤檔案中。

We’re going to execute an update statement that updates five rows by jumping back and forth between two table blocks, dumping various bits of information into our process trace file before and after the update.

我需要使我的更新有點複雜,因為我希望示例儘可能簡單,同時避免一些特殊情況。

I need to make my update a little bit complicated because I want the example to be as simple as possible while avoiding a few “special case” details.

 

我編寫的程式碼將更新表的第一個塊中的第三、第四和第五行 資料 並且在每更新一條後 更新第二個塊中的一行 ( core_demo_02.sql ),它會更改每個記錄的第三列(varchar2型別的欄位),將其由xxxxxx(小寫6個字元)更改為YYYYYYYYYY(大寫10個字元)。

The code I’ve written will update the third, fourth, and fifth rows in the first block of a table but will update a row in the second block of the table between each of these three updates (see core_demo_02.sql   in the code library on www.apress.com), and it’ll change the third column of each row—a varchar2()   column—from xxxxxx (lowercase, six characters) to YYYYYYYYYY (uppercase, ten characters).

 

https://github.com/Apress/oracle-core-esntl-internals-for-dbas-devs

--- 使用 core_demo_02.sql 指令碼如下:

---core_demo_02.sql

 

---1 準備

start  setenv

set  timing off

 

execute  dbms_random.seed ( 0 )

 

drop   table  t1 ;

 

begin

  execute immediate 'purge recyclebin';

exception

  when others then

    null;

end;

 

begin

  dbms_stats.set_system_stats ( 'MBRC' ,   8 );

  dbms_stats.set_system_stats ( 'MREADTIM' ,   26 );

  dbms_stats.set_system_stats ( 'SREADTIM' ,   12 );

  dbms_stats.set_system_stats ( 'CPUSPEED' ,   800 );

exception

   when   others   then

     null ;

end ;

 

begin

   execute   immediate   'begin dbms_stats.delete_system_stats; end;' ;

exception

   when   others   then

     null ;

end ;

 

begin

   execute   immediate   'alter session set "_optimizer_cost_model"=io' ;

exception

   when   others   then

     null ;

end ;

/

 

---2 建立表和索引

create   table  t1

as

select

   2   *   rownum   -   1        id ,

   rownum         n1 ,

   cast ( 'xxxxxx'   as   varchar2 ( 10 ))   v1 ,

   rpad ( '0' , 100 , '0' )     padding

from

  all_objects

where

   rownum   <=   60

union   all

select

   2   *   rownum        id ,

   rownum         n1 ,

   cast ( 'xxxxxx'   as   varchar2 ( 10 ))   v1 ,

   rpad ( '0' , 100 , '0' )     padding

from

  all_objects

where

   rownum   <=   60

;

 

create   index  t1_i1 on  t1 ( id );

 

---3 收集統計資訊

begin

  dbms_stats.gather_table_stats ( ownname     =>   user ,

                                tabname     =>   'T1' ,

                                method_opt =>   'for all columns size 1' );

end ;

 

---4 檢視錶佔用的塊情況,和每一個塊有多少條資料 每一個塊都有60條記錄

select  

  dbms_rowid.rowid_block_number ( rowid )   block_number ,  

   count (*)         rows_per_block

from  

  t1

group   by  

  dbms_rowid.rowid_block_number ( rowid )

order   by

  block_number

;

 

BLOCK_NUMBER ROWS_PER_BLOCK

------------ --------------

      114153    60

      114154    60

 

---5 轉儲資料塊       

alter   system   switch   logfile ;

execute  dbms_lock.sleep ( 2 )

 

spool core_demo_02.lst

 

---dump_seg儲存過程需要執行c_dump_seg.sql生成

---https://github.com/Apress/oracle-core-esntl-internals-for-dbas-devs/blob/master/ch_03/c_dump_seg.sql

execute  dump_seg ( 't1' )

 

---6 更新資料

update

   /*+ index(t1 t1_i1) */

  t1

set

  v1 =   'YYYYYYYYYY'

where

   id   between   5   and   9

;

 

---7 轉儲更新塊之後的資料塊和undo塊

alter   system   checkpoint ;

execute  dump_seg ( 't1' )

---dump_undo_block儲存過程

---https://github.com/Apress/oracle-core-esntl-internals-for-dbas-devs/blob/master/ch_02/c_dump_undo_block.sql

execute  dump_undo_block

 

---8 轉儲redo塊

rollback ;

commit ;

---dump_log儲存過程

---https://github.com/Apress/oracle-core-esntl-internals-for-dbas-devs/blob/master/ch_02/c_dump_log.sql

execute  dump_log

 

spool off

 

[oracle@cjcos trace]$ pwd

/u01/app/oracle/diag/rdbms/cjcdb/cjcdb/trace

[oracle@cjcos trace]$ vim cjcdb_ora_21778.trc

下面是更新前後塊中第五行的轉儲資訊:

Here’s a symbolic dump of the fifth row in the block before and after the update:

 

更新前 第三列(col 2)長度是6,資料是78(x的16進位制ASCII碼是78)

tab 0, row 4, @0x1d3f

tl: 117 fb: --H-FL-- lb: 0x0 cc: 4

col 0: [ 2] c1 0a

col 1: [ 2] c1 06

col 2: [ 6] 78 78 78 78 78 78

col 3: [100]

30 30 30 30 30 30 30 30 … 30 30 30 30 30 (for 100 characters)

 

更新後 第三列(col 2)長度是10,資料是59(Y的16進位制ASCII碼是59)

tab 0, row 4, @0x2a7

tl: 121 fb: --H-FL-- lb: 0x2 cc: 4

col 0: [ 2] c1 0a

col 1: [ 2] c1 06

col 2: [10] 59 59 59 59 59 59 59 59 59 59

col 3: [100]

30 30 30 30 30 30 30 30 … 30 30 30 30 30 (for 100 characters)

 

我們可以看到第三列(col2)長度變成了10,是10個59(Y的十六進位制ASCII碼是59),同時行地址由@0x1d3f變成了@0x2a7,說明這一行的空間容不下新增的資料,換了新地址。

As you can see, the third column (col 2:) of the table has changed from a string of 78s (x) to a longer string of 59s (Y). Since the update increased the length of the row, Oracle had to copy it into the block’s free space to make the change, which is why its starting byte position has moved from @0x1d3f to @0x2a7. It is still row 4 (the fifth row) in the block, though; if we were to check the block’s row directory, we would see that the fifth entry has been updated to point to this new row location.

同時,我們能看到lb(lock byte)由0x0變成了0x2,表明這條記錄被該塊事務槽列表中的第二個事務槽所標識的事務鎖定。事務槽可以在塊首部看到。我們將在第三章更深入地討論它。

I dumped the block before committing the change, which is why you can see that the lock byte (lb:) has changed from 0x0 to 0x2—the row is locked by a transaction identified by the second slot in the block’s interested transaction list (ITL). We will be discussing ITLs in more depth in Chapter 3.

 

我們來看看不同的變化向量。首先,從current redo log file轉儲中,我們可以檢查變化向量,描述我們對錶做了什麼:

So let’s look at the various change vectors. First, from a symbolic dump of the current redo log file, we can examine the change vector describing what we did to the table:

 

TYP:0 CLS: 1 AFN:11 DBA:0x02c0018a SCN:0x0000.03ee485a SEQ: 2 OP:11.5

KTB Redo

op: 0x02 ver: 0x01

op: C uba: 0x0080009a.09d4.0f

KDO Op code: URP row dependencies Disabled

 xtype: XA bdba: 0x02c0018a hdba: 0x02c00189

itli: 2 ispac: 0 maxfr: 4863

tabn: 0 slot: 4(0x4) flag: 0x2c lock: 2 ckix: 16

ncol: 4 nnew: 1 size: 4

col 2: [10] 59 59 59 59 59 59 59 59 59 59

可以看到,第五行URP(更新行塊),第六行告訴我們正在更新塊的塊地址(bdba)和物件的段頭塊(hdba)

I’ll pick out just the most significant bits of this change vector. You can see that the Op code: in line 5 is URP (update row piece). Line 6 tells us the block address of the block we are updating (bdba:) and the segment header block for that object (hdba:).

在第7行,我們看到執行這個更新的事務使用的是ITL條目2 (itli:),它是對tabn: 0 slot: 4的更新(第一個表中的第5行;請記住,叢集中的塊可以容納來自許多表的資料,因此每個塊必須包含一個列表,用於標識塊中有行的表)。

In line 7 we see that the transaction doing this update is using ITL entry 2 (itli:), which confirms

what we saw in the block dump: it’s an update to tabn: 0 slot: 4 (fifth row in the first table; remember that blocks in a cluster can hold data from many tables, so each block has to include a list identifying the tables that have rows in the block).

最後,在最後兩行中,我們看到該行有四列(ncol:),其中我們更改了一列(nnew:),將行長度(size:)增加了4個位元組,並且我們將第2列更改為YYYYYYYYYY。

Finally, in the last two lines, we see that the row has four columns (ncol:), of which we are changing one (nnew:), increasing the row length (size:) by 4 bytes, and that we are changing column 2 to YYYYYYYYYY.

接下來我們需要看到的是如何 撤銷我們的變更 。這將以undo record撤銷記錄的形式出現,從相關的undo block塊中轉儲。在第3章中將介紹尋找正確的undo block的方法。下面的文字顯示了來自塊轉儲的相關記錄:

The next thing we need to see is a description of how to put back the old data. This appears in the form of an undo record, dumped from the relevant undo block. The methods for finding the correct undo block will be covered in Chapter 3. The following text shows the relevant record from the symbolic block dump:

 

*-----------------------------

* Rec #0xf slt: 0x1a objn: 45810(0x0000b2f2) objd: 45810 tblspc: 12(0x0000000c)

* Layer: 11 (Row) opc: 1 rci 0x0e

Undo type: Regular undo Last buffer split: No

Temp Object: No

Tablespace Undo: No

rdba: 0x00000000

*-----------------------------

KDO undo record:

KTB Redo

op: 0x02 ver: 0x01

op: C uba: 0x0080009a.09d4.0d

KDO Op code: URP row dependencies Disabled

 xtype: XA bdba: 0x02c0018a hdba: 0x02c00189

itli: 2 ispac: 0 maxfr: 4863

tabn: 0 slot: 4(0x4) flag: 0x2c lock: 0 ckix: 16

ncol: 4 nnew: 1 size: -4

col 2: [ 6] 78 78 78 78 78 78

 

再次,我將忽略一些細節,只是指出這個undo記錄的重要組成部分(對我們來說) 我們看到的行大小減少4個位元組,第2列是xxxxxx。

Again, I’m going to ignore a number of details and simply point out that the significant part of this undo record (for our purposes) appears in the last five lines and comes close to repeating the content of the redo change vector, except that we see the row size decreasing by 4 bytes as column 2 becomes xxxxxx.

但是這是一個undo記錄,寫在undo塊中,儲存在其中一個資料檔案的undo表空間中,而且,正如我前面指出的,Oracle儲存所有內容的兩個副本,一個在資料檔案中,一個在重做日誌檔案中。因為我們已經將一些內容放入了資料檔案中(即使它在undo表空間中),所以我們需要建立一個關於我們所做事情的描述,並將該描述寫入重做日誌檔案中。我們需要另一個重做改變向量,它是這樣的:

But this is an undo record, written into an undo block and stored in the undo tablespace in one of the data files, and, as I pointed out earlier, Oracle keeps two copies of everything, one in the data files and one in the redo log files. Since we’ve put something into a data file (even though it’s in the undo tablespace), we need to create a description of what we’ve done and write that description into the redo log file. We need another redo change vector, which looks like this:

 

TYP:0 CLS:36 AFN:2 DBA:0x0080009a SCN:0x0000.03ee485a SEQ: 4 OP:5.1

ktudb redo: siz: 92 spc: 6786 flg: 0x0022 seq: 0x09d4 rec: 0x0f

 xid: 0x000a.01a.0000255b

ktubu redo: slt: 26 rci: 14 opc: 11.1 objn: 45810 objd: 45810 tsn: 12

Undo type: Regular undo Undo type: Last buffer split: No

Tablespace Undo: No

 0x00000000

KDO undo record:

KTB Redo

op: 0x02 ver: 0x01

op: C uba: 0x0080009a.09d4.0d

KDO Op code: URP row dependencies Disabled

 xtype: XA bdba: 0x02c0018a hdba: 0x02c00189

itli: 2 ispac: 0 maxfr: 4863

tabn: 0 slot: 4(0x4) flag: 0x2c lock: 0 ckix: 16

ncol: 4 nnew: 1 size: -4

col 2: [ 6] 78 78 78 78 78 78

重做更改向量的下半部分看起來非常像撤消記錄,這並不奇怪,因為它畢竟是我們希望放入撤消塊的內容的描述。

The bottom half of the redo change vector looks remarkably like the undo record, which shouldn’t be a surprise as it is, after all, a description of what we want to put into the undo block.

重做改變向量的上半部分告訴我們下半部分去了哪裡,幷包含了一些關於它要 寫入 塊的塊頭資訊。

The top half of the redo change vector tells us where the bottom half goes, and includes some information about the block header information of the block it’s going into.

最重要的細節,就我們的目的而言,是 第一行的 DBA(資料塊地址),該標識塊0 x0080009a:如果你知道Oracle塊在十六進位制數字,你就會認識到這是 2號資料檔案 154 (新建立的資料庫中撤消表空間的檔案號)。

The most significant detail, for our purposes, is the DBA: (data block address) in line 1, which identifies block 0x0080009a: if you know your Oracle block numbers in hex, you’ll recognize that this is block 154 of data file 2 (the file number of the undo tablespace in a newly created database).

 

1.3 Debriefing 總結

那麼,到目前為止,我們取得了什麼進展?當我們更改一個資料塊時,Oracle會在undo塊中插入一個undo記錄,告訴我們如何逆轉這個更改。但是,對於資料庫中發生的每一個塊更改,Oracle都會建立一個redo change向量來描述如何進行更改,並且在進行更改之前建立這些向量。歷史上,它在建立“正向”更改向量之前建立了撤銷更改向量,因此,我前面描述的事件序列(參見圖2-1)如下:

So where have we got to so far? When we change a data block, Oracle inserts an undo record into an undo block to tell us how to reverse that change. But for every change that happens to a block in the database, Oracle creates a redo change vector describing how to make that change, and it creates the vectors before it makes the changes. Historically, it created the undo change vector before it created the “forward” change vector, hence, the following sequence of events (see Figure 2-1) that I described earlier occurs:

2 - 1。在事務中間進行 更新的事件序列

1.為撤銷記錄建立更改向量。

2.為資料塊建立更改向量。

3.合併更改向量並將重做記錄寫入重做日誌(緩衝區)。

4.將undo記錄插入undo塊中。

5.對資料塊進行更改。

Figure 2-1. Sequence of events for a small update in the middle of a transaction

1. Create the change vector for the undo record.

2. Create the change vector for the data block.

3. Combine the change vectors and write the redo record into the redo log (buffer).

4. Insert the undo record into the undo block.

5. Make the change to the data block.

當你看到這裡的前兩個步驟,當然,沒有理由相信我把它們按正確的順序排列。我所描述或拋棄的任何東西都不能說明這些行為一定是按這個順序發生的。但有一個小細節我現在可以告訴你,我省略了轉儲的改變向量,部分是因為 Oracle 10g以後情況會有不同 ,另一部分是因為如果你一開始就以錯誤的順序思考,對活動的描述會更容易理解。

When you look at the first two steps here, of course, there’s no reason to believe that I’ve got them in the right order. Nothing I’ve described or dumped shows that the actions must be happening in that order. But there is one little detail I can now show you that I omitted from the dumps of the change vectors, partly because things are different from 10g onwards and partly because the description of the activity is easier to comprehend if you first think about it in the wrong order.

到目前為止,我已經向你們展示了我們的兩個變化向量作為單獨的實體;如果我向你展示了這些變化向量進入重做日誌的完整圖片,你就會看到它們是如何組合成一個重做記錄的:

So far I’ve shown you our two change vectors only as individual entities; if I had shown you the complete picture of the way these change vectors went into the redo log, you would have seen how they were combined into a single redo record:

REDO RECORD - Thread:1 RBA: 0x00036f.00000005.008c LEN: 0x00f8 VLD: 0x01

SCN: 0x0000.03ee485a SUBSCN: 1 03/13/2011 17:43:01

CHANGE #1 TYP:0 CLS:36 AFN:2 DBA:0x0080009a SCN:0x0000.03ee485a SEQ: 4 OP:5.1

CHANGE #2 TYP:0 CLS: 1 AFN:11 DBA:0x02c0018a SCN:0x0000.03ee485a SEQ: 2 OP:11.5

在重做日誌中,更改向量是成對出現的,撤銷記錄的更改向量出現在相應的正向更改的更改向量之前。

It is a common (though far from universal) pattern in the redo log that change vectors come in matching pairs, with the change vector for an undo record appearing before the change vector for the corresponding forward change.

在檢視前面的重做記錄的基本內容時,值得注意的是第一行中的LEN:—這是重做記錄的長度:0x00f8 = 248位元組。

While we’re looking at the bare bones of the preceding redo record, it’s worth noting the LEN: figure in the first line—this is the length of the redo record: 0x00f8 = 248 bytes.

我們所做的只是在一行中將xxxxxx更改為yyyyyyyy,這將花費我們248位元組的日誌資訊。

 All we did was change xxxxxx to YYYYYYYYYY in one row and it cost us 248 bytes of logging information.

實際上,考慮到最終結果,這似乎是一個非常昂貴的操作:我們必須生成兩個重做更改向量並更新兩個資料庫塊來進行一個很小的更改,這看起來是我們需要執行的步驟的四倍。我們希望所有這些額外的工作都能得到合理的回報。

In fact, it seems to have been a very expensive operation given the net result: we had to generate two redo change vectors and update two database blocks to make a tiny little change, which   looks like four times as many steps as we need to do. Let’s hope we get a decent payback for all that extra work.

 

1.4 Summary of Observations 總結

在繼續之前,我們可以總結一下我們的觀察結果如下:在資料檔案中,我們對自己的資料所做的每一個更改都與Oracle建立的undo記錄相匹配(這也是對資料檔案的更改);與此同時,Oracle在redo日誌中放入瞭如何進行更改和如何進行自身更改的描述。

Before we continue, we can summarize our observations as follows: in the data files, every change we make to our own data is matched by Oracle with the creation of an undo record (which is also a change to a data file); at the same time Oracle puts into the redo log a description of how to make our change and how to make its own change.

您可能會注意到,由於資料可以“就地”更改,因此我們可以建立“無限”(即。,任意大)對單行資料的更改,但是如果不增加undo表空間的資料檔案,顯然不能記錄無限多的undo記錄;如果不不斷新增更多的redo日誌檔案,也不能在redo日誌中記錄無限多的更改。為了簡單起見,我們將推遲無限更改的問題,暫時假裝我們可以記錄儘可能多的撤消和重做記錄。

You might note that since data can be changed “in place,” we could make an “infinite” (i.e.,

arbitrarily large) number of changes to our single row of data, but we clearly can’t record an infinite number of undo records without growing the data files of the undo tablespace, nor can we record an infinite number of changes in the redo log without constantly adding more redo log files. For the sake of simplicity, we’ll postpone the issue of infinite changes and simply pretend for the moment that we can record as many undo and redo records as we need.

 

二:ACID

儘管我們在本章中不會討論事務,但值得一提的是事務系統的ACID原則,以及Oracle如何實現undo和redo,從而使Oracle能夠滿足這些需求。表2-1列出了ACID。

Although we’re not going to look at transactions in this chapter, it is, at this point, worth mentioning the ACID requirements of a transactional system and how Oracle’s implementation of undo and redo gives Oracle the capability of meeting those requirements.

Table 2-1.  The ACID Requirements 

下面的列表詳細介紹了表2-1中的各項內容:

The following list goes into more detail about each of the requirements in Table 2-1:

原子性:當我們進行變更資料時,我們建立一個undo記錄描述如何回退修改的資料。這意味著當我們在交易過程中,如果另一個使用者試圖檢視我們修改過的任何資料,可以指示他使用撤消記錄以檢視該資料的舊版本,從而使我們的工作不可見直到我們決定釋出(提交)它的那一刻。我們可以保證對方使用者要麼什麼都看不到,要麼什麼都看到。

Atomicity: As we make a change, we create an undo record that describes how to

reverse the change. This means that when we are in the middle of a transaction,

another user trying to view any data we have modified can be instructed to use the

undo records to see an older version of that data, thus making our work invisible

until the moment we decide to publish (commit) it. We can ensure that the other

user either sees nothing of what we’ve done or sees everything.

 

一致性:這個要求實際上是關於定義合法狀態的約束的資料庫。

我們可以認為undo records撤銷記錄的存在意味著 其他使用者被阻止檢視事務的增量情況,因此無法看到事務通過臨時的非一致性狀態到另一個一致性狀態,他們看到的要麼是開始事務之前的狀態,要麼是事務完成後的狀態,不會看到事務的中間狀態。

Consistency: This requirement is really about constraints defining the legal states

of the database; but we could also argue that the presence of undo records means

that other users can be blocked from seeing the incremental application of our

transaction and therefore cannot see the database moving from one legal state to

another by way of a temporarily illegal state—what they see is either the old state

or the new state and nothing in between. (The internal code, of course, can see all

the intermediate states—and take advantage of being able to see them—but the

end-user code never sees inconsistent data.)

 

隔離 :我們可以再次看到,undo records撤消記錄的可用性阻止了其他使用者看到我們如何更改資料,直到我們決定事務已經完成並提交 事務 。實際上,我們做得更好:undo records撤消的可用性意味著其他使用者不必在他們的事務的整個持續期間看到我們的事務的影響,即使我們在他們的事務的開始和結束之間開始和結束我們的事務。 (這不是Oracle中的預設隔離級別,但它是可用的隔離級別;請參閱“隔離級別”)當然,當兩個使用者試圖同時更改相同的資料時,我們確實會遇到令人困惑的情況;在有限時間的事務裡,完美的隔離是不可能的。

Isolation: Yet again we can see that the availability of undo records stops other

users from seeing how we are changing the data until the moment we decide that

our transaction is complete and commit it. In fact, we do better than that: the

availability of undo means that other users need not see the effects of our

transactions for the entire duration of their transactions, even if we start and end

our transaction between the start and end of their transaction. (This is not the

default isolation level in Oracle, but it is an available isolation level; see the

“Isolation Levels” sidebar.) Of course, we do run into confusing situations when

two users try to change the same data at the same time; perfect isolation is not

possible in a world where transactions have to take a finite amount of time.

 

永續性: 此特性歸功於 redo log重做日誌。 如何確保已完成的事務在系統故障後仍然存在?最直接的策略是在磁碟發生變化或“完成”事務的最後一步時,繼續向磁碟寫入任何更改。如果沒有重做日誌,這可能意味著在更改資料時要寫入大量隨機資料塊。想象一下,將10行插入到包含三個索引的order_lines表中;這可能需要31個隨機分佈的磁碟寫來對1個表塊和30個索引塊進行持久的更改。但是Oracle有redo重做機制,不需要在更改時寫入整個資料塊,而是準備一個小的更改描述,31個小的描述可能只會在需要確保有整個事務的永久記錄時(相對而言)寫入到日誌檔案的末尾。

Durability: This is the requirement that highlights the benefit of the redo log. How

do you ensure that a completed transaction will survive a system failure? The

obvious strategy is to keep writing any changes to disc, either as they happen or as

the final step that “completes” the transaction. If you didn’t have the redo log, this

could mean writing a lot of random data blocks to disc as you change them.

Imagine inserting ten rows into an order_lines table with three indexes; this could

require 31 randomly distributed disk writes to make changes to 1 table block and

30 index blocks durable. But Oracle has the redo mechanism. Instead of writing an

entire data block as you change it, you prepare a small description of the change,

and 31 small descriptions could end up as just one (relatively) small write to the

end of the log file when you need to make sure that you’ve got a permanent record

of the entire transaction. (We’ll discuss in Chapter 6 what happens to the 31

changed data blocks, and the associated undo blocks, and how recovery might

take place.)

 

2.1 ISOLATION LEVELS   隔離級別

Oracle提供了三種不同的隔離級別:read committed (預設), read only, 和serializable作為差異的簡要說明,請考慮以下場景:表t1有1條資料,表t2和t1有相同的表結構,我們有兩個會話按照下面順序執行:

Oracle offers three isolation levels: read committed (the default), read only, and serializable. As a brief sketch of the differences, consider the following scenario: table t1 holds one row, and table t2 is identical to t1 in structure. We have two sessions that go through the following steps in order:

1. Session 1: select from t1;

2. Session 2: insert into t1 select * from t1;

3. Session 2: commit;

4. Session 1: select from t1;

5. Session 1: insert into t2 select * from t1;

如果會話1的隔離級別是read committed,將會在第一次查詢到1條資料,第二次查詢到2條資料,最後插入兩條資料。

如果會話1的隔離級別是 read only ,將會在第一次查詢到1條資料,第二次查詢到1條資料,插入資料時報錯 “ORA-01456: may not perform insert/delete/update operation inside a READ ONLY transaction.”

如果會話1的隔離級別是 serializable ,將會在第一次查詢到1條資料,第二次查詢到1條資料,最後插入1條資料。

If session 1 is operating at isolation level read committed, it will select one row on the first select, select two rows on the second select, and insert two rows.

If session 1 is operating at isolation level read only, it will select one row on the first select, select one row on the second select, and fail with Oracle error “ORA-01456: may not perform insert/delete/update operation inside a READ ONLY transaction.”

If session 1 is operating at isolation level serializable, it will select one row on the first select, select one row on the second select, and insert one row.

 

REDO UNDO 機制不僅實現ACID的基本要求,而且在效能和可恢復性方面也具有優勢。

在討論 永續性 已經 介紹了redo 重做 效能 帶來的 好處 ,如果您想要一個有關UNDO撤消的效能優勢的示例,請考慮隔離-如果您的使用者需要同時更新資料,那麼如何執行需要幾分鐘才能完成的報告?在缺少UNDO撤消機制的情況下,您必須在允許錯誤結果和鎖定更改資料兩種情況之間進行選擇。這是您必須與其他資料庫產品一起做出的選擇。UNDO撤銷機制允許非常程度的併發,因為根據Oracle的營銷宣傳,“讀者不會阻塞寫,寫也不會阻塞讀者。”

就可恢復性而言(我們將在第6章中更詳細地檢查可恢復性),如果我們記錄對資料庫所做更改的完整列表,那麼原則上,我們可以從一個全新的資料庫開始,然後簡單地重新應用每個更改描述來複制原始資料庫的最新副本。實際上,當然,我們不(通常)從一個新的資料庫開始;而是對資料檔案進行定期備份,這樣我們只需要重播生成的總重做的一小部分,就可以更新副本資料庫。

Not only are the mechanisms for undo and redo sufficient to implement the basic requirements of ACID, they also offer advantages in performance and recoverability.

The performance benefit of redo has already been covered in the comments on durability; if you want an example of the performance benefits of undo, think about isolation—how can you run a report that takes minutes to complete if you have users who need to update data at the same time? In the absence of something like the undo mechanism, you would have to choose between allowing wrong results and locking out everyone who wants to change the data. This is a choice that you have to make with some other database products. The undo mechanism allows for an extraordinary degree of concurrency because, per Oracle’s marketing sound bite, “readers don’t block writers, writers don’t block readers.”

就可恢復性而言(我們將在第6章中更詳細地檢查可恢復性),如果我們記錄下我們對資料庫所做的全部更改,那麼我們可以從一個全新的資料庫開始,然後簡單地重新應用每個更改描述來複制原始資料庫的最新副本 。當然,實際上我們(通常)不會從一個新資料庫開始;相反,我們採取資料檔案的常規備份副本,這樣我們只需要重播重做的一小部分就可以使副本資料庫保持最新。

As far as recoverability is concerned (and we will examine recoverability in more detail in Chapter 6), if we record a complete list of changes we have made to the database, then we could, in principle, start with a brand-new database and simply reapply every single change description to reproduce an upto-date copy of the original database. Practically, of course, we don’t (usually) start with a new database; instead we take regular backup copies of the data files so that we need only replay a small fraction of the total redo generated to bring the copy database up to date.

 

三: Redo Simplicity  --- Redo的簡單性

我們處理redo的方法很簡單:就是不斷地生成redo records的stream,並以最快的速度將它們寫入到redo log。最初進入共享記憶體的一個區域,稱為 redo log buffer 重做日誌緩衝區。之後在從 redo log buffer 寫入到磁碟,即online redo log files。線上重做日誌檔案的數量是有限的,因此我們必須以迴圈方式不斷地重用它們。

The way we handle redo is quite simple: we just keep generating a continuous stream of redo records and pumping them as fast as we can into the redo log, initially into an area of shared memory known as the redo log buffer. Eventually, of course, Oracle has to deal with writing the buffer to disk and, for operational reasons, actually writes the “continuous” stream to a small set of predefined files—the online redo log files. The number of online redo log files is limited, so we have to reuse them constantly in a round-robin fashion.

      為了讓online redo log files裡的資訊儲存更長時間,大多數系統會對online redo log files保留1個或多個副本,即歸檔日誌。但是,就redo而言,採用寫入忘記機制,一旦 redo record 寫入到 redo log (buffer) ,我們(通常)不希望例項重新讀取它。這種“寫和忘記”的方法使重做成為一種非常簡單的機制。

To protect the information stored in the online redo log files over a longer time period, most systems are configured to make a copy, or possibly many copies, of each file as it becomes full before allowing Oracle to reuse it: the copies are referred to as the archived redo log files. As far as redo is concerned, though, it’s essentially write it and forget it—once a redo record has gone into the redo log (buffer), we don’t (normally) expect the instance to reread it. At the basic level, this “write and forget” approach makes redo a very simple mechanism.

 

注意:

雖然我們只希望對online redo log files進行寫和忘記,但是還有一些特殊情況需要讀取online redo log files,例如檢查記憶體壞塊、從磁碟恢復壞塊等,還有一些特性需要讀取online redo log files,例如: Log Miner, Streams asynchronous Change Data Capture 等。近年來還有一些新的特性需要讀取online redo log files,如standby database,我們將在第六章討論這些特性。

Note

Although we don’t usually expect to do anything with the online redo log files except write them and forget them, there is a special case where a session can read the online redo log files when it discovers the inmemory version of a block to be corrupt and attempts to recover from the disk copy of the block. Of course, some

features, such as Log Miner, Streams, and asynchronous Change Data Capture, have been created in recent years to take advantage of the redo log files, and some of the newer mechanisms for dealing with Standby databases have become real-time and are bound into the process that writes the online redo. We will look at such features in Chapter 6.

然後,還有一個複雜的問題,將  redo record 寫入到  redo log buffer 過程會出現效能瓶頸。在10g版本之前,oracle會將每個會話下資料改變產生的 redo record (包含一對重做更改向量)寫入到 redo log buffer 。但是單個會話在短時間內可能會發生很多變化,也可能有多個會話同時併發操作,但是要訪問的redo log buffer只有一個。

There is, however, one complication. There is a critical bottleneck in redo generation, the moment when a redo record has to be copied into the redo log buffer. Prior to 10g, Oracle would insert a redo record (typically consisting of just one pair of redo change vectors) into the redo log buffer for each change a session made to user data. But a single session might make many changes in a very short period of time, and there could be many sessions operating concurrently—and there’s only one redo log buffer that everyone wants to access.

為了解決這個問題,建立一種機制來控制對共享記憶體每個部分的訪問,Oracle使用redo allocation latch來保護  redo log buffer 的使用。當要給程式需要使用log buffer裡的部分空間時,需要先去申請獲取 redo allocation latch ,一旦獲取到對於的latch,就可以把相關資訊寫入到buffer。這樣就避免了多個程式重寫log buffer相同塊的風險。但是如果有大量的程式申請獲取redo allocation latch ,會消耗大量資源(主要時CPU用來  latch spin),或者在第一次spin失敗後,離開佇列,進入 sleep time

It’s relatively easy to create a mechanism to control access to a piece of shared memory, and Oracle’s use of the redo allocation latch to protect the redo log buffer is fairly well known. A process that needs some space in the log buffer tries to acquire (get) the redo allocation latch, and once it has exclusive ownership of that latch, it can reserve some space in the buffer for the information it wants to write into the buffer. This avoids the threat of having multiple processes overwrite the same piece of memory in the log buffer, but if there are lots of processes constantly competing for the redo allocation latch, then the level of competition could end up “invisibly” consuming lots of resources (typically CPU spent on latch spinning) or even lots of sleep time as sessions take themselves off the run queue after failing to get the latch on the first spin.

Oracle老版本中,當資料庫空閒時,redo生成的會很少, “one change = one record = one allocation” 策略對大多數系統時夠用的,當隨著系統更龐大更繁忙,需要處理更大的併發請求(特別對於OLTP系統),需要更加優化的策略,因此在10g新策略裡出現了一種結合私有重做private redo和記憶體撤消 in-memory undo 的新機制。

In older versions of Oracle, when the databases were less busy and the volume of redo generated was much lower, the “one change = one record = one allocation” strategy was good enough for most systems, but as systems became larger, the requirement for dealing with large numbers of concurrent allocations (particularly for OLTP systems) demanded a more scalable strategy. So a new mechanism combining private redo and in-memory undo appeared in 10g.

事實上,流程可以遍歷整個事務,生成其所有更改向量change vectors,並將它們儲存在一對私有重做日誌緩衝區 private redo log buffers 中。當事務完成時,程式將所有私有儲存的重做複製到公共重做日誌緩衝區中,此時傳統的日誌緩衝區處理接管。這意味著一個程式僅在事務結束後獲取一次公共重做分配鎖存器 public redo allocation latch ,而不是在每個更改中獲得一次。

In effect, a process can work its way through an entire transaction, generating all its change vectors and storing them in a pair of private redo log buffers. When the transaction completes, the process copies all the privately stored redo into the public redo log buffer, at which point the traditional log buffer processing takes over. This means that a process acquires the public redo allocation latch only once per transaction, rather than once per change.

注意

Oracle9.2中優化了相關策略,使用log_parallelism引數控制多個log buffers,在CPU少於16時可以忽略這個引數。在Oracle 10g,如果CPU>=2,那麼至少由兩個 public log buffers(redo threads)

Note

As a step toward improved scalability, Oracle 9.2 introduced the option for multiple log buffers with the log_parallelism parameter, but this option was kept fairly quiet and the general suggestion was that you didn’t need to know about it unless you had at least 16 CPUs. In 10g you get at least two public log buffers (redo threads) if you have more than one CPU.

有許多細節(和限制)需要提到,但是在討論任何複雜性之前,讓我們先注意一下它是如何 通過 動態效能檢視 獲取資訊 。我使用了core_demo_02中的指令碼。刪除轉儲命令,並將其替換為呼叫來獲取v$latch和v$sesstat的快照(參見core_demo_02b)。程式碼庫中的sql)。我還修改了SQL以更新50行而不是5行,以便更清楚地顯示工作負載的差異 。下面的結果分別來自執行相同測試的9i和10g系統。首先是9i的結果:

There are a number of details (and restrictions) that need to be mentioned, but before we go into any of the complexities, let’s just take a note of how this changes some of the instance activity reported in the dynamic performance views. I’ve taken the script in core_demo_02.sql, removed the dump commands, and replaced them with calls to take snapshots of v$latch and v$sesstat (see core_demo_02b.sql in the code library). I’ve also modified the SQL to update 50 rows instead of 5 rows so that differences in workload stand out more clearly. The following results come from a 9i and a 10g system, respectively, running the same test. First the 9i results: 

特別要注意,在9i輸出中,我們已經執行了51次重做複製和重做分配鎖存,並且建立了51個重做條目。與10g的結果相比較:

Note particularly in the 9i output that we have hit the redo copy and redo allocation latches 51 times each (with a couple of extra gets on the allocation latch from another process), and have created 51 redo entries. Compare this with the 10g results: 

10g版本,我們的會話之僅申請了一次 redo copy latch redo allocation latch 上活動稍微多一些,我們可以看到生成的Redo size比9i版本測試的略小。這些結果在提交之後出現。如果我們在提交之前生成相同的快照,我們將看不到任何 redo entries (0 redo size),記憶體中的撤銷鎖存器將下降到51,重做分配鎖存器將是1,而不是5。

In 10g, our session has hit the redo copy latch just once, and there has been just a little more activity on the redo allocation latch. We can also see that we have generated a single redo entry with a size that is slightly smaller than the total redo size from the 9i test. These results appear after the commit; if we took the same snapshot before the commit, we would see no redo entries (and a zero redo size), the gets on the In memory undo latch would drop to 51, and the gets on the redo allocation latch would be 1, rather than 5.

因此,在一個關鍵點 的活動和競爭威脅明顯減少。在不利的方面,我們可以看到10g,測試過程中,擊中了被稱為記憶體撤銷鎖存器的新鎖存器 In memory undo latch   53次,這使它看起來好像我們只是把爭用問題從一個地方移到了另一個地方。我們把那個想法記下來,以後再 思考。

So there’s clearly a notable reduction in the activity and the threat of contention at a critical location. On the downside, we can see that 10g has, however, hit that new latch called the In memory undo latch 53 times in the course of our test, which makes it look as if we may simply have moved a contention problem from one place to another. We’ll take a note of that idea for later examination.

我們可以從資料庫的不同地方瞭解發生了什麼。我們可以檢查v$latch_children來理解為什麼閂鎖活動的變化不是一個新的威脅。我們可以檢查重做日誌檔案,看看一個大的重做條目是什麼樣子的。我們可以找到一些動態效能物件(x$kcrfstrand和x$ktifp),它們將幫助我們深入瞭解各種活動如何連結在一起

There are various places we can look in the database to understand what has happened. We can examine v$latch_children to understand why the change in latch activity isn’t a new threat. We can examine the redo log file to see what the one large redo entry looks like. And we can find a couple of dynamic performance objects (x$kcrfstrand and x$ktifp) that will help us to gain an insight into the way in which various pieces of activity link together.

     增強結構時基於兩組記憶體結構。一個(稱為x$kcrfstrand, private redo )處理“向前”更改向量,另一個集合(稱為x$ktifp,記憶體中的撤銷池  in-memory undo pool )處理撤銷更改向量。私有 private redo 重做結構碰巧包含關於 “public” redo log buffer(s) 的資訊,因此,如果在查詢時看到兩個不同的資訊模式,不要擔心。

The enhanced infrastructure is based on two sets of memory structures. One set (called

x$kcrfstrand, the private redo) handles “forward” change vectors, and the other set (called x$ktifp, the in-memory undo pool) handles the undo change vectors. The private redo structure also happens to hold information about the traditional “public” redo log buffer(s), so don’t be worried if you see two different patterns of information when you query it.

     x$ktifp( in-memory undo )中的池的數量取決於儲存事務陣列的大小(v$transaction),它是由引數事務設定的(但可以由引數 sessions 或引數 processes 派生)。本質上,池的數量預設為 transactions / 10,每個池由它自己的“ In memory undo latch ”管理。

The number of pools in x$ktifp (in-memory undo) is dependent on the size of the array that holds transaction details (v$transaction), which is set by parameter transactions (but may be derived from parameter sessions or parameter processes). Essentially, the number of pools defaults to transactions / 10 and each pool is covered by its own “In memory undo latch” latch.

     對於x$ktifp中的每個條目,x$kcrfstrand中都有一個對應的私有重做條目 private redo entry ,並且,正如我前面提到的,還有一些額外的條目是針對傳統的“公共”重做執行緒 “public” redo threads 的。公共重做執行緒的數量 public redo threads cpu_count引數決定,似乎是上限(1 + cpu_count / 16)。x$kcrfstrand中的每個條目都由它自己的重做分配鎖存器控制,每個公共重做執行緒都由每個CPU一個重做複製鎖存器 redo copy latch 控制(我們將在第6章中討論這些鎖存器的作用)

For each entry in x$ktifp there is a corresponding private redo entry in x$kcrfstrand, and, as I mentioned earlier, there are then a few extra entries which are for the traditional “public” redo threads. The number of public redo threads is dictated by the cpu_count parameter, and seems to be ceiling(1 + cpu_count / 16). Each entry in x$kcrfstrand is covered by its own redo allocation latch, and each public redo thread is additionally covered by one redo copy latch per CPU (we’ll be examining the role of these latches in Chapter 6).

    如果我們回到原來的測試,只更新表中的5行和2個塊,Oracle仍然會按照相同的順序訪問行和快取塊,但不會打包成對的重做更改向量,將它們寫入重做日誌緩衝區並修改塊,而是按如下操作:

     If we go back to our original test, updating just five rows and two blocks in the table, Oracle would still go through the action of visiting the rows and cached blocks in the same order, but instead of packaging pairs of redo change vectors, writing them into the redo log buffer, and modifying the blocks, it would operate as follows:

 

1. 通過獲取一對匹配的私有記憶體結構開始事務,一個來自x$ktifp,一個來自x$kcrfstrand。

2. 將每個受影響的塊標記為“具有私有重做”(但不更改塊)。

3.將每個撤消更改向量undo change vector寫入選定的記憶體撤消池in-memory undo pool。

4. 將每個重做更改向量 redo change vector 寫入選定的私有重做執行緒 private redo thread

5. 通過將兩個單個redo change記錄中來結束事務。

6. 將重做更改記錄複製到重做日誌中,並將更改應用到塊中。

1. Start the transaction by acquiring a matching pair of the private memory structures , one from x$ktifp  and one from x$kcrfstrand.

2. Flag each affected block as “has private redo” (but don’t change the block).

3. Write each undo change vector into the selected in-memory undo pool.

4. Write each redo change vector into the selected private redo thread.

5. End the transaction by concatenating the two structures into a single redo change record.

6. Copy the redo change record into the redo log and apply the changes to the blocks.

   如果我們檢視記憶體結構(參見core_imu_01)。就在我們從原始測試提交事務之前,我們看到了以下內容:

If we look at the memory structures (see core_imu_01.sql  in the code depot) just before we commit the transaction from the original test, we see the following: 

這表明,會話的私有記憶體區域允許大約64KB的“forward” changes,而“撤消”更改也是如此。對於64位的系統,這更接近於每個128KB。對5行的更新使用了來自這兩個區域的大約4KB。

如果我在提交更改後轉儲重做日誌檔案,則這是我獲得的重做記錄(最低限度):

This show us that the private memory areas for a session allow roughly 64KB for “forward” changes, and the same again for “undo” changes. For a 64-bit system this would be closer to 128KB each. The update to five rows has used about 4KB from each of the two areas.

If I then dump the redo log file after committing my change, this (stripped to a bare minimum) is the one redo record that I get: 

您將注意到撤消記錄(LEN:)的長度是0x594 = 1428,它與我在執行這個特定測試時看到的重做大小統計值相匹配。這比記憶體結構中報告的4352和3920位元組的總和要小得多,因此顯然有很多額外的位元組涉及到跟蹤私有的撤銷和重做——可能是緩衝區中的起始開銷。

You’ll notice that the length of the undo record (LEN: ) is 0x594 = 1428, which matched the value of the redo size statistic I saw when I ran this particular test. This is significantly smaller than the sum of the 4352 and 3920 bytes reported as used in the in-memory structures, so there are clearly lots of extra bytes involved in tracking the private undo and redo—perhaps as starting overhead in the buffers.

如果您通讀了12個單獨的更改向量的標題,特別注意OP:程式碼,您將看到程式碼11.5有5個更改向量,程式碼5.1有5個更改向量。這是五個前向更改向量 forward change vectors ,後面是五個撤消塊更改向量 undo block change vectors 。更改向量#2(程式碼5.2)是事務的開始,而更改向量#7(程式碼5.4)是所謂的提交記錄,事務的結束。我們將在第3章中更詳細地討論這些更改向量,但是這裡值得一提的是,雖然大多數更改向量僅在事務提交時應用於資料塊,事務開始時的更改向量是一種重要的特殊情況,在事務開始時應用於撤消段頭塊。

If you read through the headers of the 12 separate change vectors, taking note particularly of the OP: code, you’ll see that we have five change vectors for code 11.5 followed by five for code 5.1. These are the five forward change vectors followed by the five undo block change vectors. Change vector #2 (code 5.2) is the start of transaction, and change vector #7 (code 5.4) is the so-called commit record , the end of transaction. We’ll be looking at those change vectors more closely in Chapter 3, but it’s worth mentioning at this point that while most of the change vectors are applied to data blocks only when the transaction commits, the change vector for the start of transaction is an important special case and is applied to the undo segment header block as the transaction starts.

因此,Oracle有一種機制來減少會話從(公共)重做日誌緩衝區請求空間並將資訊複製到該緩衝區的次數,從而提高了我們可以實現的併發級別……在一定程度上。但你可能會想,我們必須為這種好處付出代價——當然,我們確實付出了代價

So Oracle has a mechanism for reducing the number of times a session demands space from, and copies information into, the (public) redo log buffer, and that improves the level of concurrency we can achieve . . . up to a point. But you’re probably thinking that we have to pay for this benefit somewhere— and, of course, we do.

在前面,我們看到我們所做的每一個更改都會導致對記憶體撤銷鎖存器的訪問。這是否意味著我們只是轉移了閂鎖活動的威脅,而不是真正緩解了它? Yes and no 。現在我們只命中一個鎖存器(在記憶體中是撤消鎖存器),而不是兩個(重做分配 redo allocation 和重做複製  redo copy ),所以我們至少減少了一半的閂鎖活動,但是,更重要的是,記憶體中的撤銷鎖存器有多個子鎖存器,每個子鎖存器對應一個記憶體中的撤銷池。在新機制出現之前,大多數系統執行時只有一個重做分配鎖存器,所以儘管我們現在使用記憶體中的撤銷鎖存器的次數與使用重做分配鎖存器的次數一樣多,但我們正在將訪問擴充套件到更多的鎖存器。

Earlier on we saw that every change we made resulted in an access to the In memory undo latch. Does that mean we have just moved  the threat of latch activity rather than actually relieving it? Yes and no. We now hit only one latch (In memory undo latch) instead of two (redo allocation and redo copy), so we have at least halved the latch activity, but, more significantly, there are multiple child latches for the In memory undo latches, one for each in-memory undo pool. Before the new mechanism appeared, most systems ran with just one redo allocation latch, so although we now hit an In memory undo latch just as many times as we used to hit the redo allocation latch, we are spreading the access across far

more latches.

另外值得注意的是,新機制還具有兩種重做分配閂鎖:一種覆蓋私有重做執行緒,一種覆蓋公共重做執行緒,每個執行緒都有自己的閂鎖。這有助於解釋我們前面看到的重做分配閂鎖統計的額外收穫:我們的會話使用私有重做分配閂鎖來獲取私有重做執行緒,然後在提交時必須獲取公共重做分配閂鎖,然後日誌編寫器(我們將在第6章中看到)獲取公共重做分配鎖存(我的測試系統有兩個公共重做執行緒)來將日誌緩衝區寫入檔案。

It’s also worth noting that the new mechanism also has two types of redo allocation latch—one type covers the private redo threads, one type covers the public redo threads, and each thread has its own latch. This helps to explain the extra gets on the redo allocation latch statistic that we saw earlier: our session uses a private redo allocation latch to acquire a private redo thread, then on the commit it has to acquire a public redo allocation latch, and then the log writer (as we shall see in Chapter 6) acquires the public redo allocation latches (and my test system had two public redo threads) to write the log buffer to file.

總的來說 latch 活動的數量減少了, latch 活動的分佈得更廣了,這是一件好事。但是在多使用者系統中,總是有其他的觀點需要考慮—使用舊的機制,在任何時刻複製到日誌緩衝區並應用到資料庫塊的會話的重做量非常小;使用這種新機制,複製和應用的重做量可能相對較大,這意味著將更多的時間應用於資料庫塊,這可能會阻止其他會話在進行更改時訪問這些塊。這可能是私有重做執行緒在大小上受到嚴格限制的原因之一。

Overall, then, the amount of latch activity decreases and the focus of latch activity is spread a little more widely, which is a good thing. But in a multiuser system, there are always other points of view to consider—using the old mechanism, the amount of redo a session copied into the log buffer and applied to the database blocks at any one instant was very small; using the new mechanism, the amount of redo to copy and apply could be relatively large, which means it takes more time to apply to the database blocks, potentially blocking other sessions from accessing those blocks as the changes are made. This may be one reason why the private redo threads are strictly limited in size.

此外,使用舊的機制, 第二個會話會立即看到資料的修改 ;使用新機制,第二個會話只能看到一個塊受到一些私有重做的影響,所以第二個會話現在負責跟蹤私有重做並將其應用到塊(如果需要),然後決定接下來對塊做什麼。(如果您不能立即看到另一個會話刪除了您需要的主鍵,請考慮引用完整性的問題) 這會導致另一個問題, 程式碼更長更復雜, 我們需要使用更多的CPU資源來獲取讀一致性。

Moreover, using the old mechanism, a second session reading a changed block would see the changes immediately; with the new mechanism, a second session can see only that a block is subject to some private redo, so the second session is now responsible for tracking down the private redo and applying it to the block (if necessary), and then deciding what to do next with the block. (Think about the problems of referential integrity if you can’t immediately see that another session has, for example, deleted a primary key that you need.) This leads to longer code paths, and more complex code, but even if the resulting code for read consistency does use more CPU than it used to, there is always an argument for making several sessions use a little more CPU as a way of avoiding a single point of contention.

Note   注意

有一個重要的優化原則經常被忽視。有時,如果這意味著每個人都在不同的地點工作,而不是不斷地在同一個競爭點上衝突,那麼最好每個人都多做一點工作——競爭 浪費資源。

There is an important principle of optimization that is often overlooked. Sometimes it is better for everyone to do a little more  work if that means they are operating in separate locations rather than constantly colliding on the same contention point—competition wastes resources.

我不知道有多少不同的事件 events 可以迫使會話從私有重做 private redo 和撤消 undo 中構造新版本的塊,但我知道有幾個事件會導致會話在提交之前放棄新策略。

I don’t know how many different events there are that could force a session to construct new versions of blocks from private redo and undo, but I do know that there are several events that result in a session abandoning the new strategy before the commit.

當私有重做執行緒 private redo thread或記憶體中的撤消池in-memory undo pool已滿時,Oracle必須放棄這種新機制。正如我們前面看到的,每個私有區域被限制在大約64KB(如果執行的是64位的Oracle,則為128KB)。當一個區域已滿時,Oracle建立一個redo記錄,將其複製到公共redo執行緒,然後繼續以舊方式使用公共redo執行緒。

An obvious case where Oracle has to abandon the new mechanism is when either the private redo thread or the in-memory undo pool becomes full. As we saw earlier, each private area is limited to roughly 64KB (or 128KB if you’re running a 64-bit copy of Oracle). When an area is full, Oracle creates a single redo record, copies it to the public redo thread, and then continues using the public redo thread in the old way.

但還有其他一些事件過早地導致了這種轉變。例如,您的SQL可能觸發一個遞迴語句。為了快速檢查可能的原因,以及每種原因發生的次數,可以 SYS 使用者 連線 資料庫 並執行以下SQL( 示例 10.2.0.3):

But there are other events that cause this switch prematurely. For example, your SQL might trigger a recursive statement. For a quick check on possible causes, and how many times each has occurred, you could connect as SYS and run the following SQL (sample taken from 10.2.0.3):

select ktiffcat, ktiffflc from x$ktiff; 

不幸的是,儘管在v$sysstat動態效能檢視中有各種與IMU相關的統計資料(例如,IMU重新整理),但它們似乎與x$structure中的數字沒有太好的關聯,儘管如果忽略其中的幾個數字,您可能會非常接近地認為找到了匹配的位。

Unfortunately, although there are various statistics relating to IMU in the v$sysstat dynamic performance view (e.g., IMU flushes), they don’t seem to correlate terribly well with the figures from the x$ structure—although, if you ignore a couple of the numbers, you can get quite close to thinking you’ve found the matching bits.

 

四: Undo Complexity --- Undo複雜性

Undo Redo 更復雜。 理論 上任何程式在任何時候 都可能 訪問任何撤銷記錄undo record, “隱藏”還不應該看到的資料項。為了有效地滿足這個需求,Oracle將undo記錄儲存在資料庫中的一個特殊的表空間中,這個表空間被稱為undo表空間。然後,程式碼必須維護指向撤消記錄的各種指標,以便程式知道在哪裡可以找到它需要的撤消記錄。將撤消資訊undo information儲存在“普通”資料檔案的資料庫中的優點是,這些塊與資料庫中的每個塊所受的緩衝、寫入和恢復演算法完全相同 管理撤消塊的基本程式碼與處理其他型別塊的程式碼相同。

Undo is more complicated than redo. Most significantly, any process may, in principle, need to access any undo record at any time to “hide” an item of data that it is not yet supposed to see. To meet this requirement efficiently, Oracle keeps the undo records inside the database in a special tablespace known, unsurprisingly, as the undo tablespace; then the code has to maintain various pointers to the undo records so that a process knows where to find the undo records it needs. The advantage of keeping undo information inside the database in “ordinary” data files is that the blocks are subject to exactly the same buffering, writing, and recovery algorithms as every block in the database—the basic code to manage undo blocks is the same as the code to handle every other type of block.

程式需要讀取undo記錄的原因有三個,因此指標鏈通過undo表空間的方式有三種。我們將在第3章中詳細討論這三種方法,但是我將對目前最常見的兩種方法做一些初步的評論。

There are three reasons why a process needs to read an undo record, and therefore three ways in which chains of pointers run through the undo tablespace. We will examine all three in detail in Chapter 3, but I will make some initial comments about the commonest two uses now.

注意

撤銷記錄  undo records 的連結串列用於處理讀一致性、回滾更改以及派生由於延遲塊清除而“丟失”的提交scn。第三個主題將推遲到第三章

Note

Linked lists of undo records are used to deal with read consistency, rolling back changes, and deriving commit SCNs that have been “lost” due to delayed block cleanout. The third topic will be postponed until Chapter 3.

 

4.1 Read Consistency 讀一致性

第一個也是最常呼叫的undo用法是read consistence,我已經對read consistence做了簡要的評論。UNDO的存在允許會話在尚未看到較新版本的情況下看到舊版本的資料。

The first, and most commonly invoked, use of undo is read consistency, and I have already commented briefly on read consistency. The existence of undo allows a session to see an older version of the data when it’s not yet supposed to see a newer version.

讀取一致性的要求意味著一個塊必須包含一個指向undo記錄的指標,undo記錄描述瞭如何隱藏對塊的更改。但是,可能有大量的更改需要隱藏,而且在一個塊中沒有足夠的空間容納這麼多指標。因此,Oracle允許每個塊中的指標數量有限(每個影響塊的併發事務一個指標),這些指標儲存在ITL條目中。當一個程式建立一個undo記錄時,它(通常)會覆蓋一個現有的指標,將之前的值儲存為undo記錄的一部分。

The requirement for read consistency means that a block must contain a pointer to the undo records that describe how to hide changes to the block. But there could be an   arbitrarily large number of changes that need to be concealed, and insufficient space for that many pointers in a single block. So Oracle allows a limited number of pointers in each block (one for each concurrent transaction affecting the block), which are stored in the ITL entries. When a process creates an undo record, it (usually) overwrites one of the existing pointers, saving the previous value as part of the undo record.

在更新了一個塊中的三行 資料 後,再看一看我前面顯示的撤消記錄

Take another look at the undo record I showed you earlier, after updating three rows in a single block:

這個撤銷記錄顯示我更新的第五行的表塊指向,我們可以從轉儲的第二行看到撤銷塊中的記錄0xf。從轉儲的底部向上七行可以看到,這個記錄有op:C,它告訴我們,它是同一事務先前更新的延續。這使Oracle知道行uba:0x0080009a.09d4.0d的其餘部分是用於重新建立塊的舊版本的資訊的一部分:由於將xxxxxx(78s)複製回第4行的第2列,因此必須將值0x0080009a.09d4.0d複製回ITL條目2。

The table block holding the fifth row I had updated was pointing to this undo record, and we can see from the second line of the dump that it is record 0xf in the undo block. Seven lines up from the bottom of the dump you see that this record has op: C, which tells us that it is the continuation of an earlier update by the same transaction. This lets Oracle know that the rest of the line uba: 0x0080009a.09d4.0d is part of the information that has to be used to re-create the older version of the block: as the xxxxxx (78s) are copied back to column 2 of row 4, the value 0x0080009a.09d4.0d has to be copied back to ITL entry 2.

當然,一旦Oracle採取了這些步驟來重建一箇舊版本的塊,它將發現資料還沒有走得足夠遠,但是ITL 2中的指標現在告訴它在哪裡找到下一個要應用的undo記錄。這樣,一個過程就可以隨著時間的推移而逐漸倒退;每個ITL條目中的指標告訴Oracle在何處應用撤消記錄,每個撤消記錄包含使ITL條目在時間上向後以及使資料在時間上向後的資訊。

Of course, once Oracle has taken these steps to reconstruct an older version of the block, it will discover that it hasn’t yet gone far enough, but the pointer in ITL 2 is now telling it where to find the next undo record to apply. In this way a process can gradually work its way backward through time; the pointer in each ITL entry tells Oracle where to find an undo record to apply, and each undo record includes the information to take the ITL entry backward in time as well as taking the data backward in time.

 

4.2 Rollback

第二,撤銷的主要用途是回滾更改,可以使用顯式回滾(或回滾到儲存點),也可以因為事務中的某個步驟失敗而Oracle發出了隱式的語句級回滾

讀取一致性是關於單個塊,並找到該塊的所有撤消記錄的連結列表。回滾與事務的歷史有關,因此我們需要一個連結串列,它以正確的順序(在本例中,這意味著反向)遍歷事務的所有undo記錄。

The second, major use of undo is in rolling back changes, either with an explicit rollback (or rollback to savepoint) or because a step in a transaction has failed and Oracle has issued an implicit, statement-level rollback.

Read consistency is about a single block, and finding a linked list of all the undo records for that block. Rolling back is about the history of a transaction, so we need a linked list that runs through all the undo records for a transaction in the correct (which, in this case, means reverse) order.

注意

下面是一個簡單的例子,演示了為什麼需要“反向”連結撤銷記錄。“假設我們更新一行兩次,將單個列值從 A 更改為B,然後從B更改為C,從而得到兩條undo記錄。如果我們想要逆轉改變,我們必須先把C變回B,然後才能應用“把B變回 A ”的撤銷記錄;換句話說,我們必須在應用第一個撤消記錄之前應用第二個撤消記錄

Note

Here is a simple example demonstrating why we need to link the undo records “backward.” Imagine we update a row twice, changing a single column value from A to B and then from B to C, giving us two undo records. If we want to reverse the change, we have to change the C back to B before we can apply an undo record that says “change a B to an A”; in other words, we have to apply the second undo record before we apply the first undo record.

再次檢視示例undo記錄,我們可以看到連結串列的跡象。轉儲的第3行包括rci 0x0e條目。這告訴Oracle在這個undo記錄之前立即建立的undo記錄是同一個undo塊中的第14號(0x0e)。這是有可能的,當然,之前的取消記錄將在一個不同的撤銷,但應該只有當前撤銷撤銷塊的記錄是第一個取消記錄,在這種情況下,rci條目將零和rdba:條目下面四行給前面的塊地址取消記錄。如果必須返回一個塊,那麼塊的最後一條記錄通常是必需的記錄,儘管從技術上講,您需要的是irb: entry指向的記錄。但是,如果您已經回滾到savepoint,那麼irb: entry可能不會指向最後一條記錄。

Looking again at the sample undo record, we can see signs of the linked list. Line 3 of the dump includes the entry rci 0x0e. This tells Oracle that the undo record created immediately before this undo record was number 14 (0x0e) in the same undo block. It’s possible, of course, that the previous undo record will be in a different undo block, but that should be the case only if the current undo record is the first undo record of the undo block, in which case the rci entry would be zero and the rdba: entry four lines below it would give the block address of the previous undo record. If you have to go back a block, then the last record of the block will usually be the required record, although technically what you need is the record pointed at by the irb: entry. However, the only case in which the irb: entry might not point to the last record is if you have done a rollback to savepoint.

讀一致性和回滾之間有一個重要的區別。為了讀取一致性,我們複製了記憶體中的資料塊並將撤銷記錄undo records應用到那個塊 data block 上,一旦我們完成了它,我們就可以很快地丟棄它;當回滾時,我們獲取當前塊current block並對其應用undo record。這有三個重要的影響 :

There’s an important difference between read consistency and rolling back, of course. For read consistency we make a copy of the data block in memory and apply the undo records to that block, and it’s a copy of the block that we can discard very rapidly once we’ve finished with it; when rolling back we acquire the current block and apply the undo record to that. This has three important effects:

1.資料塊 data block 是當前的塊 current block ,因此它是最終必須寫入到磁碟的塊的版本。

2. 因為它是當前塊,所以當我們更改它時,將生成 redo (即使我們“將它更改回原來的方式”)。

3.因為Oracle有儘可能高效地清理事故的崩潰恢復機制,所以我們需要確保在使用undo記錄時將其標記為“undo applied”,這樣做會生成更多的重做。

1. The data block is the current block, so it is the version of the block that must

eventually be written to disc.

2. Because it is the current block, we will be generating redo as we change it (even

though we are “changing it back to the way it used to be”).

3. Because Oracle has crash-recovery mechanisms that clean up accidents as

efficiently as possible, we need to ensure that the undo record is marked as “undo

applied” as we use it, and doing that generates even more redo.

如果undo記錄已經被用於回滾,那麼轉儲的第4行應該是這樣的

If the undo record was one that had already been used for rolling back, line 4 of the dump would have looked like this:

Undo type: Regular undo User Undo Applied Last buffer split: No

      在原始塊轉儲中,使用者撤銷應用的標誌只有1個位元組,而不是一個17個字元的字串。

回滾涉及大量工作,而且回滾所花費的時間與原始事務所花費的時間大致相同,可能會生成類似數量的重做。但是您必須記住,回滾是一個更改資料塊的活動,因此您必須重新獲取、修改和編寫這些塊,並編寫描述您如何更改這些塊的重做。此外,如果事務是一個大型的長時間執行的事務,您可能會發現您更改的一些塊已經被寫入到磁碟並從快取中清除了—所以在您可以回滾它們之前,必須從磁碟讀取它們到記憶體!

In the raw block dump, the User Undo Applied  flag is just 1 byte rather than a 17-character string.

Rolling back involves a lot of work, and a rollback can take roughly the same amount of time as the original transaction, possibly generating a similar amount of redo. But you have to remember that rolling back is an activity that changes data blocks, so you have to reacquire, modify, and write those blocks, and write the redo that describes how you’ve changed those blocks. Moreover, if the transaction was a large, long-running transaction, you may find that some of the blocks you’ve changed have been written to disc and flushed from the cache—so they’ll have to be read from disc before you can roll them back!

注意

有些系統使用Oracle表來儲存“臨時”或“暫存”資訊。使用的策略之一是插入資料而不提交資料,這樣讀一致性使其成為會話的私有資料,然後回滾使資料“消失”。這一戰略存在許多缺陷,回滾的潛在高成本只是其中之一。可以使用全域性臨時表代替這種策略。

Note

Some systems use Oracle tables to hold “temporary” or “scratchpad” information. One of the strategies used with such tables is to insert data without committing it so that read consistency makes it private to the session, and then roll back to make   the data “go away.” There are many flaws in this strategy, the potentially high cost of rolling back being just one of them. The ability to eliminate the cost of rollback is one of the things that makes global temporary tables useful.

當然,通過回滾還會引入其他開銷。當一個會話建立撤消記錄時,它一次獲取、固定和填充一個撤消塊;當它回滾時,它一次從undo塊獲取一條記錄,釋放並重新獲取每個記錄的塊。這意味著在撤消塊上生成的回滾緩衝區訪問比最初執行事務時生成的要多。而且,每次Oracle獲得undo記錄時,它都會檢查應該應用到的表空間是否仍然線上(如果不是,Oracle會將undo記錄轉移到system表空間中的save undo段);這在字典快取(特別是dc_tablespaces快取)上顯示為get。

There are other overheads introduced by rolling back, of course. When a session creates undo records, it acquires, pins, and fills one undo block at a time; when it is rolling back it gets one record from an undo block at a time, releasing and reacquiring the block for each record. This means that you generate more buffer visits on undo blocks to roll back than you generated when initially executing the transaction. Moreover, every time Oracle acquires an undo record, it checks that the tablespace it should be applied to is still online (if it isn’t, Oracle will transfer the undo record into a save undo  segment in the system tablespace); this shows up as get on the dictionary cache (specifically the dc_tablespaces cache).

我們可以用最後一個 特殊 的小細節來結束關於回滾的評論。如果會話發出回滾命令,則完成回滾的步驟是提交。我們會在第三章多花一點時間在這上面

We can finish the comments on rolling back with one last quirky little detail. If your session issues a rollback command, the step that completes the rollback is a commit. We’ll spend a little more time on that in Chapter 3.

 

Summary 總結:

在某些方面,redo 重做是一個非常簡單的概念:資料檔案中對塊的每個更改都用重做更改向量redo change vector來描述,並且這些更改向量被立即寫入重做日誌緩衝區redo log buffer(幾乎是),並最終寫入重做日誌檔案redo log file。

In some ways redo is a very simple concept: every change to a block in a data file is described by a redo change vector, and these change vectors are written to the redo log buffer (almost) immediately, and are ultimately written into the redo log file.

當我們對資料(包括索引項和結構後設資料)進行更改時,我們還在undo表空間中建立undo記錄,這些記錄描述瞭如何逆轉這些更改。由於undo表空間只是另一組資料檔案,所以我們建立了redo change vector來描述我們儲存在那裡的undo記錄

As we make changes to data (which includes index entries and structural metadata), we also create undo records  in the undo tablespace that describe how to reverse those changes. Since the undo tablespace is just another set of data files, we create redo change vectors to describe the undo records we store there.

Oracle的早期版本中,更改向量change vectors通常以成對的形式組合在一起—一個描述正向更改forward change,一個描述撤消記錄undo record— 把這一對record合併成一個redo record,並寫入到 redo log buffer

In earlier versions of Oracle, change vectors were usually combined in pairs—one describing the forward change, one describing the undo record—to create a single redo record  that was written (initially) into the redo log buffer.

Oracle的較新版本中,OLTP系統中將更改向量change vectors移動到重做日誌緩衝區redo log buffer的步驟被視為的一個重要瓶頸,並 引入 一個新機制, 將一個會話裡的所有操作存放在 private  redo log buffer中,事務完成後在一起寫入到public redo log buffer。

In later versions of Oracle, the step of moving change vectors into the redo log buffer was seen as an important bottleneck in OLTP systems, and a new mechanism was created to allow a session to accumulate all the changes for a transaction “in private” before creating one large redo record in the redo buffer.

新機制嚴格限制了會話在將其更改向量重新整理到重做日誌緩衝區並切換到舊機制之前所做的工作,並且有各種各樣的事件會使這種切換提前發生。

The new mechanism is strictly limited in the amount of work a session will do before it flushes its change vectors to the redo log buffer and switches to the older mechanism, and there are various events that will make this switch happen prematurely.

redo作為一個簡單的“寫 ---忽略 ”流操作時,undo可能會在資料庫的當前活動中頻繁地重新讀取,而undo記錄必須以不同的方式連結在一起,以實現有效的訪問。讀取一致性Read consistency要求給定塊的撤銷記錄鏈;回滾 rolling back需要一個給定事務的撤銷記錄鏈。(還有第三條鏈,將在第三章中討論。)

While redo operates as a simple “write it and forget it” stream, undo may be frequently reread in the ongoing activity of the database, and undo records have to be linked together in different ways to allow for efficient access. Read consistency requires chains of undo records for a given block; rolling back requires a chain of undo records for a given transaction. (And there is a third chain, which will be addressed in Chapter 3.)

 

 

2020-02-02 陳舉超

---翻譯整理自

Oracle Core Essential Internals for DBAs and Developers》

Chapter 2:Redo and Undo


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29785807/viewspace-2674361/,如需轉載,請註明出處,否則將追究法律責任。

相關文章