AIX lv 4k偏移量

perfychi發表於2013-01-05

前幾天在客戶資料庫做巡檢的時候,在警告日誌中發現有如下警告:

引用
WARNING: You are creating datafile /dev/rtbs_data01. 
WARNING: Oracle recommends creating new datafiles on devices with zero offset. The command "/usr/sbin/mklv -y LVname -T O -w n -s n -r n VGname NumPPs" can be used. Please contact Oracle customer support for more details.

如果在AIX環境建立lv,如果建立帶有4k偏移量的lv,Oracle 10g將做如上提示,使用引數-T O,AIX對-T 0有如下解釋

引用
-T O 
     For big vg format volume groups, the -T O option indicates that the logical volume control block will not occupy the first block of the logical volume. 
     Therefore, the space is available for application data. Applications can identify this type of logical volume with the IOC INFO ioctl. The logical volume 
     has a device subtype of DS_LVZ. A logical volume created without this option has a device subtype of DS_LV. This option is ignored for old and scalable 
     vg format volume groups.

我們對AIX解釋做進一步延伸: 
AIX在建立vg時有3中vg型別可選,分別是 Original Volume Group,Big Volume Group和Scalable Volume Group 
對於普通的VG(Original Volume Group),不管你使用什麼命令建立lv,都是普通的DS_LV型別的LV。 
對於Big VG,是唯一允許同時存在這兩種LV型別的VG,如果我們指定-T O(注意,這裡是大寫的字母O),則建立DS_LVZ型別的LV,否則,建立普通型別的LV。如 
/usr/sbin/mklv -y LVname -T O -w n -s n -r n VGname NumPPs。 
對於Scalable-type VG型別的VG,不管你使用什麼方式的命令建立lv,都是擴充套件的DS_LVZ型別的LV。 
由Oracle的警告日誌可以看出,Oracle 使用raw裝置時,建議設定不帶4k的lv。那我們不禁有3個疑問: 
(1)這4k偏移量有什麼用處? 
(2)怎麼樣檢視LV是否帶有4K偏移量呢? 
(3)設定這4k偏移量有什麼壞處? 
AIX將這4k偏移量稱之為lvcb(logical volume control block),它將佔用4k的前512個位元組,它類似於Oracle資料檔案頭,保留有lv的建立時間,映象複製資訊,檔案系統掛載點等。 
可以透過getlvcb命令檢視lvcb資訊:

引用
# getlvcb -AT fslv02 
         AIX LVCB 
         intrapolicy = m 
         copies = 1 
         interpolicy = m 
         lvid = 000b56cc00004c000000012d264b87e5.14 
         lvname = fslv02 
         label = /ora10g 
         machine id = B56CC4C00 
         number lps = 112 
         relocatable = y 
         strict = y 
         stripe width = 0 
         stripe size in exponent = 0 
         type = jfs2 
         upperbound = 32 
         fs = vfs=jfs2:log=/dev/loglv00:mount=true:options=rw:account=false 
         time created  = Mon Apr 18 09:52:50 2011 
         time modified = Mon Apr 18 09:52:56 2011

從2個方面可以檢視lv是否有4k偏移量 
1、主機層面 
沒有4k偏移量:

引用
#lslv jfkdb_2G_044 
LOGICAL VOLUME: jfkdb_2G_044 VOLUME GROUP: jfk_dbvg_01 
LV IDENTIFIER: 00c3dff400004c00000001217a9d839e.84 PERMISSION: read/write 
VG STATE: active/complete LV STATE: closed/syncd 
TYPE: raw WRITE VERIFY: off 
MAX LPs: 1024 PP SIZE: 32 megabyte(s) 
COPIES: 1 SCHED POLICY: parallel 
LPs: 64 PPs: 64 
STALE PPs: 0 BB POLICY: relocatable 
INTER-POLICY: maximum RELOCATABLE: yes 
INTRA-POLICY: middle UPPER BOUND: 1024 
MOUNT POINT: N/A LABEL: None 
MIRROR WRITE CONSISTENCY: on/ACTIVE 
EACH LP COPY ON A SEPARATE PV ?: yes 
Serialize IO ?: NO 
DEVICESUBTYPE : DS_LVZ

有4k偏移量:

引用
[root@jfk_p560q /]# lslv jfkdb_2G_044 
LOGICAL VOLUME: jfkdb_2G_044 VOLUME GROUP: jfk_db_vg01 
LV IDENTIFIER: 00ce76de00004c00000001134ee6bc51.84 PERMISSION: read/write 
VG STATE: active/complete LV STATE: opened/syncd 
TYPE: raw WRITE VERIFY: off 
MAX LPs: 1024 PP SIZE: 32 megabyte(s) 
COPIES: 1 SCHED POLICY: parallel 
LPs: 64 PPs: 64 
STALE PPs: 0 BB POLICY: relocatable 
INTER-POLICY: maximum RELOCATABLE: yes 
INTRA-POLICY: middle UPPER BOUND: 16 
MOUNT POINT: N/A LABEL: None 
MIRROR WRITE CONSISTENCY: on/ACTIVE 
EACH LP COPY ON A SEPARATE PV ?: yes 
Serialize IO ?: NO

(2)Oracle層面: 
Oracle提供了一小工具dbfsize(在$ORACLE_HOME/bin下)用於觀察lv是否有4k偏移量 
無4k偏移量:

引用
$ dbfsize /dev/rlvsysaux_1g

 

Database file: /dev/rlvsysaux_1g 
Database file type: raw device without 4K starting offset 
Database file size: 40960 8192 byte blocks

有4k偏移量:

引用
[oracle@jfk_p560q /dev]$ dbfsize /dev/rjfkdb_2G_054

 

Database file: /dev/rjfkdb_2G_054 
Database file type: raw device 
Database file size: 262016 8192 byte blocks

如果資料庫使用block size為16k,建立跨pv帶有4k偏移的lv,條帶塊大小為64k。這樣將導致第4個block橫跨2個pv(條帶化操作,把lvcb也計算進條帶塊中)。這樣會導致 
條帶塊的第4個Oracle block跨磁碟,撇開效能方面考慮,如果系統異常當機,或者儲存異常當機,極易引起資料庫塊損壞,引起ora-01578錯誤。(metalink ID 261460.1)

引用
$   oerr ora 01578 
01578, 00000, "ORACLE data block corrupted (file # %s, block # %s)" 
// *Cause:  The data block indicated was corrupted, mostly due to software 
//          errors. 
// *Action: Try to restore the segment containing the block indicated. This 
//          may involve dropping the segment and recreating it. If there 
//          is a trace file, report the errors in it to your ORACLE 
//          representative.

那是不是不做條帶化,lv保留4k,就沒問題了呢? 
答案還是否定的。如果lv橫跨pv,pp size 為64m,那麼(64m-4k)/16k,還是除不盡,那問題還是依舊。 
Oracle從 9.2.0.3開始可以識別無4k偏移量的lv,那是不是建立無4k偏移量的lv就萬事大吉了呢?可惜不是,bug如期而至: 

也就是說當系統重啟或者執行chlv之類命令,DS_LVZ標記將會消失,也就意味著Oracle認為此lv有4k偏移量,那也就存在著Oracle block橫跨pv的可能性, 
如果運氣不好的話,ora-01578又不期而至,噩夢由此開始。

引用
IY94343: MKLV -TO ON BIG VOLUME GROUPS FAILS TO PUT SOME LV INFORMATION APPLIES TO AIX 5300-07 
**************************************************************** 
* USERS AFFECTED: 
* Users of BIG volume groups with the bos.rte.lvm fileset at 
* the 5.3.0.53 or 5.3.0.54 level. 
**************************************************************** 
* PROBLEM DESCRIPTION: 
* When creating a logical volume with a device type of DS_LVZ 
* using the '-TO' flag, lslv reports a DEVICESUBTYPE of DS_LV 
* rather than DS_LVZ.  The problem shows up only after a reboot 
* or any subsequent chlv or other LVM command that can update 
* the VGDA on disk. 
* This problem can cause some applications, such as Oracle, to 
* fail to start, and could result in database corruption.

如果沒有這個bug,即沒有4k的offset,如果db_block_size比strip size大,問題還是存在的,即也會存在跨pv,這是我們建條帶化所需要注意的,事實上,我也沒看到過條帶化大小比block size小的環境。但是這裡又引申出一個問題,如果儲存底層硬碟全部打散,且已做條帶化,並虛擬出硬碟,那討論應該複雜的多,可能作業系統需要跨磁碟的 block,真正在物理並沒有跨磁碟。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/27042095/viewspace-752164/,如需轉載,請註明出處,否則將追究法律責任。

相關文章