關於filesystem與ASM的效能對比

tengrid發表於2010-11-09



如果對filesystem與ASM的區別不太明白,這個貼子值得反覆看,老湯簡要總結到下面

這個貼子由一個使用者的兩個問題引出:
(1), 將RHEL3 32-bit 上執行的一個10.2.0.4資料庫遷移到RHEL5 64-bit 後之後,發現一個批處理job慢了10倍(原來5minutes,現在50minutes), 使用者查詢DBA_HIST_SQLSTAT  發現除了 sql語句的統計資料增量(在兩個系統中各取兩個快照,相減) 中,除了IOWAIT 之外,其他都差不多, 問tom可能是什麼原因導致效能變差,要檢查哪些方面?

(2), 一個11.1.0.7的DW系統, OEM顯示 read 平均耗時要50ms , 在8個cpu的機器上正在跑12個channel的rman備份,發現cpu load為1, iowait為15%, 看起來 rman在等ASM.
在與系統管理員溝通儲存效能問題之前問, tom 要檢查哪些方面?

解決方法:
加大遷移後(ASM) SGA, 比遷移之前(filesystem)要更大

關於buffered file system與ASM/raw/cluster file system效能對比, TOM的觀點總結如下:

1, 由於secondary SGA的存在, 在從filesystem遷移到ASM時,需要分配更多的buffer cache來彌補ASM中secondary SGA的消失帶來的效能損失, 否則系統可能會因此變慢, 因為OS層已經沒有cache可以用了; 

 file system存在TOM稱之為'secondary SGA'的現像, 就是說Oracle的server process首先會從SGA的buffer cache中找block, 找不到時,會進行系統呼叫來'掃描磁碟', 這是站在Oracle的角度看到的,實際上,在OS層面上,會從filesystem cache中找block, 找不到才會真正去進行disk read (實際上,在儲存這層,控制器上仍有cache, 而hard disk這層,磁碟頭部仍有cache,所以站在不同的層面,說法不一樣), 所以TOM說這個file system cache其實相當於Oracle的第二層快取,  估計這個觀點是因為它與 SGA buffer cache一樣,都是從實體記憶體上分配.

basically, you used to have a cooked, buffered file system. When you wanted to get a buffer from the buffer cache - and we didn't see it there - we'd issue a physical IO - but it would not be a true physical IO because the operating system would find it in ITS cache - the IO would be really fast because the OS never went to disk.

ASM (like raw, like clustered file systems) do not buffer - not in the OS, not anywhere (a SAN might, but that is a different story). So, the fix? Increase your buffer cache size to make up for the loss of that OS secondary SGA - this will result in even faster performance as we won't have to go to that secondary sga, we just get it from the real one.

2) ASM would not be slow there - once the data is on disk - ASM is really out of the way. The dedicated server reads from disk. If you are seeing really slow IO times - slow read times - it would have nothing to do with ASM at that level. It would have to do with your disk response times.

Do something on them *without oracle*. DD the devices to /dev/null or something and monitor your IO times.

2,SGA調整得較好時, ASM效能 > file system效能
   因為不必去os cache中查詢block( 伴隨 context switch等開銷,OS要再做一次oracle做的事情, 何況os cache也是以LRU組織,大的表並不總是能cache住,  並且parallel read時,會flush cache,直接從disk讀取), 就是說使用ASM時,不會出現使用filesystem時的系統開銷

在11g時,建議只設定一個記憶體使用引數,交給oracle去分配pga,sga等;11g之前分配好pga,sga,無特殊情況,不要留太多free memory給os用作filesystem cache,
至於設定多大的sga適合,建議用buffer cache advisor

3,儲存系統(不是指硬體)選型時,建議使用ASM

(注:上面的ASM可以理解成泛指,TOM其實是說所有非cooked file system,即不存在os 層面filesystem cache的儲存組織形式,如raw, 某些cluster file system)  

4,關於OS記憶體的特性,與CPU這種資源是一樣的,  free memory不使用時,就會被用作filesystem cache--對於使用者來說,相當於"不用即消失";當程式需要更多記憶體時,會從filesytem cahce中釋放, 所以,給dba的啟發是--能用則儘量用

you don't need to "find it", it is just there - any memory you are not using for something else, the OS will use to buffer the buffered file systems.

If you run a memory intensive program, the OS file system cache will shrink and maybe virtually disappear.

Stop running things that use lots of physical RAM and it'll grow again.


Since memory is a "use it or lose it" resource, like CPU, they use it when it is there and do not when it isn't.

5 , ASM中的failgroup其實與redo log中的mirror的原理是一樣的, dbwr在寫block到ASM時,不會是同步寫,而是非同步--在適當的時候批次寫disk, 並且會等os確認成功, 並且在寫映象的兩邊時也沒有所謂'原子'的概念, 寫一側成功,寫另一側失敗是可以的,這也正是ASM failgroup的目的所在!

2) they don't have to be atomic, in fact - one of them can fail (that is part of the purpose). But dbwr would typically do async IO to both devices and wait for the OS to notify it of the completed write.

3) we write in parallel, just like the hardware would be doing. dbwr does this all of the time, even in non mirrored setup. dbwr gets a big set of blocks to write and writes them all using async IO and waits for the "disk device" to tell it "all done"

其實tom在這個貼子中的觀點要一分為二來看: 如果系統中絕大部分都是physical io,那麼,從filesystem遷移到ASM(底層disk 大多是raw,很少直接用block device)後, 不管怎樣調整sga,效能都有可能會下降,因為遷移之前filesystem中的cache在ASM中沒有了, IO是真正落到了disk上; 

反之,如果不是上面這種情況,那很有可能因為有更多的記憶體可用於sga,而有利於效能提升,

這個話題見另一個貼子

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/94384/viewspace-677823/,如需轉載,請註明出處,否則將追究法律責任。

相關文章