14.6.5 Doublewrite buffer
1. Doublewrite Buffer (Mysql 官方介紹)
The doublewrite buffer is a storage area where InnoDB writes pages flushed from the buffer pool before writing the pages to their proper positions in the InnoDB data files. If there is an operating system,storage subsystem,or unexpected mysqld process exit in the middle of a page write, InnoDB can find a good copy of the page from the doublewrite buffer duing crash recovery.
Although data is written twice, the doublewrite buffer does not require twice as much I/O overhead or twice as many I/O operations. Data is written to the doublewrite buffer in a lage sequential chunk, with a single fsync() call to the operating system (expect in the case tha innodb_flush_method is set to O_DIRECT_ON_FSYNC).
The doublewrite buffer is enabled by default in most cases. To disable the doublewrite buffer, set innodb_doublewrite to 0.
If system tablespace files ("ibdata files") are located on Fusion-io devices that support atomic writes, doublewrite buffering is automatically disabled and fusion-io atomic writes are used for all datafile files. Because the doublewrite buffer setting is global, doublewrite buffering is also disabled for data files residing on non-Fusion-io hardware. This feature is only supported on Fusion-io hardware and is only enabled for Fusion-io NVMFS on Linux. To take full advantage of this feature, an innodb_flush_method setting of O_DIRECT is recommended.
2. partial page write
資料庫,OS和磁碟讀寫的基本單位是塊,也可以稱之為(page size) block size。
資料庫的塊一般為8k,16k; OS的塊則一般為4K; IO塊更小,linux 核心要求IO block size <= OS block size.
磁碟IO除了IO block size , 還有扇區的概念(IO sector), 扇區是磁碟物理操作的基本單位, 而IO 塊是磁碟操作的邏輯單位,一個IO塊對應一個或多個扇區,扇區大小一般為512位元組。
所以各個塊大小的關係如下:
DB block > OS Block >= IO Block> Disk Block ,而且他們之間保持著整數倍的關係。比如DB 以mysql 為例, OS 以linux 為例:
DB block size
mysql> show variables like 'innodb_page_size';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| innodb_page_size | 16384 |
+------------------+-------+
1 row in set (0.01 sec)
OS block size
[root@vm0064 backup]# getconf PAGESIZE
4096
IO block size
[root@vm0064 backup]# blockdev --getbsz /dev/sda1
1024
Sector size
[root@vm0064 backup]# fdisk -l | grep Sector
Sector size (logical/physical): 512 bytes / 512 bytes
從上面的結果可以看到 DB page=4*OS page=16*IO pages=32*sector size
由於任何DB page的寫入,最終都會轉為sector的寫入,如果在寫磁碟的過程中,出現異常重啟,就可能會發生一個DB 頁只寫了部分sector到磁碟,進而出現頁斷裂的情況。
InnoDB 的page size 一般是16KB,其資料校驗也是針對這16KB計算的,將資料寫入到磁碟是以page為單位進行操作的。而計算機硬體和作業系統,在極端情況下(比如斷電)往往並不能保證這一操作的原子性,16K的資料, 寫入4K時,發生了系統斷電/os crash , 只有一部分寫是成功的,這種情況寫就是partial page write。
很多DBA 會想到系統恢復後,Mysql 可以根據redo log 進行恢復, 而mysql 在恢復的過程中是檢查page的checksum, checksum就是page的最後事務號,發生 partial page write 問題時, page 已經損壞,找不到改page的事務號,就無法恢復。
所以說,當page 損壞後,應用redo 是沒有意義的,這時候無法使用redo 來恢復,因為原始頁已經損壞了,會發生資料丟失。
3. doublewrite
在InnoDB 將Buffer Pool 中的Dirty Page 刷到磁碟上時,首先會將(memcpy函式) page 刷到InnoDB system tablespace(ibdata1)的一個區域中,我們稱該區域為double write buffer (大小為2MB, 每次寫入1MB,128個頁)。在向double write buffer寫入成功後,第二步再將資料複製到資料檔案對應的位置。
當第二步過程中發生故障,也就是發生partial page write的問題。 恢復的時候先檢查頁內的checksum是否相同,不一致,則直接從doublewrite中恢復
1)如果寫doublewrite buffer失敗,那麼這些資料不會寫到磁碟,innodb會載入磁碟原始資料和redo日誌比較,並重新刷到doublewrite buffer。
2) 如果寫doublewrite buffer成功,但是重新整理到磁碟失敗,那麼innodb就不會透過事務日誌來恢復了,而是直接重新整理doublewrite buffer中的資料。
4. doublewrite 引數
mysql> show variables like '%double%';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| innodb_doublewrite | ON |
+--------------------+-------+
1 row in set (0.00 sec)
mysql> show status like '%innodb_dblw%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Innodb_dblwr_pages_written | 5951 |
| Innodb_dblwr_writes | 1383 |
+----------------------------+-------+
從上面可以看出,Flush 了5951次到doublewrite buffer中,寫檔案共1383次, 則每次write合併了5951/1383次flush。
5. 對效能的影響可參考:
Innodb Double Write
https://www.percona.com/blog/2006/08/04/innodb-double-write/
MySQL Performance: InnoDB Doublewrite Buffer Impact
http://dimitrik.free.fr/blog/archives/2009/08/entry_86.html
本文參考文章
14.6.5 Doublewrite Buffer
https://dev.mysql.com/doc/refman/5.7/en/innodb-doublewrite-buffer.html
[mysql] Innodb 三大特性之 double write
https://www.cnblogs.com/chenpingzhao/p/4876282.html
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/21374452/viewspace-2777601/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- (轉)innodb_doublewrite_file
- 頁斷裂(partial write)與doublewrite技術
- IO之核心buffer----"buffer cache"
- protocol bufferProtocol
- 專業HTML文字編輯器:BBEdit for Mac 附註冊碼 14.6.5啟用版HTMLMac
- 【BUFFER】Oracle buffer cache之 latch 學習記錄Oracle
- Java NIO - BufferJava
- JAVA NIO BufferJava
- gc buffer busyGC
- Oracle Data BufferOracle
- Buffer Cache 原理
- MySQL Join BufferMySql
- cache buffer chainAI
- Export Parameter : BufferExport
- Buffer和Channel
- Buffer Cache以及buffer busy waits/gc相關事件AIGC事件
- buffer cache實驗7-buffer busy waits-完成AI
- buffer cache實驗8-free buffer waits-完成AI
- Ask Hoegh(5)——buffer cache和buffer有什麼區別?
- Protocol Buffer 使用指北Protocol
- TarsGo支援Protocol BufferGoProtocol
- TarsGo 支援 protocol bufferGoProtocol
- Java NIO之BufferJava
- Mysql Key Buffer SizeMySql
- node Buffer 學習
- Buffer Cache Hit Ratio
- Oracle Buffer Cache原理Oracle
- Oracle database buffer cacheOracleDatabase
- CACHE BUFFER CHAINSAI
- Database Buffer Cache (79)Database
- Multiple Buffer Pools (83)
- buffer cache實驗2-詳解Buffer Header--DUMP buffer結合X$BH檢視各欄位Header
- tf.data.Dataset.shuffle(buffer_size)中buffer_size的理解
- mysql change buffer小結MySql
- Oracle Cache Buffer ChainsOracleAI
- Oracle Free Buffer WaitsOracleAI
- Oracle Buffer Busy WaitsOracleAI
- NodeAPI學習之BufferAPI