今天碰到了一次MYSQL CRASH,不像以前是BUG所致，這次是人為搞死的，又是一次血的教訓
直接觸發原因：
　在白天業務期間執行了200W的DELETE語句（自動提交）；
以下是當時的日誌和狀態資訊：

100811 17:06:06 InnoDB: ERROR: the age of the last checkpoint is 1887429810,
InnoDB: which exceeds the log group capacity 1887429428.
InnoDB: If you are using big BLOB or TEXT rows, you must set the
InnoDB: combined size of log files at least 10 times bigger than the
InnoDB: largest such row.

[@more@]

100811 17:06:22 InnoDB: ERROR: the age of the last checkpoint is 1928074432,
InnoDB: which exceeds the log group capacity 1887429428.
InnoDB: If you are using big BLOB or TEXT rows, you must set the
InnoDB: combined size of log files at least 10 times bigger than the
InnoDB: largest such row.
100811 17:08:41 InnoDB: ERROR: the age of the last checkpoint is 1955265899,
InnoDB: which exceeds the log group capacity 1887429428.
InnoDB: If you are using big BLOB or TEXT rows, you must set the
InnoDB: combined size of log files at least 10 times bigger than the
InnoDB: largest such row.
100811 17:08:59 InnoDB: ERROR: the age of the last checkpoint is 1955266310,
InnoDB: which exceeds the log group capacity 1887429428.
InnoDB: If you are using big BLOB or TEXT rows, you must set the
InnoDB: combined size of log files at least 10 times bigger than the
InnoDB: largest such row.

show Innodb status
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 98390, free list len 294851, seg size 393242,
1962952796 inserts, 1960901438 merged recs, 126408533 merges
Hash table size 17700857, used cells 6710294, node heap has 12444 buffer(s)
0.16 hash searches/s, 9.00 non-hash searches/s
---
LOG
---
Log sequence number 2017 503698566
Log flushed up to 2017 503698566
Last checkpoint at 2016 3207393311
0 pending log writes, 0 pending chkp writes
3659386378 log i/o's done, 0.05 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 10367101167; in additional pool allocated 281326592
Buffer pool size 524288
Free buffers 0
Database pages 511844
Modified db pages 159931
Pending reads 1
Pending writes: LRU 0, flush list 129, single page 0
Pages read 226580777, created 88398135, written 1231690245
11.26 reads/s, 0.00 creates/s, 161.68 writes/s
Buffer pool hit rate 750 / 1000

現象：　INSERT程式無法寫入，被HANG住；
直接用KILL方式殺掉MYSQL程式，並重啟MYSQL：
開始恢復，日誌資訊如下：

InnoDB: Log scan progressed past the checkpoint lsn 2016 3207393311
100811 17:21:50 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 2016 3212636160
InnoDB: Doing recovery: scanned up to log sequence number 2016 3217879040
InnoDB: Doing recovery: scanned up to log sequence number 2016 3223121920
InnoDB: Doing recovery: scanned up to log sequence number 2016 3228364800
InnoDB: Doing recovery: scanned up to log sequence number 2016 3233607680

...

InnoDB: Doing recovery: scanned up to log sequence number 2016 4271697920
InnoDB: Doing recovery: scanned up to log sequence number 2016 4276940800
InnoDB: Doing recovery: scanned up to log sequence number 2016 4282183680
InnoDB: Doing recovery: scanned up to log sequence number 2016 4287426560
InnoDB: Doing recovery: scanned up to log sequence number 2016 4292669440
InnoDB: Doing recovery: scanned up to log sequence number 2017 2945024
InnoDB: Doing recovery: scanned up to log sequence number 2017 8187904
InnoDB: Doing recovery: scanned up to log sequence number 2017 13430784
InnoDB: Doing recovery: scanned up to log sequence number 2017 18673664
InnoDB: Doing recovery: scanned up to log sequence number 2017 23916544
InnoDB: Doing recovery: scanned up to log sequence number 2017 29159424
InnoDB: Doing recovery: scanned up to log sequence number 2017 34402304
....

InnoDB: Doing recovery: scanned up to log sequence number 2017 322760704
InnoDB: Doing recovery: scanned up to log sequence number 2017 328003584
InnoDB: Doing recovery: scanned up to log sequence number 2017 333246464
InnoDB: Doing recovery: scanned up to log sequence number 2017 338489344
InnoDB: Doing recovery: scanned up to log sequence number 2017 343732224
到這裡，恢復以失敗告終，沒有任何其他資訊；多次嘗試均未果。

出現這個現象是由於手工以自動提交的方式，執行200W條DELETE語句導致；（正常應用的操作主要是INSERT，沒有UPDATE，1000QPS，有TEXT欄位）

有故障發生後我們需要分析原因：

在這裡首先有幾個配置需要列出：
OS ：RHEL 4u7
MYSQL: 5.0.51a
HD ：146G*15 ， RAID 5
MEM :16G
/etc/my.cnf:
innodb_log_buffer_size = 4M
innodb_log_file_size = 500M
innodb_log_files_in_group = 4
innodb_max_dirty_pages_pct = 5
innodb_flush_log_at_trx_commit = 2
innodb_buffer_pool_size = 8G
innodb_data_file_path = ibdata1:500M:autoextend
innodb_file_per_table

我們能提供的資訊有：
1) 在執行200W刪除操作時候出現問題，
2) 有報錯：
100811 17:06:06 InnoDB: ERROR: the age of the last checkpoint is 1887429810,
InnoDB: which exceeds the log group capacity 1887429428.
這裡已經表明，日誌已經全部用完，需要FLUSH 髒資料塊；

3) 從show innodb status 的LOG部分看：
Log flushed up to 2017 503698566
Last checkpoint at 2016 3207393311

髒資料的範圍從 2016 3207393311 --&gt 2017 503698566 ,
髒資料範圍：(2016 4292669440)-(2016 3207393311)+(2017 503698566) = 1588974695 ,也就是需要RECOVER的範圍。

為什麼這樣算呢：
這裡的2016,2017代表的是日誌輪循寫的次數（我猜的,因為我看到INNODb在恢復時，ERROR LOG裡:
InnoDB: Doing recovery: scanned up to log sequence number 2016 4287426560
InnoDB: Doing recovery: scanned up to log sequence number 2016 4292669440 --最大值
InnoDB: Doing recovery: scanned up to log sequence number 2017 2945024
InnoDB: Doing recovery: scanned up to log sequence number 2017 8187904

第2016輪循讀到4292669440，接著是下一個2017 2945024 ；
((( 但這裡的4292669440 = 4G, 卻與我的日誌大小不匹配（500M*4），這裡有點矛盾，先跳過。)))

4) 髒資料塊的量已經明顯超過：（8G*0.05）400M，為什麼MYSQL來不及刷資料到硬碟？一種可能是COMMIT太頻繁導致MYSQL頻繁寫BINLOG日誌，太多佔用了IOPS，

5) 我們從innodb_log資訊來看
-rw-rw---- 1 mysql mysql 524288000 Aug 11 19:57 ib_logfile0
-rw-rw---- 1 mysql mysql 524288000 Aug 11 16:50 ib_logfile1
-rw-rw---- 1 mysql mysql 524288000 Aug 11 16:58 ib_logfile2
-rw-rw---- 1 mysql mysql 524288000 Aug 11 17:04 ib_logfile3

小結：
1, commit 過於頻繁，導致產生大量BINLOG日誌IO請求，而導致MYSQL變得緩慢（髒塊刷不下去），進而導致HANG住；
2, 大量的刪除，產生大量髒資料塊，超過（8G*0.05=400M）CHECKPOINT threshold, 觸發CHECKPOINT,從而使得MYSQL需要刷髒資料到塊到硬碟；
3, 大量的刪除，產生大量的REDO（innodb_logs），超過當前(500M*4)可以承受的量，強制MYSQL需要CHECKPOINT，觸發MYSQL 刷BUFFER髒塊到硬碟；

經驗與教訓：
　1,在業務繁忙期間，絕對不允許做如此大的操作。比如大量的DELETE、INSERT，並且是頻繁的提交；
2,大操作需要批次提交，減少日誌IO請求；

在這裡，我們還有不少的知識點需要補充：

MYSQL INNODB CHECKPOINT：
--------------------------------------
參考： http://www.mysqlperformanceblog.com/2008/11/13/adaptive-checkpointing/
http://dev.mysql.com/doc/refman/5.0/en/innodb-checkpoints.html

CHECKPOINT 資訊被儲存在第一個INNODB LOG FILE中。

INNODB CRUSH 後，需要進行恢復（前滾，後滾），從哪個點開始恢復，就是CHECKPOINT來控制的；
我們在show innodb status 後可以找到以下資訊：
---
LOG
---
Log sequence number 2017 503698566
Log flushed up to 2017 503698566
Last checkpoint at 2016 3207393311
------------------

Last checkpoint 指向的值，就是我們需要恢復的起始點. 要恢復到“Log flushed“，資料才是完整的；

那麼什麼時候來做checkpoint呢？

由兩個點來控制: 髒資料塊數量 && INNODB LOG FILE 大小

在INNODB BUFFER中，允許的髒資料量= innodb_buffer_size * innodb_max_dirty_pages_pct /100
如果髒資料塊超過這個值，MYSQL後臺程式就會需要將髒資料塊向磁碟重新整理，並重新整理CHECKPOINT值；

另外一個，INNODB_LOG 裡的內容裡用來恢復資料的，類似於ORACLE的REDO日誌，只要記憶體的髒資料塊沒有被刷到硬碟，那麼這些相應的日誌也不能丟失，
否則資料恢復就會失敗；INNODBLOG是迴圈使用的；當日志一次迴圈使用到最後一個日誌也寫滿了，接著需要寫第一個日誌的時候，
如果第一個日誌的內容還未被標誌成”無用“狀態，那麼MYSQL就會觸發CHECKPOINT，刷髒資料；

如果確定INNODB_LOG_FILE大小
--------------------------------------
參考： http://www.mysqlperformanceblog.com/2008/11/21/how-to-calculate-a-good-innodb-log-file-size/
INNODB_LOG_FILE 過於小，會直接觸發CHECKPOINT，導致頻繁IO請求；多大是合適的？
========================================
: (none) 16:13:13> pager grep sequence
PAGER set to 'grep sequence'
: (none) 16:13:14>
: (none) 16:13:15> SHOW engine innodb STATUSG SELECT sleep(60); SHOW engine innodb STATUSG
Log sequence number 1450 485101299
1 row in set (0.09 sec)

1 row in set (1 min 0.01 sec)

Log sequence number 1450 505024667
1 row in set (0.00 sec)

: (none) 16:14:37> nopager
PAGER set to stdout
: (none) 16:14:43> select (505024667-485101299)/1024/1024;
+---------------------------------+
| (505024667-485101299)/1024/1024 |
+---------------------------------+
| 19.00040436 |
+---------------------------------+
1 row in set (0.00 sec)
========================================
Notice the log sequence number. That's the total number of bytes written to the transaction log.
我們在高峰期間取樣可以得到，1分鐘產生19M的日誌；我覺得這個INNODB LOG大小設成 19M*60=1140M 已經足夠了；
60分鐘是一個經驗值，你也可以適當調大，比如 500M，3個檔案；這相對來說是安全的；
當然你也可以用以下命令來檢視日誌產生的大小：
show status like 'Innodb_os_log_written'; select sleep(60); show status like 'Innodb_os_log_written';

關於innodb_log_buffer_size
-----------------------
InnoDB用來往磁碟上的日誌檔案寫操作的緩衝區的大小。明智的值是從1MB到8MB。預設的是1MB。一個大的日誌緩衝允許大型事務執行而不需要在事務提交之前往磁碟寫日誌。因此，如果你有大型事務，使日誌緩衝區更大以節約磁碟I/O。
InnoDB flushes the buffer to the log files on disk
when the buffer gets full,
when a transaction commits,
when once per second

你可以檢視變數Innodb_os_log_written來估算每秒有多少日誌寫到磁碟.
if you see a peak of 100 KB written to the log per second,
a 1 MB log buffer is probably plenty.
a 256 MB log file is enough to store at least 2,560 seconds of log entries

CRASH with error- last checkpoint exceed log group capacity

相關文章