今天有個網友問我一個MySQL的恢復問題。提供的截圖如下。

MySQL斷電恢復的一點簡單分析

對於這個問題，在一些斷電的場景下還是可能出現的。我首先是要確認是否為線上業務還是測試環境，線上業務來說這個影響還是很大的。如果資料庫無法啟動，首要任務還是把資料庫啟動，然後在這個基礎上檢視丟失的資料程度，安排資料修復的事宜。

當然從我的角度來說，怎麼去快速復現這個問題呢。我用自己寫的快速搭建測試主從環境的指令碼（，後期有一位大牛建議用Python來做，最近在考慮），分分鐘即可搞定。

我們建立一個表test,指定id,name兩個欄位。然後開啟顯式事務。

create table test(id int primary key,name varchar(30) not null);

顯式開啟一個事務：

begin;
insert into test values(1,'a');
insert into test values(2,'b');
insert into test values(3,'c');

不提交，我們直接檢視mysql的服務程式，直接Kill掉。預設情況下雙1指標是開啟的，我們直接模擬斷電重啟，看看後臺的處理情況：

2017-09-13 15:05:11 35556 [Note] InnoDB: Highest supported file format is Barracuda.
2017-09-13 15:05:11 35556 [Note] InnoDB: The log sequence numbers 1625987 and 1625987 in ibdata files do not match the log sequence number 1640654 in the ib_logfiles!
2017-09-13 15:05:11 35556 [Note] InnoDB: Database was not shutdown normally!
2017-09-13 15:05:11 35556 [Note] InnoDB: Starting crash recovery.
2017-09-13 15:05:11 35556 [Note] InnoDB: Reading tablespace information from the .ibd files...
2017-09-13 15:05:11 35556 [Note] InnoDB: Restoring possible half-written data pages
2017-09-13 15:05:11 35556 [Note] InnoDB: from the doublewrite buffer...
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 3 row operations to undo
InnoDB: Trx id counter is 2304
2017-09-13 15:05:11 35556 [Note] InnoDB: 128 rollback segment(s) are active.
InnoDB: Starting in background the rollback of uncommitted transactions
2017-09-13 15:05:11 7f5ccc3d1700 InnoDB: Rolling back trx with id 1806, 3 rows to undo
2017-09-13 15:05:11 35556 [Note] InnoDB: Rollback of trx with id 1806 completed
2017-09-13 15:05:11 7f5ccc3d1700 InnoDB: Rollback of non-prepared transactions completed
2017-09-13 15:05:11 35556 [Note] InnoDB: Waiting for purge to start
2017-09-13 15:05:11 35556 [Note] InnoDB: Percona XtraDB () 5.6.14-rel62.0 started; log sequence number 1640654
2017-09-13 15:05:11 35556 [Note] Recovering after a crash using binlog
2017-09-13 15:05:11 35556 [Note] Starting crash recovery...
2017-09-13 15:05:11 35556 [Note] Crash recovery finished.
2017-09-13 15:05:11 35556 [Note] RSA private key file not found: /U01/mysql_test/m1//private_key.pem. Some authentication plugins will not work.
2017-09-13 15:05:11 35556 [Note] RSA public key file not found: /U01/mysql_test/m1//public_key.pem. Some authentication plugins will not work.
2017-09-13 15:05:11 35556 [Note] Server hostname (bind-address): '*'; port: 21804

可以看到後臺檢測到了上次的異常當機，然後開啟崩潰恢復，InnoDB檢測到日誌LSN是1625987 而系統資料檔案ibd的LSN為1625987 ，和ib_logfiles裡面的LSN不匹配。後面就是一系列的恢復，前滾，恢復，回滾。最後表裡的資料為空，證明之前的事務都已經回滾了。

所以基於上面的情況，我們明白開啟了事務，基本情況下這個問題是不會出現的，什麼時候會丟擲開始的錯誤呢。

我們繼續測試，開啟一個顯式事務，不提交。

begin;
insert into test values(1,'a');
insert into test values(2,'b');
insert into test values(3,'c');

然後殺掉mysql的服務程式，找到mysql的資料目錄下，刪除redo檔案。完成後我們重啟資料庫。

這個時候就丟擲了和截圖類似的錯誤。

2017-09-13 16:05:14 36896 [Note] InnoDB: Highest supported file format is Barracuda.
2017-09-13 16:05:14 7f73450a97e0 InnoDB: Error: page 7 log sequence number 1627722
InnoDB: is in the future! Current system log sequence number 1626124.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: for more information.

這個問題目前的影響範圍其實還不明顯，因為儘管如此，我們還是能夠寫入資料的。

mysql> insert into test values(1,'a');
Query OK, 1 row affected (0.04 sec)

mysql> select *from test;
+----+------+
| id | name |
+----+------+
| 1 | a |
+----+------+
1 row in set (0.00 sec)
關於崩潰恢復，有一個資料引數尤其需要注意，那就是innodb_force_recovery，這個引數預設值為0，如果為非0的值（範圍為1-6），會有下面的影響範圍。

1 (SRV_FORCE_IGNORE_CORRUPT): 忽略檢查到的corrupt頁。

2 (SRV_FORCE_NO_BACKGROUND): 阻止主執行緒的執行，如主執行緒需要執行full purge操作，會導致crash。

3 (SRV_FORCE_NO_TRX_UNDO): 不執行事務回滾操作。

4 (SRV_FORCE_NO_IBUF_MERGE): 不執行插入緩衝的合併操作。

5 (SRV_FORCE_NO_UNDO_LOG_SCAN):不檢視重做日誌，InnoDB儲存引擎會將未提交的事務視為已提交。

6 (SRV_FORCE_NO_LOG_REDO): 不執行前滾的操作。

當然這個引數的設定修改是需要重啟MySQL服務的。

mysql> set global innodb_force_recovery=2;
ERROR 1238 (HY000): Variable 'innodb_force_recovery' is a read only variable

在此假設我們設定為2，再次復現這個問題問題，你就會發現，資料庫暫時是可以啟動的，但是資料只能查詢，DML操作都會拋錯。

mysql> select *from test;
Empty set (0.00 sec)
mysql>
mysql> insert into test values(1,'a');
ERROR 1030 (HY000): Got error -1 from storage engine
按照這個影響的範圍來評估force_recovery的值，我們就可以做相應的取捨了。如果MySQL服務無法正常啟動，就可以修改這個引數值來調整，先滿足服務可持續性的基本問題。然後評估後匯出重要的資料來。

MySQL斷電恢復的一點簡單分析

相關文章