【MySQL】一條SQL使磁碟暴漲並導致MySQL Crash

神諭丶發表於2017-07-17
收到一個MySQL例項的磁碟告警,看到監控圖,磁碟使用率增長速度非常快,在兩個多小時的時間裡,已經漲了170多G。
檢查到binlog目錄並不大,而datadir已經180多G了,發現ibtmp1非常大,並且持續增長

趕緊上去看processlist,執行緒不多,檢查到有一個SELECT case ... when ... 的執行緒,狀態是sending data,並且已經持續幾千秒了。
有點懵,沒有第一時間kill掉這個執行緒,再一次show processlist的時候……發現大概已經掛了………………


  1. mysql> show processlist;
  2. ERROR 2006 (HY000): MySQL server has gone away
  3. No connection. Trying to reconnect...
  4. ERROR 2002 (HY000): Can't connect to local MySQL server through socket '$datadir/mysqld.sock' (2)
  5. ERROR: Can't connect to the server


那好,既然已經掛了,那麼開始找原因吧…

看一下datadir掛載的磁碟,已經釋放空了,果然已經被重啟了。
ext4    197G  4.6G  191G   3% $datadir


檢查一下mysqld程式,發現已經被mysqld_safe拉起來了。

檢查一下error log:
紅字是關於mysqld crash的資訊,很顯然,datadir掛載的磁碟滿了。
沒有多餘的空間寫binlog和ibtmp1臨時表空間檔案(5.7新增)。
而且ibtmp1檔案最後達到了201876045824bytes,將近190G,而掛載的磁碟總大小才不到200G。

藍字是關於sql thread的問題,在mysqld起來之後,sql thread也出現了問題。
這個之後再修復

  1. [ERROR] InnoDB: posix_fallocate(): Failed to preallocate data for file $datadir/ibtmp1, desired size 67108864 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/ operating-system-error-codes.html
  2. [ERROR] Disk is full writing '$datadir/mysql-bin.000004' (Errcode: 15870576 - No space left on device). Waiting for someone to free space...
  3. [ERROR] Retry in 60 secs. Message reprinted in 600 secs
  4. [Warning] InnoDB: 1048576 bytes should have been written. Only 647168 bytes written. Retrying for the remaining bytes.
  5. [Warning] InnoDB: Retry attempts for writing partial data failed.
  6. [ERROR] InnoDB: Write to file $datadir/ibtmp1 failed at offset 201911697408, 1048576 bytes should have been written, only 647168 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
  7. [ERROR] InnoDB: Error number 28 means 'No space left on device'
  8. [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
  9. [Warning] InnoDB: Error while writing 67108864 zeroes to $datadir/ibtmp1 starting at offset 201876045824
  10. [ERROR] $basedir/bin/mysqld: The table '$tmpdir/#sql_37c5_0' is full
  11. [ERROR] InnoDB: posix_fallocate(): Failed to preallocate data for file ./thread_quartz/QRTZ_FIRED_TRIGGERS.ibd, desired size 32768 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/ operating-system-error-codes.html
  12. [Warning] InnoDB: Retry attempts for writing partial data failed.
  13. [Warning] InnoDB: Error while writing 32768 zeroes to ./thread_quartz/QRTZ_FIRED_TRIGGERS.ibd starting at offset 442362017-07-06T11:49:21.893377Z mysqld_safe Number of processes running now: 0
  14. mysqld_safe mysqld restarted
  15. ……………………………………………………………………………………………………………………
  16. [Note] InnoDB: Last MySQL binlog file position 0 690908428, file name mysql-bin.000004
  17. [Note] InnoDB: Starting in background the rollback of uncommitted transactions
  18. [Note] InnoDB: Rollback of non-prepared transactions completed
  19. [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
  20. [Note] InnoDB: Creating shared tablespace for temporary tables
  21. [Note] InnoDB: Setting file '$datadir/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
  22. [Note] InnoDB: File '$datadir/ibtmp1' size is now 12 MB.
  23. [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
  24. [Note] InnoDB: 32 non-redo rollback segment(s) are active.
  25. [Note] InnoDB: Waiting for purge to start
  26. [Note] InnoDB: 5.7.12 started; log sequence number 4828513952
  27. [Note] InnoDB: page_cleaner: 1000ms intended loop took 7748ms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.)
  28. [Note] InnoDB: Loading buffer pool(s) from $datadir/ib_buffer_pool
  29. [Note] Plugin 'FEDERATED' is disabled.
  30. [Note] InnoDB: Buffer pool(s) load completed at 170706 19:49:30
  31. [Note] Recovering after a crash using $basedir/mysql-bin
  32. [ERROR] Error in Log_event::read_log_event(): 'read error', data_len: 579, event_type: 2
  33. [Note] Starting crash recovery...
  34. [Note] InnoDB: Starting recovery for XA transactions...
  35. [Note] InnoDB: Transaction 6729603 in prepared state after recovery
  36. [Note] InnoDB: Transaction contains changes to 1 rows
  37. [Note] InnoDB: 1 transactions in prepared state after recovery
  38. [Note] Found 1 prepared transaction(s) in InnoDB
  39. [Note] Crash recovery finished.
  40. [Note] Crashed binlog file $basedir/mysql-bin.000004 size is 690909184, but recovered up to 690908428. Binlog trimmed to 690908428 bytes.
  41. [Warning] Failed to set up SSL because of the following SSL library error: SSL context is not usable without certificate and private key
  42. ……………………………………………………………………………………………………………………
  43. [ERROR] Error in Log_event::read_log_event(): 'read error', data_len: 835, event_type: 2
  44. [Warning] Error reading GTIDs from relaylog: -1
  45. [Note] Slave I/O thread: Start asynchronous replication to master '*****' in log 'mysql-bin.000014' at position 286486095
  46. [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
  47. [Note] Slave I/O thread for channel '': connected to master '*****',replication started in log 'mysql-bin.000014' at position 286486095
  48. [Warning] Slave SQL for channel '': If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0
  49. [Note] Slave SQL thread for channel '' initialized, starting replication in log 'mysql-bin.000014' at position 286485153, relay log '$datadir/mysql-relay.000013' position: 286485326
  50. [ERROR] Error in Log_event::read_log_event(): 'read error', data_len: 835, event_type: 2
  51. [ERROR] Error reading relay log event for channel '': slave SQL thread aborted because of I/O error
  52. [ERROR] Slave SQL for channel '': Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 1594
  53. [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000014' position 286485153
  54. [Note] Event Scheduler: Loaded 0 events
  55. [Note] $basedir/mysqld: ready for connections.
  56. Version: '5.7.12' socket: '$datadir/mysqld.sock' port: 3306 Source distribution


因為已經掛了,而且沒有來得及將當時執行的sql完整記錄下來,只能看到show processlist前面的一小部分。

聯絡到業務,業務說這一臺是slave,無讀的業務,但偶爾會手動執行報表相關的SELECT操作。

那麼檢查一下crash前的兩個小時的slow log:
發現反覆有這麼幾條SQL:
  1. ……………………
  2. 00:00:11.392881 select * from tb ……………………
  3. 00:00:04.779748 select (case when ……………………
  4. 00:00:04.779748 select ( case when t2.col1 ……………………
  5. 00:00:03.328248 select ( case when t2.col1 ……………………
  6. 00:00:04.276773 select count(t1.id) from tb1 
  7. 00:00:05.039027 select (case when t2.col ……………………
  8. 00:00:10.263063 select (case when t2.col ……………………
  9. 00:00:03.131713 select t2.* from tb1 t1 ……………………
  10. 00:00:15.909456 select t2.* from tb1 t1 ……………………
  11. 00:00:14.367047 select * from tb ……………………
  12. ……………………

雖然
沒有一條超過20秒的,但不要忘了,在磁碟被打爆之前的一秒,有一條執行了幾千秒的SQL。

並且case when只有一個order by。
出於取出其中一條,檢查一下
執行計劃:
  1. +--------+---------------+---------+---------+---------------------+--------+----------+----------------------------------------------+
  2. | type   | possible_keys | key     | key_len | ref                 | rows   | filtered | Extra                                        |
  3. +--------+---------------+---------+---------+---------------------+--------+----------+----------------------------------------------+
  4. | ALL    | NULL          | NULL    | NULL    | NULL                | 472765 | 20.99    | Using where; Using temporary; Using filesort |
  5. | eq_ref | PRIMARY       | PRIMARY | 8       | $tb1.col1           | 1      | 10.00    | Using where                                  |
  6. +--------+---------------+---------+---------+---------------------+--------+----------+----------------------------------------------+

看起來
不是很好,但是也不至於用到190G臨時表。
雖然臨時表是長期積累下來的,但看監控,磁碟使用率確實是在這一個多小時迅速積攢起來的。

首先這個從庫是沒有業務的,而是通過客戶那邊可能手動連線過來做一些查詢服務的。
知道這個資訊之後,繼續對照slow log,發現slow log關於case when的這一條sql並不完全一樣……
這可能意味著客戶仍然在不停地除錯這條SQL…
而那條執行了幾千秒的case when可能寫得十分爛。

可惜的是,在mysqld crash的最後一瞬間,該sql仍然沒有執行完成,導致沒有被記錄到slow log。
那條sql估計也永遠無法再復現。

那麼究竟是多爛的SQL,或者多大的表,可以用到這麼多基於磁碟的臨時表或檔案排序呢??
趕緊檢查一下庫表的情況:

    表1:100w,表2:40w,也沒有blob,text等欄位。
    ② 重啟後的datadir只有3、4G的資料量

所以我有個大但的想法:
可能客戶在除錯的過程中,產生了一條類似這樣的SQL:
我對照了已經記錄在slow log的幾條sql,脫敏,格式化之後還原如下:
  1. SELECT ( CASE
  2.     WHEN t2.col='$info_000' THEN '$a'
  3.     WHEN t2.col='$info_001' THEN '$b'
  4.     WHEN t2.col='$info_002' THEN '$c'
  5.     WHEN t2.col='$info_003' THEN '$d'
  6.     WHEN t2.col='$info_004' THEN '$e'
  7.     WHEN t2.col='$info_005' THEN '$f'
  8.     WHEN t2.col='$info_006' THEN '$g'
  9.     ELSE t2.col
  10.     END ) AS 來源XX,
  11.     ( CASE
  12.     WHEN t2.clo2='$info_000' THEN '$a'
  13.     WHEN t2.clo2='$info_001' THEN '$b'
  14.     WHEN t2.clo2='$info_002' THEN '$c'
  15.     WHEN t2.clo2='$info_003' THEN '$d'
  16.     WHEN t2.clo2='$info_004' THEN '$e'
  17.     WHEN t2.clo2='$info_005' THEN '$f'
  18.     WHEN t2.clo2='$info_006' THEN '$g'
  19.     ELSE t2.col 
  20.     END ) AS 目標XX,
  21.     t2.col4 AS XX時間,
  22.     t2.col5 AS 金額
  23. FROM $tb1 t1 JOIN $tb2 t2
  24. WHERE t1.col3 = 4 AND (t2.col LIKE '$info%' OR t2.clo2 LIKE '$info%')
  25. ORDER BY t1.col4 DESC;

對,這兩張表做了一個JOIN,但是沒有條件。
產生了一個很大很大很大的笛卡爾積

那麼算一下:
列:13 + 19 = 32 
行: 1088181*440650 = 479,506,957,650 rows

再測試一下:
將兩張表dump成sql檔案,看了一下大小:330MB,並不大。
匯入自己的實驗環境,再將上述產生笛卡爾積的SQL跑一下。

經過一段時間等待:


已經執行了半個小時多了。並且該SQL仍然在執行:
說出來你們可能不信,此時的ibtmp1大小:
  1. # du -sh *
  2. 4.0K auto.cnf
  3. 4.0K ib_buffer_pool
  4. 2.9G ibdata1
  5. 48M ib_logfile0
  6. 48M ib_logfile1
  7. 54G ibtmp1
  8. 12M mysql
  9. 1.1M performance_schema
  10. 676K sys
  11. 633M test
  12. 47M test_2

大概也好解釋,為什麼兩張並不是很大的表,在執行某些查詢之後會產生這麼大的臨時表了…

當然,在做實驗的等待時間裡,已經將複製的ERROR給修復了。
當時複製的報錯:

  1. ……
  2. Slave_IO_Running: Yes
  3. Slave_SQL_Running: No
  4. ……
  5. Seconds_Behind_Master: NULL
  6. Master_SSL_Verify_Server_Cert: No
  7. Last_IO_Errno: 0
  8. Last_IO_Error:
  9. Last_SQL_Errno: 1594
  10. Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
結合之前貼的error log可以看到更多資訊,推斷是relay log有損壞。


那麼就好解決了:
  1. STOP SLAVE;

  2. -- master_log_file和master_log_pos為報錯時所記錄的位置。
  3. CHANGE MASTER master_log_file='xxx', master_log_pos=xxx;

  4. START SLAVE;
此處不再贅述。


到目前為止,我實驗環境的datadir已經滿了,但是和此處正式環境有區別的是。
在磁碟寫滿之時,並沒有馬上導致mysql crash,而是輸出了:
  1. [ERROR] InnoDB: Write to file ./ibtmp1failed at offset 83579895808, 1048576 bytes should have been written, only 958464 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
  2. [ERROR] InnoDB: Error number 28 means 'No space left on device'
  3. [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
  4. [Warning] InnoDB: Error while writing 67108864 zeroes to ./ibtmp1 starting at offset 83563118592
  5. [ERROR] /data/mysql-base/mysql57/bin/mysqld: The table '#sql_b26_0' is full
  6. [ERROR] Disk is full writing '/data/mysql-data/mysql57-3357/binlog/mysql-bin.000015' (Errcode: 15868064 - No space left on device). Waiting for someone to free space...
  7. [ERROR] Retry in 60 secs. Message reprinted in 600 secs
實際上正常情況下,mysql在磁碟滿了的情況下,會每隔1分鐘做一次檢查,每隔10分鐘將檢查資訊輸出到錯誤日誌。

既然mysqld沒有被幹掉,我再次執行了一下那條可怕的sql呢?……

當然……error log如願輸出被kill的訊息……
並且mysqld程式被殺掉,此時:


  1. [ERROR] /data/mysql-base/mysql57/bin/mysqld: Binary logging not possible. Message: An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server.
  2. 14:42:48 UTC - mysqld got signal 6 ;
  3. This could be because you hit a bug. It is also possible that this binary
  4. or one of the libraries it was linked against is corrupt, improperly built,
  5. or misconfigured. This error can also be caused by malfunctioning hardware.
  6. Attempting to collect some information that could help diagnose the problem.
  7. As this is a crash and something is definitely wrong, the informati
有趣的是,“information”這個單詞都沒有輸出完,大概是真的一個字元都寫不下了吧。


試圖用vim在error log裡新增幾個字元並儲存,也報錯。


那麼總結一下,供參考:
線上上環境執行的sql,建議進行稽核,報表等OLAP需求也需要正式一些。
考慮是否可以限制ibtmp1的大小,也就是設定innodb_temp_data_file_path的最大值。
可能需要定期重啟mysqld來收縮臨時表空間。
做好監控和及時響應,這次就是響應時間過長,達到閾值,然後到機子上檢查時,已經晚了,眼睜睜地看到mysqld在我眼前crash。







來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29773961/viewspace-2142197/,如需轉載,請註明出處,否則將追究法律責任。

相關文章