故障分析 | MySQL鎖等待超時一例分析

愛可生雲資料庫發表於2022-11-23

作者:付祥

現居珠海,主要負責 Oracle、MySQL、mongoDB 和 Redis 維護工作。

本文來源:原創投稿

*愛可生開源社群出品,原創內容未經授權不得隨意使用,轉載請聯絡小編並註明來源。


1、問題現象

開發反饋某業務持續性報鎖等待超時,相關錯誤資訊如下:

Lock wait timeout exceeded; try restarting transaction

為了能精確定位問題,繼續詢問開發有沒有鎖等待超時相關SQL,開發又給了相關報錯SQL:

INSERT INTO <TABLE_NAME> VALUES(...)

2、分析診斷

根據錯誤資訊得知,單條insert語句鎖等待超時,如果都是單條insert插入,不應該頻繁報鎖超時,似乎有點不尋常,當前資料庫版本為5.6,鎖等待超時引數設定時長30秒:

root@ (none)> show variables like 'innodb_lock_wait_timeout';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 30 |
+--------------------------+-------+

檢視慢日誌及 show engine innodb status\G ,發現有批次插入動作,由於自增鎖競爭產生死鎖:

詢問開發,批次插入SQL為定時作業,檢視當前innodb_autoinc_lock_mode引數設定:

+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_autoinc_lock_mode | 1 |
+--------------------------+-------+

innodb_autoinc_lock_mode=1,對於批次插入語句,需要等到語句執行結束才釋放自增鎖,故要解決鎖等待超時,可以將引數值設定為2,但該引數為靜態引數需要重啟MySQL才能生效,不能重啟情況下只能最佳化SQL執行時間,檢視慢日誌得知SQL執行一次需要100+秒,掃描行數86w,結果集卻為0,說明SQL有最佳化空間:

Query_time: 108.527499 Lock_time: 0.000342 Rows_sent: 0 Rows_examined: 862584

分析SQL執行計劃:

SELECT *
from ( SELECT * from aa WHERE add_time >= '2022-10-01' ) a
left JOIN ( SELECT * from bb WHERE add_time >= '2022-10-01' ) b
on a.account = b.accountb and a.end_time = b.end_timeb and a.app_id = b.app_idb WHERE
b.accountb is null;
+----+-------------+----------------+-------+---------------+--------------+---------
+------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len |
ref | rows | Extra |
+----+-------------+----------------+-------+---------------+--------------+---------
+------+--------+----------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL |
NULL | 2722 | NULL |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL |
NULL | 595248 | Using where; Using join buffer (Block Nested Loop) |
| 3 | DERIVED | bb | ALL | NULL | NULL | NULL |
NULL | 595248 | Using where |
| 2 | DERIVED | aa | range | idx_add_time | idx_add_time | 6 |
NULL | 2722 | Using index condition |
+----+-------------+----------------+-------+---------------+--------------+---------
+------+--------+----------------------------------------------------+
4 rows in set (0.00 sec)

SQL有子查詢,使用到了派生表,首先執行子查詢

( SELECT * from aa WHERE add_time >= '2022-10-01' )

將結果集存入臨時表derived2 ,然後執行子查詢

( SELECT **** from bb WHERE add_time >= '2022-10-01' )

將結果集存入臨時表derived3,最後derived2和derived3根據關聯條件做表關聯,使用Block Nested Loop演算法,即使表chat_black(account , app_id , end_time)列有複合索引也使用不到。

如果mysql版本是5.7的話,optimizer_switch 引數會增加一個選項:derived_merge=on ,滿足一定條件,即子查詢中沒有如下條件:

  • Aggregate functions ( SUM()MIN()MAX() COUNT() , and so forth)
  • DISTINCT
  • GROUP BY
  • HAVING
  • LIMIT
  • UNION or UNION ALL
  • Subqueries in the select list
  • Assignments to user variables
  • Refererences only to literal values (in this case, there is no underlying table)

子查詢將被合併到外層查詢。

3、問題解決

知道SQL慢的原因後,對SQL進行改寫,執行計劃如下:

SELECT * FROM (
select * from aa where add_time >= '2022-10-01') a
left join bb b
on ( b.add_time >= '2022-10-01' and a.account = b.account and a.end_time =
b.end_time and a.app_id = b.app_id)
where b.account is null;
+----+-------------+----------------+-------+------------------------+----------------
--------+---------+-------------------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra |
+----+-------------+----------------+-------+------------------------+----------------
--------+---------+-------------------------------+------+-----------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL
| NULL | NULL | 3096 | NULL |
| 1 | PRIMARY | b | ref | idx_ac_app_id_end_time |
idx_ac_app_id_end_time | 1542 | a.account,a.app_id,a.end_time | 1 | Using where
|
| 2 | DERIVED | aa | range | idx_add_time | idx_add_time
| 6 | NULL | 3096 | Using index condition |
+----+-------------+----------------+-------+------------------------+----------------
--------+---------+-------------------------------+------+---------------------

執行時間從原來的100+秒降低不到1秒:

root@ xsj_chat_filter> SELECT count(*) FROM (
-> select * from aa where add_time >= '2022-10-01') a
-> left join bb b
-> on ( b.add_time >= '2022-10-01' and a.account = b.account and a.end_time
= b.end_time and a.app_id = b.app_id)
-> where b.account is null;
+----------+
| count(*) |
+----------+
| 23       |
+----------+
1 row in set (0.65 sec)

執行時間短了,自然就不存在自增鎖等待超時了。

相關文章