故障分析 | MySQL鎖等待超時一例分析

作者：付祥
現居珠海，主要負責 Oracle、MySQL、mongoDB 和 Redis 維護工作。
本文來源：原創投稿
*愛可生開源社群出品，原創內容未經授權不得隨意使用，轉載請聯絡小編並註明來源。

1、問題現象

開發反饋某業務持續性報鎖等待超時，相關錯誤資訊如下：

Lock wait timeout exceeded; try restarting transaction

為了能精確定位問題，繼續詢問開發有沒有鎖等待超時相關SQL，開發又給了相關報錯SQL：

INSERT INTO <TABLE_NAME> VALUES(...)

2、分析診斷

根據錯誤資訊得知，單條insert語句鎖等待超時，如果都是單條insert插入，不應該頻繁報鎖超時，似乎有點不尋常，當前資料庫版本為5.6，鎖等待超時引數設定時長30秒：

root@ (none)> show variables like 'innodb_lock_wait_timeout';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 30 |
+--------------------------+-------+

檢視慢日誌及 show engine innodb status\G ，發現有批次插入動作，由於自增鎖競爭產生死鎖：

詢問開發，批次插入SQL為定時作業，檢視當前innodb_autoinc_lock_mode引數設定：

+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_autoinc_lock_mode | 1 |
+--------------------------+-------+

innodb_autoinc_lock_mode=1，對於批次插入語句，需要等到語句執行結束才釋放自增鎖，故要解決鎖等待超時，可以將引數值設定為2，但該引數為靜態引數需要重啟MySQL才能生效，不能重啟情況下只能最佳化SQL執行時間，檢視慢日誌得知SQL執行一次需要100+秒，掃描行數86w，結果集卻為0，說明SQL有最佳化空間：

Query_time: 108.527499 Lock_time: 0.000342 Rows_sent: 0 Rows_examined: 862584

分析SQL執行計劃：

SELECT *
from ( SELECT * from aa WHERE add_time >= '2022-10-01' ) a
left JOIN ( SELECT * from bb WHERE add_time >= '2022-10-01' ) b
on a.account = b.accountb and a.end_time = b.end_timeb and a.app_id = b.app_idb WHERE
b.accountb is null;
+----+-------------+----------------+-------+---------------+--------------+---------
+------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len |
ref | rows | Extra |
+----+-------------+----------------+-------+---------------+--------------+---------
+------+--------+----------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL |
NULL | 2722 | NULL |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL |
NULL | 595248 | Using where; Using join buffer (Block Nested Loop) |
| 3 | DERIVED | bb | ALL | NULL | NULL | NULL |
NULL | 595248 | Using where |
| 2 | DERIVED | aa | range | idx_add_time | idx_add_time | 6 |
NULL | 2722 | Using index condition |
+----+-------------+----------------+-------+---------------+--------------+---------
+------+--------+----------------------------------------------------+
4 rows in set (0.00 sec)

SQL有子查詢，使用到了派生表，首先執行子查詢

( SELECT * from aa WHERE add_time >= '2022-10-01' )

將結果集存入臨時表derived2 ，然後執行子查詢

( SELECT **** from bb WHERE add_time >= '2022-10-01' )

將結果集存入臨時表derived3，最後derived2和derived3根據關聯條件做表關聯，使用Block Nested Loop演算法，即使表chat_black（account , app_id , end_time）列有複合索引也使用不到。

如果mysql版本是5.7的話，optimizer_switch 引數會增加一個選項：derived_merge=on ，滿足一定條件，即子查詢中沒有如下條件：

Aggregate functions ( SUM() , MIN() , MAX() , COUNT() , and so forth)
DISTINCT
GROUP BY
HAVING
LIMIT
UNION or UNION ALL
Subqueries in the select list
Assignments to user variables
Refererences only to literal values (in this case, there is no underlying table)

子查詢將被合併到外層查詢。

3、問題解決

知道SQL慢的原因後，對SQL進行改寫，執行計劃如下：

SELECT * FROM (
select * from aa where add_time >= '2022-10-01') a
left join bb b
on ( b.add_time >= '2022-10-01' and a.account = b.account and a.end_time =
b.end_time and a.app_id = b.app_id)
where b.account is null;
+----+-------------+----------------+-------+------------------------+----------------
--------+---------+-------------------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra |
+----+-------------+----------------+-------+------------------------+----------------
--------+---------+-------------------------------+------+-----------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL
| NULL | NULL | 3096 | NULL |
| 1 | PRIMARY | b | ref | idx_ac_app_id_end_time |
idx_ac_app_id_end_time | 1542 | a.account,a.app_id,a.end_time | 1 | Using where
|
| 2 | DERIVED | aa | range | idx_add_time | idx_add_time
| 6 | NULL | 3096 | Using index condition |
+----+-------------+----------------+-------+------------------------+----------------
--------+---------+-------------------------------+------+---------------------

執行時間從原來的100+秒降低不到1秒：

root@ xsj_chat_filter> SELECT count(*) FROM (
-> select * from aa where add_time >= '2022-10-01') a
-> left join bb b
-> on ( b.add_time >= '2022-10-01' and a.account = b.account and a.end_time
= b.end_time and a.app_id = b.app_id)
-> where b.account is null;
+----------+
| count(*) |
+----------+
| 23       |
+----------+
1 row in set (0.65 sec)

執行時間短了，自然就不存在自增鎖等待超時了。

故障分析 | MySQL鎖等待超時一例分析

1、問題現象

2、分析診斷

3、問題解決

相關文章