作者:付祥
現居珠海,主要負責 Oracle、MySQL、mongoDB 和 Redis 維護工作。
本文來源:原創投稿
*愛可生開源社群出品,原創內容未經授權不得隨意使用,轉載請聯絡小編並註明來源。
1、問題現象
開發反饋某業務持續性報鎖等待超時,相關錯誤資訊如下:
Lock wait timeout exceeded; try restarting transaction
為了能精確定位問題,繼續詢問開發有沒有鎖等待超時相關SQL,開發又給了相關報錯SQL:
INSERT INTO <TABLE_NAME> VALUES(...)
2、分析診斷
根據錯誤資訊得知,單條insert語句鎖等待超時,如果都是單條insert插入,不應該頻繁報鎖超時,似乎有點不尋常,當前資料庫版本為5.6,鎖等待超時引數設定時長30秒:
root@ (none)> show variables like 'innodb_lock_wait_timeout';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 30 |
+--------------------------+-------+
檢視慢日誌及 show engine innodb status\G ,發現有批次插入動作,由於自增鎖競爭產生死鎖:
詢問開發,批次插入SQL為定時作業,檢視當前innodb_autoinc_lock_mode引數設定:
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_autoinc_lock_mode | 1 |
+--------------------------+-------+
innodb_autoinc_lock_mode=1,對於批次插入語句,需要等到語句執行結束才釋放自增鎖,故要解決鎖等待超時,可以將引數值設定為2,但該引數為靜態引數需要重啟MySQL才能生效,不能重啟情況下只能最佳化SQL執行時間,檢視慢日誌得知SQL執行一次需要100+秒,掃描行數86w,結果集卻為0,說明SQL有最佳化空間:
Query_time: 108.527499 Lock_time: 0.000342 Rows_sent: 0 Rows_examined: 862584
分析SQL執行計劃:
SELECT *
from ( SELECT * from aa WHERE add_time >= '2022-10-01' ) a
left JOIN ( SELECT * from bb WHERE add_time >= '2022-10-01' ) b
on a.account = b.accountb and a.end_time = b.end_timeb and a.app_id = b.app_idb WHERE
b.accountb is null;
+----+-------------+----------------+-------+---------------+--------------+---------
+------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len |
ref | rows | Extra |
+----+-------------+----------------+-------+---------------+--------------+---------
+------+--------+----------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL |
NULL | 2722 | NULL |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL |
NULL | 595248 | Using where; Using join buffer (Block Nested Loop) |
| 3 | DERIVED | bb | ALL | NULL | NULL | NULL |
NULL | 595248 | Using where |
| 2 | DERIVED | aa | range | idx_add_time | idx_add_time | 6 |
NULL | 2722 | Using index condition |
+----+-------------+----------------+-------+---------------+--------------+---------
+------+--------+----------------------------------------------------+
4 rows in set (0.00 sec)
SQL有子查詢,使用到了派生表,首先執行子查詢
( SELECT * from aa WHERE add_time >= '2022-10-01' )
將結果集存入臨時表derived2 ,然後執行子查詢
( SELECT **** from bb WHERE add_time >= '2022-10-01' )
將結果集存入臨時表derived3,最後derived2和derived3根據關聯條件做表關聯,使用Block Nested Loop演算法,即使表chat_black(account
, app_id
, end_time
)列有複合索引也使用不到。
如果mysql版本是5.7的話,optimizer_switch 引數會增加一個選項:derived_merge=on ,滿足一定條件,即子查詢中沒有如下條件:
- Aggregate functions (
SUM()
,MIN()
,MAX()
,COUNT()
, and so forth) DISTINCT
GROUP BY
HAVING
LIMIT
UNION
orUNION ALL
- Subqueries in the select list
- Assignments to user variables
- Refererences only to literal values (in this case, there is no underlying table)
子查詢將被合併到外層查詢。
3、問題解決
知道SQL慢的原因後,對SQL進行改寫,執行計劃如下:
SELECT * FROM (
select * from aa where add_time >= '2022-10-01') a
left join bb b
on ( b.add_time >= '2022-10-01' and a.account = b.account and a.end_time =
b.end_time and a.app_id = b.app_id)
where b.account is null;
+----+-------------+----------------+-------+------------------------+----------------
--------+---------+-------------------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra |
+----+-------------+----------------+-------+------------------------+----------------
--------+---------+-------------------------------+------+-----------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL
| NULL | NULL | 3096 | NULL |
| 1 | PRIMARY | b | ref | idx_ac_app_id_end_time |
idx_ac_app_id_end_time | 1542 | a.account,a.app_id,a.end_time | 1 | Using where
|
| 2 | DERIVED | aa | range | idx_add_time | idx_add_time
| 6 | NULL | 3096 | Using index condition |
+----+-------------+----------------+-------+------------------------+----------------
--------+---------+-------------------------------+------+---------------------
執行時間從原來的100+秒降低不到1秒:
root@ xsj_chat_filter> SELECT count(*) FROM (
-> select * from aa where add_time >= '2022-10-01') a
-> left join bb b
-> on ( b.add_time >= '2022-10-01' and a.account = b.account and a.end_time
= b.end_time and a.app_id = b.app_id)
-> where b.account is null;
+----------+
| count(*) |
+----------+
| 23 |
+----------+
1 row in set (0.65 sec)
執行時間短了,自然就不存在自增鎖等待超時了。