水稻:最近有個朋友生產環境出現MySQL死鎖問題,一聽是死鎖,那必須去看看啊,於是饒(si)有(qu)興(huo)致(lai)的研究了好幾天
菜瓜:MySQL死鎖,趕緊分享一下
水稻:能否先讓我裝完X,我從朋友那裡拿到資料結構,復現,分析,查資料,總。。。
菜瓜:今天的菜真香
水稻:。。。好吧,進入正題(資料已加工處理)
- 一開始朋友拿出了死鎖日誌中記錄的兩條SQL語句(暫且把第一條SQL叫SQL1,第二條SQL2)
-
-- 兩句SQL結構一致,只是引數不一樣。
explain update `deadlock` set `status` = 1,`expired` = CURRENT_TIMESTAMP where `id` <= 35610745 and `from` = '1' and `to` = 'c' ; explain update `deadlock` set `status` = 1,`expired` = CURRENT_TIMESTAMP where `id` <= 35611183 and `from` = '1' and `to` = '2' ;
-
- 然後是表結構和資料 (這裡只錄入模擬資料,資料量並不是這次分析的關鍵原因)
-
CREATE TABLE `deadlock` ( `id` int(11) NOT NULL AUTO_INCREMENT, `from` varchar(64) CHARACTER SET utf8mb4 DEFAULT '', `to` varchar(64) CHARACTER SET utf8mb4 DEFAULT '', `status` int(11) DEFAULT NULL, `expired` datetime DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx_from` (`from`), KEY `idx_to` (`to`) ) ENGINE=InnoDB AUTO_INCREMENT=35611184 DEFAULT CHARSET=utf8; INSERT INTO `deadlock` (`id`, `from`, `to`, `status`, `expired`) VALUES (5681309, '8', 'b', NULL, NULL), (5681310, 'b', '4', NULL, NULL), (5681311, '8', 'b', NULL, NULL), (5681312, '4', 'f', NULL, NULL), (5681313, '4', 'f', NULL, NULL), (5681314, '4', 'f', NULL, NULL), (5681315, 'f', '4', NULL, NULL), (35610742, '1', 'c', 1, '2020-07-07 02:03:58'), (35610744, '1', 'c', 1, '2020-07-07 02:03:58'), (35610745, '1', 'c', 1, '2020-07-07 02:03:58'), (35611180, '1', '2', 1, '2020-07-07 02:03:58'), (35611182, '1', '2', 1, '2020-07-07 02:03:58'), (35611183, '1', '2', 1, '2020-07-07 02:03:58');
-
-
兩條SQL命中的記錄各三條。一看是死鎖,第一反應是發生記錄資源互斥等待。猜想會不會是這6行記錄在執行update的時候SQL1和SQL2修改的記錄發生了互斥
菜瓜:所以你最開始想的是更新時兩條SQL獲取記錄的順序反了,譬如說SQL1先拿35610742,再拿35610744前,SQL2先把35610744拿到了且它要拿35610742在阻塞
水稻:是的,但是很快發現這個思路很離譜,我們說鎖的時候應該是和索引聯絡起來,分析記錄就有點偏了。而且按記錄來看,兩個SQL命中的記錄都不一樣。於是回到索引上,就能說通他們都命中了from='1'這個條件,在idx_from上發生了互斥。
菜瓜:soga,真實情況呢
水稻:事情如果這麼簡單我就不會研究它好幾天了。分析完了之後開始準備復現死鎖。思路很簡單,因為SQL只有一句,要模擬出來就用多執行緒併發執行
- Java Code
-
@Resource private DeadlockMapper deadlockMapper; @Resource ThreadPoolTaskExecutor threadPoolTaskExecutor; @Test public void deadlock() { for (; ; ) { threadPoolTaskExecutor.execute(() -> { try { Long id = 35611183L; String from = "1"; String to = "2"; deadlockMapper.updateDeadlock(id, from, to); } catch (Exception e) { System.out.println("transaction tututu"); e.printStackTrace(); System.exit(0); } }); threadPoolTaskExecutor.execute(() -> { try { Long id = 35610745L; String from = "1"; String to = "c"; deadlockMapper.updateDeadlock(id, from, to); } catch (Exception e) { System.out.println("transaction kakaka"); e.printStackTrace(); System.exit(0); } }); System.out.println("once loop finish ..."); } }
-
- MyBatis Code
-
public interface DeadlockMapper extends Mapper<Deadlock> { int updateDeadlock(@Param("id") Long id, @Param("from") String from, @Param("to") String to); } <update id="updateDeadlock"> update `deadlock` set `status` = 1,`expired` = CURRENT_TIMESTAMP where `id` <![CDATA[<=]]> #{id} and `from` = #{from} and `to` = #{to} </update>
-
- 執行結果
-
once loop finish ... once loop finish ... transaction tututu transaction tututu org.springframework.dao.DeadlockLoserDataAccessException: ### Error updating database. Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction ### The error may involve defaultParameterMap ### The error occurred while setting parameters ### SQL: update `deadlock` set `status` = 1,`expired` = CURRENT_TIMESTAMP where `id` <= ? and `from` = ? and `to` = ? ### Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction 。。。
-
- 死鎖日誌 - show engine innodb status;
-
LATEST DETECTED DEADLOCK ------------------------ 2020-07-07 08:00:54 0x7f2c38b6a700 *** (1) TRANSACTION: TRANSACTION 61382, ACTIVE 0 sec starting index read mysql tables in use 3, locked 3 LOCK WAIT 4 lock struct(s), heap size 1136, 3 row lock(s) MySQL thread id 147, OS thread handle 139827907593984, query id 22005 101.68.68.234 root updating update `deadlock` set `status` = 1,`expired` = CURRENT_TIMESTAMP where `id` <= 35611183 and `from` = '1' and `to` = '2' *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 66 page no 4 n bits 80 index idx_from of table `qc`.`deadlock` trx id 61382 lock_mode X waiting Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 1; hex 31; asc 1;; 1: len 4; hex 821f6076; asc `v;; *** (2) TRANSACTION: TRANSACTION 61381, ACTIVE 0 sec fetching rows mysql tables in use 3, locked 3 5 lock struct(s), heap size 1136, 11 row lock(s) MySQL thread id 150, OS thread handle 139827906782976, query id 22004 101.68.68.234 root updating update `deadlock` set `status` = 1,`expired` = CURRENT_TIMESTAMP where `id` <= 35610745 and `from` = '1' and `to` = 'c' *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 66 page no 4 n bits 80 index idx_from of table `qc`.`deadlock` trx id 61381 lock_mode X Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 1; hex 31; asc 1;; 1: len 4; hex 821f6076; asc `v;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 1; hex 31; asc 1;; 1: len 4; hex 821f6078; asc `x;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 1; hex 31; asc 1;; 1: len 4; hex 821f6079; asc `y;; Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 1; hex 31; asc 1;; 1: len 4; hex 821f622c; asc b,;; *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 66 page no 3 n bits 80 index PRIMARY of table `qc`.`deadlock` trx id 61381 lock_mode X locks rec but not gap waiting Record lock, heap no 12 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 0: len 4; hex 821f622c; asc b,;; 1: len 6; hex 00000000e962; asc b;; 2: len 7; hex 27000001d901fd; asc ' ;; 3: len 1; hex 31; asc 1;; 4: len 1; hex 32; asc 2;; 5: len 4; hex 80000001; asc ;; 6: len 5; hex 99a6ce7d03; asc } ;; *** WE ROLL BACK TRANSACTION (1)
-
菜瓜:這不是很順利嗎
水稻:我會告訴你在我本地環境中一直沒有復現出來嗎?我會告訴你我找了一天才找到原因是因為MySQL配置問題(需要開啟bin log,開啟成功後執行 select version() 會看到 5.7.30-log)?我會告訴你這過程中我放棄了好幾次又決定再看看嗎?當然不會
菜瓜:好厲害哦!那你很棒棒哦!
水稻:。。。來,進入分析環節
- 兩個SQL語句的explain執行計劃
-
explain update `deadlock` set `status` = 1,`expired` = CURRENT_TIMESTAMP where `id` <= 35610745 and `from` = '1' and `to` = 'c' ; -- Using intersect(idx_from,idx_to); Using where explain update `deadlock` set `status` = 1,`expired` = CURRENT_TIMESTAMP where `id` <= 35611183 and `from` = '1' and `to` = '2' ; -- Using intersect(idx_to,idx_from); Using where
- 可以看到執行這兩個SQL的時候命中的索引竟然是NB的Merge Index(鬼知道這是啥),而且它兩順序還不一致(有興趣的朋友可以上搜尋一下這個玩意)
- Merge Index 索引合併技術是MySQL的一個優化,會將兩個獨立的索引掃描的結果進行取交集或者並集來加快資料的檢索。
- 這裡的關鍵點是它會導致MySQL在拿鎖的順序上不一致
-
菜瓜:你這個鋪墊太多了,我已經記不清楚前面那麼多東西了,請直接上結論吧
水稻:好的,前面的code是方便您自己下去復現的,對於動手黨有用。您可以直接忽略
- 這裡原因的解析只需要注意死鎖日誌和最後的explain執行計劃結果
- 事務Transaction1
-
事務Transaction2
-
結論就是:id為35610742記錄上的索引順序不一致出現了死鎖
-
SQL1(事務2)持有idx_from,在等待主鍵primary key。執行計劃中它命中索引 intersect(idx_from,idx_to) ,非聚簇索引除了拿到自己的索引之外還要獲取主鍵索引,所以它拿鎖的順序是先獲取idx_from,再獲取primary key,最後獲取idx_to,再primary key
- SQL2(事務1)持有primary key,在等待idx_from。同理它命中索引intersect(idx_to,idx_from),它拿鎖的順序是先獲取idx_to,再獲取primary key,最後獲取idx_from,再primary key
(日誌裡面看不出來SQL2持有主鍵索引,這裡是根據出現死鎖的互斥條件得出它持有主鍵索引)
-
菜瓜:這是何等的wocao!!!
水稻:解決這個問題只要把兩個單個索引改成聯合索引即可
總結:
- 單行update語句引發的死鎖問題
- 單行記錄拿鎖順序不一致出現死鎖
- 索引合併Merge Index 導致獲取鎖順序不一致
- 場景復現需要把log_bin日誌開啟,注意檢視select version()
- 多執行緒模擬併發場景
- 死鎖日誌分析