Innodb:RR隔離級別下insert...select 對select表加鎖模型和死鎖案列

gaopengtttt發表於2017-10-20

最近有網友遇到了在RR隔離級別下insert A select B where B.COL=** 由於select表也就是B表引發的死鎖的問題。分析死鎖日誌後,筆者進行模擬重現了這位網友遇到了2種場景並且在本文中進行詳細的描述。

  • 本文使用版本percona 5.7.14修改版,能夠列印出事務所有的行鎖資訊結構鏈(不包含隱含鎖)
  • 本文中的測試是在RR隔離級別下完成的,RC不存在這樣的問題
  • 筆者對原始碼的理解有限,如有錯誤請指正
  • 本文使用了自制工具innblock和bcview,前者用於掃描塊結構後者用於更加方便的檢視二進位制檔案資訊獲取地址
  1. innblock http://pan.baidu.com/s/1qYnyVWo
  2. bcview http://pan.baidu.com/s/1num76RJ

感謝葉金榮老師對本文的稽核,筆者也曾是一名知數堂的學生

一、基本概念

在開始正文之前我打算介紹一下一些基本概念,特別是鎖模型和相容矩陣會對本文的閱讀有相當大的幫助。

1、 innodb lock模型
  • [LOCK_ORDINARY[next_key_lock]:]
    原始碼定義:
#define LOCK_ORDINARY   0   /*!< this flag denotes an ordinary
next-key lock in contrast to LOCK_GAP
or LOCK_REC_NOT_GAP */ 

預設是LOCK_ORDINARY即普通的next_key_lock,鎖住行及以前的間隙。

  • [LOCK_GAP:]
    原始碼定義:
#define LOCK_GAP    512 /*!< when this bit is set, it means that the
lock holds only on the gap before the record;
for instance, an x-lock on the gap does not
give permission to modify the record on which
the bit is set; locks of this type are created
when records are removed from the index chain 

間隙鎖,鎖住行以前的間隙,不鎖住本行。

  • [LOCK_REC_NOT_GAP:]
    原始碼定義:
#define LOCK_REC_NOT_GAP 1024   /*!< this bit means that the lock is only on
the index record and does NOT block inserts
to the gap before the index record; this is
used in the case when we retrieve a record
with a unique key, and is also used in
locking plain SELECTs (not part of UPDATE
or DELETE) when the user has set the READ
COMMITTED isolation level */ 

行鎖,鎖住行而不鎖住任何間隙。

  • [LOCK_INSERT_INTENTION:]
    原始碼定義:
#define LOCK_INSERT_INTENTION 2048 /*!< this bit is set when we place a waiting
gap type record lock request in order to let
an insert of an index record to wait until
there are no conflicting locks by other
transactions on the gap; note that this flag
remains set when the waiting lock is granted,
or if the lock is inherited record */ 

插入意向鎖,如果插入的記錄在某個已經鎖定的間隙內為這個鎖。

2、 innodb lock相容矩陣
/* LOCK COMPATIBILITY MATRIX
*    IS IX S  X  AI
* IS +  +  +  -  +
* IX +  +  -  -  +
* S  +  -  +  -  -
* X  -  -  -  -  -
* AI +  +  -  -  - 
3、infimum和supremum

一個page中包含這兩個偽記錄。頁中所有的行未刪除(或刪除未purge)的行邏輯上都連線到這兩個虛列之間,表現為一個邏輯連結串列資料結構,其中supremum偽記錄的鎖始終為next_key_lock。

4、heap no

heap no儲存在fixed_extrasize 中。heap no 為物理儲存填充的序號,頁的空閒空間掛載在page free連結串列中(頭插法)可以重用,但是重用此heap no不變,如果一直是insert 則heap no 不斷增加,並不是按照ROWID(主鍵)排序的邏輯連結串列順序,而是物理填充順序。

5、n bits

和這個page相關的鎖點陣圖的大小,每一行記錄都有1 bit的點陣圖資訊與其對應,用來表示是否加鎖,並且始終預留64bit。例如我的表有9條資料,同時包含infimum和supremum虛擬記錄即 64+9+2 bits,即75bits但是必須被8整除向上取整為一個位元組,結果也就是就是80 bits。注意不管是否加鎖每行都會對應一bit的點陣圖。

6、lock struct

這是LOCK的記憶體結構體原始碼中用lock_t表示其可以包含

lock_table_t    tab_lock;/*!< table lock */
lock_rec_t  rec_lock;/*!< record lock */ 

一般來說innodb上鎖都會對錶級加上IX,這佔用一個結構體。然後分別對二級索引和主鍵進行加鎖,每一個BLOCK會佔用這樣一個結構體。

7、row lock

這個資訊描述了當前事務加鎖的行數,他是所有lock struct結構體中排除table lock以外所有加鎖記錄的總和,並且包含了infimum和supremum偽列。

8、逐步加鎖

如果細心的朋友應該會發現在show engine 中事務資訊中的row lock在對大量行進行加鎖的時候會不斷的增加,因為加行鎖最終會呼叫lock_rec_lock逐行加鎖,這也會增加了大資料量加鎖的觸發死鎖的可能性。

二、Innodb層對insert...select 中select表的加鎖模式

RR隔離級別下insert A select B where B.COL=**,innodb層會對B表滿足條件的資料進行加鎖,但是RC模式下B表記錄不會加任何innodb層的鎖,表現如下:

  1. 如果B.COL有二級(非唯一),並且執行計劃使用到了(非using index)
  • B表二級索引對選中記錄加上LOCK_S|LOCK_ORDINARY[next_key_lock],並且對下一條記錄加上LOCK_S|LOCK_GAP
  • B表PRIMARY加上LOCK_S|LOCK_REC_NOT_GAP
  1. 如果B.COL有二級(唯一),並且執行計劃使用到了(非using index)
  • B表二級索引對選中記錄加上LOCK_S|LOCK_REC_NOT_GAP
  • B表PRIMARY加上LOCK_S|LOCK_REC_NOT_GAP
  1. 如果B.COL沒有二級索引
  • 對整個B表上的所有記錄加上LOCK_S|LOCK_ORDINARY[next_key_lock]

三、Innodb層對insert...select中select表的加鎖測試

下面我們分別對其進行測試和列印輸出:

1. 如果B.COL有二級(唯一),並且執行計劃使用到了(非using index)

使用語句:

drop table  t1;
drop table  t2;
create table t1(id int primary key,n1 varchar(20),n2 varchar(20),key(n1));
create table t2 like t1;
insert into t1 values(1,'gao1','gao'),(2,'gao1','gao'),(3,'gao1','gao'),(4,'gao2','gao'),(5,'gao2','gao'),(6,'gao2','gao'),(7,'gao3','gao'),(8,'gao4','gao'); 

檢視執行計劃:

mysql> desc insert into t2 select * from t1 force index(n1) where n1='gao2';
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
|  1 | INSERT      | t2    | NULL       | ALL  | NULL          | NULL | NULL    | NULL  | NULL |     NULL | NULL  |
|  1 | SIMPLE      | t1    | NULL       | ref  | n1            | n1   | 23      | const |    3 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+ 

執行語句:

begin;insert into t2 select * from t1 force index(n1) where n1='gao2'; 
觀察結果:
  • 二級索引對記錄加上LOCK_S|LOCK_ORDINARY[next_key_lock]
-----TRX NO:28470 LOCK STRUCT(1)(Add by gaopeng)
RECORD LOCKS space id 86 page no 4 n bits 80 index n1 of table `test`.`t1` trx id 28470 lock mode S(LOCK_S) locks gap and rec(LOCK_ORDINARY[next_key_lock])
Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 67616f32; asc gao2;;
1: len 4; hex 80000004; asc     ;;
Record lock, heap no 6 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 67616f32; asc gao2;;
1: len 4; hex 80000005; asc     ;;
Record lock, heap no 7 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 67616f32; asc gao2;;
1: len 4; hex 80000006; asc     ;; 
  • PRIMARY加上LOCK_S|LOCK_REC_NOT_GAP
-----TRX NO:28470 LOCK STRUCT(1)(Add by gaopeng)
RECORD LOCKS space id 86 page no 3 n bits 80 index PRIMARY of table `test`.`t1` trx id 28470 lock mode S(LOCK_S) locks rec but not gap(LOCK_REC_NOT_GAP)
Record lock, heap no 5 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000004; asc     ;;
1: len 6; hex 000000006f20; asc     o ;;
2: len 7; hex bc000001300134; asc     0 4;;
3: len 4; hex 67616f32; asc gao2;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 6 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000005; asc     ;;
1: len 6; hex 000000006f20; asc     o ;;
2: len 7; hex bc000001300140; asc     0 @;;
3: len 4; hex 67616f32; asc gao2;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 7 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000006; asc     ;;
1: len 6; hex 000000006f20; asc     o ;;
2: len 7; hex bc00000130014c; asc     0 L;;
3: len 4; hex 67616f32; asc gao2;;
4: len 3; hex 67616f; asc gao;; 
  • 對二級索引下一條記錄加上LOCK_S|LOCK_GAP
-----TRX NO:28470 LOCK STRUCT(1)(Add by gaopeng)
RECORD LOCKS space id 86 page no 4 n bits 80 index n1 of table `test`.`t1` trx id 28470 lock mode S(LOCK_S) locks gap before rec(LOCK_GAP)
Record lock, heap no 8 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 67616f33; asc gao3;;
1: len 4; hex 80000007; asc     ;; 
  • 如圖紅色部分都是需要鎖定的記錄
1.png
1.png
2. 如果B.COL有二級(唯一),並且執行計劃使用到了(非using index)

使用語句:

drop table  t1;
drop table  t2;
create table t1(id int primary key,n1 varchar(20),n2 varchar(20),unique key(n1));
create table t2 like t1;
insert into t1 values(1,'gao1','gao'),(2,'gao2','gao'),(3,'gao3','gao'),(4,'gao4','gao'),(5,'gao5','gao'),(6,'gao6','gao'),(7,'gao7','gao'),(8,'gao8','gao'); 

檢視執行計劃:

mysql> desc insert into t2 select * from t1 force index(n1) where n1 in ('gao2','gao3','gao4');
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
|  1 | INSERT      | t2    | NULL       | ALL   | NULL          | NULL | NULL    | NULL | NULL |     NULL | NULL                  |
|  1 | SIMPLE      | t1    | NULL       | range | n1            | n1   | 23      | NULL |    3 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+ 

執行語句:

begin;insert into t2 select * from t1 force index(n1) where n1 in ('gao2','gao3','gao4'); 
觀察輸出:
  • B表二級索引對選中記錄加上LOCK_S|LOCK_REC_NOT_GAP
-----TRX NO:30514 LOCK STRUCT(1)(Add by gaopeng)
RECORD LOCKS space id 94 page no 4 n bits 80 index n1 of table `test`.`t1` trx id 30514 lock mode S(LOCK_S) locks rec but not gap(LOCK_REC_NOT_GAP)
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 67616f32; asc gao2;;
1: len 4; hex 80000002; asc     ;;
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 67616f33; asc gao3;;
1: len 4; hex 80000003; asc     ;;
Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 67616f34; asc gao4;;
1: len 4; hex 80000004; asc     ;; 
  • B表PRIMARY加上LOCK_S|LOCK_REC_NOT_GAP
-----TRX NO:30514 LOCK STRUCT(1)(Add by gaopeng)
RECORD LOCKS space id 94 page no 3 n bits 80 index PRIMARY of table `test`.`t1` trx id 30514 lock mode S(LOCK_S) locks rec but not gap(LOCK_REC_NOT_GAP)
Record lock, heap no 3 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000002; asc     ;;
1: len 6; hex 000000007728; asc     w(;;
2: len 7; hex a200000115011c; asc        ;;
3: len 4; hex 67616f32; asc gao2;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 4 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000003; asc     ;;
1: len 6; hex 000000007728; asc     w(;;
2: len 7; hex a2000001150128; asc       (;;
3: len 4; hex 67616f33; asc gao3;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 5 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000004; asc     ;;
1: len 6; hex 000000007728; asc     w(;;
2: len 7; hex a2000001150134; asc       4;;
3: len 4; hex 67616f34; asc gao4;;
4: len 3; hex 67616f; asc gao;; 
  • 如圖紅色部分都是需要鎖定的記錄
2.png
2.png
3.如果B.COL沒有二級索引

使用語句:

drop table  t1;
drop table  t2;
create table t1(id int primary key,n1 varchar(20),n2 varchar(20));
create table t2 like t1;
insert into t1 values(1,'gao1','gao'),(2,'gao2','gao'),(3,'gao3','gao'),(4,'gao4','gao'),(5,'gao5','gao'),(6,'gao6','gao'),(7,'gao7','gao'),(8,'gao8','gao'); 

檢視執行計劃:

mysql> desc insert into t2 select * from t1  where n1 in ('gao2','gao3','gao4');
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | INSERT      | t2    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | NULL |     NULL | NULL        |
|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    8 |    37.50 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ 

執行語句:

begin;insert into t2 select * from t1  where n1 in ('gao2','gao3','gao4'); 
觀察輸出:
-----TRX NO:30535 LOCK STRUCT(1)(Add by gaopeng)
RECORD LOCKS space id 94 page no 3 n bits 80 index PRIMARY of table `test`.`t1` trx id 30535 lock mode S(LOCK_S) locks gap and rec(LOCK_ORDINARY[next_key_lock])
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000001; asc     ;;
1: len 6; hex 000000007728; asc     w(;;
2: len 7; hex a2000001150110; asc        ;;
3: len 4; hex 67616f31; asc gao1;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 3 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000002; asc     ;;
1: len 6; hex 000000007728; asc     w(;;
2: len 7; hex a200000115011c; asc        ;;
3: len 4; hex 67616f32; asc gao2;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 4 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000003; asc     ;;
1: len 6; hex 000000007728; asc     w(;;
2: len 7; hex a2000001150128; asc       (;;
3: len 4; hex 67616f33; asc gao3;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 5 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000004; asc     ;;
1: len 6; hex 000000007728; asc     w(;;
2: len 7; hex a2000001150134; asc       4;;
3: len 4; hex 67616f34; asc gao4;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 6 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000005; asc     ;;
1: len 6; hex 000000007728; asc     w(;;
2: len 7; hex a2000001150140; asc       @;;
3: len 4; hex 67616f35; asc gao5;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 7 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000006; asc     ;;
1: len 6; hex 000000007728; asc     w(;;
2: len 7; hex a200000115014c; asc       L;;
3: len 4; hex 67616f36; asc gao6;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 8 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000007; asc     ;;
1: len 6; hex 000000007728; asc     w(;;
2: len 7; hex a2000001150158; asc       X;;
3: len 4; hex 67616f37; asc gao7;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 9 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000008; asc     ;;
1: len 6; hex 000000007728; asc     w(;;
2: len 7; hex a2000001150164; asc       d;;
3: len 4; hex 67616f38; asc gao8;;
4: len 3; hex 67616f; asc gao;; 
  • 如圖紅色部分都是需要鎖定的記錄
3.png
3.png

四、insert...select由於select查詢表引起的死鎖

有了上面的理論,我們知道在RR隔離級別下insert...select會對select符合條件的資料加上LOCK_S鎖,我曾經總結過出現死鎖的條件:

  1. 至少2個獨立的執行緒(會話)
  2. 單位操作中包含多個相對獨立的加鎖步驟,有一定的時間差
  3. 多個執行緒(會話)之間加鎖物件必須有相互等待的情況發生,並且等待出現環狀。

由於存在對select符合條件的資料加上LOCK_S鎖的情況,RR模式下insert...select出現死鎖的概率無疑更加高,我通過測試模擬出這種情況,嚴格意義上是相同的語句在高併發情況下表現為兩種死鎖情況。

測試指令碼:

create table b(id int primary key,name1 varchar(20),name2 varchar(20));
alter table b add key(name1);
DELIMITER //  
CREATE PROCEDURE test_i()
begin 
 declare num int;
 set num = 1; 
while num <= 3000 do
 insert into b values(num,concat('gao',num),'gaopeng');
 set num=num+1;
end while;
end// 
call test_i()//
create table a like b//
DELIMITER ; 

語句都是一樣的:

TX1 TX2
begin; -
update b set name2='test' where id=2999; -
- insert into a select * from b where id in (996,997,998,999,2995,2996,2997,2998,2999);
update b set name2='test' where id=999; -

但是在高併發下相同的語句卻表現出不同的死鎖情況,現在進行分析:

情況1:
  • TX1:執行update將表b主鍵id=2999的記錄加上LOCK_X
  • TX2:執行insert...select語句b表上的記錄(996,997,998,999,2995,2996,2997,2998,2999)會申請加上LOCK_S,
    但是id=2999已經加上LOCK_X,顯然不能獲得只能等待.
  • TX1:執行update需要獲得表b主鍵id=999的LOCK_X顯然這個記錄已經被TX2加鎖LOCK_S,只能等待,觸發死鎖檢測

如下圖紅色記錄為不能獲得鎖的記錄:

情況1.jpg
情況1.jpg
情況2:

這種情況比較極端只能在高併發上出現

  • TX1:執行update將表b主鍵id=2999的記錄加上LOCK_X
  • TX2:執行insert...select語句b表上的記錄(996,997,998,999,2995,2996,2997,2998,2999)會申請加上LOCK_S,因為上鎖是有一個逐步加鎖的過程,假設此時加鎖到2997前那麼TX2並不會等待
  • TX1:執行update需要獲得表b主鍵id=999的LOCK_X顯然這個記錄已經被TX2加鎖LOCK_S,只能等待
  • TX2:繼續加鎖LOCK_S 2997、2998、2999 發現2999已經被TX1加鎖LOCK_X,只能等待,觸發死鎖檢測

如下圖紅色記錄為不能獲得鎖的記錄:

情況2.jpg
情況2.jpg

五、原始碼修改和引數增加

情況2的測試需要在高併發下才會出現,因為insert...select語句是一條語句很難人為控制,也就是很讓他在特定條件下停止。但是為了能夠模擬出這種情況筆者對innodb增加了4個引數如下,為了方便識別我都加上了自己的名字的拼音:

mysql> show variables like '%gaopeng%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| innodb_gaopeng_sl_heap_no | 0     |
| innodb_gaopeng_sl_ind_id  | 0     |
| innodb_gaopeng_sl_page_no | 0     |
| innodb_gaopeng_sl_time    | 0     |
+---------------------------+-------+ 

預設情況都是0,即不啟用。他們的意思如下:

  • innodb_gaopeng_sl_heap_no:記錄所在的heap no
  • innodb_gaopeng_sl_ind_id:記錄所在的index_id
  • innodb_gaopeng_sl_page_no:記錄所在的page_no
  • innodb_gaopeng_sl_time:睡眠多少秒
    有了index_id、page_no、heap no就能唯一限定一條資料了,並且睡眠時間也是可以人為指定的。

並且在原始碼lock_rec_lock 開頭增加如下程式碼:

 //add by gaopeng 
   /*if find index_id heap no page no to sleep srv_gaopeng_sl_time secs*/
       if(srv_gaopeng_sl_ind_id && srv_gaopeng_sl_page_no && srv_gaopeng_sl_heap_no)
       {
           if(heap_no == (ulint)(srv_gaopeng_sl_heap_no) && (block->page.id).page_no() ==(ib_uint32_t)(srv_gaopeng_sl_page_no) 
               && index->id ==(index_id_t)(srv_gaopeng_sl_ind_id) )
               {
                   lock_mutex_exit();
                   sleep(srv_gaopeng_sl_time);
                   lock_mutex_enter();
               }
       }
   //add end 

這樣一旦判定為符合條件的記錄,本條記錄的加鎖錢便會休眠指定的秒如果我們擬定在LOCK_S:id=2997之前睡眠30秒,那麼情況2能夠必定發生如下圖:

情況3.jpg
情況3.jpg

六、實際測試

情況1:
TX1 TX2
begin; -
update b set name2='test' where id=2999;對id:2999加LOCK_X鎖 -
- insert into a select * from b where id in (996,997,998,999,2995,2996,2997,2998,2999);對id:996,997,998,999,2995,2996,2997,2998加LOCK_S鎖,但是對id:2999加LOCK_S鎖時發現已經加LOCK_X鎖,需等待
update b set name2='test' where id=999;對id:999加LOCK_X鎖,但是發現已經加LOCK_S鎖,需等待,觸發死鎖檢測 -
TX1觸發死鎖,TX1在權重判定下回滾 -

死鎖報錯語句:

mysql> update b set name2='test' where id=999;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 

死鎖日誌:

*** (1) TRANSACTION:
TRANSACTION 48423, ACTIVE 7 sec starting index read
mysql tables in use 2, locked 2
LOCK WAIT 5 lock struct(s), heap size 1160, 9 row lock(s), undo log entries 8
MySQL thread id 4, OS thread handle 140737223177984, query id 9110 localhost root Sending data
insert into a  select * from b where id in (996,997,998,999,2995,2996,2997,2998,2999)

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 119 page no 18 n bits 160 index PRIMARY of table `test`.`b` trx id 48423 lock mode S(LOCK_S) locks rec but not gap(LOCK_REC_NOT_GAP) waiting(LOCK_WAIT)
Record lock, heap no 86 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000bb7; asc     ;;
1: len 6; hex 00000000bd26; asc      &;;
2: len 7; hex 21000001511e7d; asc !   Q };;
3: len 7; hex 67616f32393939; asc gao2999;;
4: len 4; hex 74657374; asc test;;

*** (2) TRANSACTION:
TRANSACTION 48422, ACTIVE 24 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1160, 2 row lock(s), undo log entries 1
MySQL thread id 3, OS thread handle 140737223444224, query id 9111 localhost root updating
update b set name2='test' where id=999
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 119 page no 18 n bits 160 index PRIMARY of table `test`.`b` trx id 48422 lock_mode X(LOCK_X) locks rec but not gap(LOCK_REC_NOT_GAP)
Record lock, heap no 86 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000bb7; asc     ;;
1: len 6; hex 00000000bd26; asc      &;;
2: len 7; hex 21000001511e7d; asc !   Q };;
3: len 7; hex 67616f32393939; asc gao2999;;
4: len 4; hex 74657374; asc test;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 119 page no 10 n bits 456 index PRIMARY of table `test`.`b` trx id 48422 lock_mode X(LOCK_X) locks rec but not gap(LOCK_REC_NOT_GAP) waiting(LOCK_WAIT)
Record lock, heap no 11 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 800003e7; asc     ;;
1: len 6; hex 00000000b534; asc      4;;
2: len 7; hex bd000001310110; asc     1  ;;
3: len 6; hex 67616f393939; asc gao999;;
4: len 7; hex 67616f70656e67; asc gaopeng;;

*** WE ROLL BACK TRANSACTION (2) 

資訊提取如下:

TRX1:48423 
   LOCK HOLD:死鎖資訊不提供
   LOCK WAIT: 
            表:b
            索引:`PRIMARY`  
            鎖模式:LOCK_S|LOCK_REC_NOT_GAP|LOCK_WAIT
            記錄:主鍵為0Xbb7(2999)
            附加資訊:space id 119 page no 18 heap no 86
   CURRENT SQL: 
            insert into a  select * from b where id in (996,997,998,999,2995,2996,2997,2998,2999)
           
TRX2:48422(觸發死鎖、權重回滾)
    LOCK HOLD:
             表:b
             索引:`PRIMARY`  
             鎖模式:LOCK_X|LOCK_REC_NOT_GAP
             記錄:主鍵為0Xbb7(2999)
             附加資訊:pace id 119 page no 18 heap no 86          
    LOCK WAIT:
             表:b
             索引:`PRIMARY`  
             鎖模式:LOCK_X|LOCK_REC_NOT_GAP|LOCK_WAIT
             記錄:主鍵為0X3e7(999)
             附加資訊:space id 119 page no 10 heap no 11
    CURRENT SQL:
             update b set name2='test' where id=999 
情況2:

如上我們擬定在語句

insert into a  select * from b where id in (996,997,998,999,2995,2996,2997,2998,2999) 

對b表記錄加鎖時在2997加鎖前停頓30秒,那麼我就需要找到b表主鍵2997的index_id、page_no、heap_no三個資訊,這裡使用到我的innblock工具

./innblock b.ibd scan 16
===INDEX_ID:121
level1 total block is (1)
block_no:         3,level:   1|*|
level0 total block is (9)
block_no:         5,level:   0|*|block_no:         6,level:   0|*|block_no:         7,level:   0|*|
block_no:        10,level:   0|*|block_no:        11,level:   0|*|block_no:        13,level:   0|*|
block_no:        15,level:   0|*|block_no:        17,level:   0|*|block_no:        18,level:   0|*| 

因為為順序插入那麼2997必定到page 18中然後如下:

./innblock b.ibd 18 16
==== Block base info ====
block_no:18         space_id:121          index_id:121       
....
(84) normal record offset:3287 heapno:83 n_owned 0,delflag:N minflag:0 rectype:0
(85) normal record offset:3326 heapno:84 n_owned 0,delflag:N minflag:0 rectype:0
(86) normal record offset:3365 heapno:85 n_owned 0,delflag:N minflag:0 rectype:0
(87) normal record offset:3404 heapno:86 n_owned 0,delflag:N minflag:0 rectype:0
(88) normal record offset:3443 heapno:87 n_owned 0,delflag:N minflag:0 rectype:0 

因為為順序插入heap_no 84就是id為2997的記錄。我們使用另外一個工具bcview進行驗證

./bcview b.ibd 16 3326 4
current block:00000018--Offset:03326--cnt bytes:04--data is:80000bb5 

當然0Xbb5就是2997
因此設定引數為:

set global innodb_gaopeng_sl_heap_no=84;
set global innodb_gaopeng_sl_ind_id=121;
set global innodb_gaopeng_sl_page_no=18;
set global innodb_gaopeng_sl_time=30;

mysql> show variables like '%gaopeng%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| innodb_gaopeng_sl_heap_no | 84    |
| innodb_gaopeng_sl_ind_id  | 121   |
| innodb_gaopeng_sl_page_no | 18    |
| innodb_gaopeng_sl_time    | 30    |
+---------------------------+-------+ 

那麼情況2執行順序如下:

TX1 TX2
begin; -
update b set name2='test' where id=2999; 對id:2999加LOCK_X鎖 -
- insert into a select * from b where id in (996,997,998,999,2995,2996,2997,2998,2999);對id:在加鎖到996,997,998,999,2995,2996加LOCK_S鎖,在對id:2997加鎖前睡眠30秒,為下面的update語句騰出時間)
update b set name2='test' where id=999;對id:999加LOCK_X鎖等待但發現已經加LOCK_S鎖,需等待 -
- 醒來後繼續對2997、2998、2999加LOCK_S鎖,但是發現id:2999已經加LOCK_X鎖,需等待,觸發死鎖檢測
TX1權重回滾 -

死鎖報錯語句:

mysql> update b set name2='test' where id=999;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 

死鎖日誌:

*** (1) TRANSACTION:
TRANSACTION 51545, ACTIVE 41 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1160, 2 row lock(s), undo log entries 1
MySQL thread id 9, OS thread handle 140737223444224, query id 18310 localhost root updating
update b set name2='test' where id=999

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 121 page no 10 n bits 456 index PRIMARY of table `test`.`b` trx id 51545 lock_mode X(LOCK_X) locks rec but not gap(LOCK_REC_NOT_GAP) waiting(LOCK_WAIT)
Record lock, heap no 11 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 800003e7; asc     ;;
1: len 6; hex 00000000c167; asc      g;;
2: len 7; hex bc000001300110; asc     0  ;;
3: len 6; hex 67616f393939; asc gao999;;
4: len 7; hex 67616f70656e67; asc gaopeng;;

*** (2) TRANSACTION:
TRANSACTION 51546, ACTIVE 30 sec starting index read
mysql tables in use 2, locked 2
5 lock struct(s), heap size 1160, 9 row lock(s), undo log entries 8
MySQL thread id 8, OS thread handle 140737223177984, query id 18309 localhost root Sending data
insert into a  select * from b where id in (996,997,998,999,2995,2996,2997,2998,2999)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 121 page no 10 n bits 456 index PRIMARY of table `test`.`b` trx id 51546 lock mode S(LOCK_S) locks rec but not gap(LOCK_REC_NOT_GAP)
Record lock, heap no 8 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 800003e4; asc     ;;
1: len 6; hex 00000000c164; asc      d;;
2: len 7; hex b90000012d0110; asc     -  ;;
3: len 6; hex 67616f393936; asc gao996;;
4: len 7; hex 67616f70656e67; asc gaopeng;;
Record lock, heap no 9 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 800003e5; asc     ;;
1: len 6; hex 00000000c165; asc      e;;
2: len 7; hex ba0000014f0110; asc     O  ;;
3: len 6; hex 67616f393937; asc gao997;;
4: len 7; hex 67616f70656e67; asc gaopeng;;
Record lock, heap no 10 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 800003e6; asc     ;;
1: len 6; hex 00000000c166; asc      f;;
2: len 7; hex bb0000012f0110; asc     /  ;;
3: len 6; hex 67616f393938; asc gao998;;
4: len 7; hex 67616f70656e67; asc gaopeng;;
Record lock, heap no 11 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 800003e7; asc     ;;
1: len 6; hex 00000000c167; asc      g;;
2: len 7; hex bc000001300110; asc     0  ;;
3: len 6; hex 67616f393939; asc gao999;;
4: len 7; hex 67616f70656e67; asc gaopeng;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 121 page no 18 n bits 160 index PRIMARY of table `test`.`b` trx id 51546 lock mode S(LOCK_S) locks rec but not gap(LOCK_REC_NOT_GAP) waiting(LOCK_WAIT)
Record lock, heap no 86 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000bb7; asc     ;;
1: len 6; hex 00000000c959; asc      Y;;
2: len 7; hex 00000002240110; asc     $  ;;
3: len 7; hex 67616f32393939; asc gao2999;;
4: len 4; hex 74657374; asc test;;

*** WE ROLL BACK TRANSACTION (1) 

資訊提取如下:

TRX1:51545 
   LOCK HOLD:死鎖資訊不提供
   LOCK WAIT: 
            表:b
            索引:`PRIMARY`  
            鎖模式:LOCK_MODE:LOCK_X|LOCK_REC_NOT_GAP|LOCK_WAIT
            記錄:主鍵為0X3e7 
            附加資訊: space id 121 page no 10 heap no 11
   CURRENT SQL: 
            update b set name2='test' where id=999
           
TRX2:51546
    LOCK HOLD:
             表:b
             索引:`PRIMARY`  
             鎖模式:LOCK_MODE:LOCK_S|LOCK_REC_NOT_GAP
             記錄:主鍵為0X3e4到0X3e7的多個行鎖
             附加資訊:space id 121 page no 10
                          
    LOCK WAIT:
             表:b
             索引:`PRIMARY`  
             鎖模式:LOCK_MODE:LOCK_S|LOCK_REC_NOT_GAP|LOCK_WAIT
             記錄:主鍵為0Xbb7
             附加資訊:space id 121 page no 10  heap no 86
    CURRENT SQL:
             insert into a  select * from b where id in (996,997,998,999,2995,2996,2997,2998,2999) 

我們通過死鎖日誌明顯的看出同樣的語句報出來的死鎖資訊卻不一樣,在高併發下相同語句,兩種死鎖場景都是可能發生的。

七、總結

分析死鎖一般要從死鎖日誌中獲取如下資訊

  • 1、加鎖發生在主鍵還是輔助索引
  • 2、加鎖的模式是什麼
  • 3、是單行還是多行加鎖
  • 4、觸發死鎖事務最後的語句
  • 5、死鎖資訊中事務順序是怎麼樣的

在重現的時候,必須要做到和線上死鎖資訊完全匹配那麼這個死鎖場景才叫測試成功了,從這個例子我們就發現,同樣的語句產生的死鎖資訊卻不一樣,我們當然就要按照不通的場景去考慮,本文中的情況2比較複雜一般只是在高併發先出現,測試也相對麻煩,本文通過修改原始碼的方式進行測試的,否則很難重現。找到原因後就需要採取必要的措施,比如本文中的例子需要考慮:

  • 對insert...select中select表的修改是否及時提交。
  • insert...select是否可以用其他方式代替。這種語句在自增鎖上也存在一定風險。
  • 是否考慮使用RC隔離級別,在RC隔離級別下不存在對select表記錄加鎖的情況。

強調一點對於出現LOCK_S這樣的鎖最好深入分析,因為這種鎖並不多見。

作者微信:

微信.jpg
微信.jpg

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/7728585/viewspace-2146183/,如需轉載,請註明出處,否則將追究法律責任。

相關文章