MySQL 死鎖與日誌二三事

發表於2017-07-31

原文網址 : http://blog.jobbole.com/112054/

最近線上 MySQL 接連發生了幾起資料異常，都是在凌晨爆發，由於業務場景屬於典型的資料倉儲型應用，白天壓力較小無法復現。甚至有些異常還比較詭異，最後 root cause 分析頗費周折。那實際業務當中我們們如何能快速的定位線上 MySQL 問題，修復異常呢？下文我會根據兩個實際 case，分享下相關的經驗與方法。

1、Case1：部分資料更新失敗

某天渠道同學反饋某報表極個別渠道資料為 0，大部分渠道資料正常。這個資料是由一個統計程式每天凌晨例行更新的，按理來說，要麼全部正常，要麼全部失敗，那會是什麼原因導致極個別資料異常呢？

首先我們能想到的自然是根據統計任務日誌來看了，但是看了統計程式列印的日誌沒有發現諸如 SQL update 失敗的異常描述，那當時的資料庫究竟發生了什麼呢？在檢視 MySQL-server 日誌之前，習慣性的看了下資料庫狀態：

1	SHOW ENGINE INNODB STATUS\G

恰好看到了凌晨這個 update 發生了死鎖：

------------------------
LATEST DETECTED DEADLOCK
------------------------
2017-07-17 04:09:01 0x7f6de03c8700
*** (1) TRANSACTION:
TRANSACTION 215208479, ACTIVE 0 sec fetching rows
mysql tables in use 3, locked 3
LOCK WAIT 5 lock struct(s), heap size 1136, 3 row lock(s)
MySQL thread id 27844824, OS thread handle 140092183037696, query id 412503674 10.126.95.84 zeye Searching rows for update
update t_channel_final_datas set nr_register=133,nr_add_goods=29,nr_order_normal=11,nr_pay_normal=8,nr_order_special=0,nr_pay_special=0,n_add_user_num=16 where count_date='2017-07-16' and channel_id='16' and channel_type='10' and terminal='26'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 464 page no 5459 n bits 392 index index_countdate_type_terminal of table `db_zz_flow`.`t_channel_final_datas` trx id 215208479 lock_mode X locks rec but not gap waiting
Record lock, heap no 304 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 10; hex 323031372d30372d3136; asc 2017-07-16;;
 1: len 1; hex 30; asc 0;;
 2: len 4; hex 80000010; asc     ;;
 3: len 4; hex 8009055e; asc    ^;;

*** (2) TRANSACTION:
TRANSACTION 215208474, ACTIVE 0 sec fetching rows
mysql tables in use 3, locked 3
6 lock struct(s), heap size 1136, 7 row lock(s)
MySQL thread id 27844825, OS thread handle 140109890225920, query id 412503669 10.135.6.41 zeye Searching rows for update
update t_channel_final_datas set nr_register=24,nr_add_goods=32,nr_order_normal=0,nr_pay_normal=0,nr_order_special=0,nr_pay_special=0,n_add_user_num=11 where count_date='2017-07-16' and channel_id='114' and channel_type='10' and terminal='116'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 464 page no 5459 n bits 392 index index_countdate_type_terminal of table `db_zz_flow`.`t_channel_final_datas` trx id 215208474 lock_mode X locks rec but not gap
Record lock, heap no 304 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 10; hex 323031372d30372d3136; asc 2017-07-16;;
 1: len 1; hex 30; asc 0;;
 2: len 4; hex 80000010; asc     ;;
 3: len 4; hex 8009055e; asc    ^;;

...

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 464 page no 4743 n bits 264 index PRIMARY of table `db_zz_flow`.`t_channel_final_datas` trx id 215208474 lock_mode X locks rec but not gap waiting
Record lock, heap no 168 PHYSICAL RECORD: n_fields 32; compact format; info bits 0
 0: len 4; hex 80090569; asc    i;;
 1: len 6; hex 00000cd3b9d0; asc       ;;
...

*** WE ROLL BACK TRANSACTION (1)

------------------------

LATEST DETECTED DEADLOCK

------------------------

2017-07-17 04:09:01 0x7f6de03c8700

*** (1) TRANSACTION:

TRANSACTION 215208479, ACTIVE 0 sec fetching rows

mysql tables in use 3, locked 3

LOCK WAIT 5 lock struct(s), heap size 1136, 3 row lock(s)

MySQL thread id 27844824, OS thread handle 140092183037696, query id 412503674 10.126.95.84 zeye Searching rows for update

update t_channel_final_datas set nr_register=133,nr_add_goods=29,nr_order_normal=11,nr_pay_normal=8,nr_order_special=0,nr_pay_special=0,n_add_user_num=16 where count_date='2017-07-16' and channel_id='16' and channel_type='10' and terminal='26'

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 464 page no 5459 n bits 392 index index_countdate_type_terminal of table `db_zz_flow`.`t_channel_final_datas` trx id 215208479 lock_mode X locks rec but not gap waiting

Record lock, heap no 304 PHYSICAL RECORD: n_fields 4; compact format; info bits 0

0: len 10; hex 323031372d30372d3136; asc 2017-07-16;;

1: len 1; hex 30; asc 0;;

2: len 4; hex 80000010; asc ;;

3: len 4; hex 8009055e; asc ^;;

*** (2) TRANSACTION:

TRANSACTION 215208474, ACTIVE 0 sec fetching rows

mysql tables in use 3, locked 3

6 lock struct(s), heap size 1136, 7 row lock(s)

MySQL thread id 27844825, OS thread handle 140109890225920, query id 412503669 10.135.6.41 zeye Searching rows for update

update t_channel_final_datas set nr_register=24,nr_add_goods=32,nr_order_normal=0,nr_pay_normal=0,nr_order_special=0,nr_pay_special=0,n_add_user_num=11 where count_date='2017-07-16' and channel_id='114' and channel_type='10' and terminal='116'

*** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 464 page no 5459 n bits 392 index index_countdate_type_terminal of table `db_zz_flow`.`t_channel_final_datas` trx id 215208474 lock_mode X locks rec but not gap

Record lock, heap no 304 PHYSICAL RECORD: n_fields 4; compact format; info bits 0

0: len 10; hex 323031372d30372d3136; asc 2017-07-16;;

1: len 1; hex 30; asc 0;;

2: len 4; hex 80000010; asc ;;

3: len 4; hex 8009055e; asc ^;;

...

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 464 page no 4743 n bits 264 index PRIMARY of table `db_zz_flow`.`t_channel_final_datas` trx id 215208474 lock_mode X locks rec but not gap waiting

Record lock, heap no 168 PHYSICAL RECORD: n_fields 32; compact format; info bits 0

0: len 4; hex 80090569; asc i;;

1: len 6; hex 00000cd3b9d0; asc ;;

...

*** WE ROLL BACK TRANSACTION (1)

篇幅所限，上下文我這裡省略了很多，從這段日誌裡可以看到，TRANSACTION 1 和 TRANSACTION 2 分別持有一定數量的行鎖，然後又等待對方的鎖，最後 MySQL 檢測到 deadlock ，然後選擇回滾了 TRANSACTION 1：Innodb目前處理死鎖的方法是將持有最少行級排他鎖的事務進行回滾。

那這裡就有 3 個問題了：

（1）innodb 行鎖不是隻鎖一行？

因為這張表是 innodb 引擎的，InnoDB 支援行鎖和表鎖。而InnoDB行鎖是通過給索引上的索引項加鎖來實現的，這一點MySQL與Oracle不同，後者是通過在資料塊中對相應資料行加鎖來實現的。InnoDB這種行鎖實現特點意味著：只有通過索引條件檢索資料，InnoDB才使用行級鎖，否則，InnoDB將使用表鎖，會把所有掃描過的行都鎖定！在實際應用中，要特別注意InnoDB行鎖的這一特性，不然的話，可能導致大量的鎖衝突，從而影響併發效能。由於MySQL的行鎖是針對索引加的鎖，不是針對記錄加的鎖，所以雖然是訪問不同行的記錄，但是如果是使用相同的索引鍵，是會出現鎖衝突的。當我們用範圍條件而不是相等條件檢索資料，並請求共享或排他鎖時，InnoDB會給符合條件的已有資料記錄的索引項加鎖；另外間隙鎖也會鎖多行，InnoDB除了通過範圍條件加鎖時使用間隙鎖外，如果使用相等條件請求給一個不存在的記錄加鎖，InnoDB也會使用間隙鎖！

話都說到這了，那就看下我們們業務表的索引情況：

show INDEX from `db_zz_flow`.`t_channel_final_datas`;
+-----------------------+--------------+-------------------------------+----------------+-------------+----------+--------+--------------+-----------+-----------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_namt | Packed | Null | Index_type | Comment | Index_comment |
|-----------------------+--------------+-------------------------------+----------------+-------------+----------+--------+--------------+-----------+-----------------|
| t_channel_final_datas | 0 | PRIMARY | 1 | id > | <null> | | BTREE | | |
| t_channel_final_datas | 1 | index_countdate_type_terminal | 1 | count_date> | <null> | YES | BTREE | | |
| t_channel_final_datas | 1 | index_countdate_type_terminal | 2 | channel_ty> | <null> | YES | BTREE | | |
| t_channel_final_datas | 1 | index_countdate_type_terminal | 3 | terminal > | <null> | YES | BTREE | | |
| t_channel_final_datas | 1 | index_countdate_channelid | 1 | count_date> | <null> | YES | BTREE | | |
| t_channel_final_datas | 1 | index_countdate_channelid | 2 | channel_id> | <null> | YES | BTREE | | |
+-----------------------+--------------+-------------------------------+----------------+-------------+----------+--------+--------------+-----------+-----------------+

show INDEX from `db_zz_flow`.`t_channel_final_datas`;

+-----------------------+--------------+-------------------------------+----------------+-------------+----------+--------+--------------+-----------+-----------------+

|-----------------------+--------------+-------------------------------+----------------+-------------+----------+--------+--------------+-----------+-----------------|

| t_channel_final_datas | 0 | PRIMARY | 1 | id > | <null> | | BTREE | | |

+-----------------------+--------------+-------------------------------+----------------+-------------+----------+--------+--------------+-----------+-----------------+

可以看到這張表的索引極不合理：有3個索引，但是 update 卻沒有完全的用上索引，導致 update 沒有精確的用上索引，需要鎖定多行範圍資料，從而引發死鎖。

知道原理後，我們們再精心構建一個四欄位的組合索引即可讓 update 精準的走 innodb 索引，實際上，我們更新索引後，這個死鎖問題即得到了解決。
注：innodb不僅會列印出事務和事務持有和等待的鎖，而且還有記錄本身，不幸的是，它可能超過innodb為輸出結果預留的長度（只能列印1M的內容且只能保留最近一次的死鎖資訊），如果你無法看到完整的輸出，此時可以在任意庫下建立innodb_monitor或innodb_lock_monitor表，這樣innodb status資訊會完整且每15s一次被記錄到錯誤日誌中。如：create table innodb_monitor(a int)engine=innodb;，不需要記錄到錯誤日誌中時就刪掉這個表即可。

（2）回滾的話，為什麼只有部分 update 語句失敗，而不是整個事務裡的所有 update 都失敗？

這是因為我們們的 innodb 預設是自動提交的：

show variables like 'autocommit';

+-----------------+---------+

| Variable_name | Value |

|-----------------+---------|

| autocommit | ON |

+-----------------+---------+

在多個 update 或 insert 語句情況下，每執行完一條 SQL，innodb 就立即 commit 一次以持久化變更，同時釋放鎖，這也正是本例中死鎖回滾事務後只有極個別語句失敗的原因。

需要注意的是，通常還有另外一種情況也可能導致部分語句回滾，需要格外留意。在 innodb 裡有個引數叫：innodb_rollback_on_timeout

show VARIABLES LIKE 'innodb_rollback_on_timeout'

+----------------------------+---------+

| Variable_name | Value |

|----------------------------+---------|

| innodb_rollback_on_timeout | OFF |

+----------------------------+---------+

官方手冊裡這樣描述：

In MySQL 5.1, InnoDB rolls back only the last statement on a transaction timeout by default. If –innodb_rollback_on_timeout is specified, a transaction timeout causes InnoDB to abort and roll back the entire transaction (the same behavior as in MySQL 4.1). This variable was added in MySQL 5.1.15.

解釋：這個引數關閉或不存在的話遇到超時只回滾事務最後一個Query，開啟的話事務遇到超時就回滾整個事務。

（3）怎樣降低 innodb 死鎖機率？

死鎖在行鎖及事務場景下很難完全消除，但可以通過表設計和SQL調整等措施減少鎖衝突和死鎖，包括：

儘量使用較低的隔離級別，比如如果發生了間隙鎖，你可以把會話或者事務的事務隔離級別更改為 RC(read committed)級別來避免，但此時需要把 binlog_format 設定成 row 或者 mixed 格式
精心設計索引，並儘量使用索引訪問資料，使加鎖更精確，從而減少鎖衝突的機會；
選擇合理的事務大小，小事務發生鎖衝突的機率也更小；
給記錄集顯示加鎖時，最好一次性請求足夠級別的鎖。比如要修改資料的話，最好直接申請排他鎖，而不是先申請共享鎖，修改時再請求排他鎖，這樣容易產生死鎖；
不同的程式訪問一組表時，應儘量約定以相同的順序訪問各表，對一個表而言，儘可能以固定的順序存取表中的行。這樣可以大大減少死鎖的機會；
儘量用相等條件訪問資料，這樣可以避免間隙鎖對併發插入的影響；
不要申請超過實際需要的鎖級別；除非必須，查詢時不要顯示加鎖；
對於一些特定的事務，可以使用表鎖來提高處理速度或減少死鎖的可能。

2、Case2：詭異的 Lock wait timeout

連續幾天凌晨6點和早上8點都分別有一個任務失敗，load data local infile 的時候報 Lock wait timeout exceeded try restarting transaction innodb 的 Java SQL 異常，和平臺的同學溝通得知，這是我們自己的業務資料庫的 Lock 時間太短或者鎖衝突的問題。但是回頭一想不應該啊？這不一直好好的嗎？而且基本都是單表單任務，不存在多人衝突。

甭管誰的問題，那我們們還是先看自己的資料庫有沒有問題：

show variables like 'innodb_lock_wait_timeout';

+--------------------------+---------+

| Variable_name | Value |

|--------------------------+---------|

| innodb_lock_wait_timeout | 50 |

+--------------------------+---------+

預設 lock 超時時間 50s，這個時間真心不短了，估計調了也沒用，事實上確實死馬當活馬醫的試了下沒用。。。

而且這次 SHOW ENGINE INNODB STATUSG 也沒出現任何死鎖資訊，然後又將目光轉向 MySQL-server 日誌，希望能從日誌裡看一看那個時刻前後資料究竟在做什麼操作。這裡先簡單的介紹下MySQL日誌檔案系統的組成：

(a) error 日誌：記錄啟動、執行或停止 mysqld 時出現的問題，預設開啟。
(b) general 日誌：通用查詢日誌，記錄所有語句和指令，開啟資料庫會有 5% 左右效能損失。
(c) binlog 日誌：二進位制格式，記錄所有更改資料的語句，主要用於 slave 複製和資料恢復。
(d) slow 日誌：記錄所有執行時間超過 long_query_time 秒的查詢或不使用索引的查詢，預設關閉。
(e) Innodb日誌：innodb redo log、undo log，用於恢復資料和撤銷操作。

從上面的介紹可以看到，目前這個問題的日誌可能在 d 和 b 中，看了下 d 中沒有，那就只能開啟 b 了，但 b 對資料庫的效能有一定損耗，由於是全量日誌，量非常巨大，所以開啟一定要謹慎：

-- general_log 日誌預設關閉，開啟會影響資料庫 5% 左右效能：

show variables like 'general%';

+------------------+---------------------------------+

| Variable_name | Value |

|------------------+---------------------------------|

| general_log | OFF |

| general_log_file | /opt/data/mysql/tjtx-103-26.log |

+------------------+---------------------------------+

-- 全域性 session 級別開啟：

set global general_log=1

-- 如果需要對當前 session 生效需要：

set general_log=1

-- set 指令設定的動態引數在 MySQL 重啟後失效，如果需要永久生效需要在 /etc/my.cnf 中配置靜態變數/引數。

-- 如果不知道 my.cnf 位置，可以根據 mysql -? | grep ".cnf" 查詢

order of preference, my.cnf, $MYSQL_TCP_PORT,

/etc/my.cnf /etc/mysql/my.cnf /usr/etc/my.cnf ~/.my.cnf

我這裡只是每天在出問題的前後半小時開啟下全量日誌，結果沒有發現任何 MySQL-client 請求到我們的業務資料庫！該日誌格式如下，記錄了所有的連線與命令：

/usr/sbin/mysqld, Version: 5.7.12-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /opt/data/mysql/mysql.sock
Time                 Id Command    Argument
2017-07-20T21:45:01.880828Z28556028 Quit
2017-07-20T21:45:02.708621Z28401469 Query       SELECT 1
2017-07-20T21:45:02.736734Z28556029 Connect     ooxx@127.0.0.1 on db_zz_system using TCP/IP
2017-07-20T21:45:02.737242Z28556029 Query       /* mysql-connector-java-5.1.6 ( Revision: ${svn.Revision} ) */SHOW VARIABLES WHERE Variable_name ='language' OR Variable_name = 'net_write_timeout' OR Variable_name = 'interactive_timeout' OR Variable_name = 'wait_timeout' OR Variable_name = 'character_set_client' OR Variable_name = 'character_set_connection' OR Variable_name = 'character_set' OR Variable_name = 'character_set_server' OR Variable_name = 'tx_isolation' OR Variable_name = 'transaction_isolation' OR Variable_name = 'character_set_results' OR Variable_name = 'timezone' OR Variable_name = 'time_zone' OR Variable_name = 'system_time_zone' OR Variable_name = 'lower_case_table_names' OR Variable_name = 'max_allowed_packet' OR Variable_name = 'net_buffer_length' OR Variable_name = 'sql_mode' OR Variable_name = 'query_cache_type' OR Variable_name = 'query_cache_size' OR Variable_name = 'init_connect'
2017-07-20T21:45:02.738868Z28556029 Query       SHOW COLLATION
2017-07-20T21:45:02.739941Z28556029 Query       SET character_set_results = NULL
2017-07-20T21:45:02.740275Z28556029 Query       SET autocommit=1

/usr/sbin/mysqld, Version: 5.7.12-log (MySQL Community Server (GPL)). started with:

Tcp port: 3306 Unix socket: /opt/data/mysql/mysql.sock

Time Id Command Argument

2017-07-20T21:45:01.880828Z28556028 Quit

2017-07-20T21:45:02.708621Z28401469 Query SELECT 1

2017-07-20T21:45:02.736734Z28556029 Connect ooxx@127.0.0.1 on db_zz_system using TCP/IP

2017-07-20T21:45:02.737242Z28556029 Query /* mysql-connector-java-5.1.6 ( Revision: ${svn.Revision} ) */SHOW VARIABLES WHERE Variable_name ='language' OR Variable_name = 'net_write_timeout' OR Variable_name = 'interactive_timeout' OR Variable_name = 'wait_timeout' OR Variable_name = 'character_set_client' OR Variable_name = 'character_set_connection' OR Variable_name = 'character_set' OR Variable_name = 'character_set_server' OR Variable_name = 'tx_isolation' OR Variable_name = 'transaction_isolation' OR Variable_name = 'character_set_results' OR Variable_name = 'timezone' OR Variable_name = 'time_zone' OR Variable_name = 'system_time_zone' OR Variable_name = 'lower_case_table_names' OR Variable_name = 'max_allowed_packet' OR Variable_name = 'net_buffer_length' OR Variable_name = 'sql_mode' OR Variable_name = 'query_cache_type' OR Variable_name = 'query_cache_size' OR Variable_name = 'init_connect'

2017-07-20T21:45:02.738868Z28556029 Query SHOW COLLATION

2017-07-20T21:45:02.739941Z28556029 Query SET character_set_results = NULL

2017-07-20T21:45:02.740275Z28556029 Query SET autocommit=1

那問題基本確定了，客戶端請求都沒到我們這邊就丟擲了上述的異常，和平臺方再三溝通確認下，最後平臺方查證是因為在執行插入前他們需要先從 SQL task 表取出 SQL 和更新 task 狀態，結果這張表由於在整點存在大量 insert 和 update 併發，導致部分 SQL 等待 lock 超時了。。。

3、MySQL 日誌分析指令碼

由於凌晨是資料倉儲的業務高峰，很多問題都是在這個時候爆發，一些詭異的問題往往是過了這個村就沒這個店了，白天無法復現。如何能捕獲我們關心的日誌，便於快速的定位問題，這個是重中之重，這裡我寫了個小指令碼，crontab 部署，可以選擇時間範圍開啟，每分鐘取樣一次日誌，需要說明的是 general log 沒事別輕易開啟，否則對資料庫效能損耗較大。

# crontab 部署方式：
# */1 0-10 * * * cd /opt/ooxx/script && bash mysql_perf.sh >> logs/mysql_perf.log.`date -I` 2>&1

date -Iseconds

echo '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> top -bn1|head'
top -bn1|head 
echo

echo '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SHOW ENGINE INNODB STATUS\G '
mysql -uroot -pooxx -h127.0.0.1 -e 'SHOW ENGINE INNODB STATUS\G'
echo

echo '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> show open tables where in_use>0;'
mysql -uroot -pooxx -h127.0.0.1 -e 'show open tables where in_use>0;'
echo

echo '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> show full processlist;'
mysql -uroot -pooxx -h127.0.0.1 -e 'show full processlist;'
echo

echo '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SELECT * FROM `information_schema`.`innodb_trx` ORDER BY `trx_started`; '
mysql -uroot -pooxx -h127.0.0.1 -e 'SELECT * FROM `information_schema`.`innodb_trx` ORDER BY `trx_started`;' 
echo

echo '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SELECT * FROM `information_schema`.`innodb_locks`;'
mysql -uroot -pooxx -h127.0.0.1 -e 'SELECT * FROM `information_schema`.`innodb_locks`;'
echo

echo ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>> show status like '%lock%';"
mysql -uroot -pooxx -h127.0.0.1 -e "show status like '%lock%';"
echo

echo '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> show global status like "table_locks%";'
mysql -uroot -pooxx -h127.0.0.1 -e 'show global status like "table_locks%";'
echo

echo ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>> too long omit..."
mysql -uroot -pooxx -h127.0.0.1 -e "select r.trx_isolation_level, r.trx_id waiting_trx_id, r.trx_mysql_thread_id waiting_trx_thread, r.trx_state waiting_trx_state, lr.lock_mode waiting_trx_lock_mode, lr.lock_type waiting_trx_lock_type, lr.lock_table waiting_trx_lock_table, lr.lock_index waiting_trx_lock_index, r.trx_query waiting_trx_query, b.trx_id blocking_trx_id, b.trx_mysql_thread_id blocking_trx_thread, b.trx_state blocking_trx_state, lb.lock_mode blocking_trx_lock_mode, lb.lock_type blocking_trx_lock_type, lb.lock_table blocking_trx_lock_table, lb.lock_index blocking_trx_lock_index, b.trx_query blocking_query from information_schema.innodb_lock_waits w inner join information_schema.innodb_trx b on b.trx_id=w.blocking_trx_id inner join information_schema.innodb_trx r on r.trx_id=w.requesting_trx_id inner join information_schema.innodb_locks lb on lb.lock_trx_id=w.blocking_trx_id inner join information_schema.innodb_locks lr on lr.lock_trx_id=w.requesting_trx_id\G"
echo

echo ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>> too long omit..."
mysql -uroot -pooxx -h127.0.0.1 -e "SELECT r.trx_id waiting_trx_id, r.trx_mysql_thread_id waiting_thread, r.trx_query waiting_query, b.trx_id blocking_trx_id, b.trx_mysql_thread_id blocking_thread, b.trx_query blocking_query FROM information_schema.innodb_lock_waits w INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id INNER JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id\G"
echo

date -Iseconds
echo '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>=================================================================================='
echo

file_name=mysql_perf.log.`date -I`
if [[ `date +%-H%-M` == 1059 ]]
then
 cd /opt/ooxx/script/logs
 chmod 777 /home/work/ooxx/$file_name
 find /home/work/ooxx -name 'mysql_perf.log.*' -mtime +7 -delete
 
 cd /opt/data/mysql
 cp tjtx-ooxx-slow.log /home/work/ooxx/tjtx-ooxx-slow.log.`date -I`
 chmod 777 /home/work/ooxx/tjtx-ooxx-slow.log.`date -I`
 find /home/work/ooxx -name 'tjtx-ooxx-slow.log.*' -mtime +7 -delete
 
 cp mysqld.log /home/work/ooxx/mysqld.log.`date -I`
 chmod 777 /home/work/ooxx/mysqld.log.`date -I`
 find /home/work/ooxx -name 'mysqld.log.*' -mtime +7 -delete
fi

################
# 開啟 general_log 全量明細日誌會降低資料庫 5% 效能
#if [[ "`date +%H%M`" == "0545" ]]
#then
# echo "`date +%H%M` ------- set global general_log=1;"
# mysql -uroot -pooxx -h127.0.0.1 -e 'set global general_log=1;'
#elif [[ "`date +%H%M`" == "0630" ]]
#then
# echo "`date +%H%M` ------- set global general_log=0;"
# mysql -uroot -pooxx -h127.0.0.1 -e 'set global general_log=0;'
#elif [[ "`date +%H%M`" == "0745" ]]
#then
# echo "`date +%H%M` ------- set global general_log=1;"
# mysql -uroot -pooxx -h127.0.0.1 -e 'set global general_log=1;'
#elif [[ "`date +%H%M`" == "0830" ]]
#then
# echo "`date +%H%M` ------- set global general_log=0;"
# mysql -uroot -pooxx -h127.0.0.1 -e 'set global general_log=0;'
#elif [[ "`date +%H%M`" == "0001" ]]
#then
# echo "`date +%H%M` ------- set global general_log=1;"
# mysql -uroot -pooxx -h127.0.0.1 -e 'set global general_log=1;'
#elif [[ "`date +%H%M`" == "0002" ]]
#then
# echo "`date +%H%M` ------- set global general_log=0;"
# mysql -uroot -pooxx -h127.0.0.1 -e 'set global general_log=0;'
#fi

#[[ 10#`date +%H%M` -lt 10#0550 || 10#`date +%H%M` -gt 10#0830 ]] && echo "`date +%H%M` ------- set global general_log=0;" && mysql -uroot -pooxx -h127.0.0.1 -e 'set global general_log=0;'

# mysql -uroot -pooxx -h127.0.0.1 -e 'show open tables where in_use>0;show full processlist;SELECT * FROM `information_schema`.`innodb_trx` ORDER BY `trx_started`;SELECT * FROM `information_schema`.`innodb_locks`;SHOW ENGINE INNODB STATUS\G'

# --show variables like '%tx_isolation%';
# --SELECT @@GLOBAL.tx_isolation, @@tx_isolation, @@session.tx_isolation;
# -- SET GLOBAL tx_isolation = 'READ-COMMITTED';
# 
# --show variables like '%timeout%';
# --show variables like 'innodb_lock_wait_timeout';
# -- SET GLOBAL innodb_lock_wait_timeout=60
# 
# --show variables like 'long_query_time';
# -- SET global long_query_time=3;

# -- show variables like 'innodb_rollback_on_timeout';

# -- show VARIABLES like '%max_allowed_packet%';
# -- set global max_allowed_packet = 100*1024*1024;

# 自動提交
# -- show variables like 'autocommit';

# 慢查詢
# -- show variables like '%slow_query_log%';
# set global 只對當前資料庫生效，如果MySQL重啟後則會失效。如果要永久生效，就必須修改配置檔案my.cnf（其它系統變數也是如此）。
# set global slow_query_log=1;
# -- show variables like 'long_query_time%';
# set global long_query_time=4;
# show global variables like 'long_query_time'; 
# select sleep(5);
# -- show variables like 'log_queries_not_using_indexes';
# set global log_queries_not_using_indexes=1;
# -- show variables like 'log_slow_admin_statements';
# -- show global status like '%Slow_queries%';
# http://www.cnblogs.com/kerrycode/p/5593204.html

# -- show variables like "%time_zone%";
#set global time_zone = '+8:00';

#開啟general_log日誌
# -- show variables like 'general%';
#可以在my.cnf裡新增,1開啟（0關閉）,當然了,這樣要重啟才能生效,有點多餘了
#general-log = 1
#log = /log/mysql_query.log路徑
#也可以設定變數那樣更改,1開啟（0關閉）,即時生效,不用重啟,首選當然是這樣的了
# set global general_log=1
#這個日誌對於操作頻繁的庫,產生的資料量會很快增長,出於對硬碟的保護,可以設定其他存放路徑
#set global general_log_file=/tmp/general_log.log

#mysql記錄客戶端IP：init_connect，有super許可權的使用者是不記錄的，
# create table t1 ( cur_user varchar(100), n_user varchar(100),in_time timestamp default current_timestamp()) ;
# set global init_connect='insert into test.t1 (cur_user,n_user) values (current_user(),user())'; 
# SHOW CREATE TABLE mysql.general_log\G ，開啟general_log日誌也行：
# https://dba.stackexchange.com/questions/33654/mysql-logging-activity-from-specific-user-or-ip 
#SELECT REVERSE(SUBSTRING_INDEX(REVERSE(USER()),'@',1)) as ip;
#SELECT SUBSTRING(USER(), LOCATE('@', USER())+1) as ip;
#select SUBSTRING_INDEX(host,':',1) as 'ip' from information_schema.processlist WHERE ID=connection_id();

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

# crontab 部署方式：

# */1 0-10 * * * cd /opt/ooxx/script && bash mysql_perf.sh >> logs/mysql_perf.log.`date -I` 2>&1

date -Iseconds

echo '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> top -bn1|head'

top -bn1|head

echo

echo '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SHOW ENGINE INNODB STATUS\G '

mysql -uroot -pooxx -h127.0.0.1 -e 'SHOW ENGINE INNODB STATUS\G'

echo

echo '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> show open tables where in_use>0;'

mysql -uroot -pooxx -h127.0.0.1 -e 'show open tables where in_use>0;'

echo

echo '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> show full processlist;'

mysql -uroot -pooxx -h127.0.0.1 -e 'show full processlist;'

echo

echo '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SELECT * FROM `information_schema`.`innodb_trx` ORDER BY `trx_started`; '

mysql -uroot -pooxx -h127.0.0.1 -e 'SELECT * FROM `information_schema`.`innodb_trx` ORDER BY `trx_started`;'

echo

echo '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SELECT * FROM `information_schema`.`innodb_locks`;'

mysql -uroot -pooxx -h127.0.0.1 -e 'SELECT * FROM `information_schema`.`innodb_locks`;'

echo

echo ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>> show status like '%lock%';"

mysql -uroot -pooxx -h127.0.0.1 -e "show status like '%lock%';"

echo

echo '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> show global status like "table_locks%";'

mysql -uroot -pooxx -h127.0.0.1 -e 'show global status like "table_locks%";'

echo

echo ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>> too long omit..."

mysql -uroot -pooxx -h127.0.0.1 -e "select r.trx_isolation_level, r.trx_id waiting_trx_id, r.trx_mysql_thread_id waiting_trx_thread, r.trx_state waiting_trx_state, lr.lock_mode waiting_trx_lock_mode, lr.lock_type waiting_trx_lock_type, lr.lock_table waiting_trx_lock_table, lr.lock_index waiting_trx_lock_index, r.trx_query waiting_trx_query, b.trx_id blocking_trx_id, b.trx_mysql_thread_id blocking_trx_thread, b.trx_state blocking_trx_state, lb.lock_mode blocking_trx_lock_mode, lb.lock_type blocking_trx_lock_type, lb.lock_table blocking_trx_lock_table, lb.lock_index blocking_trx_lock_index, b.trx_query blocking_query from information_schema.innodb_lock_waits w inner join information_schema.innodb_trx b on b.trx_id=w.blocking_trx_id inner join information_schema.innodb_trx r on r.trx_id=w.requesting_trx_id inner join information_schema.innodb_locks lb on lb.lock_trx_id=w.blocking_trx_id inner join information_schema.innodb_locks lr on lr.lock_trx_id=w.requesting_trx_id\G"

echo

echo ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>> too long omit..."

mysql -uroot -pooxx -h127.0.0.1 -e "SELECT r.trx_id waiting_trx_id, r.trx_mysql_thread_id waiting_thread, r.trx_query waiting_query, b.trx_id blocking_trx_id, b.trx_mysql_thread_id blocking_thread, b.trx_query blocking_query FROM information_schema.innodb_lock_waits w INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id INNER JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id\G"

echo

date -Iseconds

echo '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>=================================================================================='

echo

file_name=mysql_perf.log.`date -I`

if [[ `date +%-H%-M` == 1059 ]]

then

cd /opt/ooxx/script/logs

chmod 777 /home/work/ooxx/$file_name

find /home/work/ooxx -name 'mysql_perf.log.*' -mtime +7 -delete

cd /opt/data/mysql

cp tjtx-ooxx-slow.log /home/work/ooxx/tjtx-ooxx-slow.log.`date -I`

chmod 777 /home/work/ooxx/tjtx-ooxx-slow.log.`date -I`

find /home/work/ooxx -name 'tjtx-ooxx-slow.log.*' -mtime +7 -delete

cp mysqld.log /home/work/ooxx/mysqld.log.`date -I`

chmod 777 /home/work/ooxx/mysqld.log.`date -I`

find /home/work/ooxx -name 'mysqld.log.*' -mtime +7 -delete

################

# 開啟 general_log 全量明細日誌會降低資料庫 5% 效能

#if [[ "`date +%H%M`" == "0545" ]]

#then

# echo "`date +%H%M` ------- set global general_log=1;"

# mysql -uroot -pooxx -h127.0.0.1 -e 'set global general_log=1;'

#elif [[ "`date +%H%M`" == "0630" ]]

#then

# echo "`date +%H%M` ------- set global general_log=0;"

# mysql -uroot -pooxx -h127.0.0.1 -e 'set global general_log=0;'

#elif [[ "`date +%H%M`" == "0745" ]]

#then

# echo "`date +%H%M` ------- set global general_log=1;"

# mysql -uroot -pooxx -h127.0.0.1 -e 'set global general_log=1;'

#elif [[ "`date +%H%M`" == "0830" ]]

#then

# echo "`date +%H%M` ------- set global general_log=0;"

# mysql -uroot -pooxx -h127.0.0.1 -e 'set global general_log=0;'

#elif [[ "`date +%H%M`" == "0001" ]]

#then

# echo "`date +%H%M` ------- set global general_log=1;"

# mysql -uroot -pooxx -h127.0.0.1 -e 'set global general_log=1;'

#elif [[ "`date +%H%M`" == "0002" ]]

#then

# echo "`date +%H%M` ------- set global general_log=0;"

# mysql -uroot -pooxx -h127.0.0.1 -e 'set global general_log=0;'

#fi

#[[ 10#`date +%H%M` -lt 10#0550 || 10#`date +%H%M` -gt 10#0830 ]] && echo "`date +%H%M` ------- set global general_log=0;" && mysql -uroot -pooxx -h127.0.0.1 -e 'set global general_log=0;'

# --show variables like '%tx_isolation%';

# --SELECT @@GLOBAL.tx_isolation, @@tx_isolation, @@session.tx_isolation;

# -- SET GLOBAL tx_isolation = 'READ-COMMITTED';

# --show variables like '%timeout%';

# --show variables like 'innodb_lock_wait_timeout';

# -- SET GLOBAL innodb_lock_wait_timeout=60

# --show variables like 'long_query_time';

# -- SET global long_query_time=3;

# -- show variables like 'innodb_rollback_on_timeout';

# -- show VARIABLES like '%max_allowed_packet%';

# -- set global max_allowed_packet = 100*1024*1024;

# 自動提交

# -- show variables like 'autocommit';

# 慢查詢

# -- show variables like '%slow_query_log%';

# set global 只對當前資料庫生效，如果MySQL重啟後則會失效。如果要永久生效，就必須修改配置檔案my.cnf（其它系統變數也是如此）。

# set global slow_query_log=1;

# -- show variables like 'long_query_time%';

# set global long_query_time=4;

# show global variables like 'long_query_time';

# select sleep(5);

# -- show variables like 'log_queries_not_using_indexes';

# set global log_queries_not_using_indexes=1;

# -- show variables like 'log_slow_admin_statements';

# -- show global status like '%Slow_queries%';

# http://www.cnblogs.com/kerrycode/p/5593204.html

# -- show variables like "%time_zone%";

#set global time_zone = '+8:00';

#開啟general_log日誌

# -- show variables like 'general%';

#可以在my.cnf裡新增,1開啟（0關閉）,當然了,這樣要重啟才能生效,有點多餘了

#general-log = 1

#log = /log/mysql_query.log路徑

#也可以設定變數那樣更改,1開啟（0關閉）,即時生效,不用重啟,首選當然是這樣的了

# set global general_log=1

#這個日誌對於操作頻繁的庫,產生的資料量會很快增長,出於對硬碟的保護,可以設定其他存放路徑

#set global general_log_file=/tmp/general_log.log

#mysql記錄客戶端IP：init_connect，有super許可權的使用者是不記錄的，

# create table t1 ( cur_user varchar(100), n_user varchar(100),in_time timestamp default current_timestamp()) ;

# set global init_connect='insert into test.t1 (cur_user,n_user) values (current_user(),user())';

# SHOW CREATE TABLE mysql.general_log\G ，開啟general_log日誌也行：

# https://dba.stackexchange.com/questions/33654/mysql-logging-activity-from-specific-user-or-ip

#SELECT REVERSE(SUBSTRING_INDEX(REVERSE(USER()),'@',1)) as ip;

#SELECT SUBSTRING(USER(), LOCATE('@', USER())+1) as ip;

#select SUBSTRING_INDEX(host,':',1) as 'ip' from information_schema.processlist WHERE ID=connection_id();

Refer：

[1] MySQL批量SQL插入效能優化

http://database.51cto.com/art/201309/411050.htm

[2] MySQL 四種事務隔離級的說明

http://www.cnblogs.com/zhoujinyi/p/3437475.html

[3] innodb_rollback_on_timeout引數對鎖的影響

http://q.fireflyclub.org/?/article/37

[4] 日均萬條資料丟失，一個隱式騷操作導致的奇葩事故！

https://mp.weixin.qq.com/s/OpPV24ILewVc2lRDrV4vvA

[5] mysql之show engine innodb status解讀（轉）

http://www.cnblogs.com/ajianbeyourself/p/6941905.html

http://www.cnblogs.com/olinux/p/5497176.html

[6] Mysql 鎖機制筆記

http://www.jianshu.com/p/fa28035656a9

http://www.jianshu.com/p/ec4a343c49f0

http://hudeyong926.iteye.com/blog/1490687

[7] MySQL索引原理及慢查詢優化

https://tech.meituan.com/mysql-index.html

[8] [譯文]MySQL發生死鎖腫麼辦？

http://www.10tiao.com/html/672/201612/2653929751/1.html

[9] MySQL的timeout那點事

http://www.penglixun.com/tech/database/mysql_timeout.html

[10] mysql死鎖(deadlock)分析及解決（索引間隙鎖 sql順序）

http://www.youyong.top/article/1158d0e3bcf

MySQL列印死鎖日誌
2021-11-15
MySql
簡單介紹MySQL列印死鎖日誌的方法
2022-03-14
MySql
MySQL 事務、日誌、鎖、索引學習總結，
2021-08-11
MySql索引
MySQL 事務日誌
2024-07-23
MySql
mysql 鎖的慢日誌
2021-12-13
MySql
MySQL鎖等待與死鎖問題分析
2021-03-30
MySql
MySQL-14.MySQL事務日誌
2024-06-23
MySql
MySQL事務與鎖
2022-02-06
MySql
MySQL死鎖分析與解決之路
2022-02-16
MySql
mysql之鎖與事務
2018-03-24
MySql
Mysql 事務日誌(Ib_logfile)
2018-11-06
MySql
MySQL 死鎖和鎖等待
2020-04-13
MySql
MySQL入門--事務與鎖
2019-06-27
MySql
mysql鎖與事務總結
2018-12-28
MySql
《MySQL 進階篇》十九：事務日誌
2024-09-22
MySql
面試：什麼是死鎖，如何避免或解決死鎖；MySQL中的死鎖現象，MySQL死鎖如何解決
2024-11-10
面試MySql
MySQL 死鎖解決
2019-02-28
MySql
MySQL解決死鎖
2024-06-09
MySql
MySQL死鎖問題
2022-10-21
MySql
MySQL死鎖系列-線上死鎖問題排查思路
2020-10-19
MySql
MySQL（一）：MySQL資料庫事務與鎖
2020-12-12
MySql資料庫
mysql行鎖和死鎖檢測
2019-05-12
MySql
Mysql 兩階段鎖和死鎖
2022-10-12
MySql
MySQL redo與undo日誌解析
2020-08-28
MySql
S 鎖與 X 鎖的愛恨情仇《死磕MySQL系列四》
2021-11-02
MySql
mysql死鎖最佳化
2024-05-07
MySql
MySQL：死鎖一例
2019-07-19
MySql
MySQL 死鎖問題分析
2020-04-09
MySql
【MySQL】死鎖案例之六
2018-03-27
MySql
【MySQL】死鎖案例之七
2018-03-27
MySql
【MySQL】死鎖案例之八
2018-03-28
MySql
Mysql如何處理死鎖
2021-09-11
MySql
MySQL：MTS和mysqldump死鎖
2021-02-16
MySql
MySQL：一個死鎖分析 (未分析出來的死鎖)
2018-10-16
MySql
例項詳解 Java 死鎖與破解死鎖
2022-02-28
Java
mysql事務處理與鎖機制
2020-12-14
MySql
Mysql鎖與事務隔離級別
2020-12-07
MySql
Mysql日誌
2024-09-11
MySql
EventLoop二三事
2018-09-12
OOP

MySQL 死鎖與日誌二三事

1、Case1：部分資料更新失敗

（1）innodb 行鎖不是隻鎖一行？

（2）回滾的話，為什麼只有部分 update 語句失敗，而不是整個事務裡的所有 update 都失敗？

（3）怎樣降低 innodb 死鎖機率？

2、Case2：詭異的 Lock wait timeout

3、MySQL 日誌分析指令碼

Refer：

相關文章