檢視MySQL鎖等待的原因

czxin788發表於2018-07-17

--sys庫的介紹

    mysql 5.6也可以有sys庫(基於performance_schema的檢視)。sys庫是一個開源專案,在githup上早就有,是一個DBA的開源工具,後來mysql感覺好,就放在了mysql5.7上。

    下載地址:

    [tms@m-db3 ~]$ cd mysql-sys-master

    [tms@m-db3 ~]$ mysql < sys_56.sql

    這樣,就可以在mysql5.6裡面加入sys庫了,不過mysql 5.6只有88張表,而mysql 5.7有101張,這是因為Mysql 5.7的performace_schema庫裡面又多了幾張表。

    sys庫是performance_schema的檢視。

--MySQL鎖等待

    當Mysql發生鎖等待情況時,可以透過如下語句來線上檢視:

mysql> select * from sys.innodb_lock_waits \G;
*************************** 1. row ***************************
                wait_started: 2018-07-16 16:25:17  //鎖等待開始的時間,16:25開始等待
                    wait_age: 00:10:08    //發現問題時已經等待了10分鐘了
               wait_age_secs: 608   //608秒,也就是等10分鐘了
                locked_table: `iws`.`busi_reconciliationgbgsinfo_inputdetails` //被鎖住的表名
                locked_index: PRIMARY  //被鎖住的索引
                 locked_type: RECORD //鎖的型別為行鎖
              waiting_trx_id: 13666265  //waiting transaction id,正在等待事務的id號
         waiting_trx_started: 2018-07-16 16:24:54 //這個事務是從16:24開始等待
             waiting_trx_age: 00:10:31 //等了10分鐘了
     waiting_trx_rows_locked: 1 //正在等待的這個事務鎖住了1行記錄
   waiting_trx_rows_modified: 0 //正在等待的這個事務修改了0行記錄
                 waiting_pid: 441805 //這個等待事務的執行緒id是多少,透過show processlist 命令可以查到它,結果看到是一個sleep的執行緒,沒有執行具體sql語句,見下
               waiting_query: update busi_reconciliationgbgs ...                where id = 4510 //等待鎖釋放的語句
             waiting_lock_id: 13666265:2924:21:94 //正在等待的鎖id
           waiting_lock_mode: X //等待鎖的型別是排它鎖
             blocking_trx_id: 13666259 //這個事務id阻塞了waiting lock
                blocking_pid: 441803 阻塞事務的pid
              blocking_query: NULL  //阻塞事務的sql語句
            blocking_lock_id: 13666259:2924:21:94
          blocking_lock_mode: X
        blocking_trx_started: 2018-07-16 16:24:51
            blocking_trx_age: 00:10:34
    blocking_trx_rows_locked: 1
  blocking_trx_rows_modified: 1
     sql_kill_blocking_query: KILL QUERY 441803
sql_kill_blocking_connection: KILL 441803
1 row in set (0.00 sec)
ERROR: 
No query specified

    上面看到輸出了很多的東西,看的我都蒙圈了。後來檢視mysql官方文件,慢慢的才發現,其實只關注上面的waiting_pid、waiting_query和blocking_pid、blocking_query四個引數即可;其中waiting_pid和blocking_pid兩個引數就是透過執行show processlist命令裡面輸出的執行緒id號,如下:

mysql> show full processlist \G;
*************************** 8. row ***************************
     Id: 441803
   User: iws
   Host: 172.16.21.7:46121
     db: iws
Command: Sleep
   Time: 655
  State: 
   Info: NULL
*************************** 9. row ***************************
     Id: 441805
   User: iws
   Host: 172.16.21.7:46122
     db: iws
Command: Query
   Time: 652
  State: updating
   Info: update busi_reconciliationgbgsinfo_inputdetails     set                bgs_id = 1622              ,         date = '2018-06-24 00:00:00'              ,         awbnumber = '006-85516771'              ,         incidental = 15.00              ,         entry_exit = 23.00              ,         warehousing_fee = 0.00              ,         loading_unloading = 0.00              ,         other = 0.00              ,         total = 38.00                     ,         state = 20              ,         comparison_resultsid = 30              ,         confirmation_method = '人工'              ,         confirmationid = 'root'              ,         confirmationtime = '2018-07-16 16:25:17'              ,         confirmation_note = '.'              ,         createtime = '2018-06-24 20:00:07'                     ,         createrid = '9862ebdbaf3249a88bcaa8f01bde0471'                        where id = 4510

    透過上面兩個的輸出結果,我們明白了,是441803執行緒鎖住了表,造成執行緒441805的等待。

    我們看到發生等待的執行緒441805對應的sql語句是:update busi_reconciliationgbgs ... where id = 4510,但是鎖表的執行緒441803對應的sql語句竟然是Null。這就更讓人迷惑了。

    於是我默默的翻開了ysql官方文件,原來裡面已經對這個null專門做了說明。

    官方文件說,要想找到這個null值對應的阻塞語句,可以透過下面幾個步驟尋找:

    a)、根據鎖表的processlist id 441803,運用如下sql,找到null對應的sql語句,如下:

SELECT  SQL_TEXT FROM performance_schema.events_statements_current WHERE THREAD_ID
in 
(SELECT THREAD_ID FROM performance_schema.threads WHERE PROCESSLIST_ID=441803)

    b)、如果上面找到的sql語句,你還是不能分析出為什麼他們會鎖表,一直拿著鎖不釋放,那麼你可以檢視 performance_schema.events_statements_history表裡面最近執行過的10條sql(假設上面查到的thread_id=28):

SELECT EVENT_ID,CURRENT_SCHEMA, SQL_TEXT FROM performance_schema.events_statements_history WHERE THREAD_ID
in 
(SELECT THREAD_ID FROM performance_schema.threads WHERE PROCESSLIST_ID=441803) order by event_id

   其他:

    上面查詢鎖的sql可以只關注已下幾個列,如下:

SELECT
wait_started,
wait_age,
waiting_pid,
waiting_query,
blocking_trx_id,
blocking_pid,
blocking_query,
blocking_lock_mode,
sql_kill_blocking_query
FROM
sys.innodb_lock_waits

~~~~~~~~~~~~分割線~~~~~~~~~~~~~~~~~

    最近我用python 2.6寫了個自動殺鎖的指令碼,只要發現sys.innodb_lock_waits表裡面有鎖表的內容,就殺死相應的sql執行緒,並輸出殺死sql的內容到當前目錄下:

#!/usr/bin/env python
#-*-coding:utf8-*-
#下載rpm包安裝,下載地址:https://dev.mysql.com/downloads/connector/python/,注意mysql-connector-python版本需要是1.1.17的,2.x的版本執行會有問題
from __future__ import print_function
import mysql.connector as mdb
import os
#全域性變數
username = 'root'
password = ''
hostname = 'localhost'
database = 'sys'
#配置資訊
config = {
    'user': username,
    'password': password,
    'host': hostname,
    'database': database
}
#定義函式,檢視鎖表的行數
def Get_sys_lock():
    show_locked_num = "select count(*) from sys.innodb_lock_waits"
    cursor.execute(show_locked_num)
    for i in cursor:
        locked_sql_num = i[0]
    return locked_sql_num
    
#定義函式,如果有鎖表,就重定向到locked_sql.txt檔案裡面
def show_locked_sql():
    count = 0
    count1 = 0
    #如果日誌檔案存在就刪除
    if os.path.isfile('locked_sql.txt'):
        os.remove('locked_sql.txt')
    if os.path.isfile('null_sql.txt'):
        os.remove('null_sql.txt')
    if os.path.isfile('last_10_null_sql.txt'):
        os.remove('last_10_null_sql.txt')
    #引用函式
    locked_sql_num = Get_sys_lock()
    print("鎖表的行數是:{0}".format(locked_sql_num))
    if locked_sql_num > 0: #如果有鎖表
        show_locked_sql = " SELECT \
            wait_started, \
            wait_age, \
            waiting_pid, \
            waiting_query, \
            blocking_trx_id, \
            blocking_pid, \
            blocking_query, \
            blocking_lock_mode, \
            sql_kill_blocking_query \
            FROM \
            sys.innodb_lock_waits \
            "
        cursor.execute(show_locked_sql)
        for i in cursor:
            wait_started = i[0]
            wait_age = i[1]
            waiting_pid = i[2]
            waiting_query = i[3]
            blocking_trx_id = i[4]
            blocking_pid = i[5]
            blocking_query = i[6]
            blocking_lock_mode = i[7]
            sql_kill_blocking_query = i[8]
            if not str(blocking_query).strip(): #如果blocking_query字串為Null
                #import pdb;pdb.set_trace()
                show_null_sql = "SELECT  SQL_TEXT FROM performance_schema.events_statements_current WHERE THREAD_ID in (SELECT THREAD_ID FROM performance_schema.threads WHERE PROCESSLIST_ID=%s)" % blocking_pid
                conn = mdb.connect(**config)
                cursor1 = conn.cursor()
                cursor1.execute(show_null_sql)
                #print(cursor1.fetchall())
                for j in cursor1:
                    SQL_TEXT = j[0]
                    print(SQL_TEXT)
                cursor1.close
                try:
                    count1 += 1
                    f = open('null_sql.txt','a') #a表示追加
                    f.write (
                        '##########' + 'The ' + str(count1) + ' rows ' + 'Blocking null query對應的具體sql為##########\n' +
                        'blocking_pid: ' + str(blocking_pid) + '\n'
                        'sql_text: ' + str(SQL_TEXT) + '\n\n'
                    )
                except OSError as reason:
                    print('出錯了:' + str(reason))
                finally:
                    f.close
                #再檢視null對應的最後10條sql
                show_last_10_null_sql = "SELECT EVENT_ID,CURRENT_SCHEMA, SQL_TEXT FROM performance_schema.events_statements_history WHERE THREAD_ID in (SELECT THREAD_ID FROM performance_schema.threads WHERE PROCESSLIST_ID=%s) order by event_id" % blocking_pid
                cursor2 = conn.cursor()
                cursor2.execute(show_last_10_null_sql)
                cursor2.close
                #print(cursor1.fetchall())
                count2 = 0
                for j in cursor2:
                    EVENT_ID = j[0]
                    CURRENT_SCHEMA = j[1]
                    SQL_TEXT = j[2]
                    try:
                        count2 += 1
                        f = open('last_10_null_sql.txt','a') #a表示追加
                        f.write (
                            '##########' + 'The ' + str(count2) + ' rows ' + 'laster blocking null query對應的具體sql為##########\n' +
                            'blocking_pid: ' + str(blocking_pid) + '\n'
                            'EVENT_ID: ' + str(EVENT_ID) + '\n'
                            'CURRENT_SCHEMA: ' + str(CURRENT_SCHEMA) + '\n'
                            'SQL_TEXT: ' + str(SQL_TEXT) + '\n\n'
                        )   
                    except OSError as reason:
                        print('出錯了:' + str(reason))
                    finally:
                        f.close
            #把鎖表的情況重定向到一個locked_sql.txt檔案裡面
            try:
                count += 1
                f = open('locked_sql.txt','a') #a表示追加
                f.write('##########' + 'The ' + str(count) + ' rows' + '###########\n')
                f.write (
                    'wait_started: ' + str(wait_started) + '\n' +
                    'wait_age: ' + str(wait_age) + '\n' + 
                    'waiting_pid: ' + str(waiting_pid )  + '\n' +
                    'waiting_query: ' + str(waiting_query) + '\n' +
                    'blocking_trx_id: ' + str(blocking_trx_id) + '\n' +
                    'blocking_pid: ' + str(blocking_pid) + '\n' +
                    'blocking_query: ' + str(blocking_query) + '\n' +
                    'blocking_lock_mode: ' + str(blocking_lock_mode)  + '\n' +
                    'sql_kill_blocking_query: ' + str(sql_kill_blocking_query) + '\n\n'
                )
                '''
                f.write (
                    '##########' + 'Blocking null query對應的具體sql為##########\n' +
                    'blocking_pid:' + str(blocking_pid) +
                    'sql_text:' + str(SQL_TEXT)
                    
                )
                '''
            except OSError as reason:
                print('出錯了:' + str(reason))
            finally:
                f.close
#定義函式,列出當前所有執行的sql執行緒
def show_processlist():
    count = 0
    #如果日誌檔案存在就刪除
    if os.path.isfile('show_processlist.txt'):
        os.remove('show_processlist.txt')
    #引用函式
    locked_sql_num = Get_sys_lock()
    #print("鎖表的行數是:{0}".format(locked_sql_num))
    if locked_sql_num > 0: #如果有鎖表
        show_processlist = "select \
            id, \
            user, \
            host, \
            db, \
            time, \
            state, \
            info \
            from information_schema.`PROCESSLIST`  order  by time desc \
            "
        cursor.execute(show_processlist)
        for i in cursor:
            id = i[0]
            user = i[1]
            host = i[2]
            db = i[3]
            time = i[4]
            state = i[5]
            info = i[6]
            #把鎖表的情況重定向到一個show_processlist.txt檔案裡面
            try:
                count += 1
                f = open('show_processlist.txt','a') #a表示追加
                f.write('##########' + 'The ' + str(count) + ' rows' + '###########\n')
                f.write (
                    'id: ' + str(id) + '\n' +
                    'user: ' + str(user) + '\n' + 
                    'host: ' + str(host)  + '\n' +
                    'db: ' + str(db) + '\n' +
                    'time: ' + str(time) + '\n' +
                    'state: ' + str(state) + '\n' +
                    'info: ' + str(info) + '\n\n' 
                )
            except OSError as reason:
                print('出錯了:' + str(reason))
            finally:
                f.close
#定義函式,如果有鎖表,就殺死
def kill_locked_sql():
    #引用函式
    locked_sql_num = Get_sys_lock()
    #print("鎖表的行數是:{0}".format(locked_sql_num))
    if locked_sql_num > 0: #如果有鎖表
        execute_locked_sql = " SELECT \
            sql_kill_blocking_query \
            FROM \
            sys.innodb_lock_waits \
            "
        cursor.execute(execute_locked_sql)
        for i in cursor:
            sql_kill_blocking_query = i[0]
            conn = mdb.connect(**config)
            cursor1 = conn.cursor()
            try:
                cursor1.execute(sql_kill_blocking_query)
            except:
                print('出錯了')
            cursor1.close
        
#主程式
conn = mdb.connect(**config)
cursor = conn.cursor()
show_locked_sql()
show_processlist()
kill_locked_sql()
cursor.close
conn.close



 --參考文件:

 https://dev.mysql.com/doc/refman/8.0/en/sys-innodb-lock-waits.html

 https://dev.mysql.com/doc/refman/8.0/en/innodb-information-schema-examples.html#innodb-information-schema-examples-null-blocking-query


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/28916011/viewspace-2158012/,如需轉載,請註明出處,否則將追究法律責任。

相關文章