之前寫過一篇文章MySQL如何獲取binlog的開始時間和結束時間[1],文章裡面介紹瞭如何獲取MySQL資料庫二進位制日誌(binlog)的開始時間與結束時間的一些方法。實際應用當中,我們可能還會遇到效率/效能方面的問題。最近對這個問題做了一些研究,這裡就介紹一下如何快速獲取MySQL二進位制日誌(binlog)的開始時間和結束時間。
我們下來看看當MySQL二進位制日誌(binlog)的Size很大的時候,獲取起開始時間和結束時間,如下測試所示
$ du -sh mysql_binlog.000105
1.1G mysql_binlog.000105
$ time mysqlbinlog mysql_binlog.000105 |grep "Start: binlog" | awk -F "server id" '{print $1}'
#240425 9:20:26
real 0m34.136s
user 0m25.941s
sys 0m11.985s
從上面實驗可以看出,在MySQL二進位制日誌(binlog)變大的情況下,這種方法需要34秒,非常低效和耗時,那麼我們怎麼提升效能呢? 我們改寫一下指令碼,如下所示
$ time mysqlbinlog mysql_binlog.000105 | head -10 | grep "Start: binlog" | awk -F "server id" '{print $1}'
#240425 9:20:26
real 0m0.010s
user 0m0.006s
sys 0m0.005s
如上所示,這樣改進指令碼後,效能效率已經提升到0.01秒,已經相當的高效了。那麼獲取結束時間能否也可以這樣提升呢? 很遺憾的是由於MySQL二進位制日誌(binlog)的結束時間/滾動時間(Rotate Time)位於檔案的末尾,由於管道的一些基本特性,獲取MySQL二進位制日誌(binlog)的結束時間無法透過上面方法來最佳化,這裡不打算介紹Linux管道相關概念,所以我們只需知道這麼一個事實。
如果你對Linux管道的一些原理不是很清楚,那麼就用實驗測試驗證一下,如下所示:
$ time mysqlbinlog mysql_binlog.000105 |grep Rotate | awk -F "server id" '{print $1}'
#240426 11:11:37
real 0m34.223s
user 0m27.202s
sys 0m11.551s
$ time mysqlbinlog mysql_binlog.000105 | tail -10 | grep Rotate | awk -F "server id" '{print $1}'
#240426 11:11:37
real 0m33.917s
user 0m25.528s
sys 0m11.395s
那麼怎麼來最佳化獲取MySQL二進位制日誌(binlog)的結束時間呢?經過一番觀察與實驗,我發現一個MySQL二進位制日誌(binlog)的結束時間,就是下一個二進位制日誌(binlog)的開始時間。如下實驗所示
[mysql@dbtest04 bin_logs]$ ls -lrt
total 28
-rw-r----- 1 mysql mysql 207 May 9 15:25 mysql_binlog.000055
-rw-r----- 1 mysql mysql 207 May 9 15:27 mysql_binlog.000056
-rw-r----- 1 mysql mysql 207 May 10 11:02 mysql_binlog.000057
-rw-r----- 1 mysql mysql 207 May 10 11:34 mysql_binlog.000058
-rw-r----- 1 mysql mysql 207 May 10 11:38 mysql_binlog.000059
-rw-r----- 1 mysql mysql 157 May 10 11:38 mysql_binlog.000060
-rw-r----- 1 mysql mysql 246 May 10 11:38 mysql_binlog.index
[mysql@dbtest04 bin_logs]$ mysqlbinlog mysql_binlog.000055 | head -10 | grep "Start: binlog" | awk -F "server id" '{print $1}'
#240509 14:48:10
[mysql@dbtest04 bin_logs]$ mysqlbinlog mysql_binlog.000055 |grep Rotate | awk -F "server id" '{print $1}'
#240509 15:25:57
[mysql@dbtest04 bin_logs]$ mysqlbinlog mysql_binlog.000056 | head -10 | grep "Start: binlog" | awk -F "server id" '{print $1}'
#240509 15:25:57
[mysql@dbtest04 bin_logs]$ mysqlbinlog mysql_binlog.000056 |grep Rotate | awk -F "server id" '{print $1}'
#240509 15:27:37
[mysql@dbtest04 bin_logs]$ mysqlbinlog mysql_binlog.000057 | head -10 | grep "Start: binlog" | awk -F "server id" '{print $1}'
#240509 15:27:37
[mysql@dbtest04 bin_logs]$ mysqlbinlog mysql_binlog.000057 |grep Rotate | awk -F "server id" '{print $1}'
#240510 11:02:00
[mysql@dbtest04 bin_logs]$
如果全部符合這個規律的話,那麼我們直接用下一個binlog的開始時間作為上一個binlog的結束時間即可,於是我寫了一個指令碼find_binlog_start_end_time.sh:
#!/bin/bash
#########################################################################################
# #
# This script is used for get the binlog start time and end time #
# #
#########################################################################################
# #
# ScriptName : find_binlog_start_end_time.sh #
# Author : Kerry #
# CreateDate : 2024-05-10 #
# Email : kerry2008code@qq.com #
#***************************************************************************************#
# 引數配置 #
#---------------------------------------------------------------------------------------#
# 指令碼引數 binlog檔案存放的路徑 #
#---------------------------------------------------------------------------------------#
# MYSQLBINLOG mysqlbinlog的位置,以防沒有設定環境變數 #
# BINLOG_BASENAME binlog的字首名 #
#---------------------------------------------------------------------------------------#
# 注意事項: #
# 1:如果維護的MySQL資料庫都規範化安裝、配置的化,下面很多引數都不需要修改 #
#***************************************************************************************#
# Version Modified Date Description #
#***************************************************************************************#
# V.1.0 2024-05-10 建立此指令碼 #
#########################################################################################
# mysqlbinlog的路徑,一般無需設定,以防沒有設定環境變數時
MYSQLBINLOG="/opt/mysql/mysql8.0/bin/mysqlbinlog"
BINLOG_BASENAME="mysql_binlog"
if [ $# = 0 ]
then
echo "find_binlog_start_end_time.sh Usage:"
echo "for eg: find_binlog_start_end_time.sh /data/mysql/binlogs"
exit
fi
BINLOG_FILE_PATH=$1
if [ ! -d $BINLOG_FILE_PATH ];then
echo "the folder $BINLOG_FILE_PATH does not exist, please check it!"
exit 1
fi
index=1
start_time=""
end_time=""
last_binlog_name=""
BINLOG_FILE_NUM=`ls -lrt $BINLOG_FILE_PATH | grep $BINLOG_BASENAME |grep -v $BINLOG_BASENAME.index | wc -l`
if [ $BINLOG_FILE_NUM -lt 1 ];then
echo "pelase check the binlog or the parameter of this script"
exit 1;
fi
cd $BINLOG_FILE_PATH
for binlog_file in `ls -rt | grep $BINLOG_BASENAME |grep -v $BINLOG_BASENAME.index`;
do
if [ $index -eq 1 ];then
start_time=`$MYSQLBINLOG $binlog_file | head -10 | grep "Start: binlog" | awk -F "server id" '{print $1}'`
last_binlog_name=$binlog_file
else
end_time=`$MYSQLBINLOG $binlog_file | head -10 | grep "Start: binlog" | awk -F "server id" '{print $1}'`
echo "file name:$last_binlog_name" , "start time:$start_time", "end time:$end_time"
if [ $index -eq $BINLOG_FILE_NUM ];then
last_end_time=`$MYSQLBINLOG $binlog_file |tail -10 | egrep "Rotate|Stop" | awk -F "server id" '{print $1}'`
echo "file name:$binlog_file" , "start time:$end_time", "end time:$last_end_time"
else
start_time=$end_time
last_binlog_name=$binlog_file
fi
fi
let index++
done
如下測試所示
[mysql@dbtest04 kerry]$ sh find_binlog_start_end_time.sh /data/mysql/bin_logs/
file name:mysql_binlog.000055 , start time:#240509 14:48:10 , end time:#240509 15:25:57
file name:mysql_binlog.000056 , start time:#240509 15:25:57 , end time:#240509 15:27:37
file name:mysql_binlog.000057 , start time:#240509 15:27:37 , end time:#240510 11:02:00
file name:mysql_binlog.000058 , start time:#240510 11:02:00 , end time:#240510 11:34:01
file name:mysql_binlog.000059 , start time:#240510 11:34:01 , end time:#240510 11:38:24
file name:mysql_binlog.000060 , start time:#240510 11:38:24 , end time:#240510 16:45:34
file name:mysql_binlog.000061 , start time:#240510 16:45:34 , end time:
另外,還有一種比較高效的方法是解析二進位制日誌的頭部資訊(此篇文章統統指binlog v 4),因為binlog的頭部由固定的4個位元組組成,而頭部資訊的FORMAT_DESCRIPTION_EVENT部分包含了binlog的開始時間,我在搜尋/學習相關資料時,結果發現有人已經總結過這方面的內容,而且已經有相關Python指令碼或shell指令碼了,這裡就重複造輪子了,Python指令碼來自MySQL 查詢binlog生成時間[2]我們來看看實驗結果,如下所示
$ python3 check_bintime.py /data/mysql/bin_logs/mysql_binlog.index
{'file_name': 'mysql_binlog.000055', 'binlog_size': '207.0 B', 'start_time': '2024-05-09 14:48:10', 'end_time': '2024-05-09 15:25:57'}
{'file_name': 'mysql_binlog.000056', 'binlog_size': '207.0 B', 'start_time': '2024-05-09 15:25:57', 'end_time': '2024-05-09 15:27:37'}
{'file_name': 'mysql_binlog.000057', 'binlog_size': '207.0 B', 'start_time': '2024-05-09 15:27:37', 'end_time': '2024-05-10 11:02:00'}
{'file_name': 'mysql_binlog.000058', 'binlog_size': '207.0 B', 'start_time': '2024-05-10 11:02:00', 'end_time': '2024-05-10 11:34:01'}
{'file_name': 'mysql_binlog.000059', 'binlog_size': '207.0 B', 'start_time': '2024-05-10 11:34:01', 'end_time': '2024-05-10 11:38:24'}
{'file_name': 'mysql_binlog.000060', 'binlog_size': '180.0 B', 'start_time': '2024-05-10 11:38:24', 'end_time': '2024-05-10 16:45:34'}
{'file_name': 'mysql_binlog.000061', 'binlog_size': '157.0 B', 'start_time': '2024-05-10 16:45:34', 'end_time': 'now'}
shell指令碼來自一種快速取得binlog開始時間的方法[3]:
[mysql@dbtest04 kerry]$ function getBinlogStartTime()
> {
> theFile="$1"
> #取出檔案頭做分析
> binlogHead=`hexdump ${theFile} | head -1`
> #binlog檔案校驗
> binlogCrc=`echo $binlogHead | awk '{print $1$2$3}'`
> if [ "${binlogCrc}" != '000000062fe6e69' ]; then
> echo '${theFile} is damage.'
> exit 1
> fi
> #計算binlog的開始時間
> binlogBeginTimeInt=`echo $binlogHead | awk '{print $5$4}' | gawk ' { printf strtonum("0x" $0)}' `
> binlogBeginTime=`date -d "1970-01-01 UTC $binlogBeginTimeInt seconds" "+%F %T"`
> echo $binlogBeginTime
> }
[mysql@dbtest04 kerry]$ getBinlogStartTime /data/mysql/bin_logs/mysql_binlog.000055
2024-05-09 14:48:10
不過你看其指令碼也會發現,解析MySQL的binlog的頭部檔案,比較容易獲取binlog的開始時間,而獲取binlog的結束時間/輪轉時間就比較麻煩了。所以python指令碼中獲取binlog結束時間的思路跟我的思路也是一樣的。
總結:
這裡介紹了兩種快速獲取binlog的開始時間和結束時間的兩種方法,這兩種方法都非常高效,至於我寫的指令碼find_binlog_start_end_time.sh目前還比較粗糙,後面有時間再完善補充。
參考資料
1: https://www.cnblogs.com/kerrycode/p/17377899.html
[2]2: https://blog.csdn.net/qq_42768234/article/details/126970988
[3]3: https://blog.csdn.net/shaochenshuo/article/details/120549377