一.專案說明
1.1 專案目的
1.盤活伺服器資源,提高資源的使用率;資源是公司的資產,只有儘可能發揮其價值,才能創造更多的價值。所以,有必要知道,公司整體(或某業務、產品)所屬的 DB Server的資源使用情況。主要從CPU、記憶體、Disk的平均數和中位數來反映。實現更合理的資源分配和集中性的管理,節省資源成本。
2.慢查詢的次數,既可以說明程式的效能和Server的壓力,說明了待確認和優化的情況,也說明了資源的緊張性。
3.此類歷史資料的積累,可以生成一個變化趨勢圖,說明資源使用趨勢。
4.之前的監控大部分診斷具體的一個DB Server或應用,這個是針對公司整體(或某業務、產品)所屬的 DB Server;是監控體系的一個完善和補充。
即:資源盤活、充分利用、降本增效、監控補充。
1.2 部署環境及架構
現有的監控資料已收集到InfluxDB 和 elasticsearch 中,本次要實現的功能是將資料計算聚合到MySQL中,然後通過郵件傳送給相關人員。儲存到MySQL 資料庫中,一是因為 此類資料有一定的價值(具有追溯性和便於歷史趨勢分析),二是 InfluxDB 、elasticsearch 資料都有過期時間,資料保留的天數不是太長。
二.表的建立
2.1 儲存DB資源使用情況的表
表名定義為weekly_dbperformance,具體的指令碼如下:
CREATE TABLE `weekly_dbperformance` ( `id` int(11) NOT NULL AUTO_INCREMENT, `cpu_mean` varchar(255) NOT NULL DEFAULT '', `cpu_median` varchar(255) NOT NULL DEFAULT '', `mem_mean` varchar(255) NOT NULL DEFAULT '', `mem_median` varchar(255) NOT NULL DEFAULT '', `disk_mean` varchar(255) NOT NULL DEFAULT '', `disk_median` varchar(255) NOT NULL DEFAULT '', `datetime_created` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '資料行建立時間', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=3740 DEFAULT CHARSET=utf8mb4;
在記錄資料生成的時間欄位新增個索引
create index idx_datetime on weekly_dbperformance (datetime_created);
2.2 儲存DB 例項慢查詢情況的表
表名定義為weekly_dbslowqty,具體的指令碼如下:
CREATE TABLE `weekly_dbslowqty` ( `id` int(11) NOT NULL AUTO_INCREMENT, `qindex_name` varchar(50) NOT NULL DEFAULT '', `qstartdate` varchar(50) NOT NULL DEFAULT '', `qenddate` varchar(50) NOT NULL DEFAULT '', `slowqty` varchar(20) NOT NULL DEFAULT '', `datetime_created` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '資料行建立時間', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=3740 DEFAULT CHARSET=utf8mb4;
在記錄查詢的開始時間的欄位上新增個索引
create index idx_qstartdate on weekly_dbslowqty (qstartdate);
三.主要功能程式碼
3.1 統計DB Server資源使用率
可執行檔案為collect_dbperformance.py
從InfluxDB中查詢DB Server的資源使用情況。包括CPU的平均數、CPU的中位數、記憶體使用的平均數、記憶體使用的中位數、磁碟平均使用率、磁碟使用的中位數。
拉取計算的是過去7天的資料。
#!/usr/bin/python # -*- coding: UTF-8 -*- from influxdb import InfluxDBClient import pytz import time import dateutil.parser import datetime import db_monitor_conn mysqldb = db_monitor_conn.db # use cursor cursor = mysqldb.cursor() class DBApi(object): """ 通過infludb獲取資料 """ def __init__(self, ip, port): """ 初始化資料 :param ip:influxdb地址 :param port: 埠 """ self.db_name = 'telegraf' self.use_cpu_table = 'cpu' # cpu使用率表 self.phy_mem_table = 'mem'# 實體記憶體表 self.disk_table = 'disk'# 磁碟表 self.client = InfluxDBClient(ip, port, '用*戶*名', '密*碼', self.db_name) # 連線influxdb資料庫 print ('test link influxdb') def get_use_dbperformance(self, s_time, e_time): """ 獲取磁碟io使用率 :param host: 查詢的主機host (telegraf 配置引數中的host欄位) :param s_time: 開始時間 :param e_time: 結束時間 :return: """ response = {} ### 時間還需轉換,否則報錯 TypeError: Tuple or struct_time argument required #s = time.strptime(s_time, '%Y-%m-%d %H:%M:%S') #e = time.strptime(e_time, '%Y-%m-%d %H:%M:%S') s = time.strptime(s_time, '%Y-%m-%d') e = time.strptime(e_time, '%Y-%m-%d') start_time = int(time.mktime(s)) * 1000 * 1000 * 1000 end_time = int(time.mktime(e)) * 1000 * 1000 * 1000 #start_time = s_time #end_time = e_time cpu_mean_list = cpu_median_list = mem_mean_list = mem_median_list = disk_mean_list = disk_median_list = ['0.0'] ##print('開始查詢CPU使用率的平均數') cpu_mean_list = self.client.query( "select mean(usage_user) from cpu where time>=%s and time<=%s and cpu = 'cpu-total' AND host != 'qqlog_XXX_XXX' ;" % ( start_time, end_time)) ##print(cpu_mean_list) ### cpu_mean_list的格式 ResultSet({'('cpu', None)': [{'time': '2018-06-21T16:00:00Z', 'mean': 1.7141865567279297}]}) cpu_mean_points = list(cpu_mean_list.get_points(measurement='cpu')) ##print(cpu_mean_points) ### cpu_mean_points的格式[{'time': '2018-06-21T16:00:00Z', 'mean': 1.7141865567279297}] cpu_mean = cpu_mean_points[0]['mean'] ##print(cpu_mean) ### cpu_mean 的格式1.7141865567279297 ##print('查詢CPU使用率的平均數結束') ##print('開始查詢CPU使用率的中位數') cpu_median_list = self.client.query( "SELECT median(usage_user) from cpu where time>=%s and time<=%s and cpu = 'cpu-total' AND host != 'qqlog_XXX_XXX';" % ( start_time, end_time)) ##print(cpu_median_list) #### cpu_median_list的格式為ResultSet({'('cpu', None)': [{'time': '2018-06-21T16:00:00Z', 'median': 0.726817042581142}]}) cpu_median_points = list(cpu_median_list.get_points(measurement='cpu')) cpu_median = cpu_median_points[0]['median'] ##print(cpu_median) ##print('開始查詢mem使用率的平均數') mem_mean_list = self.client.query( "SELECT mean(used) /mean(total) from mem where time>=%s and time<=%s and host != 'qqlog_XXX_XXX';" % ( start_time, end_time)) print(mem_mean_list) ### mem_mean_list的格式為ResultSet({'('mem', None)': [{'time': '2018-06-21T16:00:00Z', 'mean_mean': 0.729324184536873}]}) mem_mean_points = list(mem_mean_list.get_points(measurement='mem')) mem_mean = mem_mean_points[0]['mean_mean'] ##print(mem_mean) ##print('開始查詢mem使用率的中位數') mem_median_list = self.client.query( "SELECT median(used) /median(total) from mem where time>=%s and time<=%s AND host != 'qqlog_XXX_XXX' ;" % ( start_time, end_time)) ##print(mem_median_list) ###mem_median_list的格式為ResultSet({'('mem', None)': [{'time': '2018-06-21T16:00:00Z', 'median_median': 0.8698493636354012}]}) mem_median_points = list(mem_median_list.get_points(measurement='mem')) mem_median = mem_median_points[0]['median_median'] ##print('開始查詢disk使用率的平均數') disk_mean_list = self.client.query( "SELECT mean(used) /mean(total) from disk where time>=%s and time<=%s AND host != 'qqlog_XXX_XXX';" % ( start_time, end_time)) ##print (disk_mean_list) ###disk_mean_list的格式為esultSet({'('disk', None)': [{'time': '2018-06-21T16:00:00Z', 'mean_mean': 0.31204798557786284}]}) disk_mean_points = list(disk_mean_list.get_points(measurement='disk')) disk_mean = disk_mean_points[0]['mean_mean'] ##print(disk_mean) ##print('開始查詢disk使用率的中位數') disk_median_list = self.client.query( "SELECT median(used) /median(total) from disk where time>=%s and time<=%s and host != 'qqlog_XXX_XXX';" % ( start_time, end_time)) ##print (disk_median_list) ###disk_median_list的格式ResultSet({'('disk', None)': [{'time': '2018-06-21T16:00:00Z', 'median_median': 0.08009824336938143}]}) disk_median_points = list(disk_median_list.get_points(measurement='disk')) ##print(disk_median_points) disk_median = disk_median_points[0]['median_median'] ##print(disk_median) ### 將計算統計的結果放到MySQl中,以便彙總傳送Report sql_insert = "insert into weekly_dbperformance(cpu_mean,cpu_median,mem_mean,mem_median,disk_mean,disk_median) " \ "values('%s','%s','%s','%s','%s','%s')" % \ (cpu_mean,cpu_median,mem_mean,mem_median,disk_mean,disk_median) cursor.execute(sql_insert) mysqldb.commit() def change_time(self, params): """ 時間轉換 :param params: :return: """ item = dateutil.parser.parse(params).astimezone(pytz.timezone('Asia/Shanghai')) result = str(item).split("+")[0] response = time.strptime(result, '%Y-%m-%d %H:%M:%S') param = time.strftime('%Y-%m-%d %H:%M:%S', response) return param # 連線 influxdb # INFLUXDB_IP influxdb所在主機 # INFLUXDB_PROT influxdb埠 db = DBApi(ip='XXX.110.119.XXX', port='?????') ###查詢的時間範圍 ### TypeError: strptime() argument 0 must be str, not <class 'datetime.datetime'> ##e_time = datetime.datetime.now() e_time = datetime.datetime.now().strftime('%Y-%m-%d') ##s_time = e_time + datetime.timedelta(-7) s_time = (datetime.datetime.now() + datetime.timedelta(-7)).strftime('%Y-%m-%d') print('列印查詢範圍----時間引數如下:') print(e_time) print(s_time) db.get_use_dbperformance(s_time,e_time) #print(disk_points)
注意:此份程式碼的執行環境是Python 3.6.8;此外還要注意下influxdb的query返回值的處理;可執行檔案可以通過crontab設定定時任務,週期性抓取資料。
3.2 統計DB例項的慢查詢
可執行檔案為count_dbslow.py
從elasticsearch中讀取慢查詢的資料,主要是統計符合條件的個數。
需要說明的是某產品線下的資料庫慢查詢放到Index命名一樣。本例中mysql-slow-qqorders-*,是查詢mysql-slow-qqorders-開通的所有慢查詢的個數,qqorders是具體的產品線程式碼,*是日期的模糊匹配。
#coding:utf8 import os import time from datetime import date ### 匯入模組 timedelta ,否則date.today()+ timedelta(days = -2) 報錯: AttributeError: 'datetime.date' object has no attribute 'timedelta' from datetime import timedelta from os import walk ###匯入模組的from datetime import datetime改成import datetime;否則在day = datetime.datetime.now()報錯:AttributeError: type object 'datetime.datetime' has no attribute 'datetime' ##from datetime import datetime import datetime from elasticsearch import Elasticsearch from elasticsearch.helpers import bulk import db_monitor_conn mysqldb = db_monitor_conn.db # use cursor cursor = mysqldb.cursor() ###資料收集前,清除之前收集的資料 ##sql_delete = "delete from weekly_dbslowqty " ##cursor.execute(sql_delete) ##mysqldb.commit() class ElasticObj: def __init__(self, index_name,index_type,ip ="ES例項所在的ServerIP"): ''' :param index_name: 索引名稱 :param index_type: 索引型別,預設為_doc ''' self.index_name =index_name self.index_type = index_type # 無使用者名稱密碼狀態 #self.es = Elasticsearch([ip]) #使用者名稱密碼狀態 self.es = Elasticsearch([ip],http_auth=('ES用*戶*名', 'ES用*戶*密*碼'),port=ES埠號) #### 獲取資料量 def Get_SlowQty_By_Indexname(self,dstart,dend): doc = { "query": { "bool": { "must": [ {"exists":{"field": "query_time"}}, {"range":{ "@timestamp": { "gte": dstart.strftime('%Y-%m-%d %H:%M:%S'), "lte": dend.strftime('%Y-%m-%d %H:%M:%S'), "format": "yyyy-MM-dd HH:mm:SS", "time_zone": "+08:00" } }} ], "must_not": [ ## 排除不符合條件的server,例如 排除 XXX.XXX.XXX.XXX {"term": {"fields.db_host": "XXX.110.119.XXX"}} ] } } } _slowqty = self.es.count(index=self.index_name, doc_type=self.index_type, body=doc) print(_slowqty) #### _slowqty 的返回格式是字典型別,如下{'count': 2374, '_shards': {'total': 16, 'successful': 16, 'skipped': 0, 'failed': 0}} slowqty = _slowqty['count'] print(slowqty) #### 將資料儲存到mysql中,以便傳送報表 sql_insert = "insert into weekly_dbslowqty(qindex_name,qstartdate,qenddate,slowqty) " \ "values('%s','%s','%s','%s')" % \ (self.index_name,dstart,dend,slowqty) cursor.execute(sql_insert) mysqldb.commit() obj =ElasticObj("mysql-slow-qqorders-*","_doc",ip ="ES 所在機器的 ServerIP") ###時間引數 ##day = datetime.datetime.now() ##start = datetime.datetime.strptime('20180628 00:00:00','%Y%m%d %H:%M:%S') ##end = datetime.datetime.strptime('20180629 00:00:00','%Y%m%d %H:%M:%S') ##dstart = (datetime.datetime.now() + datetime.timedelta(-2)) ##dend = (datetime.datetime.now() + datetime.timedelta(-1)) today = date.today() dstart = (date.today()+ timedelta(days = -2)).strftime('%Y-%m-%d') dend = (date.today()+ timedelta(days = -1)).strftime('%Y-%m-%d') ####print(dstart) ####print(dend) ###新增.strftime('%Y-%m-%d'),,否則報錯TypeError: strptime() argument 1 must be str, not datetime.date dstart = datetime.datetime.strptime(dstart,'%Y-%m-%d') dend = datetime.datetime.strptime(dend,'%Y-%m-%d') print(dstart) print(dend) obj.Get_SlowQty_By_Indexname(dstart,dend)
注意:此份程式碼的執行環境也是Python 3.6.8
3.3 傳送Server資源效能週報
可執行檔案為dbperformance_report_weekly.py
#!/usr/bin/python # -*- coding: UTF-8 -*- import sys reload(sys) sys.setdefaultencoding( "utf-8" ) import db_monitor_conn import os import time import smtp_config_dbperformance from email.mime.text import MIMEText from email.header import Header def send_mail(mail_msg): # 呼叫send_mail函式 mail_body = """ <style type="text/css"> table.gridtable { font-family: verdana,arial,sans-serif; font-size:11px; color:#333333; border-width: 1px; border-color: #666666; border-collapse: collapse; } table.gridtable th { border-width: 1px; padding: 8px; border-style: solid; border-color: #666666; background-color: #dedede; } table.gridtable td { border-width: 1px; padding: 8px; border-style: solid; border-color: #666666; background-color: #ffffff; } </style> <!-- Table goes in the document BODY --> <table class="gridtable"> <tr> <th>CPU平均數</th><th>CPU中位資料</th><th>記憶體平均數</th><th>記憶體中位資料</th> <th>Disk平均數</th><th>Disk中位數</th><th>統計時間</th> </tr> """ mail_body = mail_body + mail_msg + "</table>" message = MIMEText(mail_body, 'html', 'utf-8') subject = 'DB伺服器效能週報[資源效能]' message['Subject'] = Header(subject, 'utf-8') smtp_config_dbperformance.send_mail(message) return #定義郵件體變數 mail_msg = "" # 獲取資料庫連線 db = db_monitor_conn.db print(db) # 使用cursor()方法獲取操作遊標 cursor = db.cursor() # SQL 查詢語句 # 備份日報 sql_dbper_report = " select ROUND(cpu_mean,2) as cpu_mean,ROUND(cpu_median,2) as cpu_median ,ROUND(mem_mean *100 ,2)as mem_mean , " \ " ROUND(mem_median *100,2) as mem_median ,ROUND(disk_mean * 100,2) as disk_mean,ROUND(disk_median *100,2) as disk_median,date_format(datetime_created, '%Y-%m-%d') as datetime_created " \ " FROM weekly_dbperformance " \ " where 1=1" \ " order by datetime_created limit 1 " try: # 執行SQL語句 cursor.execute(sql_dbper_report) # 獲取所有記錄列表 results = cursor.fetchall() for row in results: cpu_mean = str(row[0]) cpu_median = str(row[1]) mem_mean = str(row[2]) mem_median = str(row[3]) disk_mean = str(row[4]) disk_median = str(row[5]) rdatetime = str(row[6]) # 生成郵件內容 注意郵件列數和引數的個數一直(<type 'exceptions.Exception'> not all arguments converted during string formatting) mail_msg_single = """ <tr> <td align="center">%s</td><td>%s</td><td align="right">%s</td> <td>%s</td><td align="right">%s</td><td align="right">%s</td> <td align="right">%s</td> </tr> """ % \ (cpu_mean, cpu_median, mem_mean, mem_median, disk_mean, disk_median, rdatetime) mail_msg = mail_msg + mail_msg_single # 傳送郵件 send_mail(mail_msg) except Exception as e: print str(Exception) print str(e) # 關閉遊標 cursor.close() # 關閉資料庫連線 db.close()
注意:此份程式碼的執行環境是Python 2.7.5
2.4 傳送DB 慢查詢週報
可執行檔案為dbslowlog_report_weekly.py
#!/usr/bin/python # -*- coding: UTF-8 -*- import sys reload(sys) sys.setdefaultencoding( "utf-8" ) import db_monitor_conn import os import time import smtp_config_dbperformance from email.mime.text import MIMEText from email.header import Header def send_mail(mail_msg): # 呼叫send_mail函式 mail_body = """ <style type="text/css"> table.gridtable { font-family: verdana,arial,sans-serif; font-size:11px; color:#333333; border-width: 1px; border-color: #666666; border-collapse: collapse; } table.gridtable th { border-width: 1px; padding: 8px; border-style: solid; border-color: #666666; background-color: #dedede; } table.gridtable td { border-width: 1px; padding: 8px; border-style: solid; border-color: #666666; background-color: #ffffff; } </style> <!-- Table goes in the document BODY --> <table class="gridtable"> <tr> <th>統計時間開始引數</th><th>時間結束引數</th><th>DB慢查詢個數</th> </tr> """ mail_body = mail_body + mail_msg + "</table>" message = MIMEText(mail_body, 'html', 'utf-8') subject = 'DB伺服器效能週報[DB慢查詢]' message['Subject'] = Header(subject, 'utf-8') smtp_config_dbperformance.send_mail(message) return #定義郵件體變數 mail_msg = "" # 獲取資料庫連線 db = db_monitor_conn.db print(db) # 使用cursor()方法獲取操作遊標 cursor = db.cursor() # SQL 查詢語句 # 備份日報 sql_dbslow_report = " select distinct qstartdate,qenddate,slowqty " \ " FROM weekly_dbslowqty " \ " where qindex_name ='mysql-slow-qqorders-*' and qstartdate >= date_sub(curdate(),interval 8 day) and qstartdate < date_sub(curdate(),interval 1 day) " \ " order by datetime_created asc " try: # 執行SQL語句 cursor.execute(sql_dbslow_report) # 獲取所有記錄列表 results = cursor.fetchall() for row in results: qstartdate = str(row[0]) qenddate = str(row[1]) slowqty = str(row[2]) # 生成郵件內容 注意郵件列數和引數的個數一直(<type 'exceptions.Exception'> not all arguments converted during string formatting) mail_msg_single = """ <tr> <td align="center">%s</td><td align="right">%s</td> <td align="right">%s</td> </tr> """ % \ (qstartdate, qenddate, slowqty) mail_msg = mail_msg + mail_msg_single # 傳送郵件 send_mail(mail_msg) except Exception as e: print str(Exception) print str(e) # 關閉遊標 cursor.close() # 關閉資料庫連線 db.close()
注意:此份程式碼的執行環境也是Python 2.7.5
3.5 其他模組
mysql的連線模組:db_monitor_conn
相應的程式碼可在《通過Python將監控資料由influxdb寫入到MySQL》一文中檢視,參閱db_conn.py的編寫,在此不再贅述。
簡訊傳送的模組:smtp_config_dbperformance
請參閱前面的分享《MySQL資料歸檔小工具推薦及優化--mysql_archiver》,github地址:https://github.com/dbarun/mysql_archiver 下載的程式碼,有傳送郵件的模組smtp_config.py,在此不再贅述。
四 實現
4.1 DBServer資源報告示樣
下圖是通過郵件的形式傳送某業務線下面DB Server資源使用率的郵件。
4.2 慢查詢報告示樣
下圖是通過郵件的形式傳送某業務線下面所有DB 例項的一週的SQL慢查詢的個數。
這是個簡單的Demo,專案規劃是隨著DB資源的監控指標清晰、完善,資料豐富,整合到一個Dashboard上。
五 題外話--DAS
阿里雲的DAS(Database Autonomy Service)是一種基於機器學習和專家經驗實現資料庫自感知、自修復、自優化、自運維及自安全的雲服務,幫助使用者消除資料庫管理的複雜性及人工操作引發的服務故障,有效保障資料庫服務的穩定、安全及高效,解決方案架構 如下圖。
個人認為, DAS 要實現的目標(自感知、自修復、自優化、自運維及自安全)是我們DBA的努力的方向。