故障分析 | MySQL 相同 SQL 不同環境執行時間不一樣案例分析

愛可生雲資料庫發表於2023-02-16

作者:付祥

現居珠海,主要負責 Oracle、MySQL、mongoDB 和 Redis 維護工作。

本文來源:原創投稿

*愛可生開源社群出品,原創內容未經授權不得隨意使用,轉載請聯絡小編並註明來源。


1、問題現象

開發反饋同一條SQL在qa環境執行需要0.1s,而在dev環境需要0.3~0.5s,SQL如下:

SELECT machine.id,
       machine.asset_number,
       machine.sn,
       machine.state,
       machine.idc_id,
       machine.cabinet_id,
       machine.cabinet_order,
       machine.unit_size,
       machine.brand_model,
       machine.buy_time,
       machine.expiration_time,
       machine.warranty,
       machine.renewstart_time,
       machine.renewend_time,
       machine.warranty_company_id,
       machine.renewal_type,
       machine.check_hardware,
       machine.machine_purchase_price,
       machine.tags,
       machine.memo,
       machine.cpu_core_count,
       machine.cpu_model,
       machine.cpu_count,
       machine.memory_count,
       machine.memory_size,
       machine.wire_standard,
       machine.disk_num,
       machine.netcard_total_count,
       machine.netcard_1g,
       machine.netcard_10g,
       machine.os_version,
       machine.kernel_version,
       machine.raid,
       machine.power,
       machine.firmware,
       machine.manage_card_ip,
       machine.hostname,
       machine.private_mac,
       machine.public_mac,
       machine.private_ip,
       machine.public_ip,
       machine.other_ips,
       machine.create_time,
       machine.update_time,
       machine.creator,
       machine.updater,
       machine.delete_flag,
       machine.disk_desc_id,
       res.id res_id,
       res.owner_company_code,
       res.owner_company_name,
       res.project_id,
       res.project_group_id,
       res.sub_project_id,
       res.finance_product_id,
       res.finance_product_name,
       res.sub_project_name,
       res.admin_id,
       res.admin_name,
       res.owner_id,
       res.owner_name,
       2 AS resource_type,
       res.resource_id,
       res.machine_usage_types,
       res.machine_usage_names,
       cdl1.display AS check_hardware_name,
       cdl2.display AS state_name,
       cdl3.display AS brand_model_name,
       cdl4.display AS renewal_type_name,
       cdl5.display AS power_name,
       cdl6.display AS unit_size_name,
       cec.company_name AS warranty_company_name,
       cc.serial_number AS cabinet_name,
       ci.name AS idc_name,
       dd.disk_desc AS disk_desc_name,
       machine.virtual_ip,
       machine.qingteng_binded,
       machine.qingteng_id,
       machine.remark
  FROM CMDB_PHYSICAL_MACHINE machine
  LEFT JOIN cmdb_dropdown_list cdl1
    ON (machine.check_hardware=cdl1.code and cdl1.type="HardwareCheck")
  LEFT JOIN cmdb_dropdown_list cdl2
    ON (machine.state=cdl2.code and cdl2.type="DeviceStatus")
  LEFT JOIN cmdb_dropdown_list cdl3
    ON (machine.brand_model=cdl3.code and cdl3.type="BrandModels")
  LEFT JOIN cmdb_dropdown_list cdl4
    ON (machine.renewal_type=cdl4.code and cdl4.type="RenewalType")
  LEFT JOIN cmdb_dropdown_list cdl5
    ON (machine.power=cdl5.code and cdl5.type="PowerInfo")
  LEFT JOIN cmdb_dropdown_list cdl6
    ON (machine.unit_size=cdl6.code and cdl6.type="UnitSize")
  LEFT JOIN cmdb_external_company cec
    ON (machine.warranty_company_id=cec.id)
  LEFT JOIN cmdb_cabinet cc
    ON (machine.cabinet_id=cc.id)
  LEFT JOIN cmdb_disk_desc dd
    ON (machine.disk_desc_id=dd.id)
 inner JOIN cmdb_idc ci
    ON (machine.idc_id=ci.id and ci.delete_flag=0)
  left join cmdb_resource_group res
    on (machine.id = res.resource_id and res.resource_type = 2)
 where 1=1
   AND machine.delete_flag=0
 order by id desc
 LIMIT 0,30

2、分析

檢視SQL執行計劃,發現2個環境執行計劃不一樣,導致執行效率不同。

qa環境SQL執行計劃:

+----+-------------+---------+------------+--------+-----------------------------+---------------+---------+----------------------------------+------+----------+-------------+
| id | select_type | table   | partitions | type   | possible_keys               | key           | key_len | ref                              | rows | filtered | Extra       |
+----+-------------+---------+------------+--------+-----------------------------+---------------+---------+----------------------------------+------+----------+-------------+
|  1 | SIMPLE      | machine | NULL       | index  | NULL                        | PRIMARY       | 4       | NULL                             |    1 |    10.00 | Using where |
|  1 | SIMPLE      | cdl1    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |    3 |   100.00 | Using where |
|  1 | SIMPLE      | cdl2    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |    9 |   100.00 | Using where |
|  1 | SIMPLE      | ci      | NULL       | eq_ref | PRIMARY                     | PRIMARY       | 4       | omms.machine.idc_id              |    1 |    10.00 | Using where |
|  1 | SIMPLE      | res     | NULL       | eq_ref | resource_id,idx_resource_id | resource_id   | 5       | omms.machine.id,const            |    1 |   100.00 | NULL        |
|  1 | SIMPLE      | cdl3    | NULL       | ref    | idx_type_code               | idx_type_code | 124     | const,omms.machine.brand_model   |    1 |   100.00 | Using where |
|  1 | SIMPLE      | cdl4    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |    3 |   100.00 | Using where |
|  1 | SIMPLE      | cdl5    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |    2 |   100.00 | Using where |
|  1 | SIMPLE      | cdl6    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |   10 |   100.00 | Using where |
|  1 | SIMPLE      | cec     | NULL       | eq_ref | PRIMARY                     | PRIMARY       | 4       | omms.machine.warranty_company_id |    1 |   100.00 | Using where |
|  1 | SIMPLE      | cc      | NULL       | eq_ref | PRIMARY                     | PRIMARY       | 4       | omms.machine.cabinet_id          |    1 |   100.00 | NULL        |
|  1 | SIMPLE      | dd      | NULL       | eq_ref | PRIMARY                     | PRIMARY       | 4       | omms.machine.disk_desc_id        |    1 |   100.00 | NULL        |
+----+-------------+---------+------------+--------+-----------------------------+---------------+---------+----------------------------------+------+----------+-------------+
12 rows in set, 1 warning (0.01 sec)

dev環境SQL執行計劃:

+----+-------------+---------+------------+--------+-----------------------------+---------------+---------+----------------------------------+------+----------+----------------------------------------------------+
| id | select_type | table   | partitions | type   | possible_keys               | key           | key_len | ref                              | rows | filtered | Extra                                              |
+----+-------------+---------+------------+--------+-----------------------------+---------------+---------+----------------------------------+------+----------+----------------------------------------------------+
|  1 | SIMPLE      | ci      | NULL       | ALL    | PRIMARY                     | NULL          | NULL    | NULL                             |    8 |    12.50 | Using where; Using temporary; Using filesort       |
|  1 | SIMPLE      | machine | NULL       | ALL    | NULL                        | NULL          | NULL    | NULL                             | 1976 |     1.00 | Using where; Using join buffer (Block Nested Loop) |
|  1 | SIMPLE      | res     | NULL       | eq_ref | resource_id,idx_resource_id | resource_id   | 5       | omms.machine.id,const            |    1 |   100.00 | NULL                                               |
|  1 | SIMPLE      | cdl1    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |    3 |   100.00 | Using where                                        |
|  1 | SIMPLE      | cdl2    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |    9 |   100.00 | Using where                                        |
|  1 | SIMPLE      | cdl3    | NULL       | ref    | idx_type_code               | idx_type_code | 124     | const,omms.machine.brand_model   |    1 |   100.00 | Using where                                        |
|  1 | SIMPLE      | cdl4    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |    3 |   100.00 | Using where                                        |
|  1 | SIMPLE      | cdl5    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |    2 |   100.00 | Using where                                        |
|  1 | SIMPLE      | cdl6    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |   10 |   100.00 | Using where                                        |
|  1 | SIMPLE      | cec     | NULL       | eq_ref | PRIMARY                     | PRIMARY       | 4       | omms.machine.warranty_company_id |    1 |   100.00 | Using where                                        |
|  1 | SIMPLE      | cc      | NULL       | eq_ref | PRIMARY                     | PRIMARY       | 4       | omms.machine.cabinet_id          |    1 |   100.00 | NULL                                               |
|  1 | SIMPLE      | dd      | NULL       | eq_ref | PRIMARY                     | PRIMARY       | 4       | omms.machine.disk_desc_id        |    1 |   100.00 | NULL                                               |
+----+-------------+---------+------------+--------+-----------------------------+---------------+---------+----------------------------------+------+----------+----------------------------------------------------+

其中,qa環境選擇machine作為驅動表,ci作為被驅動表,ci.id有主鍵索引,故表關聯採用Index Nested Loop 演演算法,並利用主鍵索引有序性避免了排序,這裡驅動表machine基數為1,實際上應該為30,而dev環境選擇ci作為驅動表,machine是被驅動表,由於machine.idc_id列無索引,故表關聯採用Block Nested Loop演演算法,且需要排序,導致了SQL執行效率不一樣。

為何相同SQL不同環境執行計劃不一樣,帶著這個疑問做了如下操作:

2.1、檢查表、索引、資料分佈

結果:基本一致

2.2、重新收集統計資訊

     結果:重新收集了dev環境表machine、ci統計資訊,還是同樣執行計劃。

2.3、資料庫版本

結果:qa環境為5.7.34,dev環境為5.7.25,會不會因為版本差異,檢視了引數optimizer_switch,發現5.7.34多了一個選項:prefer_ordering_index=on,官方檔案解釋如下:

Controls whether, in the case of a query having an ORDER BY or GROUP BY with a LIMIT clause, the optimizer tries to use an ordered index instead of an unordered index, a filesort, or some other optimization. This optimzation is performed by default whenever the optimizer determines that using it would allow for faster execution of the query.

Because the algorithm that makes this determination cannot handle every conceivable case (due in part to the assumption that the distribution of data is always more or less uniform), there are cases in which this optimization may not be desirable. Prior to MySQL 5.7.33, it ws not possible to disable this optimization, but in MySQL 5.7.33 and later, while it remains the default behavior, it can be disabled by setting the prefer_ordering_index flag to off.

當引數prefer_ordering_index為on,order by 帶有limit時,最佳化器傾向於使用索引去避免排序,5.7.33以前預設就是開啟的,5.7.33以後可以關閉。似乎也排除了版本差異,但心有不甘,抱著試試看態度把dev環境升級到了5.7.34,果然和版本差異無關,還是同樣執行計劃。

2.4、STRAIGHT_JOIN人工幹預執行計劃

透過STRAIGHT_JOIN提示選擇machine作為驅動表,利用其主鍵索引有序性避免排序

SELECT ......省略輸出......
 STRAIGHT_JOIN cmdb_idc ci
    ON (machine.idc_id=ci.id and ci.delete_flag=0)
  left join cmdb_resource_group res
    on (machine.id = res.resource_id and res.resource_type = 2)
 where 1=1
   AND machine.delete_flag=0
 order by id desc
 LIMIT 0,30


+----+-------------+---------+------------+--------+-----------------------------+---------------+---------+----------------------------------+------+----------+-------------+
| id | select_type | table   | partitions | type   | possible_keys               | key           | key_len | ref                              | rows | filtered | Extra       |
+----+-------------+---------+------------+--------+-----------------------------+---------------+---------+----------------------------------+------+----------+-------------+
|  1 | SIMPLE      | machine | NULL       | index  | NULL                        | PRIMARY       | 4       | NULL                             |    1 |    10.00 | Using where |
|  1 | SIMPLE      | res     | NULL       | eq_ref | resource_id,idx_resource_id | resource_id   | 5       | omms.machine.id,const            |    1 |   100.00 | NULL        |
|  1 | SIMPLE      | cdl1    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |    3 |   100.00 | Using where |
|  1 | SIMPLE      | cdl2    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |    9 |   100.00 | Using where |
|  1 | SIMPLE      | cdl3    | NULL       | ref    | idx_type_code               | idx_type_code | 124     | const,omms.machine.brand_model   |    1 |   100.00 | Using where |
|  1 | SIMPLE      | cdl4    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |    3 |   100.00 | Using where |
|  1 | SIMPLE      | cdl5    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |    2 |   100.00 | Using where |
|  1 | SIMPLE      | cdl6    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |   10 |   100.00 | Using where |
|  1 | SIMPLE      | cec     | NULL       | eq_ref | PRIMARY                     | PRIMARY       | 4       | omms.machine.warranty_company_id |    1 |   100.00 | Using where |
|  1 | SIMPLE      | cc      | NULL       | eq_ref | PRIMARY                     | PRIMARY       | 4       | omms.machine.cabinet_id          |    1 |   100.00 | NULL        |
|  1 | SIMPLE      | dd      | NULL       | eq_ref | PRIMARY                     | PRIMARY       | 4       | omms.machine.disk_desc_id        |    1 |   100.00 | NULL        |
|  1 | SIMPLE      | ci      | NULL       | eq_ref | PRIMARY                     | PRIMARY       | 4       | omms.machine.idc_id              |    1 |    12.50 | Using where |
+----+-------------+---------+------------+--------+-----------------------------+---------------+---------+----------------------------------+------+----------+-------------+
12 rows in set, 1 warning (0.00 sec)

這樣雖然能解決問題,但是解決問題的方式並不優雅。

2.5、分析SQL、改寫SQL

為了排除干擾,將無關緊要left join表去掉,簡化SQL如下:

      SELECT *
      FROM CMDB_PHYSICAL_MACHINE machine
      JOIN cmdb_idc ci 
        ON (machine.idc_id=ci.id and ci.delete_flag=0)
     where 1=1
       AND machine.delete_flag=0
     order by machine.id desc
     LIMIT 0,30;

dev和qa環境執行計劃一致:

root@3306 omms>  explain SELECT *
    ->       FROM CMDB_PHYSICAL_MACHINE machine
    ->       JOIN cmdb_idc ci 
    ->         ON (machine.idc_id=ci.id and ci.delete_flag=0)
    ->      where 1=1
    ->        AND machine.delete_flag=0
    ->      order by machine.id desc
    ->      LIMIT 0,30;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                              |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
|  1 | SIMPLE      | machine | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2087 |    10.00 | Using where; Using temporary; Using filesort       |
|  1 | SIMPLE      | ci      | NULL       | ALL  | PRIMARY       | NULL | NULL    | NULL |   21 |     4.76 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

雖然選擇machine作為驅動表,但是卻選擇了Block Nested Loop演演算法,也產生了排序,仔細分析SQL,其實條件ci.delete_flag=0是多餘的,因為有效的機器所在機房一定是有效的,可以去跟開發核實,這個條件可以去掉,正是因為這個條件影響了驅動表選擇,使得執行計劃不穩定,將ci.delete_flag=0去掉後執行計劃:

root@3306 omms>  explain SELECT *
    ->       FROM CMDB_PHYSICAL_MACHINE machine
    ->       JOIN cmdb_idc ci 
    ->         ON (machine.idc_id=ci.id)
    ->      where 1=1
    ->        AND machine.delete_flag=0
    ->      order by machine.id desc
    ->      LIMIT 0,30;
+----+-------------+---------+------------+--------+---------------+---------+---------+---------------------+------+----------+-------------+
| id | select_type | table   | partitions | type   | possible_keys | key     | key_len | ref                 | rows | filtered | Extra       |
+----+-------------+---------+------------+--------+---------------+---------+---------+---------------------+------+----------+-------------+
|  1 | SIMPLE      | machine | NULL       | index  | NULL          | PRIMARY | 4       | NULL                |   30 |    10.00 | Using where |
|  1 | SIMPLE      | ci      | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | omms.machine.idc_id |    1 |   100.00 | NULL        |
+----+-------------+---------+------------+--------+---------------+---------+---------+---------------------+------+----------+-------------+
2 rows in set, 1 warning (0.01 sec)

原始SQL,去掉ci.delete_flag=0條件後執行計劃如下:

root@3306 omms> explain SELECT machine.id,
    ......省略輸出......
    ->  inner JOIN cmdb_idc ci
    ->     ON (machine.idc_id=ci.id)
    ->   left join cmdb_resource_group res
    ->     on (machine.id = res.resource_id and res.resource_type = 2)
    ->  where 1=1
    ->    AND machine.delete_flag=0
    ->  order by id desc
    ->  LIMIT 0,30;
+----+-------------+---------+------------+--------+-----------------------------+---------------+---------+----------------------------------+------+----------+-------------+
| id | select_type | table   | partitions | type   | possible_keys               | key           | key_len | ref                              | rows | filtered | Extra       |
+----+-------------+---------+------------+--------+-----------------------------+---------------+---------+----------------------------------+------+----------+-------------+
|  1 | SIMPLE      | machine | NULL       | index  | NULL                        | PRIMARY       | 4       | NULL                             |    1 |    10.00 | Using where |
|  1 | SIMPLE      | ci      | NULL       | eq_ref | PRIMARY                     | PRIMARY       | 4       | omms.machine.idc_id              |    1 |   100.00 | NULL        |
|  1 | SIMPLE      | res     | NULL       | eq_ref | resource_id,idx_resource_id | resource_id   | 5       | omms.machine.id,const            |    1 |   100.00 | NULL        |
|  1 | SIMPLE      | cdl1    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |    3 |   100.00 | Using where |
|  1 | SIMPLE      | cdl2    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |    9 |   100.00 | Using where |
|  1 | SIMPLE      | cdl3    | NULL       | ref    | idx_type_code               | idx_type_code | 124     | const,omms.machine.brand_model   |    1 |   100.00 | Using where |
|  1 | SIMPLE      | cdl4    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |    3 |   100.00 | Using where |
|  1 | SIMPLE      | cdl5    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |    2 |   100.00 | Using where |
|  1 | SIMPLE      | cdl6    | NULL       | ref    | idx_type_code               | idx_type_code | 62      | const                            |   10 |   100.00 | Using where |
|  1 | SIMPLE      | cec     | NULL       | eq_ref | PRIMARY                     | PRIMARY       | 4       | omms.machine.warranty_company_id |    1 |   100.00 | Using where |
|  1 | SIMPLE      | cc      | NULL       | eq_ref | PRIMARY                     | PRIMARY       | 4       | omms.machine.cabinet_id          |    1 |   100.00 | NULL        |
|  1 | SIMPLE      | dd      | NULL       | eq_ref | PRIMARY                     | PRIMARY       | 4       | omms.machine.disk_desc_id        |    1 |   100.00 | NULL        |
+----+-------------+---------+------------+--------+-----------------------------+---------------+---------+----------------------------------+------+----------+-------------+
12 rows in set, 1 warning (0.01 sec)

3、總結

書寫SQL時,心裡要明白哪種執行計劃是最優的,比如多張表關聯時,是否可以適當利用標量子查詢、排除干擾驅動表選擇因素,使執行計劃簡單穩定。

相關文章