一個left join SQL 簡單優化分析

darren__chan發表於2018-12-04

有個關聯查詢的sql,需要2秒多,於是進行檢視一番:

SELECT
	a.id,
	a.brand_id,
	a.series_id,
	a.product_id,
	a.material_id,
	a.custom_category_id,
	a.price,
	a.product_url,
	a.organ_id,
	.....
FROM
	pm_brand_xxxx a
LEFT JOIN pm_brand_yyyyy d ON a.series_id = d.id
WHERE
	a.is_delete = 0
AND d.is_delete = 0
AND a.organ_id = 'Cxxx'
AND a.brand_id = 6491603
AND d.brand_id = 6491603
AND a.model_flag = 14;
mysql> show profile for query 4;
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000072 |
| checking permissions | 0.000002 |
| checking permissions | 0.000002 |
| Opening tables       | 0.000011 |
| init                 | 0.000026 |
| System lock          | 0.000007 |
| optimizing           | 0.000016 |
| statistics           | 0.000142 |
| preparing            | 0.000018 |
| executing            | 0.000002 |
| Sending data         | 2.281192 |<<<<<<<執行的主要時間消耗
| end                  | 0.000007 |
| query end            | 0.000011 |
| closing tables       | 0.000011 |
| freeing items        | 0.000030 |
| logging slow query   | 0.000003 |
| logging slow query   | 0.000102 |
| cleaning up          | 0.000022 |
+----------------------+----------+
+----+-------------+-------+------------+-------------+---------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------+---------+-------+-------+----------+------------------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | partitions | type        | possible_keys                                                                                                 | key                                                                       | key_len | ref   | rows  | filtered | Extra                                                                                                                                          |
+----+-------------+-------+------------+-------------+---------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------+---------+-------+-------+----------+------------------------------------------------------------------------------------------------------------------------------------------------+
|  1 | SIMPLE      | d     | NULL       | ref         | PRIMARY,idx_pm_yyyy_bid                                                                                        | idx_pm_yyyyy_bid                                                        | 9       | const |     1 |    10.00 | Using where                                                                                                                                    |
|  1 | SIMPLE      | a     | NULL       | index_merge | idx_pm_xxxx_sid,idx_pm_xxx_bid,idx_pm_brand_xxxx_organ                                                         | idx_pm_xxx_organ,idx_pm_brand_xxxx_bid                                   | 99,9    | NULL  | 11314 |     0.04 | Using intersect(idx_pm_xxxxx_organ,idx_pm_xxxx_bid); Using where; Using join buffer (Block Nested Loop)                                        |
+----+-------------+-------+------------+-------------+---------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------+---------+-------+-------+----------+------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)


從執行計劃來看,d表是做了驅動表,a做了被驅動表

d表 type = ref ,使用非唯一性索引或者唯一索引的字首掃描,返回匹配某個單獨值的記錄行,這裡使用了索引idx_pm_yyyyy_bid,該索引正是brand_id上的索引,

即是說,在和a表的關聯中d先通過brand_id來查詢記錄行,再通過相應記錄的id去和a表的series_id做匹配。

我檢視相應的記錄數,發現a表145萬的大表,d表是4075的小表。


a表

mysql> select count(*) from pm_xxxxxx;

+----------+

| count(*) |

+----------+

|  1459777 |

+----------+

1 row in set (0.27 sec)


d表:


mysql> select count(*) from pm_yyyyyy;

+----------+

| count(*) |

+----------+

|     4075 |

+----------+

1 row in set (0.00 sec)



而 a表是type=index_merge 索引合併,這裡走了idx_pm_xxx_organ(organ_id),idx_pm_brand_xxxx_bid(brand_id) ,extra 是

Using intersect(idx_pm_xxxxx_organ,idx_pm_xxxx_bid); Using where; Using join buffer (Block Nested Loop) 

Using intersect正說明了這裡使用了(idx_pm_xxxxx_organ,idx_pm_xxxx_bid)的交集

Using where 是用model_flag等這些其他條件的過濾

Using join buffer (Block Nested Loop) 說明使用BNL的演算法進行匹配

 BNL 演算法是將外層迴圈的行/結果集(驅動表)存入join buffer, 內層迴圈的每一行與整個buffer中的記錄做比較,從而減少內層迴圈的次數.


舉例來說,外層迴圈的結果集是100行,使用NLJ 演算法需要掃描內部表100次,如果使用BNL演算法,先把對Outer Loop表(外部表)每次讀取的10行記錄放到join buffer,然後在InnerLoop表(內部表)中直接匹配這10行資料,記憶體迴圈就可以一次與這10行進行比較, 這樣只需要比較10次,對內部表的掃描減少了9/10。所以BNL演算法就能夠顯著減少內層迴圈表掃描的次數.


在這裡就是d表中取得結果集分批放入buffer中與a表進行匹配。


而這個語句無論如何都要2秒中,也在我們的認識中小表驅動大表並沒錯,我的猜想應該就是在進行BNL時消耗了時間,表現到過程中就是 Sending data 的時間消耗增多。

吐槽的是mysql中貌似沒有什麼辦法來多方面看查詢消耗了。

我想到的是如果該表現有sql關聯的順序是否效能能改善,在該sql中,我發現了兩個條件:

AND a.brand_id = 6491603

AND d.brand_id = 6491603

在業務邏輯上這兩個表的欄位應該是一致的,如果我將d表的d.brand_id = 6491603去掉,以上的執行計劃應該會改變,於是去掉之後執行,執行時間非常小。

mysql> show profile for query 1;
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000080 |
| checking permissions | 0.000002 |
| checking permissions | 0.000002 |
| Opening tables       | 0.000012 |
| init                 | 0.000030 |
| System lock          | 0.000006 |
| optimizing           | 0.000014 |
| statistics           | 0.000130 |
| preparing            | 0.000016 |
| executing            | 0.000001 |
| Sending data         | 0.027325 |
| end                  | 0.000003 |
| query end            | 0.000015 |
| closing tables       | 0.000005 |
| freeing items        | 0.000014 |
| cleaning up          | 0.000009 |
+----------------------+----------+
16 rows in set, 1 warning (0.00 sec)
看其執行計劃:
+----+-------------+-------+------------+-------------+---------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------+---------+-------------------------+-------+----------+---------------------------------------------------------------------------------------------------------+
| id | select_type | table | partitions | type        | possible_keys                                                                                                 | key                                                                       | key_len | ref                     | rows  | filtered | Extra                                                                                                   |
+----+-------------+-------+------------+-------------+---------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------+---------+-------------------------+-------+----------+---------------------------------------------------------------------------------------------------------+
|  1 | SIMPLE      | a     | NULL       | index_merge | idx_pm_xxxxx_sid,idx_pm_xxxxx_bid,idx_pm_xxxx_organ                                                           | idx_pm_xxxxx_organ,idx_pm_xxxx_bid                                        | 99,9    | NULL                    | 11315 |     1.00 | Using intersect(idx_pm_xxxxx_organ,idx_pm_xxxx_bid); Using where                                        |
|  1 | SIMPLE      | d     | NULL       | eq_ref      | PRIMARY                                                                                                       | PRIMARY                                                                   | 8       | xxxx.a.series_id |     1 |    10.00 | Using where                                                                                             |
+----+-------------+-------+------------+-------------+---------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------+---------+-------------------------+-------+----------+---------------------------------------------------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)




發現變成了a表做驅動表,d表做被驅動表,從extra列看

a表是Using intersect(idx_pm_xxxxx_organ,idx_pm_xxxx_bid); Using where   依然是使用索引合併,where條件來取結果,使用了idx_pm_xxxxx_organ,idx_pm_xxxx_bid 連個索引。

d表走PRIMARY 主鍵索引,從ref列來看是通過a表的series_id 來關聯,這樣效率表提升了。


需要說的一點是,小結果集並不代表就是小表,大表也可以有小結果集,當大表用來被匹配並被掃描多次,自然效率並不高.



來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29863023/viewspace-2284069/,如需轉載,請註明出處,否則將追究法律責任。

相關文章