NULL在oracle和mysql索引上的區別

小豹子加油發表於2021-12-25

一、問題

oracle的btree索引不儲存NULL值,所以用is null或is not null都不會用到索引範圍掃描,但是在mysql中也是這樣嗎?

二、實驗

先看看NULL在oracle(11g)中的情況
準備測試資料

SQL> create table t1 as select * from dba_objects;
SQL> update t1 set object_id = null where object_id > 17840;
SQL> update t1 set data_object_id = null where data_object_id > 60;
SQL> commit;
SQL> create index idx1_id on t1(object_id);
SQL> create index idx2_data on t1(data_object_id);

蒐集統計資訊

SQL> begin
dbms_stats.gather_table_stats(ownname => 'SCOTT',
tabname => 'T1',
estimate_percent => 100,
cascade => true,  
method_opt => 'for all indexed columns size auto', 
no_invalidate => false,
degree => 4);
end;
/

檢視資料分佈

SQL> select count(*) "總行數", 
  2         count(distinct object_id) "object_id非空不同值", 
  3         count(decode(object_id,null,1,null)) "object_id空值總數",
  4         count(distinct data_object_id) "data_object_id非空不同值",
  5         count(decode(data_object_id,null,1,null)) "data_object_id空值總數"
  6    from t1;

    總行數 object_id非空不同值 object_id空值總數 data_object_id非空不同值 data_object_id空值總數
---------- ------------------- ----------------- ------------------------ ----------------------
     13582               13578                 4                       47                  13510

執行sql,並檢視執行計劃
第1條sql:is null返回行數少

SQL> select * from t1 where object_id is null;  

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    50 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     4 |   352 |    50   (0)| 00:00:01 |
--------------------------------------------------------------------------

第2條sql:is not null返回行數多

SQL> select * from t1 where object_id is not null;  

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    50 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   | 13578 |  1166K|    50   (0)| 00:00:01 |
--------------------------------------------------------------------------

第3條sql:is null返回行數多

SQL> select * from t1 where data_object_id is null;  

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    50 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   | 13510 |  1161K|    50   (0)| 00:00:01 |
--------------------------------------------------------------------------

第4條sql:is not null返回行數少

SQL> select * from t1 where data_object_id is not null;  

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |     7 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |    72 |  6336 |     7   (0)| 00:00:01 |
|*  2 |   INDEX FULL SCAN           | IDX2_DATA |    72 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

可以看到第1條和第3條sql不會用到索引,這是由於oracle的btree索引並不儲存NULL,所以用is null作為條件在索引中找不到任何結果,只能全表掃。
第2條sql也沒有用到索引,因為返回的行數多。第4條sql用到了索引,但用的是索引全掃描,原理其實還是由於索引不儲存NULL,is not null正好跟索引特性相同。

接下來我們看看在mysql(8.0)中又會是什麼情形,通過工具把上面的表匯入到mysql中
更新t1表的統計資訊

analyze table t1;

檢視執行計劃
第5條sql:is null返回行數少

(scott@localhost)[hello]> explain select * from t1 where object_id is null; 
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra                 |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-----------------------+
|  1 | SIMPLE      | t1    | NULL       | ref  | IDX1_ID       | IDX1_ID | 5       | const |    4 |   100.00 | Using index condition |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-----------------------+

第6條sql:is not null返回行數多

(scott@localhost)[hello]> explain select * from t1 where object_id is not null; 
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | IDX1_ID       | NULL | NULL    | NULL | 13541 |    50.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-------------+

第7條sql:is null返回行數多

(scott@localhost)[hello]> explain select * from t1 where data_object_id is null;
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key       | key_len | ref   | rows | filtered | Extra                 |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-----------------------+
|  1 | SIMPLE      | t1    | NULL       | ref  | IDX2_DATA     | IDX2_DATA | 5       | const | 6770 |   100.00 | Using index condition |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-----------------------+

第8條sql:is not null返回行數少

(scott@localhost)[hello]> explain select * from t1 where data_object_id is not null;
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key       | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | t1    | NULL       | range | IDX2_DATA     | IDX2_DATA | 5       | NULL |   72 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-----------------------+

可以看到在mysql中,is not null會根據返回的行數來決定用不用索引,返回行數多不用索引,返回行數少用索引,這一點跟oracle相同。但is null都會用到索引,不管你返回的行數是多少,這點的確是出乎我的意料。既然is null用到索引,那麼難道是mysql的btree索引中包含NULL值?

檢視索引的資訊

(scott@localhost)[hello]> select * from mysql.innodb_index_stats where database_name='hello' and table_name='t1' and index_name in ('IDX1_ID', 'IDX2_DATA');

微信截圖_20211225163627.png
從索引的統計資訊中可以看到,mysql認為t1表的OBJECT_ID,DATA_OBJECT_ID的不同值分別是13579,48。而前面我們知道object_id非空不同值和data_object_id非空不同值分別為13578和47。兩者都相差1,那也就是說索引的確是含NULL值。

三、總結

  1. mysql中btree索引含NULL,這點跟oracle不一樣。
  2. mysql中用is null都會用到索引,不管返回的行數多少,我認為這是一個bug。
    水平有限,如果有誤,懇請大家指正!

相關文章