NULL列時,如何使得IS NULL或者IS NOT NULL可以使用索引來提高查詢效率

zhang41082發表於2019-06-30


雖然我們一直強調在進行資料庫設計的時候,要儘量設定所有的列為NOT NULL,這樣的最直接的好處就是假如這個列上有一個索引,那麼對這個表進行COUNT(*)統計的時候(假設這個表只有這一列有索引,並且索引全掃描比 全表掃描COST低),ORACLE會直接對這個列的索引進行一個快速掃描得到COUNT結果,而不是進行一個全表掃描。雖然B*TREE索引是不儲存 NULL值的,但是因為你定義列的時候明確的告訴ORACLE這一列是NOT NULL的,所以ORACLE會知道掃描索引得到的結果跟全表掃描得到的結果是一致的(因為列的值都是NOT NULL的,也就是說列的所有的值都會儲存在了索引中)。但是,很多時候,有些列確實就是有空值存在,不能設定為NOT NULL。當然,這個時候可以使用DEFAULT值來代替空值從而使得列非空。或者單純為了統計的時候可以在列上建BITMAP索引使得NULL也被包含 在索引中。

這裡提到的將是另外方法,不改變列的屬性,而是透過改變索引來實現同樣的功能。[@more@]



create table bear(id number,c1 char(1000),c2 char(1000));
首先建立一個表,為了使得掃描索引的COST跟掃描全表的COST拉開差距,這裡跟了兩個CHAR(1000)的列

create sequence seq_bear;
建一個SEQUENCE用來生成ID列的資料
SQL> begin
2 for r in 1..10000 loop
3 insert into bear(id,c1,c2) values(seq_bear.nextval,'c1','c2');
4 end loop;
5 commit;
6 end;
7 /
SQL> create index idx_bear on bear(id);
SQL> analyze table bear compute statistics for table for all indexes for all indexed columns;
插入1萬條資料並建立ID欄位上的索引,同時對錶和索引進行分析。

1、先來看直接對錶進行COUNT的情況:
SQL> select count(*) from bear;

Execution Plan
----------------------------------------------------------
Plan hash value: 2093678280

-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 907 (1)| 00:00:11 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| BEAR | 10000 | 907 (1)| 00:00:11 |
-------------------------------------------------------------------
這個時候雖然ID列上有索引,並且所有值都非空,但是ORACLE無法判斷裡面是否有空值,所以是一個全表掃描。同樣IS NULL查詢也是一個全表掃描:
SQL> select count(*) from bear where id is null;

Execution Plan
----------------------------------------------------------
Plan hash value: 2093678280

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 907 (1)| 00:00:11 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| BEAR | 1 | 3 | 907 (1)| 00:00:11 |
---------------------------------------------------------------------------
但是IS NOT NULL查詢會是一個索引掃描
SQL> select count(*) from bear where id is not null;

Execution Plan
----------------------------------------------------------
Plan hash value: 4145915393

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 7 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | INDEX FAST FULL SCAN| IDX_BEAR | 10000 | 30000 | 7 (0)| 00:00:01 |
----------------------------------------------------------------------------------

2、修改ID列為NOT NULL,然後從新進行COUNT:
SQL> alter table bear modify id not null;
Table altered.

SQL> select count(*) from bear;

Execution Plan
----------------------------------------------------------
Plan hash value: 4145915393

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| IDX_BEAR | 10000 | 7 (0)| 00:00:01 |
--------------------------------------------------------------------------
可以看到,修改為NOT NULL後,ORACLE明確的知道這個索引中不包含NULL值,所以進行了一個索引快速全掃描。同時,IS NULL和IS NOT NULL也都會使用一個索引掃描:
SQL> select count(*) from bear where id is null;

Execution Plan
----------------------------------------------------------
Plan hash value: 1833075309

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 0 (0)| |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | FILTER | | | | | |
| 3 | INDEX FAST FULL SCAN| IDX_BEAR | 10000 | 30000 | 7 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
其實這裡只是一個執行計劃,ORACLE貌似不走這個SCAN直接就返回結果了,這個從執行後的consistent gets為0可以看的出來


SQL> select count(*) from bear where id is not null;

Execution Plan
----------------------------------------------------------
Plan hash value: 4145915393

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 7 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
| 2 | INDEX FAST FULL SCAN| IDX_BEAR | 10000 | 30000 | 7 (0)| 00:00:01 |
----------------------------------------------------------------------------------

4、接下來我們修改ID列為允許NULL,並且調整ID列上的索引:
SQL> alter table bear modify id null;
Table altered.

SQL> drop index idx_bear;
Index dropped.

SQL> create index idx_bear_null on bear(id,0);
Index created.

5、再次執行COUNT,IS NULL和IS NOT NULL相關查詢:
SQL> select count(*) from bear;

Execution Plan
----------------------------------------------------------
Plan hash value: 2000879051

-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| IDX_BEAR_NULL | 10000 | 8 (0)| 00:00:01 |
-------------------------------------------------------------------------------

SQL> select count(*) from bear where id is null;

Execution Plan
----------------------------------------------------------
Plan hash value: 1944383045

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | INDEX RANGE SCAN| IDX_BEAR_NULL | 1 | 3 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
注意這裡是一個RANGE SCAN,而不是FULL SCAN

SQL> select count(*) from bear where id is not null;

Execution Plan
----------------------------------------------------------
Plan hash value: 2000879051

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | INDEX FAST FULL SCAN| IDX_BEAR_NULL | 10000 | 30000 | 8 (0)| 00:00:01 |
---------------------------------------------------------------------------------------

可以看到,這三種情況都正確的使用了索引,從而提高了查詢效率,但需要犧牲的就是單列索引變成了複合索引,增加了維護成本,佔用了更多的空間,同時增加了索引掃描的成本。但這樣索引中也就可以把ID列為NULL的值存了進去,從而實現了對NULL和IS NOT NULL時候的正確查詢。

最佳化沒有最好的辦法,只有最合適的,很多時候,我們要根據不同的場景來選擇最適合的辦法。


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25016/viewspace-1030587/,如需轉載,請註明出處,否則將追究法律責任。

相關文章