oracle中distinct和group by的區別
其實二者沒有什麼可比性,但是對於不包含聚集函式的GROUP BY
操作來說,和DISTINCT
操作是等價的。不過雖然二者的結果是一樣的,但是二者的執行計劃並不相同。
在Oracle9i
中:
SQL> SELECT * FROM V$VERSION;
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production PL/SQL Release 9.2.0.4.0 - Production
CORE 9.2.0.3.0 Production
TNS for Linux: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production
SQL> CREATE TABLE T AS SELECT ROWNUM ID, A.* FROM DBA_OBJECTS A;
表已建立。
SQL> CREATE INDEX IND_T_CREATED ON T (CREATED);
索引已建立。
SQL> ALTER TABLE T MODIFY CREATED NOT NULL;
表已更改。
SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD HH24:MI:SS';
會話已更改。
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'T')
PL/SQL 過程已成功完成。
SQL> SET AUTOT ON EXP
SQL> SELECT COUNT(*) FROM (SELECT DISTINCT CREATED FROM T);
COUNT(*)
----------
4794
執行計劃
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=65 Card=1)
1 0 SORT (AGGREGATE)
2 1 VIEW (Cost=65 Card=4794)
3 2 SORT (UNIQUE) (Cost=65 Card=4794 Bytes=38352)
4 3 INDEX (FAST FULL SCAN) OF 'IND_T_CREATED' (NON-UNIQUE) (Cost=4 Card=41802 Bytes=334416)
SQL> SELECT COUNT(*) FROM (SELECT CREATED FROM T GROUP BY CREATED);
COUNT(*)
----------
4794
執行計劃
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=65 Card=1 Bytes=2)
1 0 SORT (AGGREGATE)
2 1 VIEW (Cost=65 Card=4794 Bytes=9588)
3 2 SORT (GROUP BY) (Cost=65 Card=4794 Bytes=38352)
4 3 INDEX (FAST FULL SCAN) OF 'IND_T_CREATED' (NON-UNIQUE) (Cost=4 Card=41802 Bytes=334416)
從執行計劃上看,DISTINCT
的操作是SORT (UNIQUE)
,而GROUP BY
是SORT (GROUP BY)
。DISTINCT
操作只需要找出所有不同的值就可以了。而GROUP BY
操作還要為其他聚集函式進行準備工作。從這一點上將,GROUP BY
操作做的工作應該比DISTINCT
所做的工作要多一些。
除了這一點,基本上看不到DISTINCT
和GROUP BY
(沒有聚集函式的情況)有什麼區別,而且從執行效率上也看不到明顯的差異。
不過從10g開始,二者的差異開始體現出來了。
SQL> CONN YANGTK/YANGTK@YTK已連線。
SQL> SET AUTOT OFF
SQL> SET TIMING OFF
SQL> CREATE TABLE T AS SELECT ROWNUM ID, A.* FROM DBA_OBJECTS A;
表已建立。
SQL> CREATE INDEX IND_T_CREATED ON T (CREATED);
索引已建立。
SQL> ALTER TABLE T MODIFY CREATED NOT NULL;
表已更改。
SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD HH24:MI:SS';
會話已更改。
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'T')
PL/SQL 過程已成功完成。
SQL> SET AUTOT ON
SQL> SET TIMING ON
建立好測試環境後,看一看標準分頁函式中,兩個操作的差異:
SQL> SELECT *
2 FROM
3 (
4 SELECT ROWNUM RN, A.*
5 FROM
6 (
7 SELECT CREATED
8 FROM T
9 GROUP BY CREATED
10 ) A
11 WHERE ROWNUM < 20
12 )
13 WHERE RN >= 10;
RN CREATED
---------- -------------------
10 2005-12-19 17:07:57
11 2005-12-19 17:07:58
12 2005-12-19 17:08:24
13 2005-12-19 17:08:25
14 2005-12-19 17:08:26
15 2005-12-19 17:08:27
16 2005-12-19 17:08:28
17 2005-12-19 17:08:29
18 2005-12-19 17:08:33
19 2005-12-19 17:08:35
已選擇10行。
已用時間: 00: 00: 00.06
執行計劃
----------------------------------------------------------
Plan hash value: 3639065582
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 418 | 1 (0)|
|* 1 | VIEW | | 19 | 418 | 1 (0)|
|* 2 | COUNT STOPKEY | | | | |
| 3 | VIEW | | 969 | 8721 | 1 (0)|
|* 4 | SORT GROUP BY STOPKEY| | 969 | 7752 | 1 (0)|
| 5 | INDEX FULL SCAN | IND_T_CREATED | 969 | 7752 | 1 (0)|
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RN">=10)
2 - filter(ROWNUM<20)
4 - filter(ROWNUM<20)
統計資訊
----------------------------------------------------------
1 recursive calls
0 db block gets
67 consistent gets
0 physical reads
0 redo size
642 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
10 rows processed
SQL> SELECT *
2 FROM
3 (
4 SELECT ROWNUM RN, A.*
5 FROM
6 (
7 SELECT DISTINCT CREATED
8 FROM T
9 ) A
10 WHERE ROWNUM < 20
11 )
12 WHERE RN >= 10;
RN CREATED
---------- -------------------
10 2005-12-19 17:07:57
11 2005-12-19 17:07:58
12 2005-12-19 17:08:24
13 2005-12-19 17:08:25
14 2005-12-19 17:08:26
15 2005-12-19 17:08:27
16 2005-12-19 17:08:28
17 2005-12-19 17:08:29
18 2005-12-19 17:08:33
19 2005-12-19 17:08:35
已選擇10行。
已用時間: 00: 00: 00.03
執行計劃
----------------------------------------------------------
Plan hash value: 1650124153
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 418 | 14 (36)|
|* 1 | VIEW | | 19 | 418 | 14 (36)|
|* 2 | COUNT STOPKEY | | | | |
| 3 | VIEW | | 987 | 8883 | 14 (36)|
|* 4 | SORT GROUP BY STOPKEY| | 987 | 7896 | 14 (36)|
| 5 | INDEX FAST FULL SCAN| IND_T_CREATED | 50333 | 393K| 10 (10)|
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RN">=10)
2 - filter(ROWNUM<20)
4 - filter(ROWNUM<20)
統計資訊
----------------------------------------------------------
1 recursive calls
0 db block gets
73 consistent gets
0 physical reads
0 redo size
642 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
10 rows processed
出乎意料的是,GROUP BY
操作的COST
更低,而且邏輯讀也小,這似乎與二者的工作量成反比。仔細觀察執行計劃發現,問題的根源來自於GROUP BY
使用INDEX FULL SCAN
,而DISTINCT
使用了INDEX FAST FULL SCAN
。也許有人會感到奇怪,索引的快速全掃描不是要比索引全掃描效率更高嗎?對於讀取所有資料的情況下,確實是索引快速全掃效率更高。但是由於這裡採用了分頁,只取前20條資料,而且Oracle的10g增加了GROUP BY STOPKEY
這種新的執行路徑,因此在這裡GROUP BY
操作的效率更高。
觀察執行計劃中的處理行數可以發現,索引全掃描由於是按照索引的順序掃描,因此利用了STOPKEY
,僅僅處理了969條記錄就停了下來。而對於DISTINCT
操作的快速索引全速而言,顯然沒有使用STOPKEY
,讀取了所有的50333條記錄。這就是GROUP BY
和DISTINCT
的效能差異原因。
相關文章
- MySQL 中的 distinct 和 group by 的效能比較MySql
- SQL -去重Group by 和Distinct的效率SQL
- Oracle中Date和Timestamp的區別Oracle
- oracle之優化一用group by或exists優化distinctOracle優化
- MySQL和Oracle的區別MySqlOracle
- Oracle和MySQL的區別OracleMySql
- PostgreSQL DBA(169) - Develop(Distinct vs Group by)SQLdev
- Oracle中單引號和雙引號的區別Oracle
- [20200117]push_pred distinct group by.txt
- Oracle dba角色和sysdba的區別Oracle
- Oracle 和 mysql的9點區別OracleMySql
- Oracle 中varchar2 和nvarchar2區別Oracle
- oracle partition by group by,詳解partition by和group by對比Oracle
- JavaScript中for in 和for of的區別JavaScript
- Js中for in 和for of的區別JS
- mysql中!=和is not的區別MySql
- Python中is和==的區別Python
- JavaScript中==和===的區別JavaScript
- Linux中“>”和“>>”的區別Linux
- Python 中 is 和 == 的區別Python
- mysql中“ ‘ “和 “ ` “的區別MySql
- dg和ogg的區別--oracle資料庫Oracle資料庫
- NULL在oracle和mysql索引上的區別NullOracleMySql索引
- PHP 中的 -> 和 :: 的區別PHP
- oracle知識整理(1) union和union all的區別,left join和right join的區別(各種join的區別)Oracle
- Oracle vs PostgreSQL Develop(15) - DISTINCT ONOracleSQLdev
- java 中equals和==的區別Java
- SQL中where和on的區別SQL
- deferred中done和then的區別
- Oracle普通檢視和物化檢視的區別Oracle
- DM7,DM8和ORACLE中對分割槽split的區別Oracle
- JS中的!=、== 、!==、=== 的用法和區別JS
- mysql與Oracle的區別MySqlOracle
- Oracle中select for update ...一些區別Oracle
- 淺析Oracle(rownum)和Mysql(limit)分頁的區別OracleMySqlMIT
- jquery中prop和attr的區別jQuery
- javascrit中undefined和null的區別JavaUndefinedNull
- swift中Class和Struct的區別SwiftStruct