DATE型別擷取到天的效率

yangtingkun發表於2012-01-09

ITPUB上看了一個帖子,根據日期型別對每天的記錄進行GROUP BY,帖子的地址如下:http://www.itpub.net/thread-1564295-1-1.html

 

 

這種包含全表掃描執行GROUP BY的語句是否還有最佳化的餘地嗎,事實上確實還有,因為對於處理日期型別,TO_CHAR並沒有TRUNC高效。

下面看一個簡單的例子:

SQL> CREATE TABLE T_DATE AS
  2  SELECT ROWNUM ID, CREATED
  3  FROM DBA_OBJECTS A, (SELECT 1 FROM DUAL CONNECT BY ROWNUM < 100) 
  4  WHERE ROWNUM <= 1000000;

Table created.

SQL> SELECT COUNT(*) FROM T_DATE;

  COUNT(*)
----------
   1000000

SQL> SET TIMING ON
SQL> SELECT TO_CHAR(CREATED, 'YYYY-MM-DD'), COUNT(*)
  2  FROM T_DATE
  3  GROUP BY TO_CHAR(CREATED, 'YYYY-MM-DD');

TO_CHAR(CR   COUNT(*)
---------- ----------
2012-01-07       3600
2012-01-08       3750
2012-01-09       4650
2012-01-06     987925
2012-01-10         75

Elapsed: 00:00:00.46
SQL> SELECT TO_CHAR(CREATED, 'YYYY-MM-DD'), COUNT(*)
  2  FROM T_DATE
  3  GROUP BY TO_CHAR(CREATED, 'YYYY-MM-DD');

TO_CHAR(CR   COUNT(*)
---------- ----------
2012-01-07       3600
2012-01-08       3750
2012-01-09       4650
2012-01-06     987925
2012-01-10         75

Elapsed: 00:00:00.40
SQL> SELECT TO_CHAR(CREATED, 'YYYY-MM-DD'), COUNT(*)
  2  FROM T_DATE
  3  GROUP BY TO_CHAR(CREATED, 'YYYY-MM-DD');

TO_CHAR(CR   COUNT(*)
---------- ----------
2012-01-07       3600
2012-01-08       3750
2012-01-09       4650
2012-01-06     987925
2012-01-10         75

Elapsed: 00:00:00.39
SQL> SELECT TO_CHAR(CREATED, 'YYYY-MM-DD'), COUNT(*)
  2  FROM T_DATE
  3  GROUP BY TO_CHAR(CREATED, 'YYYY-MM-DD');

TO_CHAR(CR   COUNT(*)
---------- ----------
2012-01-07       3600
2012-01-08       3750
2012-01-09       4650
2012-01-06     987925
2012-01-10         75

Elapsed: 00:00:00.44
SQL> SELECT TO_CHAR(TRUNC(CREATED), 'YYYY-MM-DD'), COUNT(*)
  2  FROM T_DATE
  3  GROUP BY TRUNC(CREATED);

TO_CHAR(TR   COUNT(*)
---------- ----------
2012-01-06     987925
2012-01-10         75
2012-01-08       3750
2012-01-07       3600
2012-01-09       4650

Elapsed: 00:00:00.36
SQL> SELECT TO_CHAR(TRUNC(CREATED), 'YYYY-MM-DD'), COUNT(*)
  2  FROM T_DATE
  3  GROUP BY TRUNC(CREATED);

TO_CHAR(TR   COUNT(*)
---------- ----------
2012-01-10         75
2012-01-07       3600
2012-01-09       4650
2012-01-06     987925
2012-01-08       3750

Elapsed: 00:00:00.35
SQL> SELECT TO_CHAR(TRUNC(CREATED), 'YYYY-MM-DD'), COUNT(*)
  2  FROM T_DATE
  3  GROUP BY TRUNC(CREATED);

TO_CHAR(TR   COUNT(*)
---------- ----------
2012-01-10         75
2012-01-07       3600
2012-01-09       4650
2012-01-06     987925
2012-01-08       3750

Elapsed: 00:00:00.36
SQL> SELECT TO_CHAR(TRUNC(CREATED), 'YYYY-MM-DD'), COUNT(*)
  2  FROM T_DATE
  3  GROUP BY TRUNC(CREATED);

TO_CHAR(TR   COUNT(*)
---------- ----------
2012-01-10         75
2012-01-07       3600
2012-01-09       4650
2012-01-06     987925
2012-01-08       3750

Elapsed: 00:00:00.34

如果僅從執行計劃和邏輯讀上進行分析,兩個SQL沒有任何區別:

SQL> set autot on
SQL> SELECT TO_CHAR(CREATED, 'YYYY-MM-DD'), COUNT(*)
  2  FROM T_DATE
  3  GROUP BY TO_CHAR(CREATED, 'YYYY-MM-DD');

TO_CHAR(CR   COUNT(*)
---------- ----------
2012-01-07       3600
2012-01-08       3750
2012-01-09       4650
2012-01-06     987925
2012-01-10         75

Elapsed: 00:00:00.43

Execution Plan
----------------------------------------------------------
Plan hash value: 534547868

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |  1294K|    11M|   726   (6)| 00:00:09 |
|   1 |  HASH GROUP BY     |        |  1294K|    11M|   726   (6)| 00:00:09 |
|   2 |   TABLE ACCESS FULL| T_DATE |  1294K|    11M|   694   (1)| 00:00:09 |
-----------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2490  consistent gets
       2487  physical reads
          0  redo size
        754  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

SQL> SELECT TO_CHAR(TRUNC(CREATED), 'YYYY-MM-DD'), COUNT(*)
  2  FROM T_DATE
  3  GROUP BY TRUNC(CREATED);

TO_CHAR(TR   COUNT(*)
---------- ----------
2012-01-10         75
2012-01-07       3600
2012-01-09       4650
2012-01-06     987925
2012-01-08       3750

Elapsed: 00:00:00.34

Execution Plan
----------------------------------------------------------
Plan hash value: 534547868

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |  1294K|    11M|   726   (6)| 00:00:09 |
|   1 |  HASH GROUP BY     |        |  1294K|    11M|   726   (6)| 00:00:09 |
|   2 |   TABLE ACCESS FULL| T_DATE |  1294K|    11M|   694   (1)| 00:00:09 |
-----------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2490  consistent gets
       2487  physical reads
          0  redo size
        761  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

但是觀察兩個SQL的平均執行時間,會發現使用TRUNC方式比TO_CHAR1/8的效能提升,對於執行計劃完全相同的情況而言,這個比率已經很高了。

其實導致問題的原因在於DATE型別的儲存,DATE7個位元組組成,分別為世紀、年、月、日、時、分、秒。對於TRUNC函式而言,只是簡單的捨棄掉後面三個位元組,因此效率最高,而TO_CHAR需要將內部的儲存格式轉化為字元格式,顯然會消耗更多的資源。

兩個SQL返回結果順序的不同也說明了這一點,TRUNC函式進行HASH GROUP的是日期格式,而TO_CHAR函式進行HASH GROUP的是字元型別,導致了最終結果返回順序的差異性。

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/4227/viewspace-714647/,如需轉載,請註明出處,否則將追究法律責任。

相關文章