[20180928]避免表示式在sql語句中.txt

lfree發表於2018-09-28

[20180928]避免表示式在sql語句中.txt

--//在sql語句中避免表示式是很困難的,但是連結https://blog.jooq.org/2016/11/01/why-you-should-avoid-expressions-in-sql-predicates/
--//提到這麼大的差異,我個人還是非常懷疑的,還是以自己的測試為準:

1.環境與建立測試:
SCOTT@test> @ &r/ver1
PORT_STRING                    VERSION        BANNER
------------------------------ -------------- ----------------------------------------------------------------
x86_64/Linux 2.4.xx            10.2.0.4.0     Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi

CREATE TABLE payment (
  id            NOT NULL PRIMARY KEY,
  payment_date  NOT NULL,
  text
) AS
SELECT
  level,
  SYSDATE - dbms_random.value(1, 500),
  LPAD ('a', 500, 'a')
--    dbms_random.string('a', 500)
FROM dual
CONNECT BY level <= 50000
ORDER BY dbms_random.value;

--//執行有點慢,可以修改dbms_random.string('a', 500)=>lpad('a',500,'a'),這樣可以快一些,不影響測試結果.
 
--//CREATE INDEX i_payment_date ON payment(payment_date);
--//先不建立索引,實際上索引沒用,查詢範圍很大.因為作者使用ORDER BY dbms_random.value,不按日期排序.
--//走全表掃描更能看出問題在那裡,因為這樣每行都要比較.
 
EXEC dbms_stats.gather_table_stats('SCOTT', 'PAYMENT');

2.建立測試指令碼:
$ cat expr.txt
SET SERVEROUTPUT ON
DECLARE
  v_ts TIMESTAMP WITH TIME ZONE;
  v_repeat CONSTANT NUMBER := 500;
  v_range CONSTANT NUMBER := 470;
 -- v_date CONSTANT DATE := SYSDATE - v_range;
  v_date  DATE ;
BEGIN
  v_ts := SYSTIMESTAMP;

  -- Original query with inline expression
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT *
      FROM payment
      WHERE payment_date < SYSDATE - v_range
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;

  dbms_output.put_line('Statement 1 : ' || (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;

  -- Pre-calculated PL/SQL local variable
  FOR i IN 1..v_repeat LOOP
    v_date  := SYSDATE - v_range;
    FOR rec IN (
      SELECT *
      FROM payment
      WHERE payment_date < v_date
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;

  dbms_output.put_line('Statement 2 : ' || (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;

  -- Magical 11g scalar subquery caching
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT *
      FROM payment
      WHERE payment_date < (SELECT SYSDATE - v_range FROM dual)
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;

  dbms_output.put_line('Statement 3 : ' || (SYSTIMESTAMP - v_ts));
END;
/
--//測試前說明一下,v_repeat原來定義100,我修改500.這樣測試差距明顯一點點.
--//注:我修改計算v_date應該重複500次.這樣測試公平一些.

3.測試:
--//第1次執行使用表示式:
SELECT * FROM payment WHERE payment_date < SYSDATE - v_range
--//第2次執行使用變數.感覺這裡有問題,v_date僅僅計算1次.我個人認為作為比較,應該每次要重新計算v_date.
SELECT * FROM payment WHERE payment_date < v_date
--//第3次執行使用標量子查詢.
SELECT * FROM payment WHERE payment_date < (SELECT SYSDATE - v_range FROM dual)

--//執行計劃應該都是一樣的(我沒有建立索引,僅僅標量子查詢有一點點不同).測試結果:
SCOTT@test> @expr.txt
Statement 1 : +000000000 00:00:16.639160000
Statement 2 : +000000000 00:00:07.997651000
Statement 3 : +000000000 00:00:09.716406000
PL/SQL procedure successfully completed.

SCOTT@test> @expr.txt
Statement 1 : +000000000 00:00:17.143361000
Statement 2 : +000000000 00:00:08.381461000
Statement 3 : +000000000 00:00:10.362664000
PL/SQL procedure successfully completed.

SCOTT@test> @expr.txt
Statement 1 : +000000000 00:00:18.866096000
Statement 2 : +000000000 00:00:09.923500000
Statement 3 : +000000000 00:00:11.905165000
PL/SQL procedure successfully completed.

--//沒想到在10g下存在這麼大的差異.可以發現使用表示式最慢,而標量子查詢能快取結果,這樣看上去第2,第3執行時間相差感覺還是有點大.
--//差別在於多執行500次的SELECT SYSDATE - v_range FROM dual. 感覺這裡將近2秒的差距還是有點大.
--//可以看出第2種直接使用變數快一些.
--//不過並不像作者測試那樣,使用標量子查詢最快,估計三種方式可能有一些使用索引.
--//建立索引重複測試:

SCOTT@test> CREATE INDEX i_payment_date ON payment(payment_date);
Index created.

SCOTT@test> alter system flush shared_pool;
System altered.

SCOTT@test> @expr.txt
Statement 1 : +000000000 00:00:15.888974000
Statement 2 : +000000000 00:00:07.082431000
Statement 3 : +000000000 00:00:04.654471000
PL/SQL procedure successfully completed.

SCOTT@test> @expr.txt
Statement 1 : +000000000 00:00:15.862751000
Statement 2 : +000000000 00:00:07.081248000
Statement 3 : +000000000 00:00:04.524595000

PL/SQL procedure successfully completed.

SCOTT@test> @expr.txt
Statement 1 : +000000000 00:00:15.854954000
Statement 2 : +000000000 00:00:07.104939000
Statement 3 : +000000000 00:00:04.487967000
PL/SQL procedure successfully completed.

--//建立索引後結果與作者測試相近,標量子查詢最快.實際上作者沒有注意執行計劃發生了變化:

SCOTT@test> @ &r/dpc  bnf6jkmr2sh8c ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  bnf6jkmr2sh8c, child number 0
-------------------------------------
SELECT * FROM PAYMENT WHERE PAYMENT_DATE < SYSDATE - :B1
Plan hash value: 684176532
------------------------------------------------------------------------------
| Id  | Operation         | Name    | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |
------------------------------------------------------------------------------
|*  1 |  TABLE ACCESS FULL| PAYMENT |   3002 |  1503K|   864   (1)| 00:00:11 |
------------------------------------------------------------------------------

SCOTT@test> @ &r/dpc 5tqgbsqwxsjtt ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  5tqgbsqwxsjtt, child number 0
-------------------------------------
SELECT * FROM PAYMENT WHERE PAYMENT_DATE < :B1
Plan hash value: 684176532
------------------------------------------------------------------------------
| Id  | Operation         | Name    | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |
------------------------------------------------------------------------------
|*  1 |  TABLE ACCESS FULL| PAYMENT |   3002 |  1503K|   862   (1)| 00:00:11 |
------------------------------------------------------------------------------

SCOTT@test> @ &r/dpc bumyan15pbchp ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  bumyan15pbchp, child number 0
-------------------------------------
SELECT * FROM PAYMENT WHERE PAYMENT_DATE < (SELECT SYSDATE - :B1 FROM DUAL)
Plan hash value: 2871123539
-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |
-----------------------------------------------------------------------------------------------
|   1 |  TABLE ACCESS BY INDEX ROWID| PAYMENT        |   2500 |  1252K|   453   (0)| 00:00:06 |
|*  2 |   INDEX RANGE SCAN          | I_PAYMENT_DATE |    450 |       |     3   (0)| 00:00:01 |
|   3 |    FAST DUAL                |                |      1 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
--//450/50000 = .009,真不知道這個比率如何確定的.

--//只有標量子查詢的語句走了索引,問題在於估算的返回記錄450,這樣比較的次數減少許多,所以出現了使用標量子查詢CPU消耗時間更
--//少,出現標量子查詢更快的假象.
--//不過給承認在10g下確實存在存在表示式執行比較慢的情況.

4.繼續11g下測試:
SCOTT@book> @ ver1
PORT_STRING                    VERSION        BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx            11.2.0.4.0     Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

SCOTT@book> alter session set "_serial_direct_read"=never;
Session altered.
--//避免採用直接路徑讀.
--//重複測試指令碼略,看前面:
--//先不建立索引看看:
SCOTT@book> @ expr.txt
Statement 1 : +000000000 00:00:04.520582000
Statement 2 : +000000000 00:00:03.725714000
Statement 3 : +000000000 00:00:04.309029000
PL/SQL procedure successfully completed.

SCOTT@book> @ expr.txt
Statement 1 : +000000000 00:00:05.054053000
Statement 2 : +000000000 00:00:04.325762000
Statement 3 : +000000000 00:00:05.000122000
PL/SQL procedure successfully completed.

SCOTT@book> @ expr.txt
Statement 1 : +000000000 00:00:06.489944000
Statement 2 : +000000000 00:00:05.142213000
Statement 3 : +000000000 00:00:06.300671000
PL/SQL procedure successfully completed.

--//還是可以發現使用表示式比較慢,不過這個需要量的累積.

--//建立索引重複測試:
CREATE INDEX i_payment_date ON payment(payment_date);
Index created.

SCOTT@book> CREATE INDEX i_payment_date ON payment(payment_date);
Index created.

--//退出再登入:
SCOTT@book> alter system flush shared_pool;
System altered.

SCOTT@book> alter session set "_serial_direct_read"=never;
Session altered.

SCOTT@book> @ expr.txt
Statement 1 : +000000000 00:00:05.703140000
Statement 2 : +000000000 00:00:04.711262000
Statement 3 : +000000000 00:00:05.617969000
PL/SQL procedure successfully completed.

SCOTT@book> @ expr.txt
Statement 1 : +000000000 00:00:05.700170000
Statement 2 : +000000000 00:00:04.738476000
Statement 3 : +000000000 00:00:05.615570000
PL/SQL procedure successfully completed.

SCOTT@book> @ expr.txt
Statement 1 : +000000000 00:00:05.719636000
Statement 2 : +000000000 00:00:04.709207000
Statement 3 : +000000000 00:00:05.611807000
PL/SQL procedure successfully completed.

--//執行計劃全部是全表掃描,不再貼出,不像10g那樣出現標量子查詢語句走索引的情況

5.測試11g使用提示使用索引的情況:
$ cat expr1.txt
SET SERVEROUTPUT ON
DECLARE
  v_ts TIMESTAMP WITH TIME ZONE;
  v_repeat CONSTANT NUMBER := 500;
  v_range CONSTANT NUMBER := 470;
--  v_date CONSTANT DATE := SYSDATE - v_range;
  v_date  DATE ;
BEGIN
  v_ts := SYSTIMESTAMP;

  -- Original query with inline expression
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT /*+ index(payment i_payment_date) */ *
      FROM payment
      WHERE payment_date < SYSDATE - v_range
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;

  dbms_output.put_line('Statement 1 : ' || (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;

  -- Pre-calculated PL/SQL local variable
  FOR i IN 1..v_repeat LOOP
  v_date   := SYSDATE - v_range;
    FOR rec IN (
      SELECT /*+ index(payment i_payment_date) */ *
      FROM payment
      WHERE payment_date < v_date
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;

  dbms_output.put_line('Statement 2 : ' || (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;

  -- Magical 11g scalar subquery caching
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT /*+ index(payment i_payment_date) */ *
      FROM payment
      WHERE payment_date < (SELECT SYSDATE - v_range FROM dual)
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;

  dbms_output.put_line('Statement 3 : ' || (SYSTIMESTAMP - v_ts));
END;
/

SCOTT@book> alter system flush shared_pool;
System altered.

SCOTT@book> alter session set "_serial_direct_read"=never;
Session altered.

SCOTT@book> @ expr1.txt
Statement 1 : +000000000 00:00:02.385992000
Statement 2 : +000000000 00:00:02.257972000
Statement 3 : +000000000 00:00:02.249990000
PL/SQL procedure successfully completed.

SCOTT@book> @ expr1.txt
Statement 1 : +000000000 00:00:02.197627000
Statement 2 : +000000000 00:00:02.002312000
Statement 3 : +000000000 00:00:01.987825000
PL/SQL procedure successfully completed.

SCOTT@book> @ expr1.txt
Statement 1 : +000000000 00:00:02.279598000
Statement 2 : +000000000 00:00:01.991588000
Statement 3 : +000000000 00:00:01.986305000
PL/SQL procedure successfully completed.

--//你可以發現比較次數少量,3者差距很小.

6.在12c下測試如下:
SCOTT@test01p> @ver1
PORT_STRING                    VERSION        BANNER                                                                               CON_ID
------------------------------ -------------- -------------------------------------------------------------------------------- ----------
IBMPC/WIN_NT64-9.1.0           12.1.0.1.0     Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production              0

SCOTT@test01p> @ expr.txt
Statement 1 : +000000000 00:00:06.851000000
Statement 2 : +000000000 00:00:06.048000000
Statement 3 : +000000000 00:00:06.812000000
PL/SQL procedure successfully completed.

SCOTT@test01p> @ expr.txt
Statement 1 : +000000000 00:00:06.803000000
Statement 2 : +000000000 00:00:06.187000000
Statement 3 : +000000000 00:00:07.083000000
PL/SQL procedure successfully completed.

--//注:建立不建立索引,在12c先全部是全表掃描.
--//3者差距很小.不過使用表示式還是有點慢,不是很明顯.

7.總結:
1.可以看出在sql語句出現表示式最慢的,重點還是在於比較的數量上.不管10g,11g,12c.特別注意的情況是10g差距明顯.有2倍的差距.
2.500次,大約有1秒的差距(11g的情況,12c小一些),也就是1000/500=2ms,每次2ms的差距,如果語句大量累積還是很可觀的.
3.另外作者10g下測試我猜測一定是標量子查詢的語句使用索引,這樣比較次數明顯減少.
4.給人一種感覺走索引更快,當然我僅僅一個使用者執行.
5.我自己還有1個疑問,我在10g,11g的測試環境硬體環境相似.全表掃描執行時間上為什麼存在這麼大的差異.

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/267265/viewspace-2215206/,如需轉載,請註明出處,否則將追究法律責任。

相關文章