從語句繁簡和效能優化想到的......

cow977發表於2011-04-20
 
 
 
select a.*,b.object_name from TEST_OBJECT a, TEST_OBJ1 b
  where substr(a.object_name,1,length(b.object_name))=b.object_name;
 
一句簡單的語句,join的演算法註定只能走Nested Loop,執行計劃如下:
55546 rows selected.
Elapsed: 00:09:28.48
----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |   213K|    29M|  3310K  (2)| 11:02:09 |
|   1 |  NESTED LOOPS      |             |   213K|    29M|  3310K  (2)| 11:02:09 |
|   2 |   TABLE ACCESS FULL| TEST_OBJECT | 50806 |  6350K|   159   (3)| 00:00:02 |
|*  3 |   TABLE ACCESS FULL| TEST_OBJ1   |     4 |    68 |    65   (2)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("B"."OBJECT_NAME"=SUBSTR("A"."OBJECT_NAME",1,LENGTH("B"."OBJ
              ECT_NAME")))
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
   14055414  consistent gets
          0  physical reads
          0  redo size
    2126831  bytes sent via SQL*Net to client
      41133  bytes received via SQL*Net from client
       3705  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      55546  rows processed
 
 
 
 網友anlinew提供了一種看似不可思義的非常精妙的優化方法,將語句改寫成:
select a.*, b.object_name
  from TEST_OBJECT a, TEST_OBJ1 b
 where substr(a.object_name, 1, length(b.object_name)) = b.object_name
   and substr(a.object_name, 1, 4) = substr(b.object_name, 1, 4)
   and length(b.object_name) > 3
union all
select a.*, b.object_name
  from TEST_OBJECT a, TEST_OBJ1 b
 where substr(a.object_name, 1, length(b.object_name)) = b.object_name
   and length(b.object_name) < 4;
將資料進行了切片,語句複雜了,但效能卻大幅提高。
優化的核心思想是將資料“分片”,該例子中分片的依據是多少位首字母(這裡是4),其中“大頭”由Hash Join處理,而“小頭”走Nested Loop,這種“抓大放小”的做法直接就從複雜度上進行了優化。
執行計劃如下:
55546 rows selected.
Elapsed: 00:00:09.59
-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |  2827 |   400K| 12224  (99)| 00:02:27 |
|   1 |  UNION-ALL          |             |       |       |            |  |
|*  2 |   HASH JOIN         |             |  2128 |   301K|   391  (44)| 00:00:05 |
|*  3 |    TABLE ACCESS FULL| TEST_OBJ1   | 22731 |   377K|    67   (2)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| TEST_OBJECT | 50806 |  6350K|   159   (3)| 00:00:02 |
|   5 |   NESTED LOOPS      |             |   699 |    98K| 11833   (3)| 00:02:22 |
|*  6 |    TABLE ACCESS FULL| TEST_OBJ1   |    75 |  1275 |    67   (2)| 00:00:01 |
|*  7 |    TABLE ACCESS FULL| TEST_OBJECT |     9 |  1152 |   157   (3)| 00:00:02 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(SUBSTR("A"."OBJECT_NAME",1,4)=SUBSTR("B"."OBJECT_NAME",1,4))
       filter("B"."OBJECT_NAME"=SUBSTR("A"."OBJECT_NAME",1,LENGTH("B"."OBJECT_NAME")))
   3 - filter(LENGTH("B"."OBJECT_NAME")>3)
   6 - filter(LENGTH("B"."OBJECT_NAME")<4)
   7 - filter("B"."OBJECT_NAME"=SUBSTR("A"."OBJECT_NAME",1,LENGTH("B"."OBJECT_NAME")))
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      39190  consistent gets
          0  physical reads
          0  redo size
    2216003  bytes sent via SQL*Net to client
      41133  bytes received via SQL*Net from client
       3705  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      55546  rows processed
 
按照anlinew的思路,試了下:
select a.*, b.object_name
  from TEST_OBJECT a, TEST_OBJ1 b
 where substr(a.object_name, 1, length(b.object_name)) = b.object_name
   and substr(a.object_name, 1, 1) = substr(b.object_name, 1, 1);
55546 rows selected.
Elapsed: 00:01:30.57
----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |  2135 |   302K|   391  (44)| 00:00:05 |
|*  1 |  HASH JOIN         |             |  2135 |   302K|   391  (44)| 00:00:05 |
|   2 |   TABLE ACCESS FULL| TEST_OBJ1   | 22806 |   378K|    67   (2)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| TEST_OBJECT | 50806 |  6350K|   159   (3)| 00:00:02 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access(SUBSTR("A"."OBJECT_NAME",1,1)=SUBSTR("B"."OBJECT_NAME",1,1))
       filter("B"."OBJECT_NAME"=SUBSTR("A"."OBJECT_NAME",1,LENGTH("B"."OBJECT_NAME")))
Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
       4706  consistent gets
          0  physical reads
          0  redo size
    2136652  bytes sent via SQL*Net to client
      41133  bytes received via SQL*Net from client
       3705  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      55546  rows processed
 
select a.*, b.object_name
  from TEST_OBJECT a, TEST_OBJ1 b
 where substr(a.object_name, 1, length(b.object_name)) = b.object_name
   and substr(a.object_name, 1, N) = substr(b.object_name, 1, N)
   and length(b.object_name) >= N
union all
select a.*, b.object_name
  from TEST_OBJECT a, TEST_OBJ1 b
 where substr(a.object_name, 1, length(b.object_name)) = b.object_name
   and length(b.object_name) < N;
N=1:Elapsed: 00:01:30.91
N=2:Elapsed: 00:00:16.76
N=3:Elapsed: 00:00:09.75
N=4:Elapsed: 00:00:09.63
N=5:Elapsed: 00:00:07.17
N=6:Elapsed: 00:00:06.97
N=7:Elapsed: 00:00:11.01
N=8:Elapsed: 00:00:19.21
 
當N=1時,第二個分片可以去掉,從執行計劃看,也是HashJoin,但效率不是最高,從實驗結果看,N=6是最佳分片。
SQL> select avg(length(object_name)) from test_obj1;
AVG(LENGTH(OBJECT_NAME))
------------------------
              17.2789778
SQL> select length(object_name),count(*) from test_obj1 group by length(object_name) order by 1;
LENGTH(OBJECT_NAME)   COUNT(*)
------------------- ----------
                  1          2
                  2          7
                  3         41
                  4         66
                  5        102
                  6        209
                  7        429
                  8        569
                  9        442
                 10        569
                 11        729
                 12        728
                 13        935
                 14        947
                 15       1010
                 16       1268
                 17       1428
                 18       1594
                 19       1864
                 20       1877
                 21       2185
                 22       2260
                 23       2497
23 rows selected.
 
效能的優化,是經驗的積累,但也要靠實驗來驗證,因為資料是基礎,基礎改變了,結論也就變了。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/81227/viewspace-692953/,如需轉載,請註明出處,否則將追究法律責任。

相關文章