[20190810]如何索引一個超長欄位.txt

lfree發表於2019-08-10

[20190810]如何索引一個超長欄位.txt

--//連結問的一個問題:http://www.itpub.net/thread-2119521-1-1.html
因業務需要,一個欄位的字元要很長,最長有4000, 並且還要用欄位來判斷唯一性,如果直接建立索引他會報錯的,而且怎麼長查詢起
來是否很慢?請問各位前輩,有什麼好的方法來處理這個問題。

--//他還加了一個需求保證唯一性。不知道對方使用什麼版本,自己測試看看:

1.環境:
SCOTT@test01p> @ ver1

PORT_STRING                    VERSION        BANNER                                                                               CON_ID
------------------------------ -------------- -------------------------------------------------------------------------------- ----------
IBMPC/WIN_NT64-9.1.0           12.2.0.1.0     Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production              0

2.測試例子建立:
SCOTT@test01p> create table t (id number ,v1 varchar2(4000) );
Table created.

SCOTT@test01p> insert into t select level ,lpad('x',100,'x')||level v1 from dual connect by level < 2e5;
199999 rows created.

SCOTT@test01p> create unique index i_t_v1 on t(v1);
Index created.

--//實際上這樣的索引還是能建立的,估計重新rebuild online 就會報錯。
SCOTT@test01p> alter index i_t_v1 rebuild online ;
alter index i_t_v1 rebuild online
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-01450: maximum key length (3215) exceeded

--//因為rebuild online 要建立一張IOT表,導致maximum key length (3215) exceeded.

SCOTT@test01p> alter index i_t_v1 rebuild ;
Index altered.

--//取消online 沒有問題。

3.測試:
--//分析略。
SCOTT@test01p> select segment_name,header_file,header_block, BYTES,BLOCKS from dba_segments where owner=user and segment_name in ('I_T_V1','T');
SEGMENT_NAME         HEADER_FILE HEADER_BLOCK      BYTES     BLOCKS
-------------------- ----------- ------------ ---------- ----------
T                             11          306   27262976       3328
I_T_V1                        11         1066   28311552       3456

--//你可以發現在我的例子中,索引佔用的空間比表還大,因為我索引的欄位長度佔10X位元組。如果想原連結的情況也許更大。

SCOTT@test01p> alter session set statistics_level = all;
Session altered.

SCOTT@test01p> select substr( v1,100,3) from t where v1=lpad('x',100,'x')||42;
SUBSTR
------
x42

SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  9bky8g0hf1td9, child number 1
-------------------------------------
select substr( v1,100,3) from t where v1=lpad('x',100,'x')||42
Plan hash value: 4077016850
----------------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name   | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |      1 |        |       |     2 (100)|          |      1 |00:00:00.01 |       3 |
|*  1 |  INDEX UNIQUE SCAN| I_T_V1 |      1 |      1 |   107 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 |
----------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("V1"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx42')

4.繼續測試:
--//12c 提供standard_hash 函式。

SCOTT@test01p> create unique index if_t_v1_1 on t(standard_hash(v1));
Index created.

SCOTT@test01p> create unique index if_t_v1_2 on t(ora_hash(v1));
create unique index if_t_v1_2 on t(ora_hash(v1))
                                            *
ERROR at line 1:
ORA-01452: cannot CREATE UNIQUE INDEX; duplicate keys found

--//可以發現使用ora_hash出現重複值,無法建立唯一約束。而使用standard_hash函式沒有問題。

SCOTT@test01p> select segment_name,header_file,header_block, BYTES,BLOCKS from dba_segments where owner=user and segment_name in ('I_T_V1','T','IF_T_V1_1');
SEGMENT_NAME         HEADER_FILE HEADER_BLOCK      BYTES     BLOCKS
-------------------- ----------- ------------ ---------- ----------
T                             11          306   27262976       3328
IF_T_V1_1                     11          434    7340032        896
I_T_V1                        11         1066   28311552       3456

--//可以發現建立standard_hash的函式索引相對小一些。

SCOTT@test01p> select substr( v1,100,3) from t where v1=lpad('x',100,'x')||43;
SUBSTR
------
x43

SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  0jc1sx44ckdj9, child number 0
-------------------------------------
select substr( v1,100,3) from t where v1=lpad('x',100,'x')||43

Plan hash value: 4077016850

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name   | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |      1 |        |       |     2 (100)|          |      1 |00:00:00.03 |       3 |      2 |
|*  1 |  INDEX UNIQUE SCAN| I_T_V1 |      1 |      1 |   111 |     2   (0)| 00:00:01 |      1 |00:00:00.03 |       3 |      2 |
-------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("V1"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              x43')
       filter(STANDARD_HASH("V1")=HEXTORAW('FBDADC99CB387566504D65A6003CFA57CA05F238'))

--//產生的執行有點怪怪的,仔細看filter條件就明白了不知道是否存在standard_hash函式索引的原因,也許使用I_T_V1不用回表的原因,
--//但是使用的索引是I_T_V1.修改執行語句如下:

SCOTT@test01p> select id,substr( v1,100,3) from t where v1=lpad('x',100,'x')||44;
        ID SUBSTR
---------- ------
        44 x44

SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  dazgkpkj95mfs, child number 0
-------------------------------------
select id,substr( v1,100,3) from t where v1=lpad('x',100,'x')||44
Plan hash value: 1893196135
-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |       |     3 (100)|          |      1 |00:00:00.01 |       4 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T         |      1 |      1 |   132 |     3   (0)| 00:00:01 |      1 |00:00:00.01 |       4 |
|*  2 |   INDEX UNIQUE SCAN         | IF_T_V1_1 |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 |
-----------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / T@SEL$1
   2 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("V1"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx44'
   2 - access("T"."SYS_NC00003$"=HEXTORAW('CC14E546ACC59A5B03CD75C49F62248C12C6D8D0'))

--//注意我執行的查詢條件是v1=lpad('x',100,'x')||44,並沒有使用standard_hash函式查詢,如果寫成如下:

SCOTT@test01p> select id,substr( v1,100,3) from t where standard_hash(v1)=standard_hash(lpad('x',100,'x')||44);
        ID SUBSTR
---------- ------
        44 x44

SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  9z17dmg6hk84n, child number 0
-------------------------------------
select id,substr( v1,100,3) from t where
standard_hash(v1)=standard_hash(lpad('x',100,'x')||44)
Plan hash value: 1893196135
-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |       |     3 (100)|          |      1 |00:00:00.01 |       4 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T         |      1 |      1 |   132 |     3   (0)| 00:00:01 |      1 |00:00:00.01 |       4 |
|*  2 |   INDEX UNIQUE SCAN         | IF_T_V1_1 |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 |
-----------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / T@SEL$1
   2 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T"."SYS_NC00003$"=HEXTORAW('CC14E546ACC59A5B03CD75C49F62248C12C6D8D0'))

--//上下比較可以發現前者多了一個filter("V1"='xx......xx44'.
--//也就是使用standard_hash建立的索引也可以使用像v1=lpad('x',100,'x')||44這樣的謂詞查詢。
--//類似的情況到目前位置我僅僅知道trunc以及substr(v1,1,N)的情況。

5.繼續測試..2:
--//作者並沒有提示使用的是12c,如果不是12c應該沒有standard_hash函式。自己建立1個md5函式看看。

CREATE OR REPLACE FUNCTION FN_MD5(par1 IN VARCHAR) RETURN VARCHAR2
  DETERMINISTIC IS
  retval varchar2(32);
BEGIN
  retval := utl_raw.cast_to_raw(DBMS_OBFUSCATION_TOOLKIT.MD5(INPUT_STRING => par1));
  RETURN retval;
END FN_MD5;

SCOTT@test01p> create unique index if_t_v1_2 on t(fn_md5(v1));
Index created.

SCOTT@test01p> select segment_name,header_file,header_block, BYTES,BLOCKS from dba_segments where owner=user and segment_name in ('I_T_V1','T','IF_T_V1_1','IF_T_V1_2');
SEGMENT_NAME         HEADER_FILE HEADER_BLOCK      BYTES     BLOCKS
-------------------- ----------- ------------ ---------- ----------
T                             11          306   27262976       3328
IF_T_V1_1                     11          434    7340032        896
I_T_V1                        11         1066   28311552       3456
IF_T_V1_2                     11        14850   10485760       1280

--//使用fn_md5自定義函式的索引比standard_hash索引稍微大一些。

SCOTT@test01p> select id,substr( v1,100,3) from t where fn_md5(v1)=fn_md5(lpad('x',100,'x')||48);
        ID SUBSTR
---------- ------
        48 x48

SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  ggv02trp66xas, child number 0
-------------------------------------
select id,substr( v1,100,3) from t where
fn_md5(v1)=fn_md5(lpad('x',100,'x')||48)
Plan hash value: 675596911
-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |       |     3 (100)|          |      1 |00:00:00.01 |       4 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T         |      1 |      1 |   132 |     3   (0)| 00:00:01 |      1 |00:00:00.01 |       4 |
|*  2 |   INDEX UNIQUE SCAN         | IF_T_V1_2 |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 |
-----------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / T@SEL$1
   2 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T"."SYS_NC00004$"="FN_MD5"('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxx48'))

--//OK沒有問題。

6.繼續測試..3:
--//如果不考慮唯一性,可以考慮ora_hash函式。
SCOTT@test01p> create  index if_t_v1_3 on t(ora_hash(v1));
Index created.

SCOTT@test01p> select segment_name,header_file,header_block, BYTES,BLOCKS from dba_segments where owner=user and
               segment_name in ('I_T_V1','T','IF_T_V1_1','IF_T_V1_2','IF_T_V1_3');
SEGMENT_NAME         HEADER_FILE HEADER_BLOCK      BYTES     BLOCKS
-------------------- ----------- ------------ ---------- ----------
T                             11          306   27262976       3328
IF_T_V1_1                     11          434    7340032        896
I_T_V1                        11         1066   28311552       3456
IF_T_V1_2                     11        14850   10485760       1280
IF_T_V1_3                     11        16130    5242880        640
--//使用ora_hash函式索引更小。

SCOTT@test01p>  select count(*),ora_hash(v1) from t group by ora_hash(v1) order by 1 desc FETCH FIRST 2 ROWS ONLY;
  COUNT(*) ORA_HASH(V1)
---------- ------------
         2   3933908345
         2   3975299677

SCOTT@test01p> select id,substr(v1,100,10) from t where ora_hash(v1)=3933908345;
        ID SUBSTR(V1,100,10)
---------- --------------------
     36510 x36510
    144288 x144288        

SCOTT@test01p> select id,substr( v1,100,3) from t where ora_hash(v1)=ora_hash(lpad('x',100,'x')||36510);
        ID SUBSTR
---------- ------
     36510 x36
    144288 x14

SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  4dvtgan3r97pp, child number 1
-------------------------------------
select id,substr( v1,100,3) from t where
ora_hash(v1)=ora_hash(lpad('x',100,'x')||36510)
Plan hash value: 1660760003
-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |           |      1 |        |       |   801 (100)|          |      2 |00:00:00.01 |       5 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T         |      1 |   2000 |   244K|   801   (0)| 00:00:01 |      2 |00:00:00.01 |       5 |
|*  2 |   INDEX RANGE SCAN                  | IF_T_V1_3 |      1 |    800 |       |     1   (0)| 00:00:01 |      2 |00:00:00.01 |       3 |
-------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / T@SEL$1
   2 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T"."SYS_NC00005$"=3933908345)
--//也可以使用索引,缺點返回多餘的行。

總結:
--//使用standard_hash,自定義md5函式,或者ora_hash,但是ora_hash不保證唯一。我知道就這些,不知道還有什麼其它方法。
--//另外關於ORA-01450: maximum key length (3215) exceeded錯誤,可以考慮定義更大的資料塊db_16k_cache_size,
--//db_32k_cache_size.

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/267265/viewspace-2653279/,如需轉載,請註明出處,否則將追究法律責任。

相關文章